Serious Vibe Coding | Creative Overflow

The Definition of Vibe coding

First, let's define vibe coding. My definition here is as follows:

No direct editing of the code itself.

However, the operations I still need to perform include:

Constructing the project's directory structure.
Manually writing and reviewing design documents.
Setting up the project's runtime environment using conda/pip and installing necessary libraries.
Running pytest.
Copying and pasting code into designated files.
Using git to commit code, and reverting changes if necessary.
Manually reviewing code review comments.

For operations 3-7, someone with absolutely no programming experience might not know how to do them, so it is not something that can be done with "zero foundation and no experience." However, operations 3-6 only require a short period of learning, and the manual review part primarily demands reading comprehension skills.
Additionally, because it involves writing complex design documents, it is also not the "one-liner programming" often seen in various videos and articles.

This is a very serious and earnest vibe programming practice.

AI Tool Selection

Recommended:

Use two AIs from different sources simultaneously, such as ChatGPT (e.g., GPT-4o, referred to as "o3" by the author) and Gemini 2.5 Pro. Always use their latest and most powerful versions.
Long context support is necessary; the longer, the better.

If you don't have a paid ChatGPT account, using only Gemini 2.5 Pro on Google AI Studio is also very good. However, cross-referencing with AIs from different sources, especially for checking, provides more confidence.
Personally, I don't prefer Claude. Its context window is too short. It's fine for code completion or writing one or two short functions, but our subsequent operations are at the module or file level, for which Gemini is very suitable.

As for others:

Is an AI-integrated editor or plugin like Cursor needed? No.
Is a locally run AI needed? No.

Methodological Models

Since AIs are trained on human behavior, methods that enhance human capabilities are likely to improve AI capabilities at this stage. Management methods can be broadly categorized into two types: those focusing on tasks rather than people, and those focusing on people rather than tasks. The former tends to suppress human initiative, dehumanize, and alienate... like an assembly line. The latter provides emotional support, handles benefit distribution, and allocates responsibilities and rights.

We need the former.

We primarily use two methodological models: Axiomatic Design and Test-Driven Development (TDD). Especially when dealing with medium-to-large scale problems, Axiomatic Design and TDD are excellent models that allow for steady, step-by-step progress without excessive rework. These two models are very simple:

Axiomatic Design
- Independence Axiom: Modules should be decoupled and not entangled.
- Information Axiom: Do not assume users/inputs will always follow the rules.
TDD
- Programs must be tested. Only when tests pass is it a good program.

For more detailed information, feel free to consult an AI.

Main Operational Flow

Discuss requirements with AIs (voice chat can be used), and then generate a requirements report, followed by manual revision.
- Having multiple AIs discuss at this stage is more effective.
- Some programs need to meet specific document format requirements, such as outputting binary files or JSON/YAML in a particular format. These must be repeatedly checked, and this task can be given to multiple AIs from different sources for cross-verification.
- AI understanding often has deviations, requiring item-by-item approval or revision.
Have the AI write an Axiomatic Design document based on the requirements report.
- If the FR-DP matrix does not show a lower triangular form (you can ask an AI what this means), it indicates insufficient decoupling, and redesign is needed. Manual intervention may be necessary.
- This document is the most important. It's recommended to use ChatGPT o3 deep research (likely referring to a powerful version of ChatGPT with research capabilities) to write it.
- Handing it over to multiple AIs for cross-discussion can further improve the results.
- You can have the AI estimate the complexity or difficulty of each module. Those that are too difficult should be further broken down. 7-10 modules might be a good number.
Give the finalized Axiomatic Design document to the AI to write a Function Architecture document. This step generally doesn't require manual review, but cross-discussion among AIs yields better results.
Give the finalized Function Architecture document to the AI to write a Test Requirements document. This step generally doesn't require manual review, but cross-discussion among AIs yields better results.
Based on the Axiomatic Design document, Function Architecture document, and Test Requirements document, write module and corresponding test pairs one by one. Repeat until tests pass completely, then move to the next pair.
- This will be discussed in more detail later.

The main part of this entire operation is designing the architecture. The level of abstraction is elevated from writing code to decomposing the overall process. It's more like being a product manager than a programmer. At the same time, by detaching from specific code writing, it means you can use languages or architectures you're unfamiliar with. For example, I've always disliked writing GUI interfaces; I find various "callbacks" difficult to understand, and I've never learned how to write JavaScript and its various derivatives. But under this workflow, whether the code is written in Python or JS isn't important. You could even write it in Python first and then convert it to equivalent JS, as long as corresponding libraries exist.

Simultaneously, you can also understand why the "one-liner programming" showcased in many videos and public account articles is feasible. This is because such programs are often simple, like Tetris, a bouncing ball animation, or at most, a note-taking application. The modules for these programs have been analyzed many times, are well-decomposed, and often have many open-source examples. However, if you're writing something like an optical simulation to lens design, or lens design to lathe machining integration, you can't use a single line to get an AI to write the program.

Code Writing Part

The following uses Google AI Studio as an example. Taking Python as an example, pytest is used here. Assume you have configured your local runtime environment and have already written pytest.ini to export results to an XML file, for example, placed in src\tests\unit_test_result.xml. (Ask Gemini for specifics on how to write this.)

Preparatory Work

System prompt:

You are proficient in Axiomatic Design principles and functional programming.
You answer questions in Chinese.
You always provide complete and elegant solutions.
Current project directory structure:
project_root
├───docs
└───src
├───package_name
├───tests
...

You can ask an AI for this directory structure; it will give a good suggestion. To generate such a character-based directory tree, you can use the tree command in the PowerShell command line.
Upload various documents

Upload the previously written Axiomatic Design document, Function Architecture document, and Test Requirements document to AI Studio. Using Alt+Enter allows you to upload without executing.

If you have already written multiple modules and a newly written module needs to use previous content, then also upload the already written module programs to Gemini. However, the total token usage should preferably not exceed 80k. (There's a trick here, which will be explained later.)

Writing Module/Test Pairs

Prompt:

Based on the documents above, write the program for the DP1 module and its corresponding pytest tests.

Just say it that simply. Then wait for Gemini to write the complete code. In your editor, at the designated locations, create new files like src\package_name\DP1.py and src\tests\test_DP1.py, and paste the code in. Commit to git.

Testing/Debugging

This is the main event. If you only require the program to run, the module code written above might be sufficient. However, if there are many modules, and you need each module to run, the final large program composed of them to run well, and even for future feature extensions to run correctly, the probability of this happening straight away is extremely low. Therefore, for every module written, synchronous testing must be performed to ensure each module passes its tests.

In Python, testing simply involves running pytest. If you have already configured pytest.ini, pytest will save the test results in the specified XML file. If all tests pass, then proceed to the next module.

If tests don't pass, follow these steps:

Open a new AI Studio page and copy the system prompt and reference documents (Axiomatic Design document, Function Architecture document, Test Requirements document) from above.
Write the prompt: "Based on the documents above, analyze the pytest results, modify the following program, and output the complete program code."
- Advanced prompt: "Based on the documents above, first perform a code review on the following program, then analyze the pytest results, modify the program, and output the complete program code."
After writing the prompt, use Alt+Enter.
Using File Explorer, drag the module program file, its corresponding test program file, and the unit test result file (e.g., src\package_name\DP1.py, src\tests\test_DP1.py, src\tests\unit_test_result.xml) into the AI Studio chat box.
Then run with Ctrl+Enter.

After getting the results, copy and paste them into the corresponding files, and run pytest again. If there are still errors:

Option A:

Drag the new unit test result file into the chat box, Ctrl+Enter.

Option B:

At the location where you just wrote the prompt, click the three dots (...) in the upper right of the text box, then select 'Branch from here'. You will get a state that retains the reference files and prompt but clears other files and outputs.
Using File Explorer, drag the newly modified module program file, its corresponding test program file, and the unit test result file (e.g., src\package_name\DP1.py, src\tests\test_DP1.py, src\tests\unit_test_result.xml) into the AI Studio chat box.
Then run with Ctrl+Enter.

When to use Option A / Option B:

If there are very few errors, e.g., 1-2, or if this is the first round of debugging, use Option A.
If Gemini is already struggling and its 'intelligence' is declining, use Option B. Here are indicators of Gemini's declining performance:
- Observe the token usage in the sidebar; if it exceeds 100k,
- Or, if pytest shows only 1 error, but it's described as an import error or some simple type error (yes, you still need to know some programming to understand this),
- Or, if you requested Chinese output, but the result not only contains a lot of English but also other languages like German, Hindi, Russian, or Korean,
- These all indicate Gemini is overloaded. The context needs to be cleared. Roughly, 20k context is equivalent to a 1-hour human meeting; by 100k, it's like a 5-hour meeting, and a normal human would be mentally fatigued.

If you repeatedly fail tests N times (e.g., N=5 or 7), it means this module is too large. Go back to the Axiomatic Design stage, tell the AI that this module is too large and ask for help splitting it, then update the Axiomatic Design document, Function Architecture document, and Test Requirements document.

Compressing/Summarizing Written Code

Many AIs advertise their context length. However, context length actually varies. I believe there are several types of context length:

The length that doesn't cause input errors or truncation. For Gemini 2.5 Pro, this can be up to 1M tokens.
The 'needle-in-a-haystack' retrieval length. You embed a sentence in a long text and ask the AI to find it. This is roughly 60%-80% of the maximum length.
The length for normal thinking and reasoning. For Gemini 2.5 Pro (as of Q2 2025, according to the original text's projection), it's approximately 100k-300k tokens. Performance starts to decline from 100k, and by 300k, it's quite muddled.

As mentioned earlier, at Gemini's current level, 20k tokens are like a 1-hour human meeting. Context lengths between 20-60k are optimal; all necessary background information is clearly provided, and the AI is highly motivated and at its peak 'intellectual' state. However, as the project progresses and more modules are written, the amount of information the AI needs to grasp beforehand increases. This material can accumulate to one or two hundred thousand tokens, and by the time the AI figures out all the context, it's already 'dizzy' and finds it hard to produce good code.

At this point, it's necessary to compress and summarize the already written code.

Open a new AI Studio page, upload the written programs sequentially up to about 80k tokens, then ask the AI to summarize the uploaded programs. You can discuss the summarization strategy with the AI. I usually request it to output all constants, all function signatures (names and types of input and output parameters), a one-sentence summary for each function, and a pseudocode description for complex functions. The result should be output as a document.

Then, save this summary document in the docs folder as well. Each time you write a new module, you also need to submit this summary document to the AI. This can compress your program to about 1/8 of its original size.

The above describes a "serious vibe programming" process. After all the documents are written, much of it involves repetitive, almost mindless operations, without needing to delve into code editing. Although 'proper' programmers might feel such repetitive manual tasks absolutely must be automated with code, a 'casual' programmer might be playing games, binge-watching shows, chopping vegetables, cooking, or cleaning at this point, thus not in a flow state, and wouldn't mind.

You can also choose not to use the AI Studio web version and instead use plugins like Cursor, Trae, or Cline to call the API. This can be simpler operationally, especially during debugging, where you can use the up/down arrow keys to retrieve previous prompts and have the AI revise repeatedly. However, in my personal testing with Trae, the success rate was lower. I'm not sure if Trae performs some degrading operations, like limiting length or having an overly long built-in system prompt. Or perhaps, because a new conversation wasn't started, the actual context length became too long, entering the 'reduced intelligence' zone.

I personally recommend performing code review during debugging. I feel the code success rate improves significantly. With the context of the Axiomatic Design document, code review provides a certain global perspective. If you are using obscure and complex libraries (like PythonOCC, where function calls are prone to errors), you should also enable search and force the AI to look up the correct syntax for each function.