Replaces Human Programmers with AI Progress Sheet (Feb 2025 Edition) .

TLDR

Choose the best model

As success rates for simple tasks increase, break complex tasks into simpler ones

Humans still have an advantage in engineering skills for complex system management; strong engineering abilities can more accurately describe requirements to unlock AI's potential

High-quality documentation facilitates high-quality output, especially a project's own documentation

In the year since hiring AI programmers, I've been seeking ways to more intensively use AI to replace human programmers in complex projects.

This is the progress report and best practices as of February 2025.

Choose the strongest model

Currently, the strongest coding model is still o1 pro. I suspect this is due to extended reasoning time and o1's solid foundation.

o1 pro is slow, but that's not a drawback. Writing code inherently involves thinking slowly and writing quickly. Correctness and rationality are more important than speed.

Simple tasks yield the best results

When demonstrating large language models' programming capabilities, they're often tasked with writing a complete simple application, like a ToDo app or a small game like Snake.

These tasks have clear boundaries, almost no need to interact with external systems, and are not zero-shot scenarios, so they're completed very well.

To leverage this advantage: firstly, use the strongest available model; secondly, strictly define boundaries when describing requirements, combining simple tasks to form complex ones.

Regarding the second point, treat each module as third-party: encapsulate internal logic; provide services through interfaces; acquire dependent external resources through agreed protocols. This aligns with software engineering best practices, so following this principle also improves engineering quality.

For example, a typical development request might have a prompt like this:

implement a golang module, here are requirements:
- requirement 1...
- requirement 2...
- ...

here are interfaces:
- method 1...
- method 2...
- ...

here are dummy functions you may needs:
- function 1
- function 2
- ...

here are criteria:
- criterion 1
- criterion 2
- ...

Providing this prompt to o1 pro yields the best results currently possible.

Engineering management should be done by humans

Large language models now perform well on both the highest and lowest level engineering tasks. The lowest level refers to the "simple tasks" mentioned earlier.

The highest level refers to abstraction of the entire architecture, where the model acts as a consulting architect. You can pose questions and receive options as reference, which can be very helpful.

However, for the intermediate engineering tasks connecting these two extremes, even with agents like Devin and Cursor, current large language models still struggle with complex engineering intermediaries.

I believe there are two issues: first, implicit engineering knowledge isn't reflected in code and documentation, and there's no optimal solution for design and architecture choices - there's a lot of "dirty work"; second, as models receive more content, they lack clear boundaries, leading to severe hallucinations.

For the second issue, more engineering work may be needed, such as incorporating static analysis results (a guess, as I'm not an expert in language analysis) or other elements to narrow the problem scope.

For the first issue, code and documentation are essentially the same. If the second issue isn't easily resolved, simply supplementing knowledge might not be enough.

Regarding IDEs, I don't think there's a fundamental difference between vscode, cursor, and windsurf. Given the model's limitations, if an IDE ultimately provides incorrect predictions, it's certainly not as good as o1 pro getting it right the first time.

High-quality documentation

Although code should be self-explanatory, high-quality documentation improves the model's understanding and perception of the project.

This is particularly advantageous in a mono repo, as the documentation itself is in the repo. Specifying a markdown file in the IDE is sufficient.

For this reason, I'm gradually organizing projects into mono repos.

Feasibility of replacing human programmers

I agree with Sahil Lavingia's view:

No longer hiring junior or even mid-level software engineers.

No longer hiring junior or even mid-level software engineers.

Our tokens per codebase:

Gumroad: 2M
Flexile: 800K
Helper: 500K
Iffy: 200K
Shortest: 100K

Both Claude 3.5 Sonnet and o3-mini have context windows of 200K tokens, meaning they can now write 100% of our Iffy and…
— Sahil Lavingia (@shl) February 6, 2025

I've also done things similar to antiwork. For example, AI bots similar to Helper, and anti-spam measures similar to Iffy.

My practical experience shows:

Skilled engineers using AI appropriately can easily achieve 5 to 10 times the output. This means that if a company's profit model is based on technical output, reducing 50%-80% of the workforce is feasible.
With proper complexity division and under the supervision of excellent engineers, AI can continuously iterate medium-scale projects without issues (for example, the core backend code of Quaily is about 1M tokens)
The good news is that regardless of whether AGI arrives, the upper limit of LLM capabilities is sufficient to change many, many things.
- The bad news is that its lower limit is higher than many humans' upper limit.
  - The worse news is that all those who don't fully understand the good news fall within the scope affected by the bad news.

Replaces Human Programmers with AI Progress Sheet (Feb 2025 Edition) .

Choose the strongest model

Simple tasks yield the best results

Engineering management should be done by humans

High-quality documentation

Feasibility of replacing human programmers

Human Replacement Plan: A Guide to Using AI to Replace Colleagues

Programming for AI: It's Time to Face Uncertainty

Not Balding, Just Getting Stronger: Let AI Be the Master Chef, Tripling Coding and Writing Efficiency