Notes · 2026-04-01

Agentic Coding

There will be a long time before LLMs can manage big codebases without doing absolute garbage.

I use LLMs every day in the best harness currently: Cursor. There is no way in any universe that I can trust the agents to autonomously manage the codebase, like just having a kanban and letting my agents execute my tasks.

"But I vibe coded a SaaS for my uncle's business and it works! It would have cost me 3k and 1 month to develop, and I had it in a weekend with Claude Code!!"

That's great that you could help your uncle. You are right to take care of your human relations because they are soon the only thing that is going to matter.

I am speaking about decently sized projects (in terms of features) and projects that have users, where it sometimes becomes important to have a good architecture (even though you probably don't need a good architecture until very late).

These projects usually have a great load of legacy code, bad practices from the intern's code in 2018 (or from the Staff Eng who got hired for the vibes), and an infrastructure that requires an infra team to maintain, usually in a different place than the codebase. I don't have much experience in this, it might be solvable, but it seems really hard.

These are things that confuse LLMs. Whether you tell the LLM to use best practices, give it skills, or even explicitly tell it to avoid using useEffect, if the context of the discussion contains a lot of useEffects, the LLM will have an internal battle not to output useEffects. This fight costs thinking tokens and thus intelligence. These are the limits of the transformer architecture.

For these reasons, LLMs don't work well in legacy codebases and cannot be trusted.

Even Greenfield Projects Go Weird

But the solution is not only to rebuild everything from scratch. I built a new project using agentic coding tools. And yet, the AI agents ended up doing super bad stuff that could easily slip through agentic reviews:

It created redundant props in my schema for no reason. These could easily be inferred from the value of the connected attributes.

Of course, I could add a directive, which I already saw in a skill: "Do not create useless state, prefer deriving value from other state." But no matter what I add, it feels like one of two things will happen:

  • It will still find dumb stuff to do, probably sneakier and harder to find because it always seems right from far away.
  • Context will be rotten by a list of instructions that takes up too much space.

I found that using the /simplify skill from Cursor works surprisingly well, but:

  • It still misses stuff sometimes.
  • It becomes very expensive (ralph loop).

Code review is great, but:

  • Lots of false positives when asking agents to review.
  • Expensive when using specific tools like CodeRabbit or Bugbot.

So Is It Just Bad, No?

In the same way that meta in League of Legends is hard to balance for beginners and pros, post-training LLMs is hard to balance for vibecoders and for people who care about maintainability and code quality.

As an LLM company, you probably have the dilemma of optimizing for:

  • Vibe-coding tools that make code with lots of features, even if they are not polished, super reliable, or prod-ready.
  • Senior engineers who want elegant code.

Having the LLMs trained to be only smart and not on lots of data could work to solve this problem. That sounds like AGI, and it should not happen soon.

The other option is to have coding models specialized for one or the other.