← Back to blog

Agentic Programming on Legacy Codebases

Using your favourite LLM to build a flashy product is all the rage these days. But most software in production has been live for decades, and a good chunk of it is held together by a shoestring and a prayer. This is where the usual agentic flows fall apart, because these systems expect a predictable baseline on which to do work. Each new session you open has a tiny fraction of the codebase in its memory, meaning the nasty stuff is never considered. Meanwhile, tutorials assume a clean repo, a modern stack, and a test suite. Legacy code offers none of those. We have spent three years pointing coding agents at these systems, and almost everything below comes from a specific moment where we got it wrong first.

You Cannot Skip the Boring Setup

When diving into a new codebase, it's really tempting to fire up claude and take on a bug backlog like a pro, to show off to the rest of the team that you know what's up. But really, take a step back, spend a week or so doing some auditing. Ask the team questions, then ask your AI the same questions. Map out the repository, figure out where the nasty tech debt is concentrated, explicitly single out large files as off limits for your agents (unless they absolutely must have a peek), and document the hell out of everything.

A few years ago, we inherited a .NET ERP. One solution file, close to fifty csproj directories hanging off it. I did the lazy thing. One prompt, "how can we improve this." The agent thought for a while, I waited, and it came back with a plan that looked solid. I almost clapped.

Then I read it. It hadn't read five files before starting to hallucinate, each one over three thousand lines long. What it had no way of knowing was that half those projects were abandoned, and others were carrying one or two live files between them. So the plan was confident, detailed, and completely fictional. It described a system that was not there.

That is the lesson, and it is boring on purpose. Spend real time on research and documentation before the agent changes anything, so it does not wander into dead code and build its worldview out of it. Give it somewhere to keep that knowledge, separate from the code:

/your-cool-legacy-project
  /context-docs
  /future-refactoring-plans
  /agent-instructions
  /actual-repo

The file map lives in context-docs, and it goes into every prompt afterward. Expect to run the mapping pass more than once before it is accurate. Skip this and every later task starts from a hallucinated model of the system, and you will not notice until the output is already broken.

One more thing while you are in there. Legacy repos are full of secrets, a database password in a committed config, an API key in a comment. You should probably manually search for those. Claude will happily post that to headquarters, and I bet you five sheep and a goat that they will get leaked some day. Once you've identified them, scrub them from the repo, or at the very least, exclude those files from your agent's reach.

The Codebase Has No Style, And Agents Don't Lead by Example

This is the failure that cost us the most, and it is the one nobody warns you about.

Legacy code has no single style. It has five, or fifteen. You have the original code, the rewrite that was abandoned halfway, the contractor's stuff, the panic patches, the intern's hot new feature that isn't used anymore because it's too buggy, and whatever the last person to quit left behind. The antipatterns are usually not isolated either.

One system we took on had an Angular frontend doing its best impression of React, and a Java backend with a perfectly good ORM sitting next to nothing but raw SQL. The worst part was the tenancy. It was a multi-tenant app, and the check that kept one tenant out of another's data lived inside each individual query, copied by hand, instead of in one central place. Miss it once and you have a breach.

Point an agent at that with a plain "do this" and it does exactly what you tell it to, but with the style of the neighbouring code. More raw SQL. Another fully custom styled button that is slightly different than the last one. Another hand-copied tenant check, or worse, a forgotten one. It photocopies the mistakes that were never cleaned up, and now the tech debt is bigger than when you started. The tempting response is "code is cheap, who cares." It bites you later, usually in the part of the system you were watching least.

So what's the big deal, just tell Claude to be a good boy and only do things the "good way" and you're done, right? Well, not so easy. There isn't always a single "good way" of doing something, there are usually many correct approaches to good code, and proponents of one will usually say all others aren't great. Agents will formulate their opinion on what the best pattern is each time you initialize a new context window, so sometimes you'll see one approach, other times another.

The fix is not subtle. Tell the agent, explicitly and every time, what good looks like in this repo, and that the surrounding code is not the reference. "Match the existing style" is the default behaviour, and "Follow best practices" is too ambiguous. We keep a short list of examples from the codebase, file by file, in agent-instructions, and reference it as the holy grail. Then, when a new part of the code is fixed up, that part is added to the list of examples to use moving forward.

One Audit Is Not an Audit

A customer showed up with a Vue and PHP stack that was, to put it kindly, wide open. It sat on a bare EC2 instance, nothing in front of it. We spent two days scanning with Claude and a few other tools, hunting for holes. We found them. Lots of them. We patched everything in a few hours, shipped one big safety deploy, to much fanfare.

Then, mostly for laughs, we ran the whole scan again after the deploy. There it was, more critical stuff, just as bad, that the first pass had walked straight past.

Here is what I think happens: the model wants to write you an article. It drives me nuts, I hate it, but it is what it is. You get a clean report with two criticals, three highs, two mediums, five lows and a conclusion. It looks like a professional report, and looking the part is what the model was trained to produce. The real shape of a legacy system is closer to fifteen times each of those numbers, with more dimensions than a tidy list allows. Some issues are obscure enough that nobody would ever think to look, and bad enough to end the company if someone did. Others are virtually inoffensive, but they're so visible that it actively drives customers away daily. Some issues are only an issue in specific contexts that the agent simply assumes aren't the case (but they might be).

Treat audits as a recurring job, not a one time event. Run them on a schedule, run them again after every batch of fixes, and assume the list is always shorter than the truth. The goal is not one perfect report, it is coverage that piles up over time. While we're at it, it's tempting to share your agent's alarmism for highly critical security problems it finds. But it's important to challenge your agent's findings to see if it really is a problem.

Refactoring Without Tests Guarantees you a Revert

We let an agent clean up a tangled order-processing chain of events. It tidied the logic, simplified a loop, and passed its own check. It also dropped an undocumented rule that skipped tax on a specific customer type. Nobody had written that rule down. There was no test. We found out when the numbers came back wrong, in production, a week later.

Legacy code is full of behaviour nobody documented, and the agent will optimize it away because it cannot see why it matters. Before any refactor, get characterization tests around the code. You will need to understand what's happening, as getting an agent to blindly write tests will leave you in the same situation, but you don't need to write them yourself. An agent can knock out more tests than I've written in my career in one sitting, so take advantage of that.

Remember to Throw an Agent at the Git History

One particular project we had the pleasure of working on followed the git flow in reverse. They had a development, staging and master branch just like every other company. Except, they worked off master. Then all changes were cherry-picked into the other two branches twice a week. Understanding how it got to this point with a human mind is a bit perplexing, so we used Claude to figure it out. There is no lesson here, just a generally good tip. For bonus points, we included in the prompt that it had to sound like a soap opera.

You can get someone to tell you how things got to where they did, but part of what you'll hear is usually a regurgitation of what someone else told that person. A company does not get to a situation like this one in a few months, it takes generations of employees dealing with fun problems to get to this point. This means generations of git commits containing deep company lore that you just can't live without, but also can't easily read by yourself. It's tempting to point at a funny bug and laugh, but there's likely a very compelling reason it's there in the first place.

Pick Your Rewrites and Your Do-Not-Touches First

The mistake people make is treating the whole system as equally changeable. It is not.

Some things are baked in. The data model is the obvious one. Every report, integration, and stored procedure depends on it, and an agent that decides to "improve" a column name will take down six things you did not know were connected. We mark these as do-not-touch up front, in the agent's guidelines, and we mean it. Touching them is a project, not a prompt.

Other things are quick wins, and you want those early. On one engagement the backend was terrifying, but the admin UI was just dated. We had the agent modernize the interface first. Low risk, nothing structural, and the client could suddenly see progress. That bought us the trust and the runway to go near the scary parts later. Momentum matters on legacy work, because most of the real changes are invisible for weeks. Spend an early win where someone can actually see it.

So

  • Map it before you trust it
  • Tell the agent what good looks like, or it copies the worst code in the room
  • Audit on repeat
  • Test before you refactor, and leave the weird code alone until you know why it is weird
  • Decide what is sacred and what is a quick win before you start

None of this is exotic. The thing to internalize is that the agent will not do the doubting for you. It raises alarms where nothing is wrong, and it accepts you and the code in front of it without a word of pushback, unless you tell it to argue. The skepticism has to come from you. The legacy systems were here before the agents, and they will outlive this blog post too.