Gonzalo Bilune · 2026-05-03

A house you didn't build

More and more teams are delegating code review to an agent. Before accepting the shift, it's worth pausing on a more basic question: what was code review, done between humans, actually doing all these years.

More and more teams are stepping away from code review as we knew it. Human-to-human review disappears: an agent opens the PR, another agent reviews it, automated tests run against every change, and everything gets validated in staging environments before hitting production. Put that way, it sounds reasonable. Now that agents generate code at a speed no human can match, the traditional PR ceremony, with its author, human reviewer, comments, and feedback rounds, looks like a bottleneck inherited from a time when writing code was the most expensive problem to solve.

And it probably was. But before accepting the shift or rejecting it outright, it's worth pausing on a more basic question almost nobody is asking: what was code review actually doing all these years. Because what we're leaving behind may not be exactly what we think we're leaving behind.

How to manage PRs across different projects doesn't have a single answer, it depends on context. On a POC with two people coding it's reasonable to approve on the spot, because the cost of coordination outweighs the cost of an occasional bug and both have the whole system in their heads. On a long-running project with six devs, months of scope, and a product in production, the PR dynamic carries different weight. There are pieces that depend on each other, decisions that hold over time, people rotating in and out. That difference is what makes it worth pausing to think about what we're eliminating when we eliminate code review on projects of the second kind.

If you ask anyone what code review is for, almost everyone will say the same thing: to guarantee quality. Another pair of eyes on the change, keeping broken things from shipping, holding a minimum bar of cleanliness.

But the PR always did quite a bit more than that. The change description and the review comments ended up becoming, without anyone planning it, implicit documentation of the technical decision. But something even deeper and more invisible was happening: the PR sustained a shared mental map of the system. Just having code pass through more than one pair of eyes before merging was enough for a collective picture of what was built, where, and how, to assemble itself as a silent byproduct.

When a human wrote every line, that mental map came for free. Building the system and understanding it were the same activity. Reviewing a teammate's PR forced you to read their code, and reading it taught you not just what they had done but how they had solved the problem, which pieces of the system they had touched, and what decisions they had made along the way. The team ended up with a reasonably current picture of what was built, without anyone having to sit down to produce that picture on purpose, and that let the technical and product conversations have enough resolution to be useful.

The agent breaks that coupling. You throw it a requirement, hit enter five times, and a feature gets merged that maybe even the person who shipped it didn't fully read. The code exists and works, but the team's mental map, the one that used to update as a natural byproduct of the work, no longer updates on its own. PRs go out faster, features ship in less time, velocity metrics improve. The only thing lost is something we weren't measuring and were taking for granted.

You're still the owner of the project, but you walk into the code and it's a house you didn't build. There's furniture you didn't buy and walls where doors used to be, and even though everything works, you don't know which drawer the cutlery is in. You get lost in your own house.

This loss doesn't show up on every project. On a small one, a POC, or a weekend experiment, it's not a serious problem. If everything breaks you toss it and start over, and speed wins over the mental model because there's nothing to defend. What's built is disposable, and therefore the knowledge about what's built is too.

Where the problem starts to hurt is on long-running projects, with large teams, with customers that depend on the product, and with features built on top of previous features. That shared mental map is, on projects like these, what lets you make product decisions with resolution. When someone proposes a new feature, you need to know what it impacts, what it breaks, and what it enables, and for that you need to have loaded in your head not just what exists but how it works, why it was decided that way, and what was discarded along the way. If nobody on the team has that map loaded, you end up arguing in the abstract, talking about the notifications module or the onboarding part without any real clarity of what's inside. And the conversation floats, because nobody understands the system well enough to evaluate anything being proposed.

What was until recently a typical problem for juniors just joining the team, who took weeks to build that mental map, can now be the problem of the entire team, including the seniors who have been on the project for years.

The picture gets even more complicated when product is separated from engineering. On teams where the two areas operate as distinct functions, someone defines a feature in a product document, a PM translates it into technical specs, a dev receives that spec and passes it to the agent, the agent codes it without the dev reading it carefully, and the feature ends up merged in production.

Resolution is lost at every layer, and that's not new. The broken telephone between product and engineering has existed for as long as the two disciplines have been separated. What is new is that there used to be a layer that closed the loop at the end of the chain: the dev wrote the code and the dev understood what they were writing, and if at any point something didn't match the original intent it was the dev who noticed, raised their hand, and forced a conversation. That layer today is much weaker. The dev doesn't write, the dev doesn't read in depth, and the agent doesn't detect the drift because understanding the original intent isn't its job, it isn't even in a position to know what would count as drift.

The original feature had an intent that product understood well, but by the time it reached the code that intent had passed through four human translators and an agent that filled in the gaps as best it could. If you ask the team that built it how it really works, they won't know how to answer. And if you ask what was discarded, what was considered and rejected, what decisions were made along the way, the answer will be even poorer, because none of that is even in the code the agent wrote.

And then the question that matters appears: if the PR is no longer the moment where the team stays aligned, where is?

One possibility is to move alignment pre-code, to the technical discussion before kickoff or to the spec itself. Sounds logical: if the problem is that the team doesn't know what's being built, we fix it by talking before building. But there you hit a limit that's hard to jump. The spec captures intent, not learning. The surprises you run into while building, the things that turned out to be harder than expected, the paths you tried and discarded, none of that can be written before you start because it hasn't happened yet. And that, in the past, was captured in the PR description and in the review comments, in implicit documentation that generated itself as the work progressed. Today, if nobody writes that part deliberately, it simply evaporates.

Another possibility is to move alignment post-deploy, to demos, observability, or integration tests. This does work for something, but only for part of the problem. It helps you verify the system's behavior, to confirm that what was built does what it was supposed to do. It doesn't help the team understand how what they see working is built, nor which decisions led to that build, nor which pieces can be reused and which can't. Verifying behavior is one thing, maintaining a map of the system is another, and they're easily confused because from the outside both look like being aligned.

The third possibility is to accept that the shared mental map no longer holds with a single ceremony and will never hold that way again. You have to assemble several, each smaller than the original code review and each covering a portion of what the PR covered without naming it. Something like weekly demos that show what's new, living ADRs that record the underlying decisions as they get made, occasional walkthroughs of the system in which someone who didn't build a thing listens to it and questions it, moments of pair-thinking before important changes. A multi-layered human infrastructure to replace the single layer that used to be enough on its own. It's a solution that costs more, that requires discipline, and that will probably collide with the speed pressure that motivated delegating it to an agent in the first place. But maybe it's inevitable.

Underneath it all, what's happening is that humans are going to be more and more responsible for the what and less and less for the how. The machine will write the code, but the decision of what gets built, why, and whether what was built makes sense is still ours. And as long as there's a human in the process, someone who has to answer for the system, we'll have to design incentives and rituals that force the team not to lose the picture of what they're building.