~/object-code

The Thing That Isn't There

Every line in the diff checks out, and the change is still wrong — because the bug is the permission check that was never written, and you can't review an absence. It has no surface to land your eye on. The answer is the oldest acceptance skill there is: externalize what the work was supposed to do, and walk it item by item.

Listen:

Synthesized from the research corpus's Comprehension Without Authorship domain (the omission problem, mental models, reading practices) and the Acceptance Rituals editorial-process and inspection-and-audit lenses. The single most-guarded claim in the corpus lives here — the rate at which review misses an absence — and it is carried as data-not-found, on purpose. Source-reviewed, fact-reviewed, and gap-reviewed before publication.

You can’t review an absence. It has no surface.

Every line in the diff checks out. The naming is clean, the logic reads, the tests are green, and you sign off — and the change is still wrong, because the bug isn’t a line that’s there. It’s the permission check that was never written. The retry that handles the timeout but not the second timeout. The branch for the empty list that the function will, eventually, be handed. None of it is on the screen. You were reading the code that exists, and the defect is in the code that doesn’t.


Why the eye slides off it

Start with the part that isn’t about agents at all, because the weakness is older than the tools and it’s in you.

Review is decent at catching the wrong thing present and near-blind to the right thing missing. The direction of that asymmetry is overdetermined — it shows up in the defect taxonomy, in cognitive psychology, in how humans physically locate a known bug on a page — and I’m going to walk each leg, because the conclusion I’m not going to draw from them is the one the argument most wants.

The defect taxonomy has carried the distinction for decades. The standard schemes split a defect by whether something was done wrong or something was left out — “Missing — due to an omission” versus “Incorrect — due to a commission” — and the requirements-fault lineage out of Maryland leads with Missing Information, then subdivides omission into missing functionality, performance, environment, interface. Absence isn’t an edge case in the catalog. It’s a top-level category with its own branches. And it’s a heavily populated one: Robert Glass, gathering the field’s durable findings in Facts and Fallacies of Software Engineering, put “roughly 35 percent of software defects” as emerging “from missing logic paths” (Glass, via Hatton’s reproduction). A third of what breaks was never typed.

Then the cognitive leg, which is the one I find hardest to argue my way out of, because it’s not about code. People are bad at noticing what isn’t there. The effect has a name — omission neglect — and it’s been replicated enough to clear the bar that most psychology findings now fail: “people often fail to detect the absence of important missing information,” the research runs, because “missing information is not very salient or attention drawing” (Kardes et al., via Skeptical Inquirer). Its cousin is the feature-positive effect: we learn a rule far faster from the presence of a cue than from its absence, and the effect is so robust it’s been “observed for humans, pigeons, rats, cats, and monkeys” (same source). We are, across species, built to attend to what’s in front of us. An absence puts nothing in front of us. There’s nothing for the attention to catch on.

And the on-code leg, which is where I have to be most careful, because it’s the one most likely to be misread into a number it cannot support. Fry and Weimer ran a study — 65 participants, 1,830 judgments — and found humans “over five times more accurate at locating ‘extra statements’ than ‘missing statements’” (Fry & Weimer, ICSM 2010). Five times. A striking figure pointing the right direction — so be exact about what it measures. Their subjects were told a bug existed and asked to find where it lived in a specific range of lines. That’s localization — pinning a known-present fault — not review. It’s a floor under the problem, not a measurement of it. “Over five times” is how much harder it is to point at a missing statement once you already know one is there. It is not, and I will not let it become, a rate at which a reviewer misses an absence they were never told about.

Which brings me to the number I owe you and don’t have.


The number that isn’t there either

The honest thing to say is that there is no measured rate for this.

The clean statistic the whole section is reaching for — reviewers catch X percent of omissions versus Y percent of commissions — does not exist in the literature. No controlled study has put reviewers in front of code, unprompted, and measured how often they notice a wrong line versus a missing one as a side-by-side detection rate. I went looking. It isn’t there. The direction is one of the best-supported claims in this whole project; the magnitude is data-not-found, and I’d rather hand you that gap than a borrowed figure dressed up as this one.

There’s a small irony I’ll take rather than dodge: a piece about the danger of absences arrives missing the number it would most like to cite. Good. That’s the discipline the rest of the series keeps promising, applied to its own weakest spot. If I papered the gap with the Fry-and-Weimer figure — let “five times harder to localize” quietly drift into “review misses absences five times as often” — I’d be doing the exact thing this piece warns against: filling a hole so smoothly the reader can’t see it was ever there. The hole stays visible. We know review is far better at the wrong-thing-present than the right-thing-missing. We do not know by how much, and anyone who tells you the figure is selling you something.


What agents leave out, specifically

Now the agent, because the old human blindness meets a new and weirdly predictable supplier of the thing it’s blind to.

Agents have a characteristic way of leaving things out, and it isn’t random. When Dou and colleagues sorted the errors of seven frontier models, the single largest category was missing checks for corner cases — 54.4 percent of errors on one benchmark, 30.1 percent on another (Dou et al., 2024, arXiv 2407.06153). The code was, in their words, “functionally correct” and then “fail[ed] to handle edge cases.” That is the omission category from the defect taxonomy, reproduced at scale by the generator. The machine writes the path that works and quietly skips the guard for the path that doesn’t.

I have to flag those two numbers hard, because they’re the perishable part. They’re benchmark scores — HumanEval+ and MBPP+, on a slate of models that are already being superseded as I write this — not a measurement of agent output in a real repository under real requirements. The figures will move. The model lineup will turn over. What I expect to outlast the figures is the shape: the agent’s strongest tendency is to omit the handling for the case nobody put in front of it, and that shape maps precisely onto the oldest, hardest-to-catch defect class we have. The blindness in you and the bias in the tool are aimed at the same blind spot. The number is a snapshot; the alignment is the finding.

So the situation at handback is the worst version of an old problem. The flaws most likely to matter are absences. Your attention is structurally poor at absences. And the thing producing the code has a documented lean toward exactly those absences, dressed in clean syntax that gives your already-weak eye nothing to snag on.

Reading harder does not fix this. You cannot stare more intently at a blank space. The fix is to stop trying to see the absence and start checking against a list of what should be present — which is a different instrument entirely, and two trades older than software have built their whole practice around it.


A worn cast-iron editor's copy spike standing on a graphite bench with several impaled paper slips skewered on its steel needle, and one blank slip lying flat beside the base, never spiked — lit by a single cool inspection lamp.

The first master: the fact-checking desk

The fact-checking desk's spike. What's impaled has been checked; the slip lying beside it — the source never called — is the absence the desk was built to hunt.

The magazine fact-checking desk is the closest another profession comes to your problem at handback. A person who didn’t write the piece is made personally responsible, before it ships, for confirming every checkable claim in it — working not from independent knowledge but from the writer’s own source material. Substitute “diff” for “piece” and that’s the acceptance seat.

The desk’s method is the part worth stealing. Peter Canby, who ran the checking department at The New Yorker, described it as reporting in reverse (WNYC, On the Media):

We take pieces apart and put them back together again. We talk to the writer’s sources. And we also try to get to the people who were mentioned in the story who maybe the reporter didn’t even speak to.

Read it again, because every clause is a technique and the last is the whole instrument. We take pieces apart and put them back together again — the check is a disassembly, not a read-through; you rebuild the piece from its parts and see whether they add up. We talk to the writer’s sources — verification goes around the artifact to the material behind it, never through its own confident surface. And then the clause that does the real work for an absence: the people the reporter didn’t even speak to. The checker isn’t vetting what made it onto the page; she’s hunting the source who should be on the page and isn’t. Swap the diff for the piece, and the people the reporter didn’t even speak to becomes the permission check the agent didn’t even write — the guard that no amount of reading the page will surface.

And here is exactly where it breaks, which is the part that earns its place in a piece about absence. In 2014 Rolling Stone published “A Rape on Campus,” and it collapsed, and the Columbia Journalism School review of the failure reads like a specification for the omission problem. The atomized facts had been checked. What hadn’t been checked was the un-atomizable core — the central narrative that didn’t decompose into discrete, traceable claims — and the people the story accused, who were never properly contacted. The review found a failure of journalistic standards in “not making contact with the people they were publishing derogatory information about” (Columbia Journalism School review, via Wikipedia summary). The desk confirmed every present atom and was blind to the absence: the call that was never made, the perspective that was never in the piece. Even when the checker did notice that a source “had not been interviewed, but … the article had pretended he had been,” the concern was overridden.

That’s the warning inside the technique, and it’s the same warning the agent diff carries. Atomized checking confirms the atoms. It says nothing about the atom that should have been there and isn’t. A green test suite is an atomized check — every assertion that exists, passing — and it is exactly as silent about the missing permission check as the desk was about the call no one placed.


A worn metal audit specimen tray with a grid of round sample slots, most holding a small steel sample disc but one slot near the center conspicuously empty and dark — the sample that was never drawn — on a graphite bench under a cool inspection lamp.

The second master: the audit sample

The audit sample. Sampling tests what's in the population; the empty slot is the transaction that was never recorded, and so never could be drawn.

The auditor faces the same wall from the other side and names it without flinching.

An audit cannot re-examine every transaction in a large company — so it samples, applying procedures “to less than 100% of items within a population … such that all sampling units have a chance of selection” (ISA 530 §5(a)). And the profession is unsentimental about what sampling can’t do. It names the failure mode it can’t escape — the risk of accepting a balance as fairly stated when it is in fact materially misstated (PCAOB AS 2315.12). Sampling tests the items that are in the population. It is, structurally, a present-item method.

Which means it is blind to the one thing that defines the worst frauds: the transaction that was never recorded. You cannot sample your way to a liability that isn’t on the books, because it isn’t in the population to be drawn. Wirecard is the case that makes it concrete — €1.9 billion that didn’t exist, on accounts the auditors “verified” by relying on screenshots and documents the company itself supplied (Fortune, 2020). The sample of recorded items could be impeccable and still tell you nothing about the money that was never there. The absence wasn’t in the sample because absences never are.

So the auditor’s answer to “what might be missing?” is never “look harder at the pile.” It’s “decide in advance which absences would matter most, and go hunt those specifically” — a discipline the next break in this series takes apart in full. Here, the point is narrower and sharper: a sample drawn from what’s present is the wrong instrument for finding what was never recorded, no matter how dense you make it.

Both masters land in the same place, and it’s the place the cognitive science already told us we’d have to go. You cannot detect an absence by inspecting the artifact, because the artifact is precisely where the absence isn’t. You detect it by holding a model of what should be present and checking the work against that model, item by item.


A worn steel clipboard holding a checklist, each line carrying a hand-struck tick in its box but one box near the bottom left empty and unticked, with a small struck mark in the corner — on a graphite bench under a cool inspection lamp.

The answer: externalize the intent, and walk it

The list is the surface. An absence has none of its own, so you give it one — and walk each item: is this present? Show me the line.

The technique has an unglamorous name, and the unglamorousness is the whole problem with adopting it. It’s traceability. You take the things the change was supposed to do — the requirements, the cases, the guards the intent implied — and you put them on a checkable artifact outside the diff, and you walk each one: is this present? where? show me the line. You convert the unanswerable question — “is anything missing?” — into a series of answerable ones: “does each item on my list appear in the code?” An absence has no surface, so you give it one. The list is the surface.

The inspection literature has the one real number here, and it travels with a chaperone. Porter’s work at Maryland found that defect-class-targeted scenario reading — reviewers each hunting a specific class of fault, including one scenario aimed squarely at “Missing or Ambiguous Functionalities” — beat ad hoc and generic-checklist reading by “an improvement of about 35%” (Porter, Votta & Basili, via Maryland handout). But I have to carry two things with that figure or not carry it at all. First, it’s a requirements-document review result — reading a spec, not a code diff — and nobody has reproduced it as a code-acceptance number. Second, it’s contested on its own turf: the related claim that perspective diversity drives the gain failed to replicate, and the Verification Economics side of this research reads the evidence as showing the win comes from targeting a defect class, not from structured reading as such. A generic checklist, in Porter’s own results, was “no more effective than Ad Hoc” (Porter et al. fault study). So the honest version of the finding is narrow: aiming a reader at a specific class of absence helps; ritual structure for its own sake does not. The 35 percent is real, requirements-shaped, and disputed — which is exactly as much weight as I’ll put on it.

And here’s why the thesis doesn’t lean on that number anyway. Strip Porter out entirely and the answer still stands, because it doesn’t rest on his statistic. It rests on the part that is solid: review is structurally poor at absences (the cognitive mechanism is replicated, the defect class is real and large), and the only way anyone has ever caught an absence reliably is to externalize the intended items and check each one against the work. The fact-checking desk does it. The auditor does it. The number is a bonus; the instrument is the point.

I know how this lands on a senior, because it landed on me the same way. Requirement-by-requirement traceability sounds like the most junior work there is — the clipboard, the checklist, the thing you graduated out of two decades ago. It feels slow. It feels manual. It feels beneath the seat. That reflex is the trap, and watching the auditor and the fact-checker cured me of it. Neither of them treats “what’s missing?” as the menial last step. They treat it as the first question — the one the whole apparatus is built to answer, the one the most experienced person in the room owns. Tracing intent to artifact isn’t the apprentice’s chore. It’s the master technique for the one defect class your eye can’t see and the machine most reliably produces. The slowness is the cost of looking at the only place the bug could be hiding: the empty space the diff doesn’t draw.


The surface lied about what’s present; this break is the surface that can’t show what’s absent. The next one is about volume — there’s a size past which review stops working by construction, and the agent hands you that size on demand. That’s You Can’t Read It All, and the answer is the discipline auditors and aircraft inspectors built for exactly this: decide in advance, and in writing, what gets read in full and what gets sampled. (For the break before this one — why your at-a-glance read was reading the author, not the code — see The Surface Lies.)