Reviewing Output at Scale
Here's the honest problem: when agents are writing hundreds of lines across several tasks, you cannot read every line the way you'd review a ten-line diff. Pretending you can leads to either rubber-stamping (you approve without looking) or paralysis (you never trust anything). Neither ships.
The way through is to gate on a few layers instead of reading everything:
- Tests are the floor. No agent output merges without the relevant tests green and the type-checker clean. This is non-negotiable and it's automatic — the machine does this part.
- Spot-review the risky parts. You can't read all of it, so read the parts where blast radius is high: anything touching auth, money, data deletion, or external calls. Skim the rest for the red flags from the building-features chapter — files it shouldn't have touched, things quietly deleted, a tiny ask that produced a huge diff.
- Read the diff, not the codebase.
git diff --stattells you what changed and how much in seconds. A task scoped to one module that shows changes in six is a signal to look harder, before you read a single line. - Run it. A green test suite is not a working app. Click the thing. The final check is always the same one from every other chapter: does it actually do what you asked, in front of you?
Scoping pays off twice here. A task you scoped tightly produces a diff you can actually review; a task you set loose produces one you can't. The review problem is mostly solved upstream, in the brief.