Reviewing Output at Scale

Here's the honest problem: when agents are writing hundreds of lines across several tasks, you cannot read every line the way you'd review a ten-line diff. Pretending you can leads to either rubber-stamping (you approve without looking) or paralysis (you never trust anything). Neither ships.

The way through is to gate on a few layers instead of reading everything:

Tests are the floor. No output merges without the relevant tests green and the type-checker clean. This is non-negotiable and it's automatic — the machine does this part.
Spot-review the risky parts. You can't read all of it, so read the parts where blast radius is high: anything touching auth, money, data deletion, or external calls. Skim the rest for the red flags from the building-features chapter — files it shouldn't have touched, things quietly deleted, a tiny ask that produced a huge diff.
Read the diff, not the codebase. git diff --stat tells you what changed and how much in seconds. A task scoped to one module that shows changes in six is a signal to look harder, before you read a single line.
Run it. A green test suite is not a working app. Click the thing. The final check is always the same one from every other chapter: does it actually do what you asked, in front of you?

Scoping pays off twice here. A task you scoped tightly produces a diff you can actually review; a task you set loose produces one you can't. The review problem is mostly solved upstream, in the brief.

Reviewing Output at Scale

Want it offline?