Where it fails
Be honest about the limits, because this is where unsupervised vibe coding burns people:
- Deep, novel logic. Subtle algorithms, concurrency, anything where being 95% right means being broken. The model is confident even when wrong.
- Sprawling cross-file changes. It loses the thread across a large codebase and quietly breaks things it can't see.
- Security and money. Auth, payments, permissions, anything where a plausible-looking bug has real consequences. Review these like your job depends on it, because it might.
- Underspecified problems. If you don't know what correct looks like, the AI can't read your mind. Garbage spec in, garbage code out.
The failure mode is almost always the same: code that looks right, runs on the happy path, and is wrong in a way you'd have caught if you'd read it. The fix is the review step. There is no skipping it.
Two questions place any task on this map — is the pattern common, and is the result cheap to check:
EASY TO VERIFY HARD TO VERIFY
┌──────────────────┬──────────────────┐
COMMON │ PURE UPSIDE │ go slower, │
PATTERN │ (scaffold, │ demand tests │
│ boilerplate) │ │
├──────────────────┼──────────────────┤
RARE / │ read it, but │ DANGER ZONE │
NOVEL │ doable │ (novel logic, │
│ │ auth, money) │
└──────────────────┴──────────────────┘
Notice the inverse of the rule above: these are the cases that are rare, subtle, or expensive to verify. The pattern is uncommon, so the model is guessing; or the bug is invisible on the happy path, so a quick check won't catch it. That doesn't mean you avoid AI here — it means you slow down, shrink the steps, demand tests, and read every line like it's hostile.