Where it fails

Be honest about the limits, because this is where unsupervised vibe coding burns people:

Deep, novel logic. Subtle algorithms, concurrency, anything where being 95% right means being broken. The model is confident even when wrong.
Sprawling cross-file changes. It loses the thread across a large codebase and quietly breaks things it can't see.
Security and money. Auth, payments, permissions, anything where a plausible-looking bug has real consequences. Review these like your job depends on it, because it might.
Underspecified problems. If you don't know what correct looks like, the AI can't read your mind. Garbage spec in, garbage code out.

The failure mode is almost always the same: code that looks right, runs on the happy path, and is wrong in a way you'd have caught if you'd read it. The fix is the review step. There is no skipping it.

Two questions place any task on this map — is the pattern common, and is the result cheap to check:

                  EASY TO VERIFY      HARD TO VERIFY
                ┌──────────────────┬──────────────────┐
   COMMON       │   PURE UPSIDE    │   go slower,     │
   PATTERN      │   (scaffold,     │   demand tests   │
                │    boilerplate)  │                  │
                ├──────────────────┼──────────────────┤
   RARE /       │   read it, but   │   DANGER ZONE    │
   NOVEL        │   doable         │   (novel logic,  │
                │                  │    auth, money)  │
                └──────────────────┴──────────────────┘

Notice the inverse of the rule above: these are the cases that are rare, subtle, or expensive to verify. The pattern is uncommon, so the model is guessing; or the bug is invisible on the happy path, so a quick check won't catch it. That doesn't mean you avoid AI here — it means you slow down, shrink the steps, demand tests, and read every line like it's hostile.

Where it fails

Want it offline?