Question 1

Why not just ask the AI if the feature works?

Accepted Answer

Because the agent's idea of 'looks done' is documented to be wider than yours. Anthropic's own best-practices guide for Claude Code says: 'Claude stops when the work looks done.' The ladder is the procedure for closing that gap when you cannot read the code yourself.

Question 2

Do I need to know how to code to run the ladder?

Accepted Answer

No. Four of the five rungs require only that you can click around your own product and read a list of filenames. The fifth rung asks the agent to show you evidence in a shape you can read (a screenshot, an output paste, a line of code), so you can see something is missing without having to write the missing thing yourself.

Question 3

How is this different from running tests?

Accepted Answer

Tests are automated, written by engineers, and scoped to the code an engineer thought to test. The verification ladder is manual, end-to-end, and run by the person who owns the product. The two catch different classes of bug. The ladder is especially good at regressions in flows nobody thought to write a test for, which is most of them in a non-engineer-built app.

Verification ladder

Raft definition

Questions

Keep reading