Most AI projects fail before the first line of code is written. Not because the model is wrong, but because the planning step that would have caught the failure never got run.
I know this because I’ve shipped enough of them to recognize the pattern. The idea arrives. The editor opens. The first ambiguity shows up within an hour and gets fudged. The second arrives within two and gets fudged again. By week one, what’s running is whatever the unconscious choices added up to, and the project’s actual definition lives only in my head. Some of it survives a code review. Most of it doesn’t survive a question from someone three months later.
What follows is the six-step process I now run before writing a line of code on any new system. None of the steps are clever. All of them are deliberate. Each one exists because skipping it has cost me real time on real projects, and writing it down was the cheapest way to make sure the same category of cost didn’t show up twice.
The six steps
1. Research — what already exists
I start with open-ended exploration of prior art, vendor options, and ecosystem context. A web-grounded research session in Perplexity, GitHub code search for similar implementations, primary vendor documentation. The output is a short written summary of what’s out there and which 80%-fit thing is worth adopting, porting, or wrapping. The point isn’t to be exhaustive. The point is to refuse to start from scratch when something within reach already solves most of the problem.
This step also names the things not to build. Most projects have at least one component that looks worth writing and turns out, on inspection, to already exist as a stable open-source library. The cost of finding that out before the build is an hour. The cost of finding out after is whatever the build cost.
2. Whitepaper — synthesize the brief
Then I write a single document, in plain language. What this is. Who it’s for. What it must do. What it must not do. What success looks like, in numbers. What failure looks like, in numbers. What’s in scope. What’s deliberately out of scope.
The whitepaper is the project’s contract with itself. I refine it until it survives a third reading without revisions, which usually means the third reading happens a day after the second. Writing it forces every assumption that would otherwise live in my head into a written artifact that can be challenged.
3. Refinement — second eyes on the brief
The whitepaper goes through a different chat surface — a different model, ideally — for stress-testing. The prompt is simple: read this brief, name the ambiguities, name the assumptions that look defensible but aren’t, name what’s missing.
This step exists because the model that drafted the brief shares my blind spots. A different model, given only the document, will catch what the drafting model missed. The output is a punch list of edits. The whitepaper gets tighter or the project gets killed. Both are acceptable outcomes.
4. CLAUDE.md — anchor the context
I drop the locked-down brief into the project folder as a CLAUDE.md (or whatever convention the implementing agent reads). Every subsequent agent invocation starts with the full project context loaded — the same scope, the same constraints, the same definitions of done.
This step costs nothing. It exists because skipping it produces the most common build-time failure: the implementing agent forgets, mid-project, what was decided about something the brief covered explicitly. With the brief loaded as context on every invocation, that failure becomes structurally impossible.
5. What-if audit — external red-team
A separate agent, given only the whitepaper, gets asked to break it. What fails first. What’s missing. Which dependency could be pulled and bring the whole thing down. What part of the plan looks defensible from the inside but won’t survive contact with reality.
The output is a second punch list. Some of it goes into the whitepaper as additions. Some of it kills the project. Most of it lands in between — small, specific, easy to dismiss in the moment, expensive to dismiss in retrospect.
6. Pre-mortem — imagine it shipped and failed
Last step before any code. I imagine the project has shipped, run for six weeks, and is now being shut down as a failure. I walk through every plausible cause. For each, I decide what — in the brief, in the architecture, in the schedule — would have prevented it. Then I bake the prevention in now, not after the actual postmortem.
The pre-mortem is the cheapest insurance I’ve ever bought. It runs in an hour. It surfaces the failure modes that the optimism of building hides from the planner. The most expensive bug I’ve ever shipped — a paper-trade gating layer that, until it was modeled honestly, was telling me a strategy was profitable when realistic spreads said it was a money-loser — got caught only because the gating layer existed. I wrote about that one in the trading-bot case study. The pre-mortem is the same idea, applied to the project itself rather than to one component.
Why these six and not others
These steps aren’t chosen because they’re rigorous in the abstract. They’re chosen because each one catches a specific category of failure I’ve been bitten by before:
- Step 1 catches the I rebuilt something that already existed failure.
- Steps 2 and 3 catch the we never agreed on what this was supposed to do failure.
- Step 4 catches the the implementing agent forgot half of what was decided failure.
- Step 5 catches the we never tested the plan against an outside read failure.
- Step 6 catches the we never asked what would actually kill this failure.
Someone with different scars would calibrate differently. The discipline isn’t running these specific six steps. The discipline is running a small, deliberate set of steps that together force every category of pre-build failure into a written artifact before any code is allowed.
What the artifacts cost
Total time, for a project of any reasonable scope: an evening to a weekend. The whitepaper takes the longest because it’s where the actual thinking happens. Everything else is comparatively cheap — a research summary is a few hours, the CLAUDE.md is a copy-paste from the locked brief, the audit and the pre-mortem are an hour each with a different model than the one doing the build.
The cost of skipping any one of them, on a project of any reasonable scope, is the cost of a partial rebuild. I’ve paid that cost. The outbound campaign I rebuilt from a no-code stack was a system I planned this way after the fact, not before — and the eight production incidents documented in that case study are roughly evenly split between things the planning would have caught and things only running it could have surfaced. Half of those eight cost me real time that step 5 or step 6 would have saved.
What this post is the start of
This is the overview. The next two posts in this series go deep on the phases that carry the most weight:
- Part 2 — The context phase. Why the whitepaper does more than any of the other artifacts, and what it actually needs to contain to be useful as a contract rather than a wishlist.
- Part 3 — The stress-test phase. Why the what-if audit and the pre-mortem are the two steps most operators skip and most regret, and what good outputs look like for each.
The series can be read in any order. The discipline only works in this one.