The numbers came back in three passes.
The backtest, run against historical resolution data with mid-price fills, returned +18.4%. The first paper-trade pass, still using mid-price assumptions but against a live order-book feed, returned +14.7%. The second paper-trade pass, this time charging realistic spread costs at the price a real taker would actually pay, returned −3.1%. Average spread cost: 21.5% of position size on a daily binary market.
The bot was shelved the same week. Zero live capital cleared settlement. The full story is in the case study. What follows is the principle the project actually proved.
What a backtest measures
A backtest run at mid-price is a statement about the signal. It is not a statement about the business. The business is the signal minus every cost the world charges to act on it: spread, fees, latency, slippage, partial fills. None of those costs are visible to a backtest that fills at mid-price. All of them are paid in real life.
The +18.4% I had in the spreadsheet wasn’t the business. It was the upper bound of what the strategy could plausibly have returned if execution costs had been zero. They weren’t zero. The bot proved that for me before any of my money was at risk.
The thing I needed wasn’t a better backtest. It was a paper-trade layer that would charge the real costs honestly. Then, and only then, could I tell whether the signal had any business surviving execution.
The gate that did the work
The paper-trade layer was the first piece of the project I built, before any live-trading code. Live execution was structurally gated behind it: a position could not be opened with real capital until the paper layer logged a statistically meaningful positive return against honest execution assumptions.
The fill simulator walked the order book the same way a real taker order would. Fees, minimum tick, partial fills — modeled. Latency between decision and submission — modeled. The ledger, the settlement loop, and the reconciliation logic were exercised against paper fills for weeks before any of them would have been trusted with money.
When the honest-spread run came in negative, the gate did its job. No override. No “one more backtest.” No tuning the cost model lower until the result improved. The bot was shelved the same week.
That’s the actual deliverable from this project. Not a running strategy — a gating discipline that now gets applied to every strategy idea before a line of live-trading code is written.
What the rule looks like
Three sub-principles, in order:
Build the gate before the live code. The gating layer is the first thing in the repo, not the last. If the gate doesn’t exist yet, the live-execution code doesn’t exist yet either.
Model the costs honestly. Whatever the real-world cost is — spread, latency, fee, partial fill, rate limit, recipient suppression — the simulation has to charge it. A simulation that’s been run too kindly is propaganda, not evidence.
Decide pass/fail criteria up front. Before the simulation runs, write down what “this is good enough to go live” looks like, in numbers. Once the simulation runs, no overrides. Failed gate, finished strategy.
The forbidden move, the one that ends the discipline if you allow it once, is “let me run one more backtest.” Once you re-tune the parameters because the gate failed, you’ve stopped doing science and started doing storytelling. The next strategy you ship will be the one whose story was best, not the one whose signal was real.
The pattern beyond trading
The same shape applies anywhere the cost of being wrong is real and the cost of finding out you were wrong scales with how long you waited to check.
A new outbound campaign before scaling beyond a manual test, where “real cost” is sender reputation and brand damage if the offer doesn’t work — see the next post in this thread.
A new pricing model before the next contract negotiation, where “real cost” is foregone revenue if the price is too low and lost deals if it’s too high.
A new automated decision system before it touches customers, where “real cost” is whatever it costs to take an action against a customer that you’d later need to apologise for or refund.
In each case, the shape is the same: build the gate first, model the costs honestly, decide pass/fail criteria up front, no overrides. The rule isn’t specific to trading. Trading just happens to be the domain that punishes the lack of it most quickly and most legibly.
What the project actually shipped
A trading bot that didn’t run. Zero live capital lost. One durable rule: model the spread first, and if the spread eats the edge, the strategy is finished. Not iterated, not optimised — finished.
That rule has saved more time on subsequent projects than the bot ever could have made in the market. The +18.4% in the backtest was a number. The discipline of refusing to act on it is the deliverable.