What kept it honest in production
Architecture is what you ship. Maturity is what survives the first six months. The system above did not run cleanly out of the box. It ran cleanly after eight specific incidents were caught, diagnosed, and codified into rules that prevented their recurrence. The rules are the actual deliverable.
The follow-up engine fired on a Sunday once because the cron-level weekday guard had been added on the sender but not the sequencer. Twenty-some bumps went out at three in the afternoon to people who had not asked to hear from anyone over a weekend. The fix was not the cron line. The fix was a code-level weekday guard at the top of every script that touches outbound, on the principle that any business rule which spans multiple entry points has to be enforced in the language the scripts are written in, not in the schedule that runs them.
The performance sync started failing every four hours after a deliberate scale-up, because it was polling the send-engine events one message at a time. At small volumes, fine. At the new volume, it ran into the engine's per-tag rate limit, sat in exponential backoff for hours, blew through a stale Airtable pagination token, and tripped a fcntl lock on the next run. The fix was structural — replace per-message polling with a single bulk events query paginated by date, group by message ID in process, and reconcile in one pass. Steady-state load dropped by about an order of magnitude. The lock collisions disappeared. The lesson was that any per-record loop over a rate-limited API is a bomb on a long fuse the moment volume changes.
A research-API quota ran out one evening and the pipeline aborted partway through a batch. Half the leads got drafted, half did not, the sender dutifully tried to send the half that did, and the bounce-style empty subjects told the operator something was wrong only after he checked the next morning. The fix was a small set of error-message markers — `insufficient_quota`, `billing_hard_limit`, `invalid_api_key` — checked on every model call, with a hard system-exit on first match so the wrapping shell harness could surface the failure to Telegram immediately. Silent quota exhaustion is the worst kind of outage, because everything else looks fine.
The follow-up engine was, for a time, ignoring local business hours. The cron schedule was UTC. The recipients were in twenty time zones. The fix was a small country-to-IANA-timezone resolver and a per-lead `in_business_hours()` gate inserted before the LLM draft call, so a follow-up to a Singapore recipient at three in the morning their time would skip silently and re-evaluate on the next hourly tick. Off-hours sends do not just damage open rates; they cost real model tokens that should never have been spent.
Most recently, a recipient replied with a flat denial — "I don't work for that company." The reply turned out, after the same investigation he triggered was pointed at his record, to be a false alarm; he was correctly attributed and his note was scope-deflection rather than a data error. But pulling on it surfaced the broader problem his record represented. The pipeline had never compared the email's domain to the company URL on the record, and across the full list, almost two percent of records were attributed to companies the people in question did not work for. Fifteen dollars of language-model verification, four hundred and fifty-three confirmed-bad records pulled out of the active pipeline, and a three-tier verification gate — deterministic domain comparison, a corpus-frequency allowlist of corporate alt-domains, an LLM fallback for the residual — that now runs on every new lead before any draft is generated. The cost of verifying the input was small. The cost of sending into a wrong-attributed list was the part the operator did not want to keep paying.