ABC Corp is a fictional company. Every name, number and date is invented. This is a reference artifact generated with an LLM coding agent; the brief that produces it is at the bottom of this page.
The duplicates were caught by the scheduled reconciliation check before the dispatch window opened: no customer ever saw a duplicate invoice and no customer communication was required.
Primary orders database fails over to the standby after a storage fault; application connections reset for 41 seconds.
batch-billing retry wrapper re-runs the invoice-generation stage for segments 7–12 after connection errors. The retry carries no idempotency key.
Run resumes against the new primary; re-inserted drafts for segments 7–12 are written as new rows. Duplicates begin accumulating silently.
Batch run completes with status SUCCESS. Row-count delta vs the previous night is +4.1%, inside the alerting threshold, so no alert fires.
Scheduled morning reconciliation check starts, comparing draft invoices against source orders.
Reconciliation flags 1,842 drafts sharing an order_ref with an existing draft from the same run. Pager alert sent to billing on-call.
On-call (M. Tanaka) acknowledges, confirms the pattern is run-wide, declares SEV-2 and opens INC-2417.
billing-ops and order-gateway on-call (J. Lindqvist) join the bridge. Outbound invoice dispatch queue paused as a precaution; dispatch window opens at 07:00, so customer exposure is zero while paused.
Duplicates traced to the 02:15 retry of segments 7–12 following the database failover. Duplicate set fully identified by (order_ref, run_id) pairs.
Cleanup script dry-run against a staging copy of the run output: 1,842 drafts matched, 0 false positives. Script and match-list peer-reviewed on the bridge.
Duplicates soft-deleted in production. Reconciliation re-run completes clean: draft count matches source orders exactly.
Dispatch queue resumed, well before the 07:00 window. All-clear posted to the incident channel; INC-2417 moved to monitoring, closed after the 07:00 dispatch ran clean.
# batch-billing · application log · 3 Jun 2026 (fictional excerpt) 02:14:52 ERROR db.pool connection reset by peer (orders-primary); 38 conns dropped 02:14:55 WARN retry.wrapper stage=invoice-generation attempt=1 failed: ConnectionError 02:15:09 INFO db.pool reconnected to orders-primary (now: standby-2, post-failover) 02:15:10 WARN retry.wrapper stage=invoice-generation attempt=2 starting (segments 7-12) 02:15:10 WARN retry.wrapper no idempotency key configured for stage; re-running whole stage 02:58:41 INFO run.controller run 2026-06-03T02:00 finished status=SUCCESS rows=46,217 (+4.1%)
# reconciliation-check · alert log · 3 Jun 2026 (fictional excerpt) 04:36:02 ALERT recon.invoice 1,842 draft invoices share order_ref with another draft 04:36:02 ALERT recon.invoice affected segments: 7,8,9,10,11,12 · run_id=2026-06-03T02:00 04:36:03 INFO recon.invoice dispatch queue state: HELD until 07:00 window · no drafts sent 04:36:04 INFO pager paging billing on-call · policy=sev-undetermined
Root cause: a stage-level retry without an idempotency key, executed across a database failover: a process gap (no idempotency standard for retried writes), not an individual error.
| Action | Owner | Due | Status | |
|---|---|---|---|---|
Add idempotency key (run_id, order_ref) to invoice writes; re-inserts become no-ops (batch-billing#498). |
M. Tanaka | 12 Jun 2026 | DONE | |
| Change the retry wrapper to refuse automatic retry of stages not marked idempotent; such stages now halt for operator decision. | J. Lindqvist | 10 Jun 2026 | DONE | |
| Add mid-run duplicate-reference alerting so duplicates page on-call within minutes, not at the 04:30 reconciliation. | R. Okafor | 11 Jun 2026 | DONE | |
| Run a failure drill: trigger a database failover during a full staging batch run and verify zero duplicates end-to-end. | S. de Wit | 03 Jul 2026 | OPEN | |
| Add "retried writes must be idempotent" to the engineering standards and the design-review checklist; sweep existing batch jobs for the same pattern. | A. Kowalczyk | 10 Jul 2026 | OPEN |
No customer impact. All 1,842 duplicate drafts were detected and removed before the 07:00 dispatch window opened. Zero duplicate invoices were sent, no customer data was disclosed, and no customer notification or remediation was required. Financial records were reconciled to source before close of the run.
| Control | Type | Outcome |
|---|---|---|
| Idempotency on retried writes | Preventive | FAILED — absent; root cause |
| Row-count delta alerting | Detective | MISSED — +4.1% sat inside threshold |
| Scheduled reconciliation check | Detective | WORKED — caught all 1,842 |
| Dispatch-window hold | Containment | WORKED — zero customer exposure |
3 Jun 2026) — included above(order_ref, run_id) — 1,842 rows, peer-reviewed on the bridgebatch-billing#498INC-2417LH-731)This review examined process, not people. The retry wrapper behaved exactly as designed, the on-call response was fast and careful, and the reconciliation safety net did its job; the gap was a missing standard, an unowned backlog item, and an alert that only ran once a day. Those are system properties, and the remediation list above fixes the system. No individual action contributed to this incident, and none of the follow-ups are directed at a person.
From this incident channel export and my timeline notes [paste], build a single-file HTML post-mortem for INC-2417: impact summary cards, a minute-by-minute timeline filterable by phase (detect/escalate/mitigate/ resolve), log excerpts, a five-whys root cause chain, and a remediation checklist with owners and due dates. Blameless tone. One file; it becomes the archived incident record.
Paste the brief into any capable LLM: GPT, Claude, Gemini, Grok, DeepSeek, or the assistant your company
provides. Iterate a few rounds on layout and content until it reads well. Save the final answer as a
.html file and open it in any browser. Expect similar output, not identical: every model has its
own taste, and that is fine.
This reference artifact was built with Claude Code, an LLM coding agent, over several iterations. Treat it as the bar to aim for, not as a guaranteed first answer. All data on this page is fictional.