Agent Frameworks & Infrastructure
Tombstone
Production intelligence layer for 5,000+ feature flags.
1 min read · 214 words
Tombstone is a self-hosted production intelligence layer for feature flags at scale — built to answer the question that every SRE asks at 2am but no flag system answers: which flag caused this incident, and can I roll it back safely right now?
At its core, Tombstone treats flags as causal agents in a live production system, not boolean configuration. It combines an 8-service polyglot backend (Go for performance, Python for ML, TypeScript for the management UI) with a circuit-breaker auto-rollback engine, a causal dependency graph for "What Changed?" incident correlation, and a Merkle-linked audit trail connected to Sigstore Rekor for SOC2-grade immutability.
- Circuit-breaker auto-rollback — 5%+ error rate over 100 requests in 10s auto-disables the flag; no human in the loop
- Blast-radius gating — BLOCKED / HIGH / MEDIUM / LOW tiers; BLOCKED changes require a 10-char justification
- 3-model ensemble anomaly detection — Z-score + Isolation Forest + EWMA with 2/3 vote, eliminating false positives
- Thompson Sampling + LinUCB bandit for ML-driven rollout recommendations
- Causal dependency graph — Redis sorted sets (O(log n) updates), daily rebuild at 02:00 UTC
- Merkle audit chains — SHA-256 coverage of every state transition, Rekor transparency log submission
- WASM evaluation engine (
@flagmind/eval) — zero-dependency, runs on Cloudflare Workers
Inspired by Knight Capital's $440M flag incident (2012). 289 commits, 8 services, full Kubernetes operator.