How the failure corpus is built

241 healthcare startups in this corpus were verified dead using a 5-signal composite: domain status, LinkedIn employee dispersal, Crunchbase closed flag, press coverage, and funding cessation. Each failure is then Gioia-coded with a 3-level causal chain — first-order observations, second-order themes, and a structural failure node. Minimum 3 signals required for inclusion.

Gioia coding methodology

Each corpus entry is coded using the Gioia three-level qualitative methodology, adapted from Gioia, Corley & Hamilton (2013) for structural failure analysis. The three levels:

Level 1 — First-order codes

Raw observations from primary sources: founder post-mortems, investor write-ups, press coverage, SEC filings, SBIR award databases. Direct claims and factual statements only.

Level 2 — Second-order themes

First-order codes grouped into structural patterns through constant comparison. A pattern of "buyers demanded proof we couldn't produce" codes to clinical validation gap. A pattern of "sales cycles exceeded runway" codes to CAC payback crisis.

Level 3 — Aggregate dimensions (13 nodes)

Themes mapped to one of 13 structural failure nodes. Each entry records the full Gioia chain for auditability. The taxonomy is append-only — new nodes require documented anchor entries and dual validation.

Coding sessions are date-stamped and linked to primary source URLs. Any entry whose primary source becomes unavailable is flagged for re-verification at the next audit cycle. The audit log is maintained in version control alongside the corpus.

Entries are tagged with provenance: human_validated, auto_coded_verified, or auto_coded_unverified. Bayesian weights: 1.0× / 1.0× / 0.7× respectively.

Coding Consistency

Classification consistency is assessed using a synthetic inter-rater reliability protocol: the same 30 thesis cases are run through the structural classification pipeline twice and the primary node assignments are compared using Cohen's κ.

Method

Synthetic IRR — pipeline-vs-pipeline comparison (two independent classification runs on the same input). The structural matcher and pre-classifier contain no LLM calls and are deterministically reproducible at temperature 0. This is not human-vs-human coding agreement.

Metric

Cohen's κ on primary node agreement across 30 held-out thesis cases spanning ≥8 sub-verticals.

Target

κ ≥ 0.75 (substantial agreement). Marginal: 0.60–0.74. Fail: <0.60.

Framework alignment

Methodology adapted from Cantamessa et al. (2018) startup failure taxonomy approach, which applies structured qualitative coding with inter-rater verification to failure mode classification.

IRR results are logged to data/irr-log.json and re-run after every corpus expansion of ≥20 cases.

Coverage Distribution

Shannon entropy measures how broadly failure cases are distributed across the 13 taxonomy nodes. It is a measure of coverage breadth — not coding accuracy, analytical validity, or predictive power.

A corpus with high entropy covers the failure landscape broadly (no single pattern dominates). A corpus with low entropy has cases concentrated in a few nodes — useful for that sub-domain, but not representative of the full failure space. Current Shannon entropy ratio: 0.87 (87% of theoretical maximum for 13 nodes). Directionally consistent with CB Insights 2024 failure category coverage.

Per-node case count — all 13 failure nodes

Failure NodeCase CountEvidence Tier
Clinical Validation Gap108Robust (≥25)
Competitive Displacement74Robust (≥25)
Reimbursement Dependency54Robust (≥25)
CAC Payback Crisis41Robust (≥25)
Regulatory Scope Creep31Robust (≥25)
Buyer Fragmentation27Robust (≥25)
Channel Mismatch22Developing (10–24)
Preclinical Capital Exhaustion19Developing (10–24)
Workflow Disruption18Developing (10–24)
Pivot Exhaustion15Developing (10–24)
Preclinical Translation Failure10Developing (10–24)
Integration Hell9Thin (<10)
Capability Misrepresentation2Thin (<10)

Evidence tiers: Robust ≥25 cases, Developing 10–24, Thin <10. The engine notes thin-node evidence on results pages. Thin nodes do not prevent analysis — they widen the uncertainty band.

Death verification — 5-signal composite

A company is not included as a confirmed failure unless at least 3 of 5 signals are present:

  1. Domain expired or redirects to a parking page
  2. LinkedIn headcount dropped to zero or shows dispersal to other employers
  3. Crunchbase or PitchBook status marked as "closed" or "defunct"
  4. Press coverage of shutdown, acquisition failure, or pivot away from original mission
  5. No funding activity for ≥24 months with no product activity

No living companies appear in the failure corpus. Active companies, struggling companies, and acqui-hires with positive outcomes are excluded from all failure calculations.

Quality controls

Node concentration ceiling (25%)

No single failure node may exceed 25% of all confirmed failures. This prevents the corpus from over-representing any one structural pattern. A pre-commit hook blocks corpus changes that would violate this ceiling.

Template placeholder guard

Pre-commit hooks block any corpus entry containing unresolved template strings ({{ }}). Every field must be a real value before an entry can be committed.

Discovery agent verification

Automated discovery agents (SBIR Phase I, EDGAR, press archives) feed a staging pipeline. Every staged entry is reviewed before merging into the live corpus. The pipeline never auto-merges.

What is NOT in the corpus

Scope limitations are part of the methodology. The following are explicitly excluded:

  • UK/NHS dynamics: The corpus is US-focused. UK NHS procurement structures, commissioning pathways, and NICE approval processes differ structurally from US payer dynamics.

  • Active companies: No company currently operating appears in the failure corpus, regardless of financial health.

  • Pivoted companies: Companies that pivoted to an entirely different market are excluded. The original thesis is coded, not the pivot.

  • Non-structural failures: Fraud, deliberate misrepresentation, and founder misconduct are excluded. These are non-structural causes with no generalizable pattern.

  • Thin sub-verticals: Sub-verticals with fewer than 5 confirmed failures produce THIN evidence depth classifications. Healthcare B2B (Other) is the most saturated sub-vertical with 33 entries.

Corpus summary: 241 confirmed failures · 15 sub-verticals · 13 structural failure nodes · 5-signal death verification · Gioia three-level coding · 25% node concentration ceiling enforced at commit time. Data sources: SBIR Phase I awards, Crunchbase, PitchBook, SEC EDGAR, Rock Health, CB Insights, founder interviews, press archives.

Related methodology

Failure patterns →The adversarial engine →How scoring works →
Run a stress test →