The adversarial engine

Three LLMs argue your thesis in a structured courtroom, drawing on 241 corpus precedents. A prosecutor presents the strongest evidence-based case for failure. A defense attorney argues structural differences from known failures. A judge weighs both and renders a verdict. The corpus selects the charge — the LLM narrates it. No single model controls the outcome.

Why three models, not one

Single-model risk assessment gravitates to the most common failure mode in its training data. A model asked "what are the risks?" will almost always answer "competitive displacement and reimbursement" because those appear most frequently — not because they are structurally relevant to your specific thesis.

The adversarial format forces specificity. The prosecution is constrained to charges backed by corpus evidence for your sub-vertical. The defense is seeded with structural differences — the surviving companies that faced similar conditions. The judge cannot simply agree with whoever sounds more confident.

Three independent model families (OpenAI, Google Gemini, Anthropic) ensure genuine adversarial independence. Same-family models produce correlated outputs that mimic debate without genuine opposition.

Corpus-led prosecution

The LLM does not select the prosecution charge. The structural matcher identifies consensus failure nodes deterministically — it computes which failure patterns appear most frequently among corpus entries that structurally resemble your thesis. The top node becomes the mandatory primary charge.

This is the core architectural principle: the corpus classifies, the LLM narrates. The prosecutor receives the charge from the matcher and argues it persuasively — it cannot override the structural evidence with a more rhetorically convenient failure mode.

After the LLM generates its output, an enforcement layer checks that the charge matches the corpus-ranked primary node. If the LLM drifted to a secondary node, the primary charge is substituted programmatically. The model narrates, never classifies.

How the judge works

The judge receives two constrained inputs: the prosecution case (already validated against the corpus charge list) and the defense evidence (drawn from structural survivors in the same sub-vertical). The judge's role is reconciliation, not independent analysis.

The verdict (KILLED, PASSED WITH CONDITIONS, PASSED) maps to a structural risk score range derived from Bayesian inference. The judge's narrative explains which structural factors tip the balance — but the Bayesian engine, not the judge's prose, determines the threshold classification. See how scoring works for the score mechanics.

Temperature and reproducibility

All three models run at temperature=0 with seed=42 on the OpenAI endpoint. The structural matcher is entirely deterministic — same thesis, same corpus, same output. This is a published methodology commitment: the same thesis submitted twice should produce structurally identical outputs.

In practice, determinism is constrained by the LLM provider's guarantee of reproducibility at temperature=0. Google Gemini and Anthropic do not expose seed parameters. The structural classification layer (matcher, Bayesian engine, verdict threshold) is fully deterministic regardless of LLM variance.

Architecture basis: The corpus-led prosecution architecture (corpus classifies, LLM narrates) was validated against a dual backtest on 15 sub-verticals. The adversarial format reduces single-model frequency bias — the prosecution charge is constrained to structural nodes with ≥3 matching corpus entries in the relevant sub-vertical. Nodes with thin precedent are flagged as insufficient.

Related methodology

How scoring works →How the corpus is built →Failure patterns →

Run a stress test →