# Model Tier Defaults Parameters scale with model capability. Budget is not the constraint — the constraints are diminishing returns (more voters stop helping past a point) and the asymmetric noise floor (Haiku verifiers are individually less reliable, so the right response is width not depth). ## Haiku Width compensates for per-sample noise. Scaffolding is where the leverage is. - **Parallel solvers**: 12 (wide fan — each individual solve is weaker, so cast a wider net) - **Vote budget**: 7 verifiers, need 5-confirm / 3-refute (pigeonhole exit: stop when outcome decided) - **Abstain threshold**: 3 consecutive revise cycles fail - **Pattern sweep**: all 12 patterns — Haiku can follow a checklist, the patterns are the scaffold - **Presentation pass**: yes, 3 drafts, comparator picks cleanest. Haiku's raw output is rougher, so this matters MORE not less. - **Rationale**: The skill's value is highest where the base model is weakest. Give Haiku the full harness. The 3-refute threshold (higher than Sonnet's 2) accounts for Haiku verifiers being individually noisier — don't let 2 confused Haikus kill a correct proof. ## Sonnet Balanced. - **Parallel solvers**: 6 - **Vote budget**: 5 verifiers, need 4-confirm / 2-refute - **Abstain threshold**: 3 consecutive revise cycles fail - **Pattern sweep**: all 12 - **Presentation pass**: 2 drafts, comparator picks cleaner - **Rationale**: 4-of-5 tolerates one flake. 2 dissents is signal. ## Opus Depth. Each sample is strong, so invest in making the adversarial pass harder. - **Parallel solvers**: 4 - **Vote budget**: 5 general verifiers (4-confirm / 2-refute) PLUS one dedicated verifier per pattern in `verifier_patterns.md` (12 targeted attacks). Any pattern-specific HOLE FOUND counts toward refute. - **Abstain threshold**: 5 consecutive revise cycles fail (trust the model's ability to eventually fix) - **Pattern sweep**: all 12, each with its own dedicated agent - **Presentation pass**: 3 drafts with different instructions ("most elegant," "most elementary," "shortest"), comparator picks the best. Strong models can genuinely produce different _styles_ of proof. - **Rationale**: Opus can execute the deep patterns (#19 base-vs-derived, #22 mean-first) that need real mathematical judgment. The 12 dedicated pattern passes are where the model's capability is best spent — it's the difference between "be skeptical" and "check THIS specific thing." ## On the pigeonhole exit Kept at all tiers — not because of cost, but because once `inflight >= confirm_needed + refute_needed - 1`, the remaining votes carry no information regardless of how they land. Launching them anyway is pure latency. ## Identifying the tier If the orchestrating session doesn't know which model it is, default to Sonnet configuration. A reasonable heuristic: ask the model to self-identify in its first response and match against `haiku`/`sonnet`/`opus` in the output.