- Add Apache 2.0 LICENSE file
- Register plugin in marketplace.json
- Run prettier (prose-wrap=always, 80 cols) over all plugin markdown
- Simplify model tier naming in reference docs
🏠 Remote-Dev: homespace
2.9 KiB
Model Tier Defaults
Parameters scale with model capability. Budget is not the constraint — the constraints are diminishing returns (more voters stop helping past a point) and the asymmetric noise floor (Haiku verifiers are individually less reliable, so the right response is width not depth).
Haiku
Width compensates for per-sample noise. Scaffolding is where the leverage is.
- Parallel solvers: 12 (wide fan — each individual solve is weaker, so cast a wider net)
- Vote budget: 7 verifiers, need 5-confirm / 3-refute (pigeonhole exit: stop when outcome decided)
- Abstain threshold: 3 consecutive revise cycles fail
- Pattern sweep: all 12 patterns — Haiku can follow a checklist, the patterns are the scaffold
- Presentation pass: yes, 3 drafts, comparator picks cleanest. Haiku's raw output is rougher, so this matters MORE not less.
- Rationale: The skill's value is highest where the base model is weakest. Give Haiku the full harness. The 3-refute threshold (higher than Sonnet's 2) accounts for Haiku verifiers being individually noisier — don't let 2 confused Haikus kill a correct proof.
Sonnet
Balanced.
- Parallel solvers: 6
- Vote budget: 5 verifiers, need 4-confirm / 2-refute
- Abstain threshold: 3 consecutive revise cycles fail
- Pattern sweep: all 12
- Presentation pass: 2 drafts, comparator picks cleaner
- Rationale: 4-of-5 tolerates one flake. 2 dissents is signal.
Opus
Depth. Each sample is strong, so invest in making the adversarial pass harder.
- Parallel solvers: 4
- Vote budget: 5 general verifiers (4-confirm / 2-refute) PLUS one dedicated
verifier per pattern in
verifier_patterns.md(12 targeted attacks). Any pattern-specific HOLE FOUND counts toward refute. - Abstain threshold: 5 consecutive revise cycles fail (trust the model's ability to eventually fix)
- Pattern sweep: all 12, each with its own dedicated agent
- Presentation pass: 3 drafts with different instructions ("most elegant," "most elementary," "shortest"), comparator picks the best. Strong models can genuinely produce different styles of proof.
- Rationale: Opus can execute the deep patterns (#19 base-vs-derived, #22 mean-first) that need real mathematical judgment. The 12 dedicated pattern passes are where the model's capability is best spent — it's the difference between "be skeptical" and "check THIS specific thing."
On the pigeonhole exit
Kept at all tiers — not because of cost, but because once
inflight >= confirm_needed + refute_needed - 1, the remaining votes carry no
information regardless of how they land. Launching them anyway is pure latency.
Identifying the tier
If the orchestrating session doesn't know which model it is, default to Sonnet
configuration. A reasonable heuristic: ask the model to self-identify in its
first response and match against haiku/sonnet/opus in the output.