Fixes#2159. The Stop hook emits feedback via
`hookSpecificOutput: {hookEventName: "Stop", additionalContext}`, but
`Stop`/`SubagentStop` are NOT members of CC's `hookSpecificOutput`
discriminated union (coreSchemas.ts — valid members are PreToolUse,
PostToolUse, UserPromptSubmit, SessionStart, etc.). So the emitted shape
violates CC's documented hook-output schema.
Impact is CC-version-dependent — important nuance, established empirically:
- Reporter (array0224-cloud) on CLI 2.1.150 / 2.1.152: CC rejects the
Stop feedback; the block/reason never reaches the model, so the
auto-rewake/fix loop is lost. (Detection still runs + logs.)
- On CLI 2.1.160 (current) the asyncRewake completion path is lenient:
its gate is `isSyncHookJSONOutput` (hooks.ts) which is just
`!(json.async === true)` — NOT a strict schema parse. So the invalid
hookSpecificOutput is tolerated: metrics + rewakeSummary are still
consumed and the model still receives the findings. I could NOT
reproduce the rejection on 2.1.160, and BQ confirms Stop-path
vulns_found metrics are recorded normally (~21k with-vuln fires / 3d),
i.e. NOT dropped. (An earlier draft of this message claimed metrics
were dropped — that was wrong; corrected after checking telemetry +
repro'ing the old plugin on 2.1.160.)
So this is defensive schema-correctness: the plugin should emit output
that conforms to CC's documented union regardless of how strictly a given
CC version validates it. The reporter's environment validates strictly;
relying on the current version's leniency is fragile.
Fix (CC's documented asyncRewake "clean pattern" — hooks.ts: "error text
on stderr, JSON on stdout"):
- For Stop/SubagentStop, emit_metrics writes guidance to stderr (the
asyncRewake body channel CC delivers via `stderr || stdout`) and sets
top-level `decision: "block"` + `reason` (valid SyncHookJSONOutput
fields; also the documented sync Stop-hook contract for the `-p`
fallback). It does NOT emit a Stop hookSpecificOutput.
- PostToolUse (commit-review, push-sweep) is unchanged — valid union
member, keeps the modern hookSpecificOutput protocol.
Verified locally on macOS Python 3.13:
- py_compile clean.
- 11 new tests (test_2159_stop_hook_schema.py) pin the contract: Stop
output carries no hookSpecificOutput, uses top-level decision/reason,
writes guidance to stderr; the emitted hookEventName (when present) is
a valid union member. 2 existing tests that asserted the buggy
Stop->hookSpecificOutput shape were corrected. Full suite 464/464
pass + 2 skipped.
- END-TO-END in /tmux on CLI 2.1.160:
* FIXED plugin (2.0.3): staged pickle.loads + os.system, benign edit
pulls the file into review_set; Stop LLM review found 2 critical
vulns; CC delivered a clean rewake ("Background security review
found issues" + both findings). All hooks (UPS, PostToolUse[Edit]
pattern, PostToolUse[Bash] commit-review, Stop) fired clean; zero
schema rejections / errors / http_err in the debug log.
* OLD plugin (2.0.2) on the SAME 2.1.160: also delivered Stop feedback
(confirming the no-repro-on-latest finding above) — which proves the
fix carries NO regression risk on current CC while making the output
robust for the stricter versions where it actually breaks.
Version bumped 2.0.2 -> 2.0.3 per the per-PR-bump policy (#2114: a bump is
the only way the fix reaches the existing fleet — relevant for users still
on the CC versions where this breaks).
Closes#2159.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sagemaker-ai was dropped from the marketplace in #1762 (validate-plugins
adoption) due to a manifest/YAML error. AWS has since fixed it; the plugin
validates clean at awslabs/agent-plugins@187edde (claude plugin validate passes).
Re-listed as a git-subdir source SHA-pinned to current monorepo HEAD,
matching its sibling AWS entries (deploy-on-aws, databases-on-aws).
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both plugins in awslabs/agent-plugins had their subpaths edited in commit
187edde (after the morning bump cron pinned them to f16aaf2a), so they fell
behind again on merge. Manual catch-up bump to current monorepo HEAD.
- databases-on-aws: 4 files changed under plugins/databases-on-aws/ (v1.1.0)
- deploy-on-aws: 7 files changed under plugins/deploy-on-aws/ (v1.2.0)
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR #2112's telemetry visibility surfaced an immediate finding from
the first 3h of v2.0.1 data: **2,406 phase=2 / err=99 sessions** —
"venv stage / uncategorized" — dominating BUILD_FAILED. The original
err_kind detection patterns were all pip-flavored (pip_no_match,
dns_fail, ssl_verify, etc.) and didn't catch venv-creation failure
modes, so they all collapsed to the catch-all _uncategorized (99)
bucket.
This PR fills the gap on two axes.
## 1. Five new venv-specific err_kind categories (codes 11-15)
Each gated on `err_phase == "venv"` so the same substring doesn't
mis-fire in pip-phase failures:
- 11 `venv_ensurepip_fail` — Debian/Ubuntu without python3-venv
installed; stderr matches "ensurepip is not available" or
"ensurepip ... returned non-zero". Predicted to be the biggest
chunk based on Linux distro market share.
- 12 `venv_path_too_long` — Windows MAX_PATH (260) or POSIX
ENAMETOOLONG. Triggered when state_dir + venv layout exceeds
the path limit (deep Lib/site-packages/<pkg>/<...> paths).
- 13 `venv_no_module` — `python3 -m venv` itself missing
("No module named 'venv'"). Rare but distinctive.
- 14 `venv_already_exists` — Errno 17 / "file exists" — sentinel
race past O_EXCL or stale dir survived `--clear`.
- 15 `venv_setup_failed` — generic "virtual environment was not
created successfully" catch-all for venv setup failures that
don't match a more specific category.
All 5 occupy reserved slots in SDK_BOOTSTRAP_ERR_CODES per the
APPEND-ONLY contract from PR #2112.
## 2. `sdk_bootstrap_stderr_sig` integer hash
For "other:<tail>" err_kinds (which encode to _uncategorized = 99),
emit a bounded integer hash (0-999) of the first ~30 chars of the
stderr tail. This restores cardinality to the _uncategorized bucket
in BQ aggregation without unbounded keyspace — same stderr message
always maps to the same bucket, so a real failure mode replicating
across thousands of machines clusters cleanly. Bounded at 1000
buckets: well below any "high cardinality" alarm but wide enough to
distinguish ~30 distinct dominant patterns (birthday-paradox
collision probability ~50% at ~37 distinct inputs).
The field auto-omits (`if sig:` gate) when err_kind is categorized
— no key-budget cost on the common-case categorized failures.
## Version bump 2.0.1 → 2.0.2
PR #2114 confirmed the version-bump mechanism is the only way to
propagate code changes to the existing fleet — without a bump, CC's
plugin updater short-circuits on string-equality of installation
version vs marketplace version. Following the policy we established:
**bump patch on every functional PR**.
By 17:31:42Z on 2026-06-01 (1m22s after #2114 merged), v2.0.1 was
already appearing in BQ. v2.0.2 should follow the same propagation
curve — ~30% adoption within 3 hours, full convergence within a few
days.
## Verified locally
- py_compile clean.
- 15 new tests in test_venv_failure_deepdive.py (added to internal
test suite at sg-staging/tests/, not in this PR):
* 5 parametrized: each new err_kind maps to its expected code (11-15).
* 1 APPEND-ONLY regression: existing codes 1-10 + 99 unchanged.
* 6 stderr_sig: non-other inputs → 0; None/empty → 0; deterministic
same-input → same-output; bounded to 0-999; distinct inputs →
distinct hashes (5/5 with P(collision) ≈ 1%); leading-chars focus
(path-varying stderr with shared 30-char prefix collide as designed).
* 1 static-shape catcher: every new `err_kind = "venv_..."` branch
in main() is guarded by `err_phase == "venv"`. Catches the
regression where someone adds a venv pattern without the phase
gate and starts mis-categorizing pip-phase failures.
* 1 map-coverage: all err_kind strings assigned anywhere in
ensure_agent_sdk.main() are present in SDK_BOOTSTRAP_ERR_CODES
(catches new categories added in code but forgotten in the map).
* 1 emit-shape: the metric block uses `_encode_stderr_sig`, the
`sdk_bootstrap_stderr_sig` key is written conditionally on `if
sig:`. Catches the regression where someone removes the
helper or makes the emit unconditional (would pad every
categorized BUILD_FAILED row with a zero-valued field).
- Full suite: 452/452 pass + 2 skipped (live API tests, opt-in).
## What this unblocks in BQ
```sql
-- For the 2,406 sessions/3h that were phase=2/err=99 on v2.0.1,
-- v2.0.2+ will split them across the new categories. Query:
SELECT
CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap_err") AS INT64) AS err,
CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap_stderr_sig") AS INT64) AS sig,
COUNT(*) AS sessions
FROM `proj-product-data-nhme.raw_events.claude_code_internal_event`
WHERE _PARTITIONTIME >= ...
AND CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap") AS INT64) = 3
AND CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap_phase") AS INT64) = 2 -- venv
GROUP BY err, sig
ORDER BY sessions DESC
```
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>