mirror of
https://github.com/anthropics/claude-plugins-official.git
synced 2026-06-13 22:26:03 -03:00
10 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
4e56d19dd8
|
security-guidance: handle signal-killed venv builds (memory) + cooldown (2.0.5 → 2.0.6)
The real dominant Linux failure, identified by a CCR Linux repro.
A CCR container reproduced the production signature — non-zero exit +
EMPTY stdout + EMPTY stderr (~60k fires/day, 4,485 Linux users on 2.0.4):
running `python -m venv` under a tight memory limit (ulimit -v) kills the
memory-heavy venv+ensurepip/pip subprocess with SIGSEGV (-11, RLIMIT_AS)
or SIGKILL (-9, kernel OOM-killer) BEFORE it writes anything. This is
NOT the ensurepip/packaging case (that always writes to stderr, code 11)
and NOT fixable by --target (a --target pip install is also memory-heavy
and gets killed too). Three earlier hypotheses (stdout, packaging,
Option A fixes Linux) were wrong — the repro corrected them.
Changes:
- Detect the signal kill (rc<0, or 128+sig: 134/137/139) in the venv/pip
and --target paths → err_kind "signal_killed:<rc>" (new code 16). The
returncode rides in a new sdk_bootstrap_rc metric so prod confirms
which signal dominates (-9 OOM-killer vs -11 RLIMIT_AS).
- Cooldown: on a signal kill, write a marker and return the new
SKIP_COOLDOWN outcome (9) on subsequent sessions for 24h — stops the
retry storm (every session was re-attempting a build that just gets
re-killed, burning the user's memory/CPU). Retries once per window so a
machine that frees memory still recovers.
- --no-cache-dir on both pip installs (venv + --target) trims pip's peak
memory; may get marginal machines under the OOM threshold.
No happy-path change: signal detection is at the top of the existing
failure handler; cooldown is checked only after all no-op probes
(NOOP_SYSTEM/VENV/TARGET short-circuit first).
Verified locally on macOS Python 3.13:
- py_compile clean.
- 35 new tests (test_signal_kill_cooldown.py): _is_signal_kill across
signals/exit-codes, rc decode, signal_killed→code 16, cooldown
lifecycle (none→write→expire), and an integration flow — simulated
SIGKILL'd venv → BUILD_FAILED/signal_killed:-9 + cooldown written →
2nd run SKIP_COOLDOWN without re-attempting → retry after window;
non-signal failure does NOT cool down; --no-cache-dir present on both
pip paths; sdk_bootstrap_rc emitted conditionally.
- End-to-end harness: the full kill→categorize→cooldown→skip→retry
chain confirmed in-process.
The original CCR repro (ulimit -v ≤7000 KB → rc=-11, empty streams) is
the ground truth this fix is built on. Can be re-validated on CCR with the
same ulimit approach.
Version 2.0.5 -> 2.0.6 per the per-PR-bump policy (#2114).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
e7fe15d9ba
|
security-guidance: pip --target fallback when venv can't bootstrap pip (2.0.4 → 2.0.5)
Option A, the data-gated fix for venv_ensurepip_fail (#2154 follow-up). v2.0.4 telemetry made the call: of the venv_ensurepip_fail cohort, ~95% HAVE pip (sdk_has_pip=true) and run Python 3.11–3.14 — so it's not the Apple-3.9 problem; it's modern interpreters where `python -m venv` can't bootstrap pip (Debian python3-venv absent, or python.org/pyenv builds without ensurepip) but pip itself works. `pip install --target` needs only pip, so it recovers the agentic reviewer for them instead of degrading to pattern + single-shot review. Producer (ensure_agent_sdk.py): - New outcomes BUILT_TARGET=7, NOOP_TARGET=8; new phase pip_target=5. - _build_via_target(): `pip install --target <state>/agent-sdk-libs --upgrade --prefer-binary claude-agent-sdk`. Failures categorized via _pip_err_from_stderr (sibling of main()'s pip chain — kept separate to avoid disturbing the working venv categorizer); errno embedded for OSError-family exceptions. - _target_sdk_importable(): probes a prior target install → NOOP_TARGET. Dir-check short-circuits before any subprocess, and it's only reached when there's no working venv, so the 81% NOOP_VENV cohort never pays. - main() falls through to the target build ONLY on venv_ensurepip_fail; every other venv/pip failure stays terminal BUILD_FAILED. The sentinel is released before the target build so a retry isn't seen as SKIP_SENTINEL. Consumer (llm.py): - _inject_agent_sdk_venv_into_syspath() adds the flat agent-sdk-libs dir (packages sit directly in it, not under site-packages). The existing pywin32 .pth bootstrap applies (target installs don't run .pth either). No change to the happy path — the new branch is taken only on the ensurepip failure, and the extra candidate dir is a no-op when absent. Verified locally on macOS Python 3.13: - py_compile clean. - 30 new tests (test_venv_target_fallback.py): outcome/phase codes (append-only, 4 stays retired), _pip_err_from_stderr categories, _build_via_target success/CalledProcessError/timeout/exc+errno (mocked subprocess), _target_sdk_importable dir-short-circuit, main() wiring (ensurepip→target fallthrough + NOOP_TARGET probe + sentinel release), consumer adds the flat dir. Full suite 533/533 pass + 2 skipped. - END-TO-END harness (real install, simulated ensurepip failure): main() → BUILT_TARGET, target dir has claude_agent_sdk; 2nd run → NOOP_TARGET; consumer _inject → `import claude_agent_sdk` resolves FROM the --target dir. Full chain proven without needing a broken-ensurepip box. - Real `pip install --target` + import confirmed independently (exit 0, SDK imports from the flat layout). NOT validated in tmux: the ensurepip failure can't be reproduced on macOS (working ensurepip), so the fallback was proven via the real-install harness above instead. The happy path (NOOP_VENV / normal agentic review) is unchanged and covered by the existing hook-smoke suite. Version 2.0.4 -> 2.0.5 per the per-PR-bump policy (#2114). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
43fcf6d513
|
security-guidance: encode exception type + errno + ensurepip instrumentation for venv BUILD_FAILED (2.0.3 → 2.0.4)
Follow-up to #2154. v2.0.3 telemetry showed the venv BUILD_FAILED bucket splits into two unexplained groups; this PR instruments both. ## 1. The exc: bucket — exception type + errno The dominant remaining venv BUILD_FAILED (phase=venv, err=99) is ~99% sdk_bootstrap_stderr_sig=NULL — Python exceptions caught by the generic `except Exception` ("exc:<TypeName>"), not CalledProcessErrors with categorizable stderr. ~56k/30h, all opaque (stderr_sig only covers "other:<tail>"). - Handler embeds errno for OSError-family: "exc:OSError:28", etc. - SDK_BOOTSTRAP_EXC_CODES maps the type → sdk_bootstrap_exc (FileNotFoundError=1 … OSError=6 … 99=other). - errno decoded → sdk_bootstrap_errno (ENOENT/EACCES/ENOSPC/…). ## 2. venv_ensurepip_fail instrumentation (the other category) venv_ensurepip_fail (code 11) is the top categorizable venv failure, and telemetry flipped the naive assumption: it's NOT just Debian/Ubuntu — macOS has the MOST distinct affected users (466 vs 121 linux), and linux is a retry storm (~172 fires/user). Before committing to a `pip install --target` fallback (Option A) we need to know (a) which interpreter these users run and (b) whether that interpreter even has pip (→ whether --target would work, vs needing a system package). - sdk_hook_py (always emitted): interpreter version as major*100+minor (309/312). Disambiguates Apple-3.9 vs a 3.10+-with-broken-ensurepip, and also recovers the version for HOOK_PY_INCOMPATIBLE (whose "py_3.9" err_kind otherwise collapses to err=99). - sdk_has_pip (only on err==11, to avoid an extra subprocess per healthy session): whether `<interpreter> -m pip --version` works. has_pip=true → the --target fallback would fix them; has_pip=false → they need a system package (python3-venv / a complete Python). Both #1 and #2 are purely additive telemetry on the existing BUILD_FAILED path — no behavior change to the bootstrap. They de-risk the Option A decision: ship A only if the affected cohort has pip. Verified locally on macOS Python 3.13: - py_compile clean. - 39 tests in test_exc_failure_encoding.py (34 exc/errno + 5 ensurepip instrumentation): type-code map, errno extraction + round-trip, APPEND-ONLY stability, handler-embeds-errno, _probe_has_pip returns bool + true-on-this-machine, sdk_hook_py always-emitted as major*100+minor, sdk_has_pip gated on err==11. - Full suite: 503/503 pass + 2 skipped. Version 2.0.3 -> 2.0.4 per the per-PR-bump policy (#2114). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
70c28b9c2f
|
security-guidance: emit schema-valid Stop-hook output (#2159) — 2.0.2 → 2.0.3
Fixes #2159. The Stop hook emits feedback via `hookSpecificOutput: {hookEventName: "Stop", additionalContext}`, but `Stop`/`SubagentStop` are NOT members of CC's `hookSpecificOutput` discriminated union (coreSchemas.ts — valid members are PreToolUse, PostToolUse, UserPromptSubmit, SessionStart, etc.). So the emitted shape violates CC's documented hook-output schema. Impact is CC-version-dependent — important nuance, established empirically: - Reporter (array0224-cloud) on CLI 2.1.150 / 2.1.152: CC rejects the Stop feedback; the block/reason never reaches the model, so the auto-rewake/fix loop is lost. (Detection still runs + logs.) - On CLI 2.1.160 (current) the asyncRewake completion path is lenient: its gate is `isSyncHookJSONOutput` (hooks.ts) which is just `!(json.async === true)` — NOT a strict schema parse. So the invalid hookSpecificOutput is tolerated: metrics + rewakeSummary are still consumed and the model still receives the findings. I could NOT reproduce the rejection on 2.1.160, and BQ confirms Stop-path vulns_found metrics are recorded normally (~21k with-vuln fires / 3d), i.e. NOT dropped. (An earlier draft of this message claimed metrics were dropped — that was wrong; corrected after checking telemetry + repro'ing the old plugin on 2.1.160.) So this is defensive schema-correctness: the plugin should emit output that conforms to CC's documented union regardless of how strictly a given CC version validates it. The reporter's environment validates strictly; relying on the current version's leniency is fragile. Fix (CC's documented asyncRewake "clean pattern" — hooks.ts: "error text on stderr, JSON on stdout"): - For Stop/SubagentStop, emit_metrics writes guidance to stderr (the asyncRewake body channel CC delivers via `stderr || stdout`) and sets top-level `decision: "block"` + `reason` (valid SyncHookJSONOutput fields; also the documented sync Stop-hook contract for the `-p` fallback). It does NOT emit a Stop hookSpecificOutput. - PostToolUse (commit-review, push-sweep) is unchanged — valid union member, keeps the modern hookSpecificOutput protocol. Verified locally on macOS Python 3.13: - py_compile clean. - 11 new tests (test_2159_stop_hook_schema.py) pin the contract: Stop output carries no hookSpecificOutput, uses top-level decision/reason, writes guidance to stderr; the emitted hookEventName (when present) is a valid union member. 2 existing tests that asserted the buggy Stop->hookSpecificOutput shape were corrected. Full suite 464/464 pass + 2 skipped. - END-TO-END in /tmux on CLI 2.1.160: * FIXED plugin (2.0.3): staged pickle.loads + os.system, benign edit pulls the file into review_set; Stop LLM review found 2 critical vulns; CC delivered a clean rewake ("Background security review found issues" + both findings). All hooks (UPS, PostToolUse[Edit] pattern, PostToolUse[Bash] commit-review, Stop) fired clean; zero schema rejections / errors / http_err in the debug log. * OLD plugin (2.0.2) on the SAME 2.1.160: also delivered Stop feedback (confirming the no-repro-on-latest finding above) — which proves the fix carries NO regression risk on current CC while making the output robust for the stricter versions where it actually breaks. Version bumped 2.0.2 -> 2.0.3 per the per-PR-bump policy (#2114: a bump is the only way the fix reaches the existing fleet — relevant for users still on the CC versions where this breaks). Closes #2159. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
009392eee4
|
security-guidance: 5 venv-specific err_kind categories + stderr_signature bucket (2.0.1 → 2.0.2)
PR #2112's telemetry visibility surfaced an immediate finding from the first 3h of v2.0.1 data: **2,406 phase=2 / err=99 sessions** — "venv stage / uncategorized" — dominating BUILD_FAILED. The original err_kind detection patterns were all pip-flavored (pip_no_match, dns_fail, ssl_verify, etc.) and didn't catch venv-creation failure modes, so they all collapsed to the catch-all _uncategorized (99) bucket. This PR fills the gap on two axes. ## 1. Five new venv-specific err_kind categories (codes 11-15) Each gated on `err_phase == "venv"` so the same substring doesn't mis-fire in pip-phase failures: - 11 `venv_ensurepip_fail` — Debian/Ubuntu without python3-venv installed; stderr matches "ensurepip is not available" or "ensurepip ... returned non-zero". Predicted to be the biggest chunk based on Linux distro market share. - 12 `venv_path_too_long` — Windows MAX_PATH (260) or POSIX ENAMETOOLONG. Triggered when state_dir + venv layout exceeds the path limit (deep Lib/site-packages/<pkg>/<...> paths). - 13 `venv_no_module` — `python3 -m venv` itself missing ("No module named 'venv'"). Rare but distinctive. - 14 `venv_already_exists` — Errno 17 / "file exists" — sentinel race past O_EXCL or stale dir survived `--clear`. - 15 `venv_setup_failed` — generic "virtual environment was not created successfully" catch-all for venv setup failures that don't match a more specific category. All 5 occupy reserved slots in SDK_BOOTSTRAP_ERR_CODES per the APPEND-ONLY contract from PR #2112. ## 2. `sdk_bootstrap_stderr_sig` integer hash For "other:<tail>" err_kinds (which encode to _uncategorized = 99), emit a bounded integer hash (0-999) of the first ~30 chars of the stderr tail. This restores cardinality to the _uncategorized bucket in BQ aggregation without unbounded keyspace — same stderr message always maps to the same bucket, so a real failure mode replicating across thousands of machines clusters cleanly. Bounded at 1000 buckets: well below any "high cardinality" alarm but wide enough to distinguish ~30 distinct dominant patterns (birthday-paradox collision probability ~50% at ~37 distinct inputs). The field auto-omits (`if sig:` gate) when err_kind is categorized — no key-budget cost on the common-case categorized failures. ## Version bump 2.0.1 → 2.0.2 PR #2114 confirmed the version-bump mechanism is the only way to propagate code changes to the existing fleet — without a bump, CC's plugin updater short-circuits on string-equality of installation version vs marketplace version. Following the policy we established: **bump patch on every functional PR**. By 17:31:42Z on 2026-06-01 (1m22s after #2114 merged), v2.0.1 was already appearing in BQ. v2.0.2 should follow the same propagation curve — ~30% adoption within 3 hours, full convergence within a few days. ## Verified locally - py_compile clean. - 15 new tests in test_venv_failure_deepdive.py (added to internal test suite at sg-staging/tests/, not in this PR): * 5 parametrized: each new err_kind maps to its expected code (11-15). * 1 APPEND-ONLY regression: existing codes 1-10 + 99 unchanged. * 6 stderr_sig: non-other inputs → 0; None/empty → 0; deterministic same-input → same-output; bounded to 0-999; distinct inputs → distinct hashes (5/5 with P(collision) ≈ 1%); leading-chars focus (path-varying stderr with shared 30-char prefix collide as designed). * 1 static-shape catcher: every new `err_kind = "venv_..."` branch in main() is guarded by `err_phase == "venv"`. Catches the regression where someone adds a venv pattern without the phase gate and starts mis-categorizing pip-phase failures. * 1 map-coverage: all err_kind strings assigned anywhere in ensure_agent_sdk.main() are present in SDK_BOOTSTRAP_ERR_CODES (catches new categories added in code but forgotten in the map). * 1 emit-shape: the metric block uses `_encode_stderr_sig`, the `sdk_bootstrap_stderr_sig` key is written conditionally on `if sig:`. Catches the regression where someone removes the helper or makes the emit unconditional (would pad every categorized BUILD_FAILED row with a zero-valued field). - Full suite: 452/452 pass + 2 skipped (live API tests, opt-in). ## What this unblocks in BQ ```sql -- For the 2,406 sessions/3h that were phase=2/err=99 on v2.0.1, -- v2.0.2+ will split them across the new categories. Query: SELECT CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap_err") AS INT64) AS err, CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap_stderr_sig") AS INT64) AS sig, COUNT(*) AS sessions FROM `proj-product-data-nhme.raw_events.claude_code_internal_event` WHERE _PARTITIONTIME >= ... AND CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap") AS INT64) = 3 AND CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap_phase") AS INT64) = 2 -- venv GROUP BY err, sig ORDER BY sessions DESC ``` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
17b532f92e
|
security-guidance: bump 2.0.0 → 2.0.1 to propagate 8 weeks-of-fixes to the existing fleet
The 8 PRs we shipped since 2026-05-26 (#2076, #2077, #2078, #2086, #2091, #2100, #2101, #2105) all changed plugin code without bumping the version. CC's plugin updater uses string equality for the freshness check (pluginOperations.ts:1835): const isUpToDate = installation.version === newVersion || installation.installPath === versionedPath || installation.installPath === zipPath if (isUpToDate) return { alreadyUpToDate: true } Users who installed v2.0.0 anywhere between 2026-05-26 and 2026-05-31 have `installation.version === "2.0.0"` in their installed_plugins.json. The marketplace also advertises "2.0.0" (until this commit), so isUpToDate returns true and the plugin cache directory is never refreshed — they keep running whatever 2.0.0 code was current on the day they installed. The marketplace git pull happens; the per-user cache install does NOT. Empirical evidence: in BQ today (5/31) on Windows v2.0.0 fires, **73% emit sdk_bootstrap outcome 4 (SKIP_WIN32)** — a code path retired in PR #2055's Windows-enable fix. Those users are running a plugin tree that pre-dates the fix, even though their telemetry shows pv=20000. The fix is a one-line version bump. Once the marketplace advertises 2.0.1, every CC autoupdate cycle sees installation.version (2.0.0) != newVersion (2.0.1), installs the new version, and the user's next session loads the fixed code. This PR: 1. plugins/security-guidance/.claude-plugin/plugin.json: 2.0.0 → 2.0.1 2. .claude-plugin/marketplace.json security-guidance entry: 2.0.0 → 2.0.1 What 2.0.1 carries (versus 2.0.0 as published 5/26): - #2076 — Graphite gt commit/push detection - #2077 — hookSpecificOutput.additionalContext on async-rewake exit-2 - #2078 — CLAUDE_CONFIG_DIR support - #2086 — core.quotePath=false on diff feeders (Arabic/Hebrew/CJK paths) - #2091 — fix Bash(...|...) if-clause regression from #2076 - #2100 — drop text=True from subprocess.run, bake PYTHONUTF8=1 (Windows non-cp1252 path crash) - #2101 — core.quotePath=false on GIT_CMD globally - #2105 — output_format → output_config.format API migration (#2098) Verified locally: - plugin.json + marketplace.json both valid JSON. - _read_plugin_version_int() returns 20001 (was 20000). - Existing test suite passes — 408 tests, no regressions caused by the version bump itself. (29 unrelated failures are from test_telemetry_failure_signals.py which expects PR #2112's not-yet-merged code.) Going forward: bumping `patch` on every functional PR closes this gap entirely. Without that policy, every fix only reaches NEW installs, never the existing fleet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
0bde168648
|
Update security-guidance plugin | ||
|
|
19a119f97e
|
Update plugins library to include authors (#6)
* added Anthropic as author * update figma |
||
|
|
22d3def39e
|
Merge pull request #2 from anthropics/noahz/official_language
Add homepages, other cleanup |
||
|
|
4ca561fb85
|
creating intital scaffolding for claude code plugins |