claude-plugins-official

mirror of https://github.com/anthropics/claude-plugins-official.git synced 2026-06-13 22:26:03 -03:00

Author	SHA1	Message	Date
Mohamed Hegazy	475038edfc	security-guidance: emit HTTP error codes + fix sdk_bootstrap phase/err encoding Fills two failure-visibility gaps in plugin telemetry. ## Gap 1: HTTP errors from _call_claude invisible Before: a 4xx/5xx response from the LLM API caused `_call_claude` to return None and produce ZERO fingerprint in tengu_hook_plugin_metrics. A failed call looked identical to "no review needed". The recent deprecation-400 outage (PR #2105, output_format → output_config.format, #2098) was invisible in aggregate dashboards until a user manually reported errors from their debug log. Cohort-specific or partial outages would never show up in BQ. Fix: add `http_err_last` (most recent status) and `http_err_count` to the existing `_USAGE` accumulator in `_base.py`. `_usage_metrics()` snapshots them whenever count > 0 (skip-path no-pollute contract preserved when count == 0). All `_call_claude` error sites now call the new `_record_http_error()` helper alongside the existing `_last_call_claude_http_error` module-state assignment. Now any future API failure category is queryable in BQ in real time: SELECT DATE(server_timestamp, "America/Los_Angeles") AS d, CAST(JSON_VALUE(additional_metadata, "$.http_err_last") AS INT64) AS code, COUNT() AS n FROM ... WHERE event_name = "tengu_hook_plugin_metrics" AND JSON_VALUE(additional_metadata, "$.pluginId") LIKE "%security-guidance%" AND JSON_VALUE(additional_metadata, "$.http_err_count") IS NOT NULL GROUP BY d, code ORDER BY d, n DESC ## Gap 2: sdk_bootstrap_phase / sdk_bootstrap_err always NULL in BQ Before: ensure_agent_sdk.py emitted these as strings (e.g. "pip", "dns_fail"). CC's plugin-metrics pipeline silently drops plugin-emitted string values — only bool\|finite-number plugin metrics reach BigQuery. (CC-core fields like `subscription_type` are exempt because they're injected downstream of plugin validation.) Confirmed empirically: ~185K BUILD_FAILED rows in BQ over the past 2 days had `sdk_bootstrap_phase` = NULL and `sdk_bootstrap_err` = NULL despite the Python code emitting them. ~28K BUILD_FAILED sessions/day had no diagnostic split — flying blind on whether the failures are pip-no-match vs dns-fail vs ssl-verify vs proxy-auth etc. Fix: encode phase + err_kind as stable integers via SDK_BOOTSTRAP_PHASE_CODES and SDK_BOOTSTRAP_ERR_CODES. Phase: 1=pre, 2=venv, 3=pip, 4=main. Err: 10 known categories (1-10), 11-98 reserved, 99 = uncategorized catch-all (covers "exc:<X>", "other:<X>", and unmapped strings). APPEND-ONLY for telemetry stability. Also corrects the misleading "CC accepts string metric values" comment in ensure_agent_sdk.py that led to the bug originally. Verified locally on macOS Python 3.13: - py_compile clean. - 32 new tests in test_telemetry_failure_signals.py (added to internal test suite at sg-staging/tests/, not in this PR): 4 HTTP-error tracking unit tests: _record_http_error increments count + tracks most-recent; handles None/invalid; -1 for network/timeout. * 4 _usage_metrics emission tests: empty when no activity; successful call has no http_err fields; failure-only has http_err and no api_calls; mixed has both. * 1 contract test: every emitted value is bool\|finite-number (catches future regression of the string-dropping bug class). * 13 sdk_bootstrap encoding tests (parametrized over the 10 known err_kind categories + 5 catch-all shapes): each maps to the right integer; unknown phase = 0; unknown err = 99. * 1 static-shape regression catcher: every `err_kind = "..."` string in ensure_agent_sdk.main() must be in SDK_BOOTSTRAP_ERR_CODES (otherwise new err_kinds silently collapse to 99). * 2 emit-shape regression catchers: the assignments in main() go through _encode_phase / _encode_err_kind helpers (no raw strings); no literal string values for sdk_bootstrap_phase/err. * 1 comment-accuracy: the misleading "CC accepts string metric values" comment is gone. - Full suite: 437/437 pass + 2 skipped (live API tests, opt-in). NOT verified end-to-end against BQ — would require shipping + observing in production for 24h to confirm the http_err and sdk_bootstrap_phase/err fields actually appear in tengu_hook_plugin_metrics rows. The unit tests pin the contract; if the wire shape is broken, BQ will show NULL for the new fields and we revisit (with the same diagnostic the BUILD_FAILED bug gave us). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-31 08:34:35 -07:00
Mohamed Hegazy	0d22ba3501	security-guidance: respect CLAUDE_CONFIG_DIR for plugin state files (#1868 ) Fixes #1868 — when CLAUDE_CONFIG_DIR is set to a non-default location (e.g. ~/.config/claude for XDG compliance, or a multi-tenant install path), the plugin still wrote state files to the hardcoded ~/.claude/ path, leaving stale state and breaking CLAUDE_CONFIG_DIR's purpose. Resolution precedence (highest first): 1. SECURITY_WARNINGS_STATE_DIR — plugin-specific override (existing) 2. CLAUDE_CONFIG_DIR/security — CC's config-dir env (new — #1868) 3. ~/.claude/security — default fallback (unchanged) Empty-string env vars (e.g. CLAUDE_CONFIG_DIR= in a misconfigured shell) are treated as not-set so the empty path doesn't collide with os.path.join and silently write to /security at the filesystem root. Implementation: a single state_dir() helper in _base.py is the source of truth for resolution. All five modules that previously had inline SECURITY_WARNINGS_STATE_DIR / ~/.claude/security resolutions (_base.py, session_state.py, ensure_agent_sdk.py, llm.py, and one site in security_reminder_hook.py) now call state_dir() instead. Re-implementing the precedence inline risks drift — one module gets a future fix, others don't. The helper is called per-invocation rather than cached at import time so test monkeypatches of the env vars take effect, and so a long- running test or future shared-process scenario can change the env between calls and have the next call observe the new value. The per-call cost is negligible compared to the subprocess-spawn cost the hooks pay every fire in production. Three hardcoded ~/.claude/security strings remain but are NOT functional resolutions: - _base.py:39: the fallback BRANCH inside state_dir() itself - ensure_agent_sdk.py:6, :11: docstring text describing default location for users Verified locally on macOS Python 3.13: - py_compile clean on all 5 modified files. - Existing 45 smoke + extensibility tests still pass. - 14 new tests in test_claude_config_dir.py (added to internal test suite at sg-staging/tests/, not in this PR): * 7 resolution-semantics: default fallback, CLAUDE_CONFIG_DIR override, SECURITY_WARNINGS_STATE_DIR beats both, tilde expansion, empty-string handling (CLAUDE_CONFIG_DIR= must fall back, NOT join to /security). * 4 static-shape: each of session_state / ensure_agent_sdk / llm / security_reminder_hook either imports state_dir from _base OR has zero resolution patterns. Catches the regression where someone adds a new state-file writer and re-implements resolution inline, missing the CLAUDE_CONFIG_DIR branch. * 3 end-to-end: with CLAUDE_CONFIG_DIR set, get_state_file / get_lock_file return paths under <CLAUDE_CONFIG_DIR>/security/; save_state round-trip writes a file to the redirected path and re-reads the same contents. - 59/59 pass total (45 existing + 14 new) in 2.54s. NOT verified end-to-end with a real CC instance setting CLAUDE_CONFIG_DIR. The shape tests catch the regression class (hardcoded ~/.claude/), and the end-to-end test pins the behavior that user state files actually land at the redirected path. Closes #1868. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 23:57:10 -07:00
Mohamed Hegazy	a67587c816	security-guidance: enable LLM review on default macOS Python 3.9 Fixes anthropics/claude-plugins-official#2071 — on macOS where the default `python3` is Apple's Command Line Tools Python 3.9.6, the plugin's agentic commit reviewer silently does not run, even when the user has a newer Python installed. Three compounding factors in the bug: 1. `sg-python.sh` only checks the major version (`3`), so it always picks 3.9 even when 3.10+ is on PATH. 2. `claude_agent_sdk` requires Python >=3.10 — pip install on 3.9 returns "No matching distribution" -> bootstrap returns BUILD_FAILED. 3. Even with a hand-built 3.12 venv, `llm.py` imports the SDK in-process into the hook's interpreter (still 3.9), which raises SyntaxError. The existing venv-probe in `ensure_agent_sdk.py` uses the venv's own Python (3.12) so it reports NOOP_VENV (healthy) while the consumer fails — misleading telemetry on top of silent feature degradation. Per BQ telemetry, 14,073 external macOS users hit sdk_bootstrap=BUILD_FAILED in the past 4 days (the default-macOS cohort), out of ~86K total external installed users. Combined with ~20K other users in similar broken-bootstrap states (Windows pre-#2055, Linux <3.10), about half the installed base has a silently-broken agentic reviewer. This PR implements the reporter's items #1, #3, and #4. Item #2 (running the SDK out-of-process) is deferred as a bigger refactor. Item #1 — hooks/sg-python.sh — prefer >=3.10 binaries via 3-pass probe: Pass 1: python3.13 / 3.12 / 3.11 / 3.10 (>=3.10 by name, highest wins) Pass 2: bare python3 / python / py -3 (accept only if reported >=3.10) Pass 3: bare python3 / python / py -3 (any Python 3, FALLBACK so pattern checks still work on macOS-default 3.9 — no regression vs today; SDK-dependent paths detect the version mismatch inside Python and degrade cleanly via item #4) Item #4 — ensure_agent_sdk.py — health-check honesty: Added HOOK_PY_INCOMPATIBLE=6 outcome with short-circuit at top of main(): if sys.version_info < (3, 10): return HOOK_PY_INCOMPATIBLE, "hook_py", f"py_{...}" Telemetry consequences after rollout: sdk_bootstrap=6 is a new clean bucket; some users currently miscounted in sdk_bootstrap=3 BUILD_FAILED (wasted pip cycles) and sdk_bootstrap=1 NOOP_VENV (falsely-healthy) move to sdk_bootstrap=6. The remaining NOOP_VENV count becomes trustworthy. Item #3 — ensure_agent_sdk.py — one-time user-visible notice: When outcome == HOOK_PY_INCOMPATIBLE and a marker file at `~/.claude/security/.agentic_unavailable_notice_v<pv>` doesn't exist, the SessionStart response includes hookSpecificOutput.additionalContext + systemMessage explaining the situation. Marker file is plugin- version-keyed so a future fix (e.g. shipping out-of-process SDK) can bump pv and re-notify users. BUILD_FAILED is intentionally excluded from the notice — it covers transient causes where a permanent banner would mislead. Verified locally on macOS Python 3.13: - py_compile clean on both files. - Existing 45-test smoke + extensibility suite: 45/45 PASS in 2.50s. - Unit test of simulated 3.9 path: HOOK_PY_INCOMPATIBLE returned with correct phase/kind; notice shown on first call, suppressed on second, reshown on bumped pv; BUILD_FAILED correctly does NOT trigger notice. NOT verified: actual Python 3.9 behavior end-to-end (would need a 3.9 install). Worth a follow-up smoke test in a 3.9 venv before next release. The unit test simulating 3.9 covers the logic but not the runtime invocation through the shim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 22:58:01 -07:00
Mohamed Hegazy	c11244778d	Address Windows verification: --prefer-binary + pywin32 bootstrap The first round of this PR removed SKIP_WIN32, fixed venv_py to use Scripts/python.exe, and added Lib/site-packages to the consumer glob — all necessary. Windows verification (Win11 ARM64, Py 3.13, Git Bash) showed two more blockers, both addressed here. 1. Pip dependency resolver picks unbuildable cryptography on ARM64. Without --prefer-binary, pip picks a cryptography version with no published ARM64 wheel and tries to build it from source. That needs Rust/Cargo, almost never present on user machines → BUILD_FAILED with err_kind=other:cryptography. A binary wheel exists for an adjacent version (cryptography-46.0.3-cp311-abi3-win_arm64.whl); --prefer-binary tells pip to pick it. Cross-platform safe (no-op where the latest version already has a wheel). 2. pywin32 .pth files aren't processed by sys.path.insert(). With the venv built, ensure_agent_sdk.py's post-build probe passes (it runs from venv_py, where Python's site.py at startup processes pywin32.pth and registers win32/, win32/lib/ plus runs pywin32_bootstrap.py to set the DLL search dir). But llm.py runs in the hook's SYSTEM Python and adds the venv via sys.path.insert(), which doesn't trigger site.py at all. Without the bootstrap, the SDK's mcp.client.stdio → mcp.os.win32.utilities chain raises ModuleNotFoundError: pywintypes and the agentic reviewer falls back to single-shot silently — exactly the symptom this PR is trying to fix. The probe says NOOP_VENV; the actual consumer fails. Probe and consumer use different Pythons. Replicate what site.py would do: after inserting site-packages, also insert win32/ and win32/lib/, then exec pywin32_bootstrap.py. Pulled into a shared helper _inject_agent_sdk_venv_into_syspath() so both consumer sites (3P SDK fallback, agentic_review fallback) call the same code — Windows handling stays in one place. Verified on macOS (POSIX path unchanged): - Helper end-to-end test: POSIX-layout venv detected + fake package imports successfully via the injected path - Windows-layout venv also detected; win32 branch correctly skipped via sys.platform check - Both files pass py_compile Credit: @mhegazy verified the previous commit on Win11 ARM64 / Py 3.13 / Git Bash, surfaced both issues end-to-end, and provided the exact fix patterns. This commit applies them with the pywin32 part factored into a shared helper (vs. inlining at both consumer sites). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 15:07:33 -07:00
Mohamed Hegazy	4decd2e3b2	Enable agentic commit reviewer on Windows The agentic reviewer is silently no-op on Windows today. SessionStart bootstrap (ensure_agent_sdk.py) short-circuits with SKIP_WIN32 because the consumer glob in llm.py only matches POSIX venv layout (lib/pythonX.Y/site-packages). On Windows, venvs use Lib/site-packages (capital L, no pythonX.Y subdir), so even if a venv got built the glob wouldn't find its contents. Result: Windows users on default installs (no system-wide claude_agent_sdk) get layer 1 (pattern warnings) and layer 2 (single- shot LLM diff review) but not layer 3 — the cross-file agentic review that catches IDOR, auth-bypass, cross-file SSRF, and other things that need to read related files. Plugin description claims layer 3 but it silently doesn't run. Three changes: 1. llm.py — extend the consumer glob (2 sites: 3P SDK fallback at ~L297, agentic_review fallback at ~L1090) to also match the Windows Lib/site-packages layout, so a venv built on Windows is actually discoverable. 2. ensure_agent_sdk.py — remove the sys.platform == 'win32' early-exit so the SessionStart bootstrap builds the venv on Windows too. Outcome code 4 (formerly SKIP_WIN32) is retired but not reused so pre-fix telemetry rows still decode correctly. 3. ensure_agent_sdk.py — venv_py path now branches on sys.platform: Windows venvs put the interpreter at Scripts\python.exe; POSIX uses bin/python. Previously assumed POSIX, so even with the glob fix, the post-build SDK-importability probe would fail on Windows. Verified locally on macOS: - glob test: both layouts now match (POSIX venv detected, simulated Windows venv also detected via the new Lib/site-packages branch) - both files pass py_compile - POSIX path unchanged (sys.platform != 'win32' so old branch runs) Not verified on Windows in this commit — needs an actual Windows runner to confirm the venv build + SDK import + subprocess plumbing all work end-to-end. The SDK spawns a child claude.exe; Windows process plumbing has its own quirks (shell semantics, path escaping) that may surface separately. Worth a controlled rollout (one-week soak under env-var opt-in before flipping default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 14:50:51 -07:00
Mohamed Hegazy	0bde168648	Update security-guidance plugin	2026-05-26 14:06:52 -07:00

6 Commits