10 Commits

Author SHA1 Message Date
Mohamed Hegazy
4e56d19dd8
security-guidance: handle signal-killed venv builds (memory) + cooldown (2.0.5 → 2.0.6)
The real dominant Linux failure, identified by a CCR Linux repro.

A CCR container reproduced the production signature — non-zero exit +
EMPTY stdout + EMPTY stderr (~60k fires/day, 4,485 Linux users on 2.0.4):
running `python -m venv` under a tight memory limit (ulimit -v) kills the
memory-heavy venv+ensurepip/pip subprocess with SIGSEGV (-11, RLIMIT_AS)
or SIGKILL (-9, kernel OOM-killer) BEFORE it writes anything. This is
NOT the ensurepip/packaging case (that always writes to stderr, code 11)
and NOT fixable by --target (a --target pip install is also memory-heavy
and gets killed too). Three earlier hypotheses (stdout, packaging,
Option A fixes Linux) were wrong — the repro corrected them.

Changes:
  - Detect the signal kill (rc<0, or 128+sig: 134/137/139) in the venv/pip
    and --target paths → err_kind "signal_killed:<rc>" (new code 16). The
    returncode rides in a new sdk_bootstrap_rc metric so prod confirms
    which signal dominates (-9 OOM-killer vs -11 RLIMIT_AS).
  - Cooldown: on a signal kill, write a marker and return the new
    SKIP_COOLDOWN outcome (9) on subsequent sessions for 24h — stops the
    retry storm (every session was re-attempting a build that just gets
    re-killed, burning the user's memory/CPU). Retries once per window so a
    machine that frees memory still recovers.
  - --no-cache-dir on both pip installs (venv + --target) trims pip's peak
    memory; may get marginal machines under the OOM threshold.

No happy-path change: signal detection is at the top of the existing
failure handler; cooldown is checked only after all no-op probes
(NOOP_SYSTEM/VENV/TARGET short-circuit first).

Verified locally on macOS Python 3.13:
  - py_compile clean.
  - 35 new tests (test_signal_kill_cooldown.py): _is_signal_kill across
    signals/exit-codes, rc decode, signal_killed→code 16, cooldown
    lifecycle (none→write→expire), and an integration flow — simulated
    SIGKILL'd venv → BUILD_FAILED/signal_killed:-9 + cooldown written →
    2nd run SKIP_COOLDOWN without re-attempting → retry after window;
    non-signal failure does NOT cool down; --no-cache-dir present on both
    pip paths; sdk_bootstrap_rc emitted conditionally.
  - End-to-end harness: the full kill→categorize→cooldown→skip→retry
    chain confirmed in-process.

The original CCR repro (ulimit -v ≤7000 KB → rc=-11, empty streams) is
the ground truth this fix is built on. Can be re-validated on CCR with the
same ulimit approach.

Version 2.0.5 -> 2.0.6 per the per-PR-bump policy (#2114).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-12 00:53:02 -07:00
Mohamed Hegazy
e7fe15d9ba
security-guidance: pip --target fallback when venv can't bootstrap pip (2.0.4 → 2.0.5)
Option A, the data-gated fix for venv_ensurepip_fail (#2154 follow-up).

v2.0.4 telemetry made the call: of the venv_ensurepip_fail cohort, ~95%
HAVE pip (sdk_has_pip=true) and run Python 3.11–3.14 — so it's not the
Apple-3.9 problem; it's modern interpreters where `python -m venv` can't
bootstrap pip (Debian python3-venv absent, or python.org/pyenv builds
without ensurepip) but pip itself works. `pip install --target` needs only
pip, so it recovers the agentic reviewer for them instead of degrading to
pattern + single-shot review.

Producer (ensure_agent_sdk.py):
  - New outcomes BUILT_TARGET=7, NOOP_TARGET=8; new phase pip_target=5.
  - _build_via_target(): `pip install --target <state>/agent-sdk-libs
    --upgrade --prefer-binary claude-agent-sdk`. Failures categorized via
    _pip_err_from_stderr (sibling of main()'s pip chain — kept separate to
    avoid disturbing the working venv categorizer); errno embedded for
    OSError-family exceptions.
  - _target_sdk_importable(): probes a prior target install → NOOP_TARGET.
    Dir-check short-circuits before any subprocess, and it's only reached
    when there's no working venv, so the 81% NOOP_VENV cohort never pays.
  - main() falls through to the target build ONLY on venv_ensurepip_fail;
    every other venv/pip failure stays terminal BUILD_FAILED. The sentinel
    is released before the target build so a retry isn't seen as SKIP_SENTINEL.

Consumer (llm.py):
  - _inject_agent_sdk_venv_into_syspath() adds the flat agent-sdk-libs dir
    (packages sit directly in it, not under site-packages). The existing
    pywin32 .pth bootstrap applies (target installs don't run .pth either).

No change to the happy path — the new branch is taken only on the
ensurepip failure, and the extra candidate dir is a no-op when absent.

Verified locally on macOS Python 3.13:
  - py_compile clean.
  - 30 new tests (test_venv_target_fallback.py): outcome/phase codes
    (append-only, 4 stays retired), _pip_err_from_stderr categories,
    _build_via_target success/CalledProcessError/timeout/exc+errno (mocked
    subprocess), _target_sdk_importable dir-short-circuit, main() wiring
    (ensurepip→target fallthrough + NOOP_TARGET probe + sentinel release),
    consumer adds the flat dir. Full suite 533/533 pass + 2 skipped.
  - END-TO-END harness (real install, simulated ensurepip failure):
    main() → BUILT_TARGET, target dir has claude_agent_sdk; 2nd run →
    NOOP_TARGET; consumer _inject → `import claude_agent_sdk` resolves
    FROM the --target dir. Full chain proven without needing a
    broken-ensurepip box.
  - Real `pip install --target` + import confirmed independently (exit 0,
    SDK imports from the flat layout).

NOT validated in tmux: the ensurepip failure can't be reproduced on macOS
(working ensurepip), so the fallback was proven via the real-install
harness above instead. The happy path (NOOP_VENV / normal agentic review)
is unchanged and covered by the existing hook-smoke suite.

Version 2.0.4 -> 2.0.5 per the per-PR-bump policy (#2114).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-11 23:31:55 -07:00
Mohamed Hegazy
43fcf6d513
security-guidance: encode exception type + errno + ensurepip instrumentation for venv BUILD_FAILED (2.0.3 → 2.0.4)
Follow-up to #2154. v2.0.3 telemetry showed the venv BUILD_FAILED bucket
splits into two unexplained groups; this PR instruments both.

## 1. The exc: bucket — exception type + errno

The dominant remaining venv BUILD_FAILED (phase=venv, err=99) is ~99%
sdk_bootstrap_stderr_sig=NULL — Python exceptions caught by the generic
`except Exception` ("exc:<TypeName>"), not CalledProcessErrors with
categorizable stderr. ~56k/30h, all opaque (stderr_sig only covers
"other:<tail>").

  - Handler embeds errno for OSError-family: "exc:OSError:28", etc.
  - SDK_BOOTSTRAP_EXC_CODES maps the type → sdk_bootstrap_exc
    (FileNotFoundError=1 … OSError=6 … 99=other).
  - errno decoded → sdk_bootstrap_errno (ENOENT/EACCES/ENOSPC/…).

## 2. venv_ensurepip_fail instrumentation (the other category)

venv_ensurepip_fail (code 11) is the top categorizable venv failure, and
telemetry flipped the naive assumption: it's NOT just Debian/Ubuntu —
macOS has the MOST distinct affected users (466 vs 121 linux), and linux
is a retry storm (~172 fires/user). Before committing to a `pip install
--target` fallback (Option A) we need to know (a) which interpreter these
users run and (b) whether that interpreter even has pip (→ whether
--target would work, vs needing a system package).

  - sdk_hook_py (always emitted): interpreter version as major*100+minor
    (309/312). Disambiguates Apple-3.9 vs a 3.10+-with-broken-ensurepip,
    and also recovers the version for HOOK_PY_INCOMPATIBLE (whose "py_3.9"
    err_kind otherwise collapses to err=99).
  - sdk_has_pip (only on err==11, to avoid an extra subprocess per healthy
    session): whether `<interpreter> -m pip --version` works. has_pip=true
    → the --target fallback would fix them; has_pip=false → they need a
    system package (python3-venv / a complete Python).

Both #1 and #2 are purely additive telemetry on the existing BUILD_FAILED
path — no behavior change to the bootstrap. They de-risk the Option A
decision: ship A only if the affected cohort has pip.

Verified locally on macOS Python 3.13:
  - py_compile clean.
  - 39 tests in test_exc_failure_encoding.py (34 exc/errno + 5 ensurepip
    instrumentation): type-code map, errno extraction + round-trip,
    APPEND-ONLY stability, handler-embeds-errno, _probe_has_pip returns
    bool + true-on-this-machine, sdk_hook_py always-emitted as
    major*100+minor, sdk_has_pip gated on err==11.
  - Full suite: 503/503 pass + 2 skipped.

Version 2.0.3 -> 2.0.4 per the per-PR-bump policy (#2114).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-04 14:55:55 -07:00
Mohamed Hegazy
009392eee4
security-guidance: 5 venv-specific err_kind categories + stderr_signature bucket (2.0.1 → 2.0.2)
PR #2112's telemetry visibility surfaced an immediate finding from
the first 3h of v2.0.1 data: **2,406 phase=2 / err=99 sessions** —
"venv stage / uncategorized" — dominating BUILD_FAILED. The original
err_kind detection patterns were all pip-flavored (pip_no_match,
dns_fail, ssl_verify, etc.) and didn't catch venv-creation failure
modes, so they all collapsed to the catch-all _uncategorized (99)
bucket.

This PR fills the gap on two axes.

## 1. Five new venv-specific err_kind categories (codes 11-15)

Each gated on `err_phase == "venv"` so the same substring doesn't
mis-fire in pip-phase failures:

  - 11 `venv_ensurepip_fail` — Debian/Ubuntu without python3-venv
    installed; stderr matches "ensurepip is not available" or
    "ensurepip ... returned non-zero". Predicted to be the biggest
    chunk based on Linux distro market share.
  - 12 `venv_path_too_long` — Windows MAX_PATH (260) or POSIX
    ENAMETOOLONG. Triggered when state_dir + venv layout exceeds
    the path limit (deep Lib/site-packages/<pkg>/<...> paths).
  - 13 `venv_no_module` — `python3 -m venv` itself missing
    ("No module named 'venv'"). Rare but distinctive.
  - 14 `venv_already_exists` — Errno 17 / "file exists" — sentinel
    race past O_EXCL or stale dir survived `--clear`.
  - 15 `venv_setup_failed` — generic "virtual environment was not
    created successfully" catch-all for venv setup failures that
    don't match a more specific category.

All 5 occupy reserved slots in SDK_BOOTSTRAP_ERR_CODES per the
APPEND-ONLY contract from PR #2112.

## 2. `sdk_bootstrap_stderr_sig` integer hash

For "other:<tail>" err_kinds (which encode to _uncategorized = 99),
emit a bounded integer hash (0-999) of the first ~30 chars of the
stderr tail. This restores cardinality to the _uncategorized bucket
in BQ aggregation without unbounded keyspace — same stderr message
always maps to the same bucket, so a real failure mode replicating
across thousands of machines clusters cleanly. Bounded at 1000
buckets: well below any "high cardinality" alarm but wide enough to
distinguish ~30 distinct dominant patterns (birthday-paradox
collision probability ~50% at ~37 distinct inputs).

The field auto-omits (`if sig:` gate) when err_kind is categorized
— no key-budget cost on the common-case categorized failures.

## Version bump 2.0.1 → 2.0.2

PR #2114 confirmed the version-bump mechanism is the only way to
propagate code changes to the existing fleet — without a bump, CC's
plugin updater short-circuits on string-equality of installation
version vs marketplace version. Following the policy we established:
**bump patch on every functional PR**.

By 17:31:42Z on 2026-06-01 (1m22s after #2114 merged), v2.0.1 was
already appearing in BQ. v2.0.2 should follow the same propagation
curve — ~30% adoption within 3 hours, full convergence within a few
days.

## Verified locally

  - py_compile clean.
  - 15 new tests in test_venv_failure_deepdive.py (added to internal
    test suite at sg-staging/tests/, not in this PR):

      * 5 parametrized: each new err_kind maps to its expected code (11-15).
      * 1 APPEND-ONLY regression: existing codes 1-10 + 99 unchanged.
      * 6 stderr_sig: non-other inputs → 0; None/empty → 0; deterministic
        same-input → same-output; bounded to 0-999; distinct inputs →
        distinct hashes (5/5 with P(collision) ≈ 1%); leading-chars focus
        (path-varying stderr with shared 30-char prefix collide as designed).
      * 1 static-shape catcher: every new `err_kind = "venv_..."` branch
        in main() is guarded by `err_phase == "venv"`. Catches the
        regression where someone adds a venv pattern without the phase
        gate and starts mis-categorizing pip-phase failures.
      * 1 map-coverage: all err_kind strings assigned anywhere in
        ensure_agent_sdk.main() are present in SDK_BOOTSTRAP_ERR_CODES
        (catches new categories added in code but forgotten in the map).
      * 1 emit-shape: the metric block uses `_encode_stderr_sig`, the
        `sdk_bootstrap_stderr_sig` key is written conditionally on `if
        sig:`. Catches the regression where someone removes the
        helper or makes the emit unconditional (would pad every
        categorized BUILD_FAILED row with a zero-valued field).

  - Full suite: 452/452 pass + 2 skipped (live API tests, opt-in).

## What this unblocks in BQ

```sql
-- For the 2,406 sessions/3h that were phase=2/err=99 on v2.0.1,
-- v2.0.2+ will split them across the new categories. Query:
SELECT
  CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap_err") AS INT64) AS err,
  CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap_stderr_sig") AS INT64) AS sig,
  COUNT(*) AS sessions
FROM `proj-product-data-nhme.raw_events.claude_code_internal_event`
WHERE _PARTITIONTIME >= ...
  AND CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap") AS INT64) = 3
  AND CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap_phase") AS INT64) = 2  -- venv
GROUP BY err, sig
ORDER BY sessions DESC
```

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 16:05:49 -07:00
Mohamed Hegazy
475038edfc
security-guidance: emit HTTP error codes + fix sdk_bootstrap phase/err encoding
Fills two failure-visibility gaps in plugin telemetry.

## Gap 1: HTTP errors from _call_claude invisible

Before: a 4xx/5xx response from the LLM API caused `_call_claude` to
return None and produce ZERO fingerprint in tengu_hook_plugin_metrics.
A failed call looked identical to "no review needed". The recent
deprecation-400 outage (PR #2105, output_format → output_config.format,
#2098) was invisible in aggregate dashboards until a user manually
reported errors from their debug log. Cohort-specific or partial
outages would never show up in BQ.

Fix: add `http_err_last` (most recent status) and `http_err_count`
to the existing `_USAGE` accumulator in `_base.py`. `_usage_metrics()`
snapshots them whenever count > 0 (skip-path no-pollute contract
preserved when count == 0). All `_call_claude` error sites now call
the new `_record_http_error()` helper alongside the existing
`_last_call_claude_http_error` module-state assignment.

Now any future API failure category is queryable in BQ in real time:

  SELECT
    DATE(server_timestamp, "America/Los_Angeles") AS d,
    CAST(JSON_VALUE(additional_metadata, "$.http_err_last") AS INT64) AS code,
    COUNT(*) AS n
  FROM ... WHERE event_name = "tengu_hook_plugin_metrics"
    AND JSON_VALUE(additional_metadata, "$.pluginId") LIKE "%security-guidance%"
    AND JSON_VALUE(additional_metadata, "$.http_err_count") IS NOT NULL
  GROUP BY d, code ORDER BY d, n DESC

## Gap 2: sdk_bootstrap_phase / sdk_bootstrap_err always NULL in BQ

Before: ensure_agent_sdk.py emitted these as strings (e.g. "pip",
"dns_fail"). CC's plugin-metrics pipeline silently drops
plugin-emitted string values — only bool|finite-number plugin metrics
reach BigQuery. (CC-core fields like `subscription_type` are exempt
because they're injected downstream of plugin validation.)

Confirmed empirically: ~185K BUILD_FAILED rows in BQ over the past 2
days had `sdk_bootstrap_phase` = NULL and `sdk_bootstrap_err` = NULL
despite the Python code emitting them. ~28K BUILD_FAILED sessions/day
had no diagnostic split — flying blind on whether the failures are
pip-no-match vs dns-fail vs ssl-verify vs proxy-auth etc.

Fix: encode phase + err_kind as stable integers via
SDK_BOOTSTRAP_PHASE_CODES and SDK_BOOTSTRAP_ERR_CODES. Phase: 1=pre,
2=venv, 3=pip, 4=main. Err: 10 known categories (1-10), 11-98
reserved, 99 = uncategorized catch-all (covers "exc:<X>", "other:<X>",
and unmapped strings). APPEND-ONLY for telemetry stability.

Also corrects the misleading "CC accepts string metric values"
comment in ensure_agent_sdk.py that led to the bug originally.

Verified locally on macOS Python 3.13:

  - py_compile clean.
  - 32 new tests in test_telemetry_failure_signals.py (added to
    internal test suite at sg-staging/tests/, not in this PR):

      * 4 HTTP-error tracking unit tests: _record_http_error increments
        count + tracks most-recent; handles None/invalid; -1 for
        network/timeout.
      * 4 _usage_metrics emission tests: empty when no activity;
        successful call has no http_err fields; failure-only has http_err
        and no api_calls; mixed has both.
      * 1 contract test: every emitted value is bool|finite-number
        (catches future regression of the string-dropping bug class).
      * 13 sdk_bootstrap encoding tests (parametrized over the 10 known
        err_kind categories + 5 catch-all shapes): each maps to the
        right integer; unknown phase = 0; unknown err = 99.
      * 1 static-shape regression catcher: every `err_kind = "..."`
        string in ensure_agent_sdk.main() must be in
        SDK_BOOTSTRAP_ERR_CODES (otherwise new err_kinds silently
        collapse to 99).
      * 2 emit-shape regression catchers: the assignments in main() go
        through _encode_phase / _encode_err_kind helpers (no raw
        strings); no literal string values for sdk_bootstrap_phase/err.
      * 1 comment-accuracy: the misleading "CC accepts string metric
        values" comment is gone.

  - Full suite: 437/437 pass + 2 skipped (live API tests, opt-in).

NOT verified end-to-end against BQ — would require shipping +
observing in production for 24h to confirm the http_err and
sdk_bootstrap_phase/err fields actually appear in
tengu_hook_plugin_metrics rows. The unit tests pin the contract; if
the wire shape is broken, BQ will show NULL for the new fields and we
revisit (with the same diagnostic the BUILD_FAILED bug gave us).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 08:34:35 -07:00
Mohamed Hegazy
0d22ba3501
security-guidance: respect CLAUDE_CONFIG_DIR for plugin state files (#1868)
Fixes #1868 — when CLAUDE_CONFIG_DIR is set to a non-default location
(e.g. ~/.config/claude for XDG compliance, or a multi-tenant install
path), the plugin still wrote state files to the hardcoded ~/.claude/
path, leaving stale state and breaking CLAUDE_CONFIG_DIR's purpose.

Resolution precedence (highest first):
  1. SECURITY_WARNINGS_STATE_DIR  — plugin-specific override (existing)
  2. CLAUDE_CONFIG_DIR/security    — CC's config-dir env (new — #1868)
  3. ~/.claude/security            — default fallback (unchanged)

Empty-string env vars (e.g. CLAUDE_CONFIG_DIR= in a misconfigured
shell) are treated as not-set so the empty path doesn't collide with
os.path.join and silently write to /security at the filesystem root.

Implementation: a single state_dir() helper in _base.py is the source
of truth for resolution. All five modules that previously had inline
SECURITY_WARNINGS_STATE_DIR / ~/.claude/security resolutions
(_base.py, session_state.py, ensure_agent_sdk.py, llm.py, and one
site in security_reminder_hook.py) now call state_dir() instead.
Re-implementing the precedence inline risks drift — one module gets
a future fix, others don't.

The helper is called per-invocation rather than cached at import time
so test monkeypatches of the env vars take effect, and so a long-
running test or future shared-process scenario can change the env
between calls and have the next call observe the new value. The
per-call cost is negligible compared to the subprocess-spawn cost
the hooks pay every fire in production.

Three hardcoded ~/.claude/security strings remain but are NOT
functional resolutions:
  - _base.py:39: the fallback BRANCH inside state_dir() itself
  - ensure_agent_sdk.py:6, :11: docstring text describing default
                                location for users

Verified locally on macOS Python 3.13:

  - py_compile clean on all 5 modified files.
  - Existing 45 smoke + extensibility tests still pass.
  - 14 new tests in test_claude_config_dir.py (added to internal test
    suite at sg-staging/tests/, not in this PR):

      * 7 resolution-semantics: default fallback, CLAUDE_CONFIG_DIR
        override, SECURITY_WARNINGS_STATE_DIR beats both, tilde
        expansion, empty-string handling (CLAUDE_CONFIG_DIR= must
        fall back, NOT join to /security).
      * 4 static-shape: each of session_state / ensure_agent_sdk /
        llm / security_reminder_hook either imports state_dir from
        _base OR has zero resolution patterns. Catches the
        regression where someone adds a new state-file writer and
        re-implements resolution inline, missing the
        CLAUDE_CONFIG_DIR branch.
      * 3 end-to-end: with CLAUDE_CONFIG_DIR set, get_state_file /
        get_lock_file return paths under <CLAUDE_CONFIG_DIR>/security/;
        save_state round-trip writes a file to the redirected path
        and re-reads the same contents.

  - 59/59 pass total (45 existing + 14 new) in 2.54s.

NOT verified end-to-end with a real CC instance setting
CLAUDE_CONFIG_DIR. The shape tests catch the regression class
(hardcoded ~/.claude/), and the end-to-end test pins the behavior
that user state files actually land at the redirected path.

Closes #1868.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 23:57:10 -07:00
Mohamed Hegazy
a67587c816
security-guidance: enable LLM review on default macOS Python 3.9
Fixes anthropics/claude-plugins-official#2071 — on macOS where the
default `python3` is Apple's Command Line Tools Python 3.9.6, the
plugin's agentic commit reviewer silently does not run, even when the
user has a newer Python installed.

Three compounding factors in the bug:

1. `sg-python.sh` only checks the major version (`3`), so it always
   picks 3.9 even when 3.10+ is on PATH.
2. `claude_agent_sdk` requires Python >=3.10 — pip install on 3.9
   returns "No matching distribution" -> bootstrap returns BUILD_FAILED.
3. Even with a hand-built 3.12 venv, `llm.py` imports the SDK
   in-process into the hook's interpreter (still 3.9), which raises
   SyntaxError. The existing venv-probe in `ensure_agent_sdk.py` uses
   the venv's own Python (3.12) so it reports NOOP_VENV (healthy) while
   the consumer fails — misleading telemetry on top of silent feature
   degradation.

Per BQ telemetry, 14,073 external macOS users hit
sdk_bootstrap=BUILD_FAILED in the past 4 days (the default-macOS
cohort), out of ~86K total external installed users. Combined with
~20K other users in similar broken-bootstrap states (Windows pre-#2055,
Linux <3.10), about half the installed base has a silently-broken
agentic reviewer.

This PR implements the reporter's items #1, #3, and #4. Item #2
(running the SDK out-of-process) is deferred as a bigger refactor.

Item #1 — hooks/sg-python.sh — prefer >=3.10 binaries via 3-pass probe:

  Pass 1: python3.13 / 3.12 / 3.11 / 3.10 (>=3.10 by name, highest wins)
  Pass 2: bare python3 / python / py -3 (accept only if reported >=3.10)
  Pass 3: bare python3 / python / py -3 (any Python 3, FALLBACK so
          pattern checks still work on macOS-default 3.9 — no regression
          vs today; SDK-dependent paths detect the version mismatch
          inside Python and degrade cleanly via item #4)

Item #4 — ensure_agent_sdk.py — health-check honesty:

Added HOOK_PY_INCOMPATIBLE=6 outcome with short-circuit at top of main():

  if sys.version_info < (3, 10):
      return HOOK_PY_INCOMPATIBLE, "hook_py", f"py_{...}"

Telemetry consequences after rollout: sdk_bootstrap=6 is a new clean
bucket; some users currently miscounted in sdk_bootstrap=3 BUILD_FAILED
(wasted pip cycles) and sdk_bootstrap=1 NOOP_VENV (falsely-healthy)
move to sdk_bootstrap=6. The remaining NOOP_VENV count becomes
trustworthy.

Item #3 — ensure_agent_sdk.py — one-time user-visible notice:

When outcome == HOOK_PY_INCOMPATIBLE and a marker file at
`~/.claude/security/.agentic_unavailable_notice_v<pv>` doesn't exist,
the SessionStart response includes hookSpecificOutput.additionalContext
+ systemMessage explaining the situation. Marker file is plugin-
version-keyed so a future fix (e.g. shipping out-of-process SDK) can
bump pv and re-notify users.

BUILD_FAILED is intentionally excluded from the notice — it covers
transient causes where a permanent banner would mislead.

Verified locally on macOS Python 3.13:
  - py_compile clean on both files.
  - Existing 45-test smoke + extensibility suite: 45/45 PASS in 2.50s.
  - Unit test of simulated 3.9 path: HOOK_PY_INCOMPATIBLE returned with
    correct phase/kind; notice shown on first call, suppressed on
    second, reshown on bumped pv; BUILD_FAILED correctly does NOT
    trigger notice.

NOT verified: actual Python 3.9 behavior end-to-end (would need a 3.9
install). Worth a follow-up smoke test in a 3.9 venv before next
release. The unit test simulating 3.9 covers the logic but not the
runtime invocation through the shim.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 22:58:01 -07:00
Mohamed Hegazy
c11244778d
Address Windows verification: --prefer-binary + pywin32 bootstrap
The first round of this PR removed SKIP_WIN32, fixed venv_py to use
Scripts/python.exe, and added Lib/site-packages to the consumer glob —
all necessary. Windows verification (Win11 ARM64, Py 3.13, Git Bash)
showed two more blockers, both addressed here.

1. Pip dependency resolver picks unbuildable cryptography on ARM64.

   Without --prefer-binary, pip picks a cryptography version with no
   published ARM64 wheel and tries to build it from source. That needs
   Rust/Cargo, almost never present on user machines → BUILD_FAILED
   with err_kind=other:cryptography. A binary wheel exists for an
   adjacent version (cryptography-46.0.3-cp311-abi3-win_arm64.whl);
   --prefer-binary tells pip to pick it. Cross-platform safe (no-op
   where the latest version already has a wheel).

2. pywin32 .pth files aren't processed by sys.path.insert().

   With the venv built, ensure_agent_sdk.py's post-build probe passes
   (it runs from venv_py, where Python's site.py at startup processes
   pywin32.pth and registers win32/, win32/lib/ plus runs
   pywin32_bootstrap.py to set the DLL search dir). But llm.py runs in
   the hook's SYSTEM Python and adds the venv via sys.path.insert(),
   which doesn't trigger site.py at all. Without the bootstrap, the
   SDK's mcp.client.stdio → mcp.os.win32.utilities chain raises
   ModuleNotFoundError: pywintypes and the agentic reviewer falls back
   to single-shot silently — exactly the symptom this PR is trying to
   fix. The probe says NOOP_VENV; the actual consumer fails. Probe and
   consumer use different Pythons.

   Replicate what site.py would do: after inserting site-packages,
   also insert win32/ and win32/lib/, then exec pywin32_bootstrap.py.
   Pulled into a shared helper _inject_agent_sdk_venv_into_syspath()
   so both consumer sites (3P SDK fallback, agentic_review fallback)
   call the same code — Windows handling stays in one place.

Verified on macOS (POSIX path unchanged):
- Helper end-to-end test: POSIX-layout venv detected + fake package
  imports successfully via the injected path
- Windows-layout venv also detected; win32 branch correctly skipped
  via sys.platform check
- Both files pass py_compile

Credit: @mhegazy verified the previous commit on Win11 ARM64 / Py 3.13
/ Git Bash, surfaced both issues end-to-end, and provided the exact
fix patterns. This commit applies them with the pywin32 part factored
into a shared helper (vs. inlining at both consumer sites).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 15:07:33 -07:00
Mohamed Hegazy
4decd2e3b2
Enable agentic commit reviewer on Windows
The agentic reviewer is silently no-op on Windows today. SessionStart
bootstrap (ensure_agent_sdk.py) short-circuits with SKIP_WIN32 because
the consumer glob in llm.py only matches POSIX venv layout
(lib/pythonX.Y/site-packages). On Windows, venvs use Lib/site-packages
(capital L, no pythonX.Y subdir), so even if a venv got built the
glob wouldn't find its contents.

Result: Windows users on default installs (no system-wide
claude_agent_sdk) get layer 1 (pattern warnings) and layer 2 (single-
shot LLM diff review) but not layer 3 — the cross-file agentic review
that catches IDOR, auth-bypass, cross-file SSRF, and other things that
need to read related files. Plugin description claims layer 3 but it
silently doesn't run.

Three changes:

1. llm.py — extend the consumer glob (2 sites: 3P SDK fallback at
   ~L297, agentic_review fallback at ~L1090) to also match the Windows
   Lib/site-packages layout, so a venv built on Windows is actually
   discoverable.

2. ensure_agent_sdk.py — remove the sys.platform == 'win32' early-exit
   so the SessionStart bootstrap builds the venv on Windows too.
   Outcome code 4 (formerly SKIP_WIN32) is retired but not reused so
   pre-fix telemetry rows still decode correctly.

3. ensure_agent_sdk.py — venv_py path now branches on sys.platform:
   Windows venvs put the interpreter at Scripts\python.exe; POSIX
   uses bin/python. Previously assumed POSIX, so even with the glob
   fix, the post-build SDK-importability probe would fail on Windows.

Verified locally on macOS:
- glob test: both layouts now match (POSIX venv detected, simulated
  Windows venv also detected via the new Lib/site-packages branch)
- both files pass py_compile
- POSIX path unchanged (sys.platform != 'win32' so old branch runs)

Not verified on Windows in this commit — needs an actual Windows
runner to confirm the venv build + SDK import + subprocess plumbing
all work end-to-end. The SDK spawns a child claude.exe; Windows
process plumbing has its own quirks (shell semantics, path escaping)
that may surface separately. Worth a controlled rollout (one-week
soak under env-var opt-in before flipping default).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 14:50:51 -07:00
Mohamed Hegazy
0bde168648
Update security-guidance plugin 2026-05-26 14:06:52 -07:00