Mohamed Hegazy 475038edfc
security-guidance: emit HTTP error codes + fix sdk_bootstrap phase/err encoding
Fills two failure-visibility gaps in plugin telemetry.

## Gap 1: HTTP errors from _call_claude invisible

Before: a 4xx/5xx response from the LLM API caused `_call_claude` to
return None and produce ZERO fingerprint in tengu_hook_plugin_metrics.
A failed call looked identical to "no review needed". The recent
deprecation-400 outage (PR #2105, output_format → output_config.format,
#2098) was invisible in aggregate dashboards until a user manually
reported errors from their debug log. Cohort-specific or partial
outages would never show up in BQ.

Fix: add `http_err_last` (most recent status) and `http_err_count`
to the existing `_USAGE` accumulator in `_base.py`. `_usage_metrics()`
snapshots them whenever count > 0 (skip-path no-pollute contract
preserved when count == 0). All `_call_claude` error sites now call
the new `_record_http_error()` helper alongside the existing
`_last_call_claude_http_error` module-state assignment.

Now any future API failure category is queryable in BQ in real time:

  SELECT
    DATE(server_timestamp, "America/Los_Angeles") AS d,
    CAST(JSON_VALUE(additional_metadata, "$.http_err_last") AS INT64) AS code,
    COUNT(*) AS n
  FROM ... WHERE event_name = "tengu_hook_plugin_metrics"
    AND JSON_VALUE(additional_metadata, "$.pluginId") LIKE "%security-guidance%"
    AND JSON_VALUE(additional_metadata, "$.http_err_count") IS NOT NULL
  GROUP BY d, code ORDER BY d, n DESC

## Gap 2: sdk_bootstrap_phase / sdk_bootstrap_err always NULL in BQ

Before: ensure_agent_sdk.py emitted these as strings (e.g. "pip",
"dns_fail"). CC's plugin-metrics pipeline silently drops
plugin-emitted string values — only bool|finite-number plugin metrics
reach BigQuery. (CC-core fields like `subscription_type` are exempt
because they're injected downstream of plugin validation.)

Confirmed empirically: ~185K BUILD_FAILED rows in BQ over the past 2
days had `sdk_bootstrap_phase` = NULL and `sdk_bootstrap_err` = NULL
despite the Python code emitting them. ~28K BUILD_FAILED sessions/day
had no diagnostic split — flying blind on whether the failures are
pip-no-match vs dns-fail vs ssl-verify vs proxy-auth etc.

Fix: encode phase + err_kind as stable integers via
SDK_BOOTSTRAP_PHASE_CODES and SDK_BOOTSTRAP_ERR_CODES. Phase: 1=pre,
2=venv, 3=pip, 4=main. Err: 10 known categories (1-10), 11-98
reserved, 99 = uncategorized catch-all (covers "exc:<X>", "other:<X>",
and unmapped strings). APPEND-ONLY for telemetry stability.

Also corrects the misleading "CC accepts string metric values"
comment in ensure_agent_sdk.py that led to the bug originally.

Verified locally on macOS Python 3.13:

  - py_compile clean.
  - 32 new tests in test_telemetry_failure_signals.py (added to
    internal test suite at sg-staging/tests/, not in this PR):

      * 4 HTTP-error tracking unit tests: _record_http_error increments
        count + tracks most-recent; handles None/invalid; -1 for
        network/timeout.
      * 4 _usage_metrics emission tests: empty when no activity;
        successful call has no http_err fields; failure-only has http_err
        and no api_calls; mixed has both.
      * 1 contract test: every emitted value is bool|finite-number
        (catches future regression of the string-dropping bug class).
      * 13 sdk_bootstrap encoding tests (parametrized over the 10 known
        err_kind categories + 5 catch-all shapes): each maps to the
        right integer; unknown phase = 0; unknown err = 99.
      * 1 static-shape regression catcher: every `err_kind = "..."`
        string in ensure_agent_sdk.main() must be in
        SDK_BOOTSTRAP_ERR_CODES (otherwise new err_kinds silently
        collapse to 99).
      * 2 emit-shape regression catchers: the assignments in main() go
        through _encode_phase / _encode_err_kind helpers (no raw
        strings); no literal string values for sdk_bootstrap_phase/err.
      * 1 comment-accuracy: the misleading "CC accepts string metric
        values" comment is gone.

  - Full suite: 437/437 pass + 2 skipped (live API tests, opt-in).

NOT verified end-to-end against BQ — would require shipping +
observing in production for 24h to confirm the http_err and
sdk_bootstrap_phase/err fields actually appear in
tengu_hook_plugin_metrics rows. The unit tests pin the contract; if
the wire shape is broken, BQ will show NULL for the new fields and we
revisit (with the same diagnostic the BUILD_FAILED bug gave us).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 08:34:35 -07:00
..
2026-05-26 14:06:52 -07:00

security-guidance

Security review for Claude-generated code. Three layers:

  1. Pattern warnings — instant regex-based reminders on Edit/Write for ~25 known-dangerous patterns (yaml.load, torch.load(weights_only=False), pickle.load on untrusted data, raw innerHTML, hardcoded secrets, etc.).
  2. LLM diff review — when Claude finishes a turn, the plugin sends the diff to a fast LLM call (Opus 4.7 by default) and feeds high-severity findings back to Claude so it can fix them before you see the response.
  3. Agentic commit review — on git commit, an SDK-driven reviewer reads related files (Read/Grep/Glob) to trace data flow across the codebase, catching multi-file vulnerabilities pattern matching misses (IDOR, auth bypass, cross-file SSRF).

Findings cover common web-vulnerability classes — injection, XSS, SSRF, hardcoded secrets, IDOR, auth bypass, unsafe deserialization, and path traversal among others.

Install

/plugin install security-guidance@claude-plugins-official

Marketplace ships enabled by default in Claude Code — no setup beyond having the CLI itself.

Prerequisites

  • Claude Code CLI ≥ v2.1.144
  • Python 3.8+ on PATH (python3, python, or py -3 — the plugin picks the first that works)
  • A working API path (subscription, API key, or 3P provider config)

Configuration

All configuration is via environment variables. None are required for default behavior.

Selecting a model

# 1P / gateway: a canonical model id
SECURITY_REVIEW_MODEL=claude-opus-4-7   # default

# Bedrock: use the inference-profile id
SECURITY_REVIEW_MODEL=us.anthropic.claude-opus-4-7

# Vertex: use the Vertex date-tag form
SECURITY_REVIEW_MODEL=claude-opus-4-7@20260218

SECURITY_REVIEW_MODEL controls the LLM diff review. SG_AGENTIC_MODEL (same syntax) controls the agentic commit reviewer; defaults to the same model.

Enabling/disabling layers

Variable Default What it does
SECURITY_GUIDANCE_DISABLE=1 unset Kill switch — disables the entire plugin
ENABLE_PATTERN_RULES=0 on Disable layer 1 (regex pattern warnings)
ENABLE_CODE_SECURITY_REVIEW=0 on Disable all LLM reviews (Stop hook + commit/push)
ENABLE_STOP_REVIEW=0 on Disable only the Stop-hook diff review, keeping commit/push reviews. Useful for multi-agent / shared-worktree setups where another agent can move HEAD between a worker's turns
ENABLE_COMMIT_REVIEW=0 on Disable layer 3 (agentic commit review)

Higher-recall mode

SG_DUAL_OR=on   # default off

Runs two parallel review calls and unions the findings. Catches a few percentage points more vulnerabilities in our testing, at roughly 2× the API cost per review. Most users don't need it.

Org-specific policies

Drop a claude-security-guidance.md in any of:

  • ~/.claude/claude-security-guidance.md — user-wide rules
  • <project>/.claude/claude-security-guidance.md — project rules, intended to be committed
  • <project>/.claude/claude-security-guidance.local.md — local overrides, intended to be .gitignore'd

All three are loaded and concatenated into the LLM diff review's prompt in the order user → project → project-local. If the combined size exceeds the 8 KB prompt budget, the tail is truncated, so user-wide rules are kept and project-local rules are dropped first. The agentic commit reviewer (layer 3) does not currently read this file. Example:

# Acme security rules

- All SELECTs against the `customers` or `orders` tables MUST go through `db.replica`,
  never `db.primary`. Primary is for writes only.
- Background jobs must not use the user-context auth token; they get
  service-account creds from `jobs.get_service_account()`.
- Calls to `requests.get(url)` with a user-controlled `url` need
  the SSRF-allowlist wrapper at `acme.net.safe_request`.

Built-in rules cover common web-vulnerability classes without it — claude-security-guidance.md is for things specific to your codebase that the model can't infer.

Privacy and data handling

The plugin sends data to a model endpoint to perform its reviews. Specifically, each Stop-hook diff review transmits the changed file paths, the diff hunks, and the relevant file contents in the diff; each agentic commit review additionally transmits any files the reviewer pulls in via Read/Grep/Glob while tracing data flow. Your claude-security-guidance.md contents (user, project, and local) are appended to the prompt on every review, so don't put secrets in it.

Where that data goes depends on your Claude Code configuration:

  • Default (Anthropic API / subscription): sent to api.anthropic.com and handled under Anthropic's Commercial Terms and Privacy Policy.
  • LLM gateway (ANTHROPIC_BASE_URL set): sent to your gateway URL instead. The gateway operator's terms apply.
  • 3rd-party providers (Bedrock / Vertex / Foundry / Mantle): sent to your configured provider endpoint. The provider's data-handling terms apply (e.g., AWS / GCP / Azure).

The plugin writes its own debug log to ~/.claude/security/log.txt (override with SECURITY_GUIDANCE_DEBUG_LOG). The log contains diffstate metadata and finding categories — no full file contents or model prompts — and rotates at 1 MB. Nothing is uploaded.

Limitations

This is a best-effort assistive tool, not a guarantee. Treat findings as suggestions, not as a substitute for human code review, SAST/DAST, dependency scanning, or pen-testing. The reviewer can miss vulnerabilities, produce false positives, and may behave differently across codebases, languages, and model versions. No warranty is provided — use is subject to Anthropic's Commercial Terms.

Troubleshooting

Plugin doesn't seem to fire — check that ~/.claude/claude-security-guidance.md (or hook activity) shows in debug logs. Run Claude Code with --debug-file /tmp/claude/debug.txt and grep for security_reminder_hook. The plugin also writes its own log to ~/.claude/security/log.txt.

Review never finds anything — verify your API path works. On 3P providers, check SECURITY_REVIEW_MODEL is set to a provider-specific id (not a bare claude-opus-4-7). On LLM gateways, check the gateway's logs for POST /v1/messages traffic from the plugin.

Too many false positives — drop SECURITY_REVIEW_MODEL to a cheaper model (claude-sonnet-4-6) and re-evaluate; if precision is the priority, stay on Opus 4.7.

Want to silence a specific finding — add a comment to the line explaining why it's safe; the LLM reviewer treats inline justifications as exclusions. For systemic exclusions, document them in your claude-security-guidance.md.

Reporting issues

Open an issue on the security-guidance plugin repo with:

  • The Claude Code CLI version (claude --version)
  • Provider setup (1P / Bedrock / Vertex / LLM gateway / etc.)
  • A minimal repro diff
  • The relevant section of ~/.claude/security/log.txt