claude-plugins-official

mirror of https://github.com/anthropics/claude-plugins-official.git synced 2026-08-02 06:27:09 -03:00

Author	SHA1	Message	Date
Morgan Lunt	746c982737	Merge pull request #2467 from anthropics/morganl/code-mod-secrets-redaction code-modernization: never write discovered credential values into findings	2026-06-09 08:49:47 -07:00
Morgan Lunt	ecc5139c30	code-modernization: legacy toolchain is advisory, not a transform blocker Legacy code often cannot build locally by nature — CICS/IMS programs have no local translator and the real runtime may be a mainframe the user doesn't have. Stopping transform on a failed legacy smoke compile would block exactly those systems. - transform Step 0a: the target toolchain remains required (its tests cannot run without it); a failed or impossible legacy compile no longer stops the run — the equivalence strategy switches to recorded traces / golden-master fixtures, and that downgrade is stated in the plan and in TRANSFORMATION_NOTES.md so reviewers know the strength of the proof - preflight: a red legacy toolchain now yields Ready-with-gaps for transform/reimagine instead of Not-ready	2026-06-09 08:48:05 -07:00
Morgan Lunt	0c40a74425	code-modernization: add topology viewer screenshot to README Rendered from AWS CardDemo (the public mainframe sample app) by running /modernize-map end to end and capturing the resulting TOPOLOGY.html.	2026-06-09 08:48:04 -07:00
Morgan Lunt	bec8d7c93a	code-modernization: vendor d3, viewer robustness, status command, pipeline fixes Viewer (assets/topology-viewer.html): - inline a minified d3 subset (hierarchy/pack, zoom, selection, interpolateZoom, ease; ISC license) instead of loading from a CDN — the page is now fully self-contained and works on air-gapped networks - handle duplicate node ids (unique-suffix; edges bind to the first occurrence) and store parent references directly, fixing level-of-detail and selection corruption with messy generated data - share one reveal rule between drawing, edge culling, and hit-testing so edges no longer draw into collapsed containers - pre-bucket edges by kind and keep a per-node adjacency map; the hover/selection pass no longer scans every edge each frame - cancel in-flight fly-to animations when a new one starts; clamp fly-to zoom to the zoom extent; derive max zoom from the smallest leaf so deep estates stay reachable - render dead-end candidates (new deadEnds field) with a dashed outline and a sidebar badge - clicking a node during a flow walkthrough exits the walkthrough; search results clear on selection and Escape; surrogate-safe label truncation; clearer stats line; explicit empty-topology message Commands: - new /modernize-status: read-only progress report — artifact inventory with timestamps, staleness flags, secrets-hygiene checks, next step - map: deadEnds in the topology schema; datastore names must be logical identifiers with credentials stripped from URLs/DSNs - brief: read topology.json + .mmd files (not the interactive HTML); staleness check against inputs; effort unit aligned to person-months - transform: secret-safe characterization-test prompt; diff -y fallback when delta is missing; credential-safe diff selection - reimagine: target vision is everything after the first argument (was silently truncated to one word); masking rules in spec/scaffold/ handoff prompts - brief/transform/reimagine: human-approval gates phrased as explicit stop-and-wait instead of 'enter plan mode' - preflight: delta in the tool table; brief added to the verdict list - README: preflight/status in the workflow; legacy/ deny list also covers Write; plugin + marketplace descriptions updated	2026-06-09 08:48:04 -07:00
Morgan Lunt	8745968186	code-modernization: harden topology viewer and template injection Fixes from an adversarial review of the new viewer: - pin d3 to 7.9.0 and load it via dynamic import with an explicit error panel when the CDN is unreachable (previously a blocked CDN produced a silent dark page — a real concern for restricted networks) - coerce ids/names/loc at intake: a single missing name or non-numeric loc previously threw inside the render loop or propagated NaN through the pack layout, blanking the canvas with no error - normalize flows/steps/edges defensively (null entries, missing steps, numeric ids vs string lookups) - mirror the level-of-detail reveal rule in the hit test so clicks can't select nodes that aren't drawn - scope the Escape shortcut so clearing the search box doesn't reset the viewport; set zoom clickDistance(4) so trackpad jitter doesn't swallow selection clicks - round canvas backing-store size (fractional devicePixelRatio caused a reallocation every frame on 125%/150% display scaling) - modernize-map: use braced ${CLAUDE_PLUGIN_ROOT} so substitution actually happens, assert the injection marker exists in the template, and correct the documented failure mode	2026-06-09 08:48:04 -07:00
Morgan Lunt	1c4a5cfded	code-modernization: interactive topology map, preflight command, persona flows modernize-map previously rendered the call graph and data lineage as static Mermaid diagrams, which become unreadable once a node has ~10+ edges — exactly the shape of real legacy systems. It now builds an interactive viewer from a shipped template (assets/topology-viewer.html): a zoomable circle-pack of domains/modules sized by LOC, rendered to canvas with level-of-detail reveal, dependency edges with per-kind toggles, search with fly-to, a per-node detail sidebar, and a flow walkthrough mode. Small domain-level .mmd exports remain for docs. - topology.json now has a documented schema (hierarchy + edges + entry points + observations + flows) consumed by the viewer - map traces 2-4 business flows anchored to personas (claimant, operator, auditor), each step in plain business language mapped to the modules that implement it; the viewer plays them as numbered paths - brief gains a Business Walkthroughs section connecting each persona flow to the phase that replaces it - new modernize-preflight command: detects the stack, checks analysis tooling, smoke-compiles a real source file with the legacy toolchain, inventories missing copybooks/descriptors/binary-only artifacts, and writes a per-command readiness verdict - transform now verifies legacy + target toolchains before its plan gate instead of failing at test time - README: commands updated, optional-tooling section reframed as 'what to give Claude'	2026-06-09 08:48:04 -07:00
Morgan Lunt	4f49895abd	code-modernization: assess writes the full quarantine ignore set assess only added SECRETS.local.md to analysis/.gitignore, leaving *.local.patch uncovered until harden's own Step 0 ran. Both patterns are now written by whichever command runs first.	2026-06-09 08:47:34 -07:00
Morgan Lunt	9d49c4b135	code-modernization: close remaining credential-leak paths A red-team pass found four ways credential values still reached shareable artifacts after the initial redaction: - the remediation patch: a diff removing a hardcoded secret carries the raw value on its '-' lines by construction. harden now splits output: non-credential hunks in the shareable security_remediation.patch, credential hunks in a gitignored security_remediation.local.patch with comment-only placeholders in the shareable file - the other four agents had no secret-handling rules. legacy-analyst (hardcoded-config evidence in tech-debt findings), business-rules-extractor (credentials recorded as rule parameters), test-engineer (legacy literals becoming committed test fixtures), and architecture-critic (quoted code in notes files) now all mask values and cite file:line; assess's tech-debt prompt and ASSESSMENT.md masking now cover every section, not just Security Findings - non-git projects: a .gitignore protects nothing under SVN/Mercurial. Both commands now refuse --show-secrets without git and write the quarantine file to ~/.modernize/<system>/ outside the project tree - the patch-apply instruction was wrong in both documented layouts (symlinked legacy/ broke relative paths). Patches are now written with project-root-relative paths and applied from the project root Also: --show-secrets is now position-independent in both commands, and the README documents the full model.	2026-06-09 08:47:34 -07:00
Morgan Lunt	ff5feaeb7f	code-modernization: never write discovered credential values into findings Legacy systems often contain live credentials, and assessment/findings files get committed and shared. Previously the security-auditor agent reported hardcoded secrets verbatim into ASSESSMENT.md and SECURITY_FINDINGS.md. - security-auditor: mandatory secret-handling rules — mask all credential values (file:line + 2-4 char preview), redact secrets from echoed tool output, recommend rotation for anything that looks live - assess/harden: gitignore-verified SECRETS.local.md quarantine file for the per-credential inventory; findings files get masked entries and a pointer only - new --show-secrets flag opts into raw values in the quarantine file (and only there) - README: document the behavior and advise users of earlier versions to check for already-committed findings and rotate	2026-06-09 08:47:33 -07:00
Mohamed Hegazy	70c28b9c2f	security-guidance: emit schema-valid Stop-hook output (#2159 ) — 2.0.2 → 2.0.3 Fixes #2159. The Stop hook emits feedback via `hookSpecificOutput: {hookEventName: "Stop", additionalContext}`, but `Stop`/`SubagentStop` are NOT members of CC's `hookSpecificOutput` discriminated union (coreSchemas.ts — valid members are PreToolUse, PostToolUse, UserPromptSubmit, SessionStart, etc.). So the emitted shape violates CC's documented hook-output schema. Impact is CC-version-dependent — important nuance, established empirically: - Reporter (array0224-cloud) on CLI 2.1.150 / 2.1.152: CC rejects the Stop feedback; the block/reason never reaches the model, so the auto-rewake/fix loop is lost. (Detection still runs + logs.) - On CLI 2.1.160 (current) the asyncRewake completion path is lenient: its gate is `isSyncHookJSONOutput` (hooks.ts) which is just `!(json.async === true)` — NOT a strict schema parse. So the invalid hookSpecificOutput is tolerated: metrics + rewakeSummary are still consumed and the model still receives the findings. I could NOT reproduce the rejection on 2.1.160, and BQ confirms Stop-path vulns_found metrics are recorded normally (~21k with-vuln fires / 3d), i.e. NOT dropped. (An earlier draft of this message claimed metrics were dropped — that was wrong; corrected after checking telemetry + repro'ing the old plugin on 2.1.160.) So this is defensive schema-correctness: the plugin should emit output that conforms to CC's documented union regardless of how strictly a given CC version validates it. The reporter's environment validates strictly; relying on the current version's leniency is fragile. Fix (CC's documented asyncRewake "clean pattern" — hooks.ts: "error text on stderr, JSON on stdout"): - For Stop/SubagentStop, emit_metrics writes guidance to stderr (the asyncRewake body channel CC delivers via `stderr \|\| stdout`) and sets top-level `decision: "block"` + `reason` (valid SyncHookJSONOutput fields; also the documented sync Stop-hook contract for the `-p` fallback). It does NOT emit a Stop hookSpecificOutput. - PostToolUse (commit-review, push-sweep) is unchanged — valid union member, keeps the modern hookSpecificOutput protocol. Verified locally on macOS Python 3.13: - py_compile clean. - 11 new tests (test_2159_stop_hook_schema.py) pin the contract: Stop output carries no hookSpecificOutput, uses top-level decision/reason, writes guidance to stderr; the emitted hookEventName (when present) is a valid union member. 2 existing tests that asserted the buggy Stop->hookSpecificOutput shape were corrected. Full suite 464/464 pass + 2 skipped. - END-TO-END in /tmux on CLI 2.1.160: * FIXED plugin (2.0.3): staged pickle.loads + os.system, benign edit pulls the file into review_set; Stop LLM review found 2 critical vulns; CC delivered a clean rewake ("Background security review found issues" + both findings). All hooks (UPS, PostToolUse[Edit] pattern, PostToolUse[Bash] commit-review, Stop) fired clean; zero schema rejections / errors / http_err in the debug log. * OLD plugin (2.0.2) on the SAME 2.1.160: also delivered Stop feedback (confirming the no-repro-on-latest finding above) — which proves the fix carries NO regression risk on current CC while making the output robust for the stricter versions where it actually breaks. Version bumped 2.0.2 -> 2.0.3 per the per-PR-bump policy (#2114: a bump is the only way the fix reaches the existing fleet — relevant for users still on the CC versions where this breaks). Closes #2159. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-02 11:30:11 -07:00
Mohamed Hegazy	009392eee4	security-guidance: 5 venv-specific err_kind categories + stderr_signature bucket (2.0.1 → 2.0.2) PR #2112's telemetry visibility surfaced an immediate finding from the first 3h of v2.0.1 data: 2,406 phase=2 / err=99 sessions — "venv stage / uncategorized" — dominating BUILD_FAILED. The original err_kind detection patterns were all pip-flavored (pip_no_match, dns_fail, ssl_verify, etc.) and didn't catch venv-creation failure modes, so they all collapsed to the catch-all _uncategorized (99) bucket. This PR fills the gap on two axes. ## 1. Five new venv-specific err_kind categories (codes 11-15) Each gated on `err_phase == "venv"` so the same substring doesn't mis-fire in pip-phase failures: - 11 `venv_ensurepip_fail` — Debian/Ubuntu without python3-venv installed; stderr matches "ensurepip is not available" or "ensurepip ... returned non-zero". Predicted to be the biggest chunk based on Linux distro market share. - 12 `venv_path_too_long` — Windows MAX_PATH (260) or POSIX ENAMETOOLONG. Triggered when state_dir + venv layout exceeds the path limit (deep Lib/site-packages/<pkg>/<...> paths). - 13 `venv_no_module` — `python3 -m venv` itself missing ("No module named 'venv'"). Rare but distinctive. - 14 `venv_already_exists` — Errno 17 / "file exists" — sentinel race past O_EXCL or stale dir survived `--clear`. - 15 `venv_setup_failed` — generic "virtual environment was not created successfully" catch-all for venv setup failures that don't match a more specific category. All 5 occupy reserved slots in SDK_BOOTSTRAP_ERR_CODES per the APPEND-ONLY contract from PR #2112. ## 2. `sdk_bootstrap_stderr_sig` integer hash For "other:<tail>" err_kinds (which encode to _uncategorized = 99), emit a bounded integer hash (0-999) of the first ~30 chars of the stderr tail. This restores cardinality to the _uncategorized bucket in BQ aggregation without unbounded keyspace — same stderr message always maps to the same bucket, so a real failure mode replicating across thousands of machines clusters cleanly. Bounded at 1000 buckets: well below any "high cardinality" alarm but wide enough to distinguish ~30 distinct dominant patterns (birthday-paradox collision probability ~50% at ~37 distinct inputs). The field auto-omits (`if sig:` gate) when err_kind is categorized — no key-budget cost on the common-case categorized failures. ## Version bump 2.0.1 → 2.0.2 PR #2114 confirmed the version-bump mechanism is the only way to propagate code changes to the existing fleet — without a bump, CC's plugin updater short-circuits on string-equality of installation version vs marketplace version. Following the policy we established: bump patch on every functional PR. By 17:31:42Z on 2026-06-01 (1m22s after #2114 merged), v2.0.1 was already appearing in BQ. v2.0.2 should follow the same propagation curve — ~30% adoption within 3 hours, full convergence within a few days. ## Verified locally - py_compile clean. - 15 new tests in test_venv_failure_deepdive.py (added to internal test suite at sg-staging/tests/, not in this PR): * 5 parametrized: each new err_kind maps to its expected code (11-15). * 1 APPEND-ONLY regression: existing codes 1-10 + 99 unchanged. * 6 stderr_sig: non-other inputs → 0; None/empty → 0; deterministic same-input → same-output; bounded to 0-999; distinct inputs → distinct hashes (5/5 with P(collision) ≈ 1%); leading-chars focus (path-varying stderr with shared 30-char prefix collide as designed). * 1 static-shape catcher: every new `err_kind = "venv_..."` branch in main() is guarded by `err_phase == "venv"`. Catches the regression where someone adds a venv pattern without the phase gate and starts mis-categorizing pip-phase failures. * 1 map-coverage: all err_kind strings assigned anywhere in ensure_agent_sdk.main() are present in SDK_BOOTSTRAP_ERR_CODES (catches new categories added in code but forgotten in the map). * 1 emit-shape: the metric block uses `_encode_stderr_sig`, the `sdk_bootstrap_stderr_sig` key is written conditionally on `if sig:`. Catches the regression where someone removes the helper or makes the emit unconditional (would pad every categorized BUILD_FAILED row with a zero-valued field). - Full suite: 452/452 pass + 2 skipped (live API tests, opt-in). ## What this unblocks in BQ ```sql -- For the 2,406 sessions/3h that were phase=2/err=99 on v2.0.1, -- v2.0.2+ will split them across the new categories. Query: SELECT CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap_err") AS INT64) AS err, CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap_stderr_sig") AS INT64) AS sig, COUNT(*) AS sessions FROM `proj-product-data-nhme.raw_events.claude_code_internal_event` WHERE _PARTITIONTIME >= ... AND CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap") AS INT64) = 3 AND CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap_phase") AS INT64) = 2 -- venv GROUP BY err, sig ORDER BY sessions DESC ``` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-01 16:05:49 -07:00
Mohamed Hegazy	fcdcd079e3	Merge pull request #2112 from anthropics/telemetry-failure-signals security-guidance: emit HTTP error codes + fix sdk_bootstrap phase/err encoding (telemetry visibility)	2026-06-01 10:31:29 -07:00
Mohamed Hegazy	17b532f92e	security-guidance: bump 2.0.0 → 2.0.1 to propagate 8 weeks-of-fixes to the existing fleet The 8 PRs we shipped since 2026-05-26 (#2076, #2077, #2078, #2086, #2091, #2100, #2101, #2105) all changed plugin code without bumping the version. CC's plugin updater uses string equality for the freshness check (pluginOperations.ts:1835): const isUpToDate = installation.version === newVersion \|\| installation.installPath === versionedPath \|\| installation.installPath === zipPath if (isUpToDate) return { alreadyUpToDate: true } Users who installed v2.0.0 anywhere between 2026-05-26 and 2026-05-31 have `installation.version === "2.0.0"` in their installed_plugins.json. The marketplace also advertises "2.0.0" (until this commit), so isUpToDate returns true and the plugin cache directory is never refreshed — they keep running whatever 2.0.0 code was current on the day they installed. The marketplace git pull happens; the per-user cache install does NOT. Empirical evidence: in BQ today (5/31) on Windows v2.0.0 fires, 73% emit sdk_bootstrap outcome 4 (SKIP_WIN32) — a code path retired in PR #2055's Windows-enable fix. Those users are running a plugin tree that pre-dates the fix, even though their telemetry shows pv=20000. The fix is a one-line version bump. Once the marketplace advertises 2.0.1, every CC autoupdate cycle sees installation.version (2.0.0) != newVersion (2.0.1), installs the new version, and the user's next session loads the fixed code. This PR: 1. plugins/security-guidance/.claude-plugin/plugin.json: 2.0.0 → 2.0.1 2. .claude-plugin/marketplace.json security-guidance entry: 2.0.0 → 2.0.1 What 2.0.1 carries (versus 2.0.0 as published 5/26): - #2076 — Graphite gt commit/push detection - #2077 — hookSpecificOutput.additionalContext on async-rewake exit-2 - #2078 — CLAUDE_CONFIG_DIR support - #2086 — core.quotePath=false on diff feeders (Arabic/Hebrew/CJK paths) - #2091 — fix Bash(...\|...) if-clause regression from #2076 - #2100 — drop text=True from subprocess.run, bake PYTHONUTF8=1 (Windows non-cp1252 path crash) - #2101 — core.quotePath=false on GIT_CMD globally - #2105 — output_format → output_config.format API migration (#2098) Verified locally: - plugin.json + marketplace.json both valid JSON. - _read_plugin_version_int() returns 20001 (was 20000). - Existing test suite passes — 408 tests, no regressions caused by the version bump itself. (29 unrelated failures are from test_telemetry_failure_signals.py which expects PR #2112's not-yet-merged code.) Going forward: bumping `patch` on every functional PR closes this gap entirely. Without that policy, every fix only reaches NEW installs, never the existing fleet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-31 12:10:16 -07:00
Mohamed Hegazy	475038edfc	security-guidance: emit HTTP error codes + fix sdk_bootstrap phase/err encoding Fills two failure-visibility gaps in plugin telemetry. ## Gap 1: HTTP errors from _call_claude invisible Before: a 4xx/5xx response from the LLM API caused `_call_claude` to return None and produce ZERO fingerprint in tengu_hook_plugin_metrics. A failed call looked identical to "no review needed". The recent deprecation-400 outage (PR #2105, output_format → output_config.format, #2098) was invisible in aggregate dashboards until a user manually reported errors from their debug log. Cohort-specific or partial outages would never show up in BQ. Fix: add `http_err_last` (most recent status) and `http_err_count` to the existing `_USAGE` accumulator in `_base.py`. `_usage_metrics()` snapshots them whenever count > 0 (skip-path no-pollute contract preserved when count == 0). All `_call_claude` error sites now call the new `_record_http_error()` helper alongside the existing `_last_call_claude_http_error` module-state assignment. Now any future API failure category is queryable in BQ in real time: SELECT DATE(server_timestamp, "America/Los_Angeles") AS d, CAST(JSON_VALUE(additional_metadata, "$.http_err_last") AS INT64) AS code, COUNT() AS n FROM ... WHERE event_name = "tengu_hook_plugin_metrics" AND JSON_VALUE(additional_metadata, "$.pluginId") LIKE "%security-guidance%" AND JSON_VALUE(additional_metadata, "$.http_err_count") IS NOT NULL GROUP BY d, code ORDER BY d, n DESC ## Gap 2: sdk_bootstrap_phase / sdk_bootstrap_err always NULL in BQ Before: ensure_agent_sdk.py emitted these as strings (e.g. "pip", "dns_fail"). CC's plugin-metrics pipeline silently drops plugin-emitted string values — only bool\|finite-number plugin metrics reach BigQuery. (CC-core fields like `subscription_type` are exempt because they're injected downstream of plugin validation.) Confirmed empirically: ~185K BUILD_FAILED rows in BQ over the past 2 days had `sdk_bootstrap_phase` = NULL and `sdk_bootstrap_err` = NULL despite the Python code emitting them. ~28K BUILD_FAILED sessions/day had no diagnostic split — flying blind on whether the failures are pip-no-match vs dns-fail vs ssl-verify vs proxy-auth etc. Fix: encode phase + err_kind as stable integers via SDK_BOOTSTRAP_PHASE_CODES and SDK_BOOTSTRAP_ERR_CODES. Phase: 1=pre, 2=venv, 3=pip, 4=main. Err: 10 known categories (1-10), 11-98 reserved, 99 = uncategorized catch-all (covers "exc:<X>", "other:<X>", and unmapped strings). APPEND-ONLY for telemetry stability. Also corrects the misleading "CC accepts string metric values" comment in ensure_agent_sdk.py that led to the bug originally. Verified locally on macOS Python 3.13: - py_compile clean. - 32 new tests in test_telemetry_failure_signals.py (added to internal test suite at sg-staging/tests/, not in this PR): 4 HTTP-error tracking unit tests: _record_http_error increments count + tracks most-recent; handles None/invalid; -1 for network/timeout. * 4 _usage_metrics emission tests: empty when no activity; successful call has no http_err fields; failure-only has http_err and no api_calls; mixed has both. * 1 contract test: every emitted value is bool\|finite-number (catches future regression of the string-dropping bug class). * 13 sdk_bootstrap encoding tests (parametrized over the 10 known err_kind categories + 5 catch-all shapes): each maps to the right integer; unknown phase = 0; unknown err = 99. * 1 static-shape regression catcher: every `err_kind = "..."` string in ensure_agent_sdk.main() must be in SDK_BOOTSTRAP_ERR_CODES (otherwise new err_kinds silently collapse to 99). * 2 emit-shape regression catchers: the assignments in main() go through _encode_phase / _encode_err_kind helpers (no raw strings); no literal string values for sdk_bootstrap_phase/err. * 1 comment-accuracy: the misleading "CC accepts string metric values" comment is gone. - Full suite: 437/437 pass + 2 skipped (live API tests, opt-in). NOT verified end-to-end against BQ — would require shipping + observing in production for 24h to confirm the http_err and sdk_bootstrap_phase/err fields actually appear in tengu_hook_plugin_metrics rows. The unit tests pin the contract; if the wire shape is broken, BQ will show NULL for the new fields and we revisit (with the same diagnostic the BUILD_FAILED bug gave us). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-31 08:34:35 -07:00
Mohamed Hegazy	84011d43b1	security-guidance: migrate from deprecated output_format to output_config.format (#2098 ) Fixes #2098. The Anthropic Messages API moved structured-output schema specification from a top-level `output_format` field to a nested `output_config.format` field, per https://platform.claude.com/docs/en/build-with-claude/structured-outputs. Per docs the old form "will continue working for a transition period" — and indeed for api-key + non-streaming auth it still returns HTTP 200 (verified via live API). But OAuth Bearer users with CLI 2.1.158 hit `invalid_request_error: output_format: This field is deprecated. Use 'output_config.format' instead.` consistently — reporter saw 462 errors in one day. The trigger appears to be auth mode + possibly stream:true (their controlled curl bypass used Bearer + stream=true); api-key + non-streaming was my initial repro attempt and didn't fire. The bug only affected `_call_claude` (the legacy direct-urllib path). The agentic `_agentic_review` path goes through claude_agent_sdk → subprocesses to the `claude` CLI binary, which already uses the new `output_config.format` shape correctly (per src/utils/sideQuery.ts:263 in claude-cli-internal). So this PR only needs to fix the plugin's direct HTTP path. This commit: 1. llm.py: rewrite the payload literal in `_call_claude` to use `output_config: { format: { type: 'json_schema', schema: ... } }` instead of top-level `output_format`. 2. llm.py: in the adaptive-thinking branch, MERGE `effort: "high"` into the existing `output_config` dict instead of reassigning. Reassignment would silently clobber the format schema set in (1). The pre-existing code did `payload["output_config"] = {"effort": "high"}` which was correct WHEN output_format was top-level (and output_config wasn't otherwise used). With the migration the existing dict carries the schema, so we extend it not replace it. Verified locally on macOS Python 3.13: - py_compile clean. - Existing 401 tests still pass — 0 regression. - 6 new tests in test_2098_output_config_format.py (added to internal test suite at sg-staging/tests/, not in this PR): * 2 static-shape: the `_call_claude` source no longer contains top-level `"output_format":` AND uses `output_config`. The adaptive-thinking branch does NOT reassign output_config (and DOES set output_config['effort']). Catches the regression class where a future refactor reintroduces either bug. * 2 payload-shape unit (mocked urllib): both thinking_budget=0 and thinking_budget>0+adaptive code paths produce a payload with the correct `output_config.format` shape AND no `output_format` top-level. The adaptive path verifies both `format` and `effort` coexist in output_config (i.e., the merge fix works). * 2 live-API gating (skip-on-no-key): the new shape returns HTTP 200 against api.anthropic.com; the old shape's current status is recorded for canary purposes (still 200 for api-key today, but reporter shows it's 400 for OAuth). - Full suite: 405/405 pass + 2 skipped (live API tests, opt-in). - The reporter's exact deprecation 400 message reproduces if you swap auth to OAuth Bearer + stream:true (could not test locally without extracting the keychain OAuth token, which was out of scope). The fix shape is API-contract-level so it doesn't depend on which auth mode triggers the 400. NOT verified end-to-end via OAuth-authenticated plugin invocation on my machine (auto-mode classifier correctly declined to extract the keychain token). Reporter's 462 production errors + the docs migration notice + the live-API HTTP 200 on the new form are sufficient evidence to ship. Closes #2098. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 20:11:41 -07:00
Mohamed Hegazy	a40c9f1e83	security-guidance: move core.quotePath=false to GIT_CMD globally (#2099 followup) Followup to PR #2086 (which added the flag to 4 specific git call sites) and PR #2100 (text=True purge for #2099). The Windows reporter for #2099 noticed more git invocations still lacked the flag — rev-parse path queries (--show-toplevel, --git-dir, --git-common-dir), reflog %gs subjects, and `git show <sha>:<path>` all output paths but the per-site PR #2086 approach missed them. The result: an Arabic-named directory shows up via _git_diff_range but rev-parse-emitted paths get C-quoted, breaking downstream os.path.isabs() checks. Fix: add `-c core.quotePath=false` to GIT_CMD itself as the 4th config-set. Every subprocess.run using the GIT_CMD splat picks it up automatically — diff feeders, rev-parse path queries, reflog log, ls-files, status, git show. No more per-site flag duplication. This commit: 1. gitutil.py: add -c core.quotePath=false to GIT_CMD. 2. Remove the now-redundant per-site flags at the 7 call sites that previously had inline -c core.quotePath=false (cleanup, since the global setting subsumes them): gitutil.py: _git_diff_range, _git_name_only, _git_status_porcelain, get_git_diff (4 sites) diffstate.py: _list_untracked git ls-files (1 site) security_reminder_hook.py: commit-review git diff + git show (2 sites) Verified locally on latest main (post PR #2100 merge) with macOS Python 3.13: - py_compile clean on all 3 modified files. - Bare main BEFORE my fix: 400/401 pass — 1 failure proves the gap (test_git_cmd_contains_quotepath_false catches the missing flag). - Main + my fix: 401/401 pass. - 23 new tests in test_quotepath_global.py (added to internal test suite at sg-staging/tests/, not in this PR): 1 GIT_CMD-level: GIT_CMD list contains core.quotePath=false as a (-c, value) pair. Single source of truth — single place a future PR will be caught if the flag gets dropped. * 10 static-shape (one per hooks/.py): every subprocess.run uses the GIT_CMD splat (no bare git invocation that would bypass the global flag). * 12 end-to-end (parametrized over Arabic, Hebrew, CJK directory names): real git repo, _git_diff_range emits unquoted diff, extract_file_paths_from_diff and parse_diff_into_files keep the non-ASCII path in their output, _git_toplevel returns the non-ASCII path intact. - 1 staleness fix in test_diff_parser_non_ascii.py (test_no_bare_git_diff_or_show_without_flag): updated to accept EITHER inline core.quotePath=false OR *GIT_CMD splat (which globally provides it). NOT verified end-to-end on Windows with a non-ASCII repo root path. The new global-flag test pins the contract permanently, and the parametrized macOS tests confirm parser behavior on ASCII-control paths in non-ASCII directories. The Windows-specific rev-parse quoting behavior follows from the same git contract our macOS test environment exercises (POSIX git always emits raw UTF-8 regardless of quotePath; on Windows the flag is what makes output raw). Closes the #2099 followup specifically about _git_diff_range / rev-parse --show-toplevel / git log %gs paths slipping past. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 12:19:38 -07:00
Mohamed Hegazy	1ecf3d1bac	security-guidance: purge text=True from subprocess.run + bake PYTHONUTF8=1 (#2099 ) URGENT WINDOWS FIX. Sibling of #2056 / PR #2075 but covering 14 more sites that PR #2075 missed. The bug class: on Windows with cp1252 default encoding (typical en-US locale), `subprocess.run(..., text=True)` decodes child stdout AND stderr via `locale.getpreferredencoding()`. When git emits a UTF-8 byte that's undefined in cp1252 (e.g. `0x81` from ف, present in any path/filename/branch ref/commit message containing Arabic/Hebrew/CJK), Python's internal `_readerthread` raises UnicodeDecodeError. The thread crash is silent in Python 3.13+ (only printed to stderr), but `subprocess.run` returns `stdout=None` and the caller AttributeErrors on `.strip()`. The user sees a misleading "WinError 267" or similar catch-all message instead of the real decode failure. PR #2075 fixed 6 specific helpers in `diffstate.py` / `gitutil.py`. This commit covers the 14 survivors. Plus a defense-in-depth belt: `PYTHONUTF8=1` exported by sg-python.sh. This commit: 1. sg-python.sh: `export PYTHONUTF8=1` (PEP 540). No-op on macOS/Linux (already UTF-8). On Windows, makes Python's `locale.getpreferredencoding()` return UTF-8 instead of cp1252 — so even if a future regression slips in text=True, the decode succeeds. Must be set BEFORE Python starts; changing it from inside the interpreter has no effect. 2. gitutil.py: convert 8 subprocess.run sites from `capture_output=True, text=True` to `capture_output=True` + manual `r.stdout.decode("utf-8", errors="replace")`: - _git_rev_parse_head (stdout = SHA, stderr risk) - _find_git_index (stdout = PATH, primary bug site) - _temp_index git add (returncode only, stderr risk) - _git_toplevel (stdout = PATH, primary bug site) - _git_dir (stdout = PATH, primary bug site) - _git_rev_list_range (stdout = SHAs, stderr risk) - _detect_main_branch (stdout = ref, stderr risk) - merge-base --is-ancestor (returncode only, stderr risk) 3. security_reminder_hook.py: convert 6 subprocess.run sites (rev-parse @{u}/@{u}@{1}/local_ref, merge-base, HEAD lookup, reflog SHA resolution) — same pattern. 4. security_reminder_hook.py: fix the misleading log line in handle_user_prompt_submit. Was: debug_log("Failed to capture git baseline (not a git repo?)") Now includes the cwd in the message so the next reporter doesn't waste an hour grepping for the real WinError, per reporter's secondary finding. Verified locally on macOS Python 3.13: - py_compile clean on all modified files. - bash -n sg-python.sh clean. - sg-python.sh actually propagates PYTHONUTF8=1 to child Python (verified via probe — sys.flags.utf8_mode=1). - Existing 353 tests still pass — 0 regression. - 25 new tests in test_2099_subprocess_text_true.py: * 10 static-shape catchers (one per hooks/.py file). Any future PR that reintroduces text=True OR encoding= in subprocess.run fails this check at PR time. Single source of truth for the regression class. 2 sg-python.sh verifiers (literal export + actual propagation to child Python). * 5 macOS end-to-end against a real git repo containing non-cp1252 content (`ف.py` filename): _git_toplevel, _git_dir, _find_git_index, _git_rev_parse_head, _git_rev_list_range all return clean values without AttributeError / UnicodeDecodeError. * 7 round-trip bytes-decode pattern verifiers (parametrized over Arabic ف, Hebrew א, Japanese 案, raw 0x81, multiple cp1252-undefined bytes, real-world git diff headers). * 1 sanity check that cp1252 strict DOES raise on 0x81 (proves the test environment can catch the bug class). - Full suite: 378/378 pass in 5.56s. - End-to-end tmux smoke test driving real claude 2.1.145 CLI: Made a git commit via Bash tool call. All 4 hooks fired through the fixed plugin path: 11:28:16.730 Hook called with args: …/plugin/hooks/security_reminder_hook.py 11:28:16.734 Processing: hook_event=UserPromptSubmit 11:28:16.825 Captured git baseline: 445f7f213256 11:28:19.923 Hook called with args: … 11:28:19.923 Processing: hook_event=PostToolUse, tool=Bash 11:28:19.971 Commit review: detected git commit in command 11:28:20.020 Commit review: 1/1 sha(s) resolved, 1 files 11:28:26.415 Hook called with args: … 11:28:26.416 Processing: hook_event=Stop 11:28:26.550 Stop hook: empty review set Confirms: PYTHONUTF8=1 export doesn't break anything; converted helpers (_git_rev_parse_head, _git_toplevel, _git_dir, _find_git_index) run end-to-end without issue on the happy path. NOT verified end-to-end on Windows with actual non-cp1252 content in path/filename/stderr. The static-shape catcher pins the regression class permanently. Reporter's PYTHONUTF8=1 workaround empirically proves the encoding-mode fix works for the affected scenario; this commit just bakes it in. Closes #2099. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-30 11:29:07 -07:00
Mohamed Hegazy	c40770ae5a	Merge pull request #2078 from anthropics/fix-1868-claude-config-dir security-guidance: respect CLAUDE_CONFIG_DIR for plugin state files (#1868)	2026-05-29 16:14:35 -07:00
Mohamed Hegazy	42487ee6fd	Merge pull request #2091 from anthropics/fix-2089-if-clause-regression URGENT: security-guidance: fix #2089 regression — split \|-joined if clauses	2026-05-29 13:39:45 -07:00
Mohamed Hegazy	bc07f7a1fd	security-guidance: 5 precise if entries fixing #2089 regression + gt support URGENT REGRESSION FIX. PR #2076 (Graphite gt workflow) gated the PostToolUse commit/push hooks with: "if": "Bash(git commit:)\|Bash(gt create:)\|Bash(gt modify:)" "if": "Bash(git push:)\|Bash(gt submit:)" mirroring the regex-OR idiom that `matcher` uses ("Edit\|Write\|MultiEdit\|NotebookEdit"). But `if` is NOT a regex — it's a SINGLE permission-rule string. The CC harness's dispatch filter parses the entire `if` value as one rule of shape `ToolName(rule_content)` via: let firstParen = H.indexOf("("); let lastParen = H.lastIndexOf(")"); // searches from END if (lastParen !== H.length - 1) return { toolName: H }; let toolName = H.slice(0, firstParen); let ruleContent = H.slice(firstParen + 1, lastParen); Applied to the broken commit clause: toolName = "Bash" ruleContent = "git commit:)\|Bash(gt create:)\|Bash(gt modify:" The garbled `ruleContent` never matches any real command, so the hook never fires — for ANY workflow, not just gt. The plugin's deepest review layer was dead in production for all users on builds shipping PR #2076. Fix shape: split into separate hook entries, each with its own well-formed single-rule `if` clause. The Python hook self-routes commit vs push via the bash-command regexes and dedups concurrent spawns via `_claim_bash_hook_once`, so multiple entries firing the same script is safe. This commit: 1. hooks.json: 5 precise entries (one per command shape) instead of the broken \|-joined 2-entry form. Restores the original commit/ push behavior bit-for-bit (`Bash(git commit:)` + `Bash(git push:)` are unchanged from pre-#2076), and adds 3 separate entries for the Graphite commands (`Bash(gt create:)`, `Bash(gt modify:)`, `Bash(gt submit:)`). No git behavior change. The earlier draft used the broader `Bash(git )` + `Bash(gt )` per the reporter's suggestion, but that has a real cost: every `git status` / `git log` / `git diff` would spawn the Python hook only to early-exit via the regex matcher. Precise per-command entries avoid the spawn overhead and match the pre-#2076 cost profile exactly. 2. security_reminder_hook.py: widen `_GIT_COMMIT_RE` to tolerate `git -C <path>` and `git -c k=v` global options between `git` and `commit` (mirrors `_GIT_PUSH_RE`'s long-standing tolerance). Without this, `git -C /repo commit` is silently dropped by the handler — reporter flagged this as the secondary finding. Verified locally on macOS Python 3.13: - hooks.json valid JSON, 5 `if` clauses each parses to a single `{toolName: "Bash", ruleContent: "<command>:"}` pair. - py_compile security_reminder_hook.py clean. - 9-case regex sanity: all 4 commit forms match (bare, -C path, -c k=v, gt create/modify); 3 non-commit forms reject (status, gt submit, gt log). Pre-fix would reject -C path form. - 7 new tests in test_2089_if_clause_validity.py + 2 updated tests in test_gt_graphite_workflow.py: * 12 sanity tests for a Python parser mirroring harness's BA(H) — pinned so a future refactor can't silently start accepting the broken form. * 2 hooks.json validity: every `if` clause parses as a single valid rule; at least one if-gated hook exists. * 1 post-fix structure: separate entries cover git AND gt. * 2 updated gt-coverage: SOME clause covers git, SOME clause covers gt (no longer requires both in the same \|-joined clause, which was the broken shape). TDD-verified the test catches the bug: temporarily restored main's broken \|-joined hooks.json, ran the new test, saw `test_every_if_clause_is_single_valid_rule` fail with a clear error explaining #2089's cause. Restored fix, test passes. - Full suite: 336/353 pass (17 unrelated failures from open PRs #2078 / #2086 not in this branch). NOT verified end-to-end with a real CC instance triggering the hooks on a git or gt commit. The static-shape tests catch the regression class and the regex sanity tests pin the `git -C` tolerance, but the asyncRewake feedback loop needs runtime verification. Closes #2089. Restores the closes for #2048 that PR #2076 attempted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 13:22:20 -07:00
Mohamed Hegazy	9e150cfd48	Merge pull request #2086 from anthropics/fix-2082-diff-parser-non-ascii security-guidance: pass core.quotePath=false to diff feeders (#2082)	2026-05-29 08:11:25 -07:00
Mohamed Hegazy	38b298d5b2	security-guidance: pass core.quotePath=false to diff feeders (#2082 ) Fixes anthropics/claude-plugins-official#2082 — diff feeders use git's default quotePath setting, which C-quotes any path with a non-ASCII byte. The downstream parsers in gitutil.parse_diff_into_files / gitutil.extract_file_paths_from_diff match the diff header with `re.match(r'^a/(.+?) b/(.+)$', ...)`, which only sees the raw `a/path b/path` form. The C-quoted `"a/\303\201vila/..."` form slips past the regex, the `continue` fires, and the file is silently dropped from review. Effect: a vulnerable file like `Ávila/payment.py` with `os.system('curl ' + user_input)` never reaches the LLM reviewer. False negative in exactly the direction the plugin exists to catch. Sibling of #2056 / #2075: those fixed the UTF-8 decode of the subprocess output (text=True crashed the reader thread on Windows cp1252). This one fixes the diff-feeder commands themselves — the name-only helpers (_git_name_only, _git_status_porcelain) already pass core.quotePath=false for this exact reason; the diff-text feeders were the holdouts. Fix: add `-c core.quotePath=false` to 4 git invocations: - gitutil._git_diff_range (push-sweep feed) - gitutil.get_git_diff (Stop-hook feed) - security_reminder_hook commit-review `git diff` (amend delta) - security_reminder_hook commit-review `git show` (post-amend) With the flag, git emits raw UTF-8 in the diff header (`a/Ávila/payment.py`), the regex matches, and both files (the non-ASCII vulnerable one + any ASCII control file) flow through to review correctly. Verified locally on macOS Python 3.13: - py_compile clean on both files. - Existing 45 smoke + extensibility tests still pass. - 8 new tests in test_diff_parser_non_ascii.py (added to internal test suite at sg-staging/tests/, not in this PR): * 2 static-shape: gitutil._git_diff_range and get_git_diff both contain `core.quotePath=false` in their source. * 2 commit-review static: every subprocess.run in handle_commit_review_posttooluse that mentions `"diff"` or `"show"` also passes the flag. Catches the regression class where a new diff/show call site is added without plumbing the flag through. * 4 end-to-end with a real git repo containing a `Ávila/payment.py` baseline-and-edit: - WITHOUT flag: header is C-quoted, both parsers drop the non-ASCII file (demonstrates the bug). - WITH flag: header is raw UTF-8, both parsers see the file. - parse_diff_into_files (the other parse path) also keeps the file with the flag. - get_git_diff end-to-end produces unquoted output whose file list includes the non-ASCII path. - 53/53 pass total (45 existing + 8 new) in 3.41s. NOT verified end-to-end with a real CC commit-review fire on a non-ASCII path. The static-shape tests catch the regression and the end-to-end git-repo tests pin parser behavior, but the actual LLM-review-with-vuln-found path requires runtime verification against an Anthropic-API-credentialed CC session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 07:56:22 -07:00
Mohamed Hegazy	8435428dfc	Merge pull request #2077 from anthropics/fix-1358-1375-1783-hook-output-protocol security-guidance: emit findings via hookSpecificOutput.additionalContext (#1358 #1375 #1783)	2026-05-29 00:20:48 -07:00
Mohamed Hegazy	0d22ba3501	security-guidance: respect CLAUDE_CONFIG_DIR for plugin state files (#1868 ) Fixes #1868 — when CLAUDE_CONFIG_DIR is set to a non-default location (e.g. ~/.config/claude for XDG compliance, or a multi-tenant install path), the plugin still wrote state files to the hardcoded ~/.claude/ path, leaving stale state and breaking CLAUDE_CONFIG_DIR's purpose. Resolution precedence (highest first): 1. SECURITY_WARNINGS_STATE_DIR — plugin-specific override (existing) 2. CLAUDE_CONFIG_DIR/security — CC's config-dir env (new — #1868) 3. ~/.claude/security — default fallback (unchanged) Empty-string env vars (e.g. CLAUDE_CONFIG_DIR= in a misconfigured shell) are treated as not-set so the empty path doesn't collide with os.path.join and silently write to /security at the filesystem root. Implementation: a single state_dir() helper in _base.py is the source of truth for resolution. All five modules that previously had inline SECURITY_WARNINGS_STATE_DIR / ~/.claude/security resolutions (_base.py, session_state.py, ensure_agent_sdk.py, llm.py, and one site in security_reminder_hook.py) now call state_dir() instead. Re-implementing the precedence inline risks drift — one module gets a future fix, others don't. The helper is called per-invocation rather than cached at import time so test monkeypatches of the env vars take effect, and so a long- running test or future shared-process scenario can change the env between calls and have the next call observe the new value. The per-call cost is negligible compared to the subprocess-spawn cost the hooks pay every fire in production. Three hardcoded ~/.claude/security strings remain but are NOT functional resolutions: - _base.py:39: the fallback BRANCH inside state_dir() itself - ensure_agent_sdk.py:6, :11: docstring text describing default location for users Verified locally on macOS Python 3.13: - py_compile clean on all 5 modified files. - Existing 45 smoke + extensibility tests still pass. - 14 new tests in test_claude_config_dir.py (added to internal test suite at sg-staging/tests/, not in this PR): * 7 resolution-semantics: default fallback, CLAUDE_CONFIG_DIR override, SECURITY_WARNINGS_STATE_DIR beats both, tilde expansion, empty-string handling (CLAUDE_CONFIG_DIR= must fall back, NOT join to /security). * 4 static-shape: each of session_state / ensure_agent_sdk / llm / security_reminder_hook either imports state_dir from _base OR has zero resolution patterns. Catches the regression where someone adds a new state-file writer and re-implements resolution inline, missing the CLAUDE_CONFIG_DIR branch. * 3 end-to-end: with CLAUDE_CONFIG_DIR set, get_state_file / get_lock_file return paths under <CLAUDE_CONFIG_DIR>/security/; save_state round-trip writes a file to the redirected path and re-reads the same contents. - 59/59 pass total (45 existing + 14 new) in 2.54s. NOT verified end-to-end with a real CC instance setting CLAUDE_CONFIG_DIR. The shape tests catch the regression class (hardcoded ~/.claude/), and the end-to-end test pins the behavior that user state files actually land at the redirected path. Closes #1868. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 23:57:10 -07:00
Mohamed Hegazy	37ffc76005	security-guidance: emit findings via hookSpecificOutput.additionalContext (#1358 #1375 #1783 ) Fixes #1358, #1375, and #1783 — three related complaints about the hook output protocol used at the three asyncRewake exit-2 sites (handle_commit_review_posttooluse, handle_push_sweep_posttooluse, handle_stop_hook). The old shape at each site was: emit_metrics({...}) # JSON to stdout (metrics) sys.stderr.write(banner + guidance + suffix) # plain text to stderr sys.exit(2) # asyncRewake trigger That triggered three reported problems: #1375: CC's hook system parsing stdout for a SyncHookJSONOutput sees only the bare metrics dict — no findings reason — and on older CC versions surfaces a 'json output validation failed' error because stderr's plain text isn't valid JSON. #1783: CC's UI shows 'Permission to use Edit has been denied' with no permissionDecisionReason — the stderr text is invisible to that UI surface; CC only renders fields it can find in the JSON. #1358: Reporters experienced the exit(2) as 'gating' behavior rather than 'warning' behavior. The pattern-warning path in main() was migrated to exit(0) + hookSpecificOutput.additionalContext long ago; these three asyncRewake sites were never updated. Fix: extend emit_metrics() to accept additional_context, system_message, and hook_event_name kwargs, and emit them in the same SyncHookJSONOutput line as the metrics. CC's parser stops scanning stdout after the first {-prefixed line, so the findings must ride in that same line — calling emit_metrics twice or adding a second print(json.dumps(...)) would silently drop the second emission. At each of the three call sites: route the guidance text that used to go to stderr through additional_context instead. The stderr.write is dropped — additionalContext carries the same text to the model via the JSON channel, and the legacy stderr surface is what triggered #1375's JSON validation error on older CC clients. exit(2) is preserved at all three sites. That's the documented mechanism for triggering the asyncRewake 'force fix' feedback loop (per the inline comment at the stop-hook site); switching to exit(0) without verifying CC's protocol-version support risks dropping the rewake entirely and silently losing all the findings the hook just computed. For push-sweep specifically: emit_metrics had to move from an unconditional pre-emission (line ~1680) to two conditional sites (one in the no-vulns branch with exit(0), one in the with-vulns branch with exit(2)) because the with-vulns branch needs to attach additional_context and CC reads only the first JSON line — a second emit would be ignored. Behavior is preserved: every push-sweep fire emits exactly one metrics line, just at a slightly later point in the function body. Verified locally on macOS Python 3.13: - py_compile clean. - Existing 45 smoke + extensibility tests still pass. - 21 new tests in test_hook_output_protocol.py (added to internal test suite at sg-staging/tests/, not in this PR): * 6 backward-compat: emit_metrics with metrics only, with rewake_summary, etc. — verifies the legacy callers still produce the same output shape. * 5 additional_context shape: lands in hookSpecificOutput, round-trips the value, default hook_event_name is sensible, empty/None doesn't pollute the JSON with an empty hSO block. * 3 system_message shape: lands in systemMessage, empty/None suppressed, round-trips. * 1 combined: metrics + rewake_summary + additional_context + system_message + hook_event_name all merge into one JSON line. * 6 round-trip safety: emoji, quotes, backslashes, newlines, Unicode (山田太郎 + 🎉), tabs, null bytes — all survive the json.dumps cycle. * 6 static-shape: each of the three asyncRewake handlers (commit_review, push_sweep, stop_hook) is checked to confirm it passes additional_context to emit_metrics and no longer writes the PROVENANCE_BANNER guidance to stderr. Catches the regression class where a new exit(2) site forgets to plumb guidance through the JSON channel. - 66/66 pass total (45 existing + 21 new) in 2.57s. NOT verified end-to-end with a real CC instance triggering all three hooks. The static-shape tests + the JSON round-trip tests should catch any regression in the emit_metrics output, but the actual interaction with CC's asyncRewake / rewakeMessage flow (especially: does hookSpecificOutput.additionalContext successfully appear in the rewakeMessage that CC sends to the model?) needs runtime verification against a CC version that supports the modern protocol. The reporter for #1375 specifically called out that CC's older versions surfaced 'json output validation failed' on the old stderr- only output; this fix changes the stdout shape to valid JSON with the findings included, which should resolve that error class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 23:53:04 -07:00
Mohamed Hegazy	982070e51f	Merge pull request #2076 from anthropics/fix-2048-graphite-gt-workflow security-guidance: detect Graphite (gt) commands as commit/push events (#2048)	2026-05-28 23:44:58 -07:00
Mohamed Hegazy	68a700837c	Merge pull request #2075 from anthropics/fix-2056-windows-unicode-decode security-guidance: lenient UTF-8 decode in 6 git-subprocess helpers (#2056)	2026-05-28 23:36:36 -07:00
Mohamed Hegazy	5212308979	security-guidance: detect Graphite (gt) commands as commit/push events (#2048 ) Fixes anthropics/claude-plugins-official#2048 — teams using Graphite for stacked PRs (`gt create` / `gt modify` / `gt submit`) never get the commit/push agentic review because the hook matcher only catches literal `git commit` / `git push` Bash calls. gt shells out to git as a subprocess, but the hook fires on Claude's top-level tool call, which is `gt create` — not the `git commit` invocation inside the gt subprocess that Claude Code never observes. Per-edit pattern checks and end-of-turn Stop review still fire (those don't depend on detecting the commit command), so the silent-coverage- gap is bounded to the deepest review layer for Graphite users. Still: that's exactly the layer designed to catch IDOR / auth-bypass / cross-file SSRF, so the gap matters. Semantic mapping (per the reporter): gt create -> commit (like `git commit`) gt modify -> commit + amend (like `git commit --amend`) gt submit -> push (like `git push`) Changes: 1. hooks/hooks.json: extend two PostToolUse `if` matchers. "Bash(git commit:)" -> "Bash(git commit:)\|Bash(gt create:)\|Bash(gt modify:)" "Bash(git push:)" -> "Bash(git push:)\|Bash(gt submit:)" Without this, the hook subprocess never spawns for gt invocations and the Python regex changes below are dead code. 2. hooks/security_reminder_hook.py: extend three regexes that classify the bash command line. _GIT_COMMIT_RE: now also matches `gt create` and `gt modify`. Used at 4 sites (handler gate, multi-commit count, prompt detection, event classification). Compound commands like `gt create -am a && gt submit` now correctly trigger both the commit and push paths. _GIT_AMEND_RE: now also matches `gt modify` (semantically an amend). The amend code path uses reflog to find the pre-amend SHA and diff against THAT instead of HEAD~1 — same code path now applies to `gt modify`. _GIT_PUSH_RE: now also matches `gt submit`. Tolerates the same `git -C path` / `git -c k=v` global options as before for the git form; gt has its own flag layer that doesn't conflict. Verified locally on macOS Python 3.13: - JSON valid (hooks.json roundtrips). - Existing 45 smoke + extensibility tests still pass. - 76 new tests in test_gt_graphite_workflow.py (added to internal test suite this PR doesn't ship — kept in sg-staging tests/ until we have a story for shipping plugin tests publicly): 16 parametrized commit-match: native git commit variants + all gt create / gt modify variants from the reporter's repro. * 11 parametrized commit-reject: gt submit, gt log, gtoolkit (word-boundary), agt create, etc. * 9 parametrized amend-match: git commit --amend variants + gt modify variants + chained git+gt. * 7 parametrized amend-reject: regular git commit, gt create, gt submit, echo'd substring noise. * 11 parametrized push-match: git push variants + gt submit variants + chained. * 12 parametrized push-reject: git commit, gt log, gt fetch, gt down, gt restack, gh pr create, agt submit. * 3 compound-command class tests: git+gt mixtures trigger both paths; gt modify chained with gt submit triggers amend + push. * 3 commit-invocation-count tests: gt commands contribute to the multi-commit-detection findall count. * 2 hooks.json static config tests: read the JSON, verify the commit and push `if` clauses include the gt cases. Catches the easy regression where someone updates the Python regex but forgets to widen the matcher. - 121/121 pass total (45 existing + 76 new) in 2.50s. NOT verified end-to-end with a real `gt` install. Reporter has the deterministic Graphite workflow and offered to retest. The regex + matcher widening is a clean superset — current git-only matching still works (verified by the 45-test smoke suite that uses `git commit` / `git push` exclusively), and the new gt cases are pure additions. Not in this PR: - `gt prev` / `gt next` / `gt up` / `gt down` etc. — pure navigation, no commit / push side effect. - `gt restack` — could in principle rewrite commits (so the plugin's reviewed-shas cache becomes stale), but it doesn't create reviewable new content. Out of scope. - `gh pr create` — already explicitly NOT a separate matcher per the existing comment in _GIT_PUSH_RE (gh invokes git push as a child process; the bash hook only sees the top-level `gh pr create`). Same architectural issue as gt but with a different cost/benefit per the existing comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 23:33:14 -07:00
Mohamed Hegazy	3d349d40b9	Merge pull request #2074 from anthropics/fix-xss-rules-non-js-false-positives security-guidance: gate XSS pattern rules to JS-family files	2026-05-28 23:18:17 -07:00
Mohamed Hegazy	6a63e35e75	security-guidance: lenient UTF-8 decode in 6 git-subprocess helpers (#2056 ) Fixes anthropics/claude-plugins-official#2056 — on Windows, when the worktree contains an untracked file whose name has a character undefined in cp1252 (accented capitals like Á Í Ï Ð Ý, most CJK, emoji), the UserPromptSubmit hook crashes: Exception in thread Thread-5 (_readerthread): UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 Traceback (most recent call last): File diffstate.py, line 338, in _list_untracked for p in r.stdout.split('\\0'): AttributeError: 'NoneType' object has no attribute 'split' Non-blocking (UPS failures still let the prompt through) but the baseline-untracked snapshot is silently lost, so the Stop-hook review mis-handles pre-existing untracked files. Root cause (reporter's diagnosis, verified): 1. core.quotePath=false makes git emit raw UTF-8 for non-ASCII filenames. 2. subprocess.run(..., text=True) decodes via locale.getpreferredencoding(False) in strict mode — on Windows that is cp1252, in which 0x81 / 0x8D / 0x8F / 0x90 / 0x9D are undefined. Those bytes appear in the UTF-8 encodings of Á (C3 81), Í (C3 8D), Ï (C3 8F), Ð (C3 90), Ý (C3 9D), and a large fraction of CJK / emoji codepoints. 3. The decode runs in the subprocess reader thread. The thread raises UnicodeDecodeError, threading prints 'Exception in thread Thread-N', subprocess.run returns with stdout=None. The handler then does None.split('\\0') -> AttributeError, which is NOT in the narrow except (TimeoutExpired, FileNotFoundError, OSError) tuple, so it escapes the helper, propagates out of UserPromptSubmit's ThreadPoolExecutor.result(), and exits the hook non-zero. This is internally inconsistent: gitutil._git_diff_range, security_reminder_hook._reflog_amend_lookup (line ~540), and the commit diff loop (line ~1115) already do bytes + decode utf-8/replace, with comments explicitly noting that text=True would crash. The fix below extends that established pattern to the helpers that were holdouts. Affected helpers (6 total): - diffstate._list_untracked <- reporter, hot path, CRITICAL - diffstate.capture_git_baseline <- reporter, latent - diffstate.get_baseline_file_content <- audit, file content read, HIGH - gitutil._git_name_only <- reporter, latent - gitutil._git_status_porcelain <- reporter, latent - gitutil._git_reflog_recent_commits <- audit, embeds %gs commit msg, HIGH For each one: - Drop text=True from subprocess.run. - Decode r.stdout / r.stderr as .decode('utf-8', errors='replace'). - Add ValueError to the except tuple as defense against any future strict-decode regression (UnicodeDecodeError is a ValueError subclass; including it explicitly degrades the helper to its empty/None return instead of escaping out of the hook). Verified locally on macOS Python 3.13: - py_compile clean on both files. - 45 existing smoke + extensibility tests still pass. - 21 new internal tests (not in this PR — added to the team's local test suite at staging/tests/test_unicode_decode.py): * 18 static-shape parametrized: each of the 6 fixed helpers has no text=True in its subprocess calls, contains errors='replace', and lists ValueError in its except. * Deterministic end-to-end: create real git repo + Ávila_report.txt untracked, call _list_untracked, verify it returns {'Ávila_report.txt': <mtime>} without crashing. * Deterministic end-to-end: same for capture_git_baseline (verifies the latent stderr-warning case stays valid). * Deterministic end-to-end: get_baseline_file_content on a file whose content has 山田太郎 + 🎉; verify the bytes round-trip through the decode. - 66/66 tests pass total (45 existing + 21 new). NOT verified end-to-end on Windows — would need actual cp1252 strict decode to fire. Reporter has the deterministic repro and will re-verify on their Win11 / Python 3.14.x setup before merge. Not in this PR (defense-in-depth, lower risk): - 3 git rev-parse calls returning path output (gitutil._find_git_index, _git_toplevel, _git_dir) could fail on Windows if cwd is in a non-ASCII install directory. Same fix shape but unreported and much lower probability — worth a separate follow-up if anyone actually hits it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 23:15:16 -07:00
Mohamed Hegazy	12a5376e20	security-guidance: gate XSS pattern rules to JS-family files Closes #410, #2037, #2045, #1640, #1280, #1329, #1341, #255, anthropics/claude-code#46720 (partial closes on overlap with other rules). The plugin's substring-only XSS / browser-DOM rules (new_function_injection, react_dangerously_set_html, document_write_xss, innerHTML_xss, outerHTML_xss, insertAdjacentHTML_xss) fired on any file containing the trigger substring — including: * Markdown documentation explaining XSS sinks * Blog posts / READMEs that name browser APIs * Python tutorials referencing dangerouslySetInnerHTML * Plugin skill files with example HTML strings * .yaml / .json configs that happen to contain the literal string * .gitignore / Dockerfile / Makefile These constructs have no meaning outside JS/TS source. Add a path_filter: lambda p: p.endswith(_JS_EXTS) to each so they fire only on .js, .jsx, .ts, .tsx, .mjs, .cjs, .mts, .cts, .vue, .svelte. Cross-checked against the existing _JS_EXTS-gated rules (regex_exec_substring, child_process_exec, exec_substring) — same pattern, same constant, same intent. Uses the module-level _JS_EXTS tuple so future extension changes propagate to all 6 rules atomically. Verified locally on macOS Python 3.13: - py_compile clean. - 45-test existing smoke + extensibility suite still passes. - 151 new parametrized tests in test_xss_gate.py (added to internal test suite this PR doesn't ship): each gated rule x every JS-family extension accepts, x every non-JS path (.md / .py / .yaml / .json / .txt / .html / Dockerfile / Makefile / .gitignore / .sh / .go / .rs / .rb) rejects. 196 tests pass total. Doesn't address everything in the false-positive cluster — issues that require Python-rule gating (#1114 .env.schema exec), tighter substring scoping (#660 pickle in usernames), or hook-protocol changes (#1358 exit-2 vs warning, #1375 plain-text-vs-JSON output) need separate PRs. This PR covers the JS-substring subset cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 23:07:53 -07:00
Mohamed Hegazy	a67587c816	security-guidance: enable LLM review on default macOS Python 3.9 Fixes anthropics/claude-plugins-official#2071 — on macOS where the default `python3` is Apple's Command Line Tools Python 3.9.6, the plugin's agentic commit reviewer silently does not run, even when the user has a newer Python installed. Three compounding factors in the bug: 1. `sg-python.sh` only checks the major version (`3`), so it always picks 3.9 even when 3.10+ is on PATH. 2. `claude_agent_sdk` requires Python >=3.10 — pip install on 3.9 returns "No matching distribution" -> bootstrap returns BUILD_FAILED. 3. Even with a hand-built 3.12 venv, `llm.py` imports the SDK in-process into the hook's interpreter (still 3.9), which raises SyntaxError. The existing venv-probe in `ensure_agent_sdk.py` uses the venv's own Python (3.12) so it reports NOOP_VENV (healthy) while the consumer fails — misleading telemetry on top of silent feature degradation. Per BQ telemetry, 14,073 external macOS users hit sdk_bootstrap=BUILD_FAILED in the past 4 days (the default-macOS cohort), out of ~86K total external installed users. Combined with ~20K other users in similar broken-bootstrap states (Windows pre-#2055, Linux <3.10), about half the installed base has a silently-broken agentic reviewer. This PR implements the reporter's items #1, #3, and #4. Item #2 (running the SDK out-of-process) is deferred as a bigger refactor. Item #1 — hooks/sg-python.sh — prefer >=3.10 binaries via 3-pass probe: Pass 1: python3.13 / 3.12 / 3.11 / 3.10 (>=3.10 by name, highest wins) Pass 2: bare python3 / python / py -3 (accept only if reported >=3.10) Pass 3: bare python3 / python / py -3 (any Python 3, FALLBACK so pattern checks still work on macOS-default 3.9 — no regression vs today; SDK-dependent paths detect the version mismatch inside Python and degrade cleanly via item #4) Item #4 — ensure_agent_sdk.py — health-check honesty: Added HOOK_PY_INCOMPATIBLE=6 outcome with short-circuit at top of main(): if sys.version_info < (3, 10): return HOOK_PY_INCOMPATIBLE, "hook_py", f"py_{...}" Telemetry consequences after rollout: sdk_bootstrap=6 is a new clean bucket; some users currently miscounted in sdk_bootstrap=3 BUILD_FAILED (wasted pip cycles) and sdk_bootstrap=1 NOOP_VENV (falsely-healthy) move to sdk_bootstrap=6. The remaining NOOP_VENV count becomes trustworthy. Item #3 — ensure_agent_sdk.py — one-time user-visible notice: When outcome == HOOK_PY_INCOMPATIBLE and a marker file at `~/.claude/security/.agentic_unavailable_notice_v<pv>` doesn't exist, the SessionStart response includes hookSpecificOutput.additionalContext + systemMessage explaining the situation. Marker file is plugin- version-keyed so a future fix (e.g. shipping out-of-process SDK) can bump pv and re-notify users. BUILD_FAILED is intentionally excluded from the notice — it covers transient causes where a permanent banner would mislead. Verified locally on macOS Python 3.13: - py_compile clean on both files. - Existing 45-test smoke + extensibility suite: 45/45 PASS in 2.50s. - Unit test of simulated 3.9 path: HOOK_PY_INCOMPATIBLE returned with correct phase/kind; notice shown on first call, suppressed on second, reshown on bumped pv; BUILD_FAILED correctly does NOT trigger notice. NOT verified: actual Python 3.9 behavior end-to-end (would need a 3.9 install). Worth a follow-up smoke test in a 3.9 venv before next release. The unit test simulating 3.9 covers the logic but not the runtime invocation through the shim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 22:58:01 -07:00
Bryan Thompson	fc49e6815f	recommender: add Convex to database/backend MCP coverage (#1981 ) Picked up from sethconvex's PR #1966 (auto-closed by membership gate), split off from #1980 (Convex plugin entry refresh) so the editorial addition to claude-automation-recommender gets its own review. Changes: - SKILL.md: add `convex` to the package.json dep-detection grep, update the Database row in the indicator table to name Convex, and add a Convex MCP row to the MCP recommendation table. - references/mcp-servers.md: new "Convex MCP" section in the Databases group (Supabase / Convex / PostgreSQL / Neon / Turso), and a row in the Detection Patterns quick reference. Convex publishes its MCP server via the `convex` npm package (`npx convex mcp start`), exposing tables, function-spec, data, run-once-query, logs, env list/set/get. Same row pattern as the existing database/backend MCP entries. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 07:59:33 -05:00
Mohamed Hegazy	c11244778d	Address Windows verification: --prefer-binary + pywin32 bootstrap The first round of this PR removed SKIP_WIN32, fixed venv_py to use Scripts/python.exe, and added Lib/site-packages to the consumer glob — all necessary. Windows verification (Win11 ARM64, Py 3.13, Git Bash) showed two more blockers, both addressed here. 1. Pip dependency resolver picks unbuildable cryptography on ARM64. Without --prefer-binary, pip picks a cryptography version with no published ARM64 wheel and tries to build it from source. That needs Rust/Cargo, almost never present on user machines → BUILD_FAILED with err_kind=other:cryptography. A binary wheel exists for an adjacent version (cryptography-46.0.3-cp311-abi3-win_arm64.whl); --prefer-binary tells pip to pick it. Cross-platform safe (no-op where the latest version already has a wheel). 2. pywin32 .pth files aren't processed by sys.path.insert(). With the venv built, ensure_agent_sdk.py's post-build probe passes (it runs from venv_py, where Python's site.py at startup processes pywin32.pth and registers win32/, win32/lib/ plus runs pywin32_bootstrap.py to set the DLL search dir). But llm.py runs in the hook's SYSTEM Python and adds the venv via sys.path.insert(), which doesn't trigger site.py at all. Without the bootstrap, the SDK's mcp.client.stdio → mcp.os.win32.utilities chain raises ModuleNotFoundError: pywintypes and the agentic reviewer falls back to single-shot silently — exactly the symptom this PR is trying to fix. The probe says NOOP_VENV; the actual consumer fails. Probe and consumer use different Pythons. Replicate what site.py would do: after inserting site-packages, also insert win32/ and win32/lib/, then exec pywin32_bootstrap.py. Pulled into a shared helper _inject_agent_sdk_venv_into_syspath() so both consumer sites (3P SDK fallback, agentic_review fallback) call the same code — Windows handling stays in one place. Verified on macOS (POSIX path unchanged): - Helper end-to-end test: POSIX-layout venv detected + fake package imports successfully via the injected path - Windows-layout venv also detected; win32 branch correctly skipped via sys.platform check - Both files pass py_compile Credit: @mhegazy verified the previous commit on Win11 ARM64 / Py 3.13 / Git Bash, surfaced both issues end-to-end, and provided the exact fix patterns. This commit applies them with the pywin32 part factored into a shared helper (vs. inlining at both consumer sites). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 15:07:33 -07:00
Mohamed Hegazy	4decd2e3b2	Enable agentic commit reviewer on Windows The agentic reviewer is silently no-op on Windows today. SessionStart bootstrap (ensure_agent_sdk.py) short-circuits with SKIP_WIN32 because the consumer glob in llm.py only matches POSIX venv layout (lib/pythonX.Y/site-packages). On Windows, venvs use Lib/site-packages (capital L, no pythonX.Y subdir), so even if a venv got built the glob wouldn't find its contents. Result: Windows users on default installs (no system-wide claude_agent_sdk) get layer 1 (pattern warnings) and layer 2 (single- shot LLM diff review) but not layer 3 — the cross-file agentic review that catches IDOR, auth-bypass, cross-file SSRF, and other things that need to read related files. Plugin description claims layer 3 but it silently doesn't run. Three changes: 1. llm.py — extend the consumer glob (2 sites: 3P SDK fallback at ~L297, agentic_review fallback at ~L1090) to also match the Windows Lib/site-packages layout, so a venv built on Windows is actually discoverable. 2. ensure_agent_sdk.py — remove the sys.platform == 'win32' early-exit so the SessionStart bootstrap builds the venv on Windows too. Outcome code 4 (formerly SKIP_WIN32) is retired but not reused so pre-fix telemetry rows still decode correctly. 3. ensure_agent_sdk.py — venv_py path now branches on sys.platform: Windows venvs put the interpreter at Scripts\python.exe; POSIX uses bin/python. Previously assumed POSIX, so even with the glob fix, the post-build SDK-importability probe would fail on Windows. Verified locally on macOS: - glob test: both layouts now match (POSIX venv detected, simulated Windows venv also detected via the new Lib/site-packages branch) - both files pass py_compile - POSIX path unchanged (sys.platform != 'win32' so old branch runs) Not verified on Windows in this commit — needs an actual Windows runner to confirm the venv build + SDK import + subprocess plumbing all work end-to-end. The SDK spawns a child claude.exe; Windows process plumbing has its own quirks (shell semantics, path escaping) that may surface separately. Worth a controlled rollout (one-week soak under env-var opt-in before flipping default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 14:50:51 -07:00
Mohamed Hegazy	390c2fe785	Convert POSIX script paths to Windows form before exec on Git Bash Fixes #2043. On Git Bash for Windows, Claude Code hands script paths to the shim in POSIX form (`/c/Users/...`). We exec a Windows `python.exe` (the `python3` Microsoft Store stub fails the probe), and Windows Python interprets the leading `/` as the root of the current drive — `/c/...` becomes `C:\c\Users\...` or `D:\c\Users\...` depending on which drive the shell happens to be on, fails with ENOENT, and every Edit/Write/MultiEdit blocks until the session restarts. Convert absolute path args via `cygpath -w` (a Git Bash builtin) before exec. Guarded by `command -v cygpath` so macOS/Linux fall straight through unchanged; `cygpath -w` is idempotent on already-Windows paths so the rare mixed-form case is safe. Only `/*` paths are converted — Windows-form paths reaching the shim are already openable by python.exe. Verified locally: - cygpath absent on macOS → guard skips → POSIX behavior unchanged - end-to-end shim invocation with a POSIX path on macOS exits 0 - stubbed cygpath -w on /c/Users/test/hook.py produces C:\Users\test\hook.py Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 14:04:09 -07:00
Mohamed Hegazy	0bde168648	Update security-guidance plugin	2026-05-26 14:06:52 -07:00
Den Delimarsky	529d105a78	Rename mcp-gateway -> mcp-proxy throughout Aligns the compose service name, local config filename, and all log/restart commands with the image and binary name. Adds an explicit -config arg since the image CMD still defaults to the legacy /etc/mcp-gateway path. 🏠 Remote-Dev: homespace	2026-05-21 06:06:15 +00:00
Den Delimarsky	12482fd9e2	Link to public MCP tunnels docs and use public mcp-proxy image - Replace doc references with platform.claude.com URLs (overview, quickstart, security, deploy-compose, deploy-helm, console, troubleshooting, reference, WIF) - Swap the POC mcp-proxy image for the public registry digest used in the published quickstart 🏠 Remote-Dev: homespace	2026-05-21 05:39:34 +00:00
Den Delimarsky	0a6ff87909	Add mcp-tunnels plugin Adds the /create-docker-mcp-tunnel command, which drives the MCP tunnels Docker Compose quickstart end to end: preflight checks, certificate generation, proxy config, cloudflared, an optional sample FastMCP server, and verification from Managed Agents and the Messages API. Migrated from anthropic-experimental/mcp-tunnel-skills. 🏠 Remote-Dev: homespace	2026-05-21 05:24:44 +00:00
Mohamed Hegazy	b58bdbf551	fix: quote \${CLAUDE_PLUGIN_ROOT} in hookify and security-guidance hook commands Paths containing spaces (common on Windows, e.g. C:\Users\Some User\...) cause shell word-splitting when CLAUDE_PLUGIN_ROOT is unquoted, resulting in hooks erroring with "No such file or directory" on every tool call. Wraps the path in double quotes for all five affected hook commands. Fixes the pattern reported in issue #57946. Closes the fix surfaced in PR #1921. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 17:53:52 -07:00
Morgan Lunt	573ecf32cd	Merge pull request #1820 from anthropics/morganl/code-modernization-plugin code-modernization: fix pipeline gaps, redesign harden, dry-run hardening	2026-05-12 09:58:41 -07:00
Morgan Lunt	5e4a45001d	code-modernization: harden writes a patch instead of editing legacy; make map/security guidance language-agnostic - modernize-harden: never edits legacy/ anymore. Writes findings plus a reviewed unified diff to analysis/<system>/security_remediation.patch. A second security-auditor pass reviews each hunk (RESOLVES / PARTIAL / INTRODUCES-RISK) before presenting. The user reviews and applies the patch deliberately, then re-runs to verify. This makes every command consistent with the recommended deny Edit(legacy/**) workspace setting, so the README's exception note is gone. - modernize-map: restructure the parse-target list around three stack- agnostic principles (dispatcher targets are variables; code-storage joins live in config; entry points live in deployment descriptors), with COBOL/Java/web/CLI examples on equal footing rather than COBOL-dominant. Same protections against false dead-code findings, less stack-specific. - security-auditor agent: rephrase coverage items in stack-neutral terms (record layouts/temp datasets, resource ACLs, deployment scripts/job definitions, batch input records) so the checklist reads naturally for COBOL, Java EE, .NET, and web targets alike. - README: drop the harden exception note; describe the patch workflow.	2026-05-11 16:46:03 -07:00
Morgan Lunt	22a1b25977	Harden code-modernization plugin from a real CardDemo dry run Fixes found by running the discovery workflow against the AWS CardDemo mainframe sample (~50 KLOC of COBOL/CICS/JCL/BMS/VSAM): - modernize-assess: add scc -> cloc -> find/wc fallback chain with the COCOMO-II formula so Step 1 works when scc isn't installed; same for portfolio-mode cloc/lizard. Drop the reference to a specific agent-spawning tool name (just "in parallel"). Sharpen the structural- map subagent prompt: 5-12 domains, subgraph clustering, ~40-edge cap, repo-relative paths, dangling-reference check. - modernize-map: expand the parse-target list with the things a literal-minded reader would miss on a real mainframe codebase — CICS CSD DEFINE TRANSACTION/FILE for entry points and online file I/O, EXEC CICS file ops, SELECT...ASSIGN TO joined with JCL DD, EXEC SQL table refs (not JCL DD), SEND/RECEIVE MAP, dynamic data-name XCTL resolution, COBOL fixed-format column slicing. Without these the dead-code list is wrong (most CICS programs look unreachable). Also write a machine-readable topology.json alongside the summary. - modernize-extract-rules: add a Priority (P0/P1/P2) field with a heuristic, and an optional Suspected-defect field. modernize-brief reads P0 rules to build the behavior contract, but the Rule Card had no priority slot — the chain was broken. - modernize-brief: read the new P0 tags; flag low-confidence P0 rules as SME blockers. - modernize-reimagine: drop "for the demo" wording. - security-auditor agent: add mainframe/COBOL coverage items (RACF, JCL/PROC creds, BMS field validation, DB2 dynamic SQL, copybook PII) and mark web-only items as such so it adapts to the target stack. - README: add Optional Tooling section and a symlink example for the expected layout.	2026-05-11 16:28:27 -07:00
Morgan Lunt	718818146e	Fix code-modernization plugin: align README with commands, fix pipeline gaps - modernize-brief: read TOPOLOGY.html (what modernize-map actually produces) instead of nonexistent TOPOLOGY.md, and tell the user which command produces each missing input. - README: rewrite the Commands section to match actual command behavior — correct output filenames, ordering (brief is the synthesis/approval gate after discovery, not the first step), agent attributions, and required args. Add a workspace-layout note and an explicit callout that modernize-harden edits legacy/, which conflicts with the recommended deny rule. Reconcile the Overview and Typical Workflow sequences. - modernize-assess: generalize the production-runtime overlay step so it no longer assumes a specific MCP server/tool; mark it optional. Fix app/jcl/ -> legacy/$1/jcl/ for layout consistency. - modernize-map: make TOPOLOGY.html self-contained (load Mermaid from a CDN) so it renders in any browser; drop assumptions about an external artifact renderer. Generalize the telemetry annotation note. - business-rules-extractor agent: fix command cross-reference to the actual command name. - plugin.json: include the brief step in the workflow description.	2026-05-11 16:17:59 -07:00
Daisy S. Hollman	06f52cd3ac	cwc-makers: curl+tar fallback when git is missing (#1731 ) /maker-setup now falls back to GitHub's tarball endpoint when git isn't on PATH, instead of detouring through a package-manager git install. curl and tar ship with macOS, Linux, and Windows 10 1803+ out of the box, so this is zero-install on every target platform — and a CwC attendee just needs the files once to flash a device, not git history. - maker-setup.md: git-clone fast path, curl\|tar (Unix) / curl+tar+ Rename-Item (PowerShell) fallback, normalizes the -main suffix - m5-onboard/SKILL.md: drop git from required deps + per-OS git bootstrap block; keep Python bootstrap - README: git now listed as optional Linear: CC-1975 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> 🏠 Remote-Dev: homespace	2026-05-05 18:04:20 -07:00
Daisy S. Hollman	574a879067	Add cwc-makers plugin: /maker-setup Cardputer onboarding (#1730 ) * Add cwc-makers plugin: /keep-thinking Cardputer onboarding Packages the Code-with-Claude Makers (claude.com/cwc-makers) Cardputer experience as a one-command flow for event attendees: - commands/keep-thinking.md: user entry point — clones moremas/build-with-claude and runs the m5-onboard provisioning flow - skills/m5-onboard/SKILL.md: vendored from upstream onboard/SKILL.md; Installation section replaced with clone-location note; explicit 'relay physical button steps to user' directive added - skills/cardputer-buddy/SKILL.md: post-onboarding app iteration All three are user-invocable; /keep-thinking is the intended entry point. Skill content is Apache-2.0 from the upstream repo. Linear: CC-1975 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> 🏠 Remote-Dev: homespace * Rename /keep-thinking → /start-making Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> 🏠 Remote-Dev: homespace * Rename /start-making → /maker-setup Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> 🏠 Remote-Dev: homespace	2026-05-05 17:16:30 -07:00
Pete Blois	ce721c1f1d	Replace transcript-style agent examples with prose trigger descriptions Several agent files used <example> blocks containing user: "..." / assistant: "..." turn markers, embedded as \n-escaped strings inside the YAML frontmatter description: field. Replace those with flat prose trigger descriptions in description: and a 'When to invoke' section in the agent body containing prose-bullet scenarios. Affected files: - 5 agent definitions: - plugins/hookify/agents/conversation-analyzer.md - plugins/pr-review-toolkit/agents/code-reviewer.md - plugins/pr-review-toolkit/agents/pr-test-analyzer.md - plugins/pr-review-toolkit/agents/type-design-analyzer.md - plugins/pr-review-toolkit/agents/comment-analyzer.md - 5 agent-development skill files updated to teach the new format: - plugins/plugin-dev/skills/agent-development/SKILL.md - .../references/triggering-examples.md - .../references/agent-creation-system-prompt.md - .../examples/complete-agent-examples.md - .../examples/agent-creation-prompt.md Routing is unaffected: each description: still names the trigger surface in plain English.	2026-04-28 13:56:22 -07:00
Morgan Lunt	38b5056c44	Merge pull request #1582 from anthropics/morganl/code-modernization-plugin Add code-modernization plugin	2026-04-28 10:02:28 -07:00
Tobin South	f4b5494fb4	mcp-server-dev: hosting, payload-cap, lifecycle, and directory guidance (#1566 )	2026-04-28 04:46:23 +01:00

1 2 3

102 Commits