495 Commits

Author SHA1 Message Date
github-actions[bot]
89a3934ffa
bump(desktop-commander): 9c44119a → ce4669cc (#2173)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:34:38 -05:00
github-actions[bot]
4053de6b37
bump(duende-skills): 2c803785 → 72e39de9 (#2175)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:33:23 -05:00
github-actions[bot]
907ad63743
bump(mcp-apps): 9a37ad71 → 7d4434e9 (#2177)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:32:08 -05:00
github-actions[bot]
75c86cb7ad
bump(nvidia-skills): 62b685a2 → 91dbdf23 (#2179)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:30:54 -05:00
github-actions[bot]
b9447ff795
bump(pydantic-ai): eb17c0da → 56dadc9c (#2181)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:29:39 -05:00
github-actions[bot]
9a8303c6d9
bump(sentry-cli): db907679 → 5abcacdf (#2182)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:28:22 -05:00
github-actions[bot]
9557d751dc
bump(teamcity-cli): 9436b94b → 533c8cb2 (#2185)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:27:08 -05:00
github-actions[bot]
eac1df92db
bump(togetherai-skills): a1277729 → f957e292 (#2186)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:25:52 -05:00
github-actions[bot]
4263502749
bump(vibe-prospecting): c00b11db → 7ed0c4e2 (#2187)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:24:36 -05:00
github-actions[bot]
c0c282b87b
bump(azure): 7cb89c22 → d3440b8a (#2125)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:23:22 -05:00
github-actions[bot]
abeea5843f
bump(clickhouse): 36889764 → bfd22b9c (#2130)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:21:33 -05:00
github-actions[bot]
a5d43627ef
bump(codspeed): 407dd3c9 → f79d57d2 (#2131)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:18:56 -05:00
github-actions[bot]
6a8591db5f
bump(convex): ece93250 → 002f9c83 (#2132)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:16:18 -05:00
github-actions[bot]
31fd7f0923
bump(dash0): d1ad56f8 → 0ec3db6b (#2134)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:13:58 -05:00
github-actions[bot]
9cb21aab75
bump(firecrawl): e71cec48 → 81178096 (#2137)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:11:53 -05:00
github-actions[bot]
7328989616
bump(hunter): 9b614652 → 4eb5fbbc (#2139)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:09:48 -05:00
github-actions[bot]
8a7f6912b2
bump(hyperframes): bc3701f5 → 3c7e2f36 (#2140)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:07:26 -05:00
github-actions[bot]
a6ce4ca3d5
bump(outputai): 0eeffece → 5a87ebc1 (#2142)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:05:21 -05:00
github-actions[bot]
785a75e88b
bump(pigment): 4bf16c80 → abf36e64 (#2143)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:03:34 -05:00
github-actions[bot]
697a046997
bump(sentry): d6123be3 → 849303a8 (#2148)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 09:00:06 -05:00
github-actions[bot]
8d45b83d6c
bump(alloydb): 4a756532 → bbf4eb36 (#2162)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 08:58:31 -05:00
github-actions[bot]
87e08885b5
bump(atlan): b0efcc8e → cda594f4 (#2164)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 08:57:16 -05:00
github-actions[bot]
2de71f5542
bump(dominodatalab): 47c6e0a7 → 56c3fc39 (#2174)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 08:56:01 -05:00
github-actions[bot]
798cb06aa3
bump(logfire): eb17c0da → 56dadc9c (#2176)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 08:54:47 -05:00
github-actions[bot]
2a22053549
bump(postiz): 238aede6 → 41c5a9db (#2180)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 08:53:31 -05:00
github-actions[bot]
754f7f2f54
bump(stripe): 99425a01 → 38cc559c (#2183)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 08:52:16 -05:00
github-actions[bot]
05107962e7
bump(superpowers): f2cbfbef → 6fd45076 (#2184)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 08:51:01 -05:00
github-actions[bot]
56a8f8df52
bump(wix): c5b343f2 → 2da8231f (#2188)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-02 08:49:04 -05:00
Mohamed Hegazy
bf7e852731
Merge pull request #2154 from anthropics/venv-failure-deepdive
security-guidance: 5 venv-specific err_kind categories + stderr_signature bucket (2.0.1 → 2.0.2)
2026-06-01 17:29:36 -07:00
Bryan Thompson
3866e34b15
Add sagemaker-ai plugin (re-list after YAML fix) (#2158)
sagemaker-ai was dropped from the marketplace in #1762 (validate-plugins
adoption) due to a manifest/YAML error. AWS has since fixed it; the plugin
validates clean at awslabs/agent-plugins@187edde (claude plugin validate passes).

Re-listed as a git-subdir source SHA-pinned to current monorepo HEAD,
matching its sibling AWS entries (deploy-on-aws, databases-on-aws).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 17:25:46 -07:00
Bryan Thompson
3d490adc34
bump(databases-on-aws, deploy-on-aws): f16aaf2a → 187edde (#2157)
Both plugins in awslabs/agent-plugins had their subpaths edited in commit
187edde (after the morning bump cron pinned them to f16aaf2a), so they fell
behind again on merge. Manual catch-up bump to current monorepo HEAD.

- databases-on-aws: 4 files changed under plugins/databases-on-aws/ (v1.1.0)
- deploy-on-aws:    7 files changed under plugins/deploy-on-aws/ (v1.2.0)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 17:25:16 -07:00
github-actions[bot]
08d1b59559
bump(amazon-location-service): 9d46cc0a → f16aaf2a (#2120)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-01 19:08:14 -05:00
github-actions[bot]
49880c89fe
bump(databases-on-aws): 9d46cc0a → f16aaf2a (#2135)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-01 18:42:30 -05:00
github-actions[bot]
7951b76e19
bump(deploy-on-aws): 9d46cc0a → f16aaf2a (#2136)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-01 18:41:19 -05:00
Bryan Thompson
9cc1748a65
Add aws-startup-advisor plugin (#2156) 2026-06-02 00:38:25 +01:00
Mohamed Hegazy
009392eee4
security-guidance: 5 venv-specific err_kind categories + stderr_signature bucket (2.0.1 → 2.0.2)
PR #2112's telemetry visibility surfaced an immediate finding from
the first 3h of v2.0.1 data: **2,406 phase=2 / err=99 sessions** —
"venv stage / uncategorized" — dominating BUILD_FAILED. The original
err_kind detection patterns were all pip-flavored (pip_no_match,
dns_fail, ssl_verify, etc.) and didn't catch venv-creation failure
modes, so they all collapsed to the catch-all _uncategorized (99)
bucket.

This PR fills the gap on two axes.

## 1. Five new venv-specific err_kind categories (codes 11-15)

Each gated on `err_phase == "venv"` so the same substring doesn't
mis-fire in pip-phase failures:

  - 11 `venv_ensurepip_fail` — Debian/Ubuntu without python3-venv
    installed; stderr matches "ensurepip is not available" or
    "ensurepip ... returned non-zero". Predicted to be the biggest
    chunk based on Linux distro market share.
  - 12 `venv_path_too_long` — Windows MAX_PATH (260) or POSIX
    ENAMETOOLONG. Triggered when state_dir + venv layout exceeds
    the path limit (deep Lib/site-packages/<pkg>/<...> paths).
  - 13 `venv_no_module` — `python3 -m venv` itself missing
    ("No module named 'venv'"). Rare but distinctive.
  - 14 `venv_already_exists` — Errno 17 / "file exists" — sentinel
    race past O_EXCL or stale dir survived `--clear`.
  - 15 `venv_setup_failed` — generic "virtual environment was not
    created successfully" catch-all for venv setup failures that
    don't match a more specific category.

All 5 occupy reserved slots in SDK_BOOTSTRAP_ERR_CODES per the
APPEND-ONLY contract from PR #2112.

## 2. `sdk_bootstrap_stderr_sig` integer hash

For "other:<tail>" err_kinds (which encode to _uncategorized = 99),
emit a bounded integer hash (0-999) of the first ~30 chars of the
stderr tail. This restores cardinality to the _uncategorized bucket
in BQ aggregation without unbounded keyspace — same stderr message
always maps to the same bucket, so a real failure mode replicating
across thousands of machines clusters cleanly. Bounded at 1000
buckets: well below any "high cardinality" alarm but wide enough to
distinguish ~30 distinct dominant patterns (birthday-paradox
collision probability ~50% at ~37 distinct inputs).

The field auto-omits (`if sig:` gate) when err_kind is categorized
— no key-budget cost on the common-case categorized failures.

## Version bump 2.0.1 → 2.0.2

PR #2114 confirmed the version-bump mechanism is the only way to
propagate code changes to the existing fleet — without a bump, CC's
plugin updater short-circuits on string-equality of installation
version vs marketplace version. Following the policy we established:
**bump patch on every functional PR**.

By 17:31:42Z on 2026-06-01 (1m22s after #2114 merged), v2.0.1 was
already appearing in BQ. v2.0.2 should follow the same propagation
curve — ~30% adoption within 3 hours, full convergence within a few
days.

## Verified locally

  - py_compile clean.
  - 15 new tests in test_venv_failure_deepdive.py (added to internal
    test suite at sg-staging/tests/, not in this PR):

      * 5 parametrized: each new err_kind maps to its expected code (11-15).
      * 1 APPEND-ONLY regression: existing codes 1-10 + 99 unchanged.
      * 6 stderr_sig: non-other inputs → 0; None/empty → 0; deterministic
        same-input → same-output; bounded to 0-999; distinct inputs →
        distinct hashes (5/5 with P(collision) ≈ 1%); leading-chars focus
        (path-varying stderr with shared 30-char prefix collide as designed).
      * 1 static-shape catcher: every new `err_kind = "venv_..."` branch
        in main() is guarded by `err_phase == "venv"`. Catches the
        regression where someone adds a venv pattern without the phase
        gate and starts mis-categorizing pip-phase failures.
      * 1 map-coverage: all err_kind strings assigned anywhere in
        ensure_agent_sdk.main() are present in SDK_BOOTSTRAP_ERR_CODES
        (catches new categories added in code but forgotten in the map).
      * 1 emit-shape: the metric block uses `_encode_stderr_sig`, the
        `sdk_bootstrap_stderr_sig` key is written conditionally on `if
        sig:`. Catches the regression where someone removes the
        helper or makes the emit unconditional (would pad every
        categorized BUILD_FAILED row with a zero-valued field).

  - Full suite: 452/452 pass + 2 skipped (live API tests, opt-in).

## What this unblocks in BQ

```sql
-- For the 2,406 sessions/3h that were phase=2/err=99 on v2.0.1,
-- v2.0.2+ will split them across the new categories. Query:
SELECT
  CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap_err") AS INT64) AS err,
  CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap_stderr_sig") AS INT64) AS sig,
  COUNT(*) AS sessions
FROM `proj-product-data-nhme.raw_events.claude_code_internal_event`
WHERE _PARTITIONTIME >= ...
  AND CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap") AS INT64) = 3
  AND CAST(JSON_VALUE(additional_metadata, "$.sdk_bootstrap_phase") AS INT64) = 2  -- venv
GROUP BY err, sig
ORDER BY sessions DESC
```

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 16:05:49 -07:00
Mohamed Hegazy
9f6eae5114
Merge pull request #2155 from anthropics/fix-nvidia-skills-sha
fix(nvidia-skills): add missing source.sha (validator invariant I5; unblocks all PRs touching marketplace.json)
2026-06-01 15:57:55 -07:00
github-actions[bot]
1fe78a3f60
bump(carta-crm): e66d331c → f512df80 (#2127)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-01 17:49:39 -05:00
Mohamed Hegazy
267c4e6f06
fix(nvidia-skills): add missing source.sha (validator invariant I5)
The nvidia-skills entry was added in PR #2088 with:

  "source": {
    "source": "git-subdir",
    "url": "https://github.com/NVIDIA/skills.git",
    "path": "plugins/nvidia-skills",
    "ref": "main"
  }

It's missing the required `sha` field. The marketplace validator
enforces invariant I5 ("source.sha is missing or not a 40-char hex
SHA") on every git-subdir source — without it, the action fails:

  ##[error]invariant I5: nvidia-skills: source.sha is missing or not
  a 40-char hex SHA

This has been silently failing the "Validate Plugins" CI on every
PR that touches marketplace.json since #2088 merged on 2026-05-03.
Confirmed by checking the last 5 completed validate runs on main —
all 5 , including PR #2114 (security-guidance bump that you merged
earlier today). The validator failure was getting swallowed because
all the other PR-level checks (Check MCP URLs, Scan Plugins, Validate
Plugin Licenses) were passing, and humans were `gh pr merge --admin`-ing
through it.

Fix: add the sha field pinned to the current upstream HEAD of
github.com/NVIDIA/skills.git on the `main` branch.

  Resolved via: git ls-remote https://github.com/NVIDIA/skills.git refs/heads/main
  SHA:          62b685a20ac45285cafd1e22782abbed33172c17

This mirrors the shape of other git-subdir entries with both `ref`
and `sha` (e.g. 42crunch-api-security-testing pins ref="v1.5.5",
sha="b404d99a...", adobe-for-creativity pins ref="main", sha="8d74ee6b...").

Unblocks every in-flight PR that touches marketplace.json — including
PR #2154 (security-guidance venv-deepdive) which is currently
red-blocked on this.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 15:24:29 -07:00
github-actions[bot]
12b3721b22
bump(carta-cap-table): e66d331c → f512df80 (#2126)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-01 15:28:15 -05:00
github-actions[bot]
e11db042eb
bump(aws-serverless): 9d46cc0a → f16aaf2a (#2124)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-01 15:28:09 -05:00
github-actions[bot]
b92bc59595
bump(aws-amplify): 9d46cc0a → f16aaf2a (#2123)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-01 15:25:31 -05:00
Mohamed Hegazy
fcdcd079e3
Merge pull request #2112 from anthropics/telemetry-failure-signals
security-guidance: emit HTTP error codes + fix sdk_bootstrap phase/err encoding (telemetry visibility)
2026-06-01 10:31:29 -07:00
Mohamed Hegazy
5adb5a2d26
Merge pull request #2114 from anthropics/bump-2-0-1-propagate-fixes
security-guidance: bump 2.0.0 → 2.0.1 to propagate 8 weeks-of-fixes to existing users
2026-06-01 10:30:17 -07:00
github-actions[bot]
a63dc11763
bump(atomic-agents): bb9708ec → 57d6099f (#2121)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-01 12:12:10 -05:00
github-actions[bot]
025f4d4477
bump(adobe-for-creativity): 0a015c06 → 8d74ee6b (#2119)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-06-01 11:12:45 -05:00
Bryan Thompson
e586a0fc00
Add nvidia-skills plugin (#2088) 2026-06-01 09:09:21 -07:00
Bryan Thompson
0d82eac145
bump: switch to per-entry PR mode (one PR per stale plugin) (#2051)
* bump: switch to per-entry PR mode (one PR per stale plugin)

Replaces the single batched bump PR with one PR per stale plugin so a
single failing plugin no longer blocks the rest. Pins to a feature
branch of the bump-plugin-shas action that adds 'pr-mode: per-entry';
re-pin to the merge commit on the action's main when that lands.

- pr-mode: per-entry → one PR per plugin on bump/<slug>
- max_bumps default lowered 130 → 30 (per-entry scans cost more)
- scan dispatch fanned out over pr-urls JSON (one per per-entry branch)
- header comments updated for per-entry semantics

* bump: re-pin to merged composite action SHA on -community main

The pr-mode: per-entry input now lives on main of the bump-plugin-shas
action (merged at e2019b2a). Update the pin and drop the now-stale
header comment that tracked the feature branch.

* bump: dispatch all three required checks per per-entry PR

Bump PRs are opened with GITHUB_TOKEN, which doesn't fire on:pull_request
(recursion guard). The per-entry cutover already dispatched scan-plugins.yml
per branch to satisfy the `scan` required check, but `check` (Check MCP URLs)
and `validate` (Validate Plugins) are also required on main and likewise
never fired — leaving every bump PR BLOCKED on missing checks (observed on
the batched #2079, which only cleared after a human-authored push re-fired
the pull_request workflows).

Fix: dispatch all three workflows per per-entry bump branch. Each runs its
job unconditionally on workflow_dispatch, so the check run lands on the
branch HEAD (= PR head) and satisfies the required check.

- validate-plugins.yml: add workflow_dispatch trigger (check-mcp-urls.yml
  already had one). gh workflow run requires the trigger on the default
  branch; this lands together with the per-entry bump so main stays
  consistent.
- bump-plugin-shas.yml: loop the dispatch over
  {scan-plugins,check-mcp-urls,validate-plugins}; tolerate a single
  transient dispatch failure (warn, don't abort) so one hiccup can't
  strand the rest of the batch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* bump: fail the per-entry check-dispatch step when a dispatch fails

The dispatch step logged each failed gh workflow run as a warning and exited 0, so a transient API error or rate limit could leave a per-entry bump PR missing a required check while the bump run still showed green. The composite action skips slugs with an open PR, so the stranded PR was never retried.

Attempt every dispatch (one failure must not strand the other branches), record failures via a temp file (the while loop runs in a pipe subshell), then emit an error and exit non-zero if any dispatch failed, so the bump run goes red and the affected PR can be re-dispatched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 06:38:22 +01:00
Mohamed Hegazy
17b532f92e
security-guidance: bump 2.0.0 → 2.0.1 to propagate 8 weeks-of-fixes to the existing fleet
The 8 PRs we shipped since 2026-05-26 (#2076, #2077, #2078, #2086,
#2091, #2100, #2101, #2105) all changed plugin code without bumping
the version. CC's plugin updater uses string equality for the
freshness check (pluginOperations.ts:1835):

    const isUpToDate =
      installation.version === newVersion ||
      installation.installPath === versionedPath ||
      installation.installPath === zipPath
    if (isUpToDate) return { alreadyUpToDate: true }

Users who installed v2.0.0 anywhere between 2026-05-26 and 2026-05-31
have `installation.version === "2.0.0"` in their installed_plugins.json.
The marketplace also advertises "2.0.0" (until this commit), so
isUpToDate returns true and the plugin cache directory is never
refreshed — they keep running whatever 2.0.0 code was current on the
day they installed. The marketplace git pull happens; the per-user
cache install does NOT.

Empirical evidence: in BQ today (5/31) on Windows v2.0.0 fires,
**73% emit sdk_bootstrap outcome 4 (SKIP_WIN32)** — a code path
retired in PR #2055's Windows-enable fix. Those users are running a
plugin tree that pre-dates the fix, even though their telemetry
shows pv=20000.

The fix is a one-line version bump. Once the marketplace advertises
2.0.1, every CC autoupdate cycle sees installation.version (2.0.0)
!= newVersion (2.0.1), installs the new version, and the user's next
session loads the fixed code.

This PR:

1. plugins/security-guidance/.claude-plugin/plugin.json: 2.0.0 → 2.0.1
2. .claude-plugin/marketplace.json security-guidance entry: 2.0.0 → 2.0.1

What 2.0.1 carries (versus 2.0.0 as published 5/26):

  - #2076 — Graphite gt commit/push detection
  - #2077 — hookSpecificOutput.additionalContext on async-rewake exit-2
  - #2078 — CLAUDE_CONFIG_DIR support
  - #2086 — core.quotePath=false on diff feeders (Arabic/Hebrew/CJK paths)
  - #2091 — fix Bash(...|...) if-clause regression from #2076
  - #2100 — drop text=True from subprocess.run, bake PYTHONUTF8=1 (Windows non-cp1252 path crash)
  - #2101 — core.quotePath=false on GIT_CMD globally
  - #2105 — output_format → output_config.format API migration (#2098)

Verified locally:

  - plugin.json + marketplace.json both valid JSON.
  - _read_plugin_version_int() returns 20001 (was 20000).
  - Existing test suite passes — 408 tests, no regressions caused by
    the version bump itself. (29 unrelated failures are from
    test_telemetry_failure_signals.py which expects PR #2112's
    not-yet-merged code.)

Going forward: bumping `patch` on every functional PR closes this
gap entirely. Without that policy, every fix only reaches NEW
installs, never the existing fleet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 12:10:16 -07:00
Mohamed Hegazy
475038edfc
security-guidance: emit HTTP error codes + fix sdk_bootstrap phase/err encoding
Fills two failure-visibility gaps in plugin telemetry.

## Gap 1: HTTP errors from _call_claude invisible

Before: a 4xx/5xx response from the LLM API caused `_call_claude` to
return None and produce ZERO fingerprint in tengu_hook_plugin_metrics.
A failed call looked identical to "no review needed". The recent
deprecation-400 outage (PR #2105, output_format → output_config.format,
#2098) was invisible in aggregate dashboards until a user manually
reported errors from their debug log. Cohort-specific or partial
outages would never show up in BQ.

Fix: add `http_err_last` (most recent status) and `http_err_count`
to the existing `_USAGE` accumulator in `_base.py`. `_usage_metrics()`
snapshots them whenever count > 0 (skip-path no-pollute contract
preserved when count == 0). All `_call_claude` error sites now call
the new `_record_http_error()` helper alongside the existing
`_last_call_claude_http_error` module-state assignment.

Now any future API failure category is queryable in BQ in real time:

  SELECT
    DATE(server_timestamp, "America/Los_Angeles") AS d,
    CAST(JSON_VALUE(additional_metadata, "$.http_err_last") AS INT64) AS code,
    COUNT(*) AS n
  FROM ... WHERE event_name = "tengu_hook_plugin_metrics"
    AND JSON_VALUE(additional_metadata, "$.pluginId") LIKE "%security-guidance%"
    AND JSON_VALUE(additional_metadata, "$.http_err_count") IS NOT NULL
  GROUP BY d, code ORDER BY d, n DESC

## Gap 2: sdk_bootstrap_phase / sdk_bootstrap_err always NULL in BQ

Before: ensure_agent_sdk.py emitted these as strings (e.g. "pip",
"dns_fail"). CC's plugin-metrics pipeline silently drops
plugin-emitted string values — only bool|finite-number plugin metrics
reach BigQuery. (CC-core fields like `subscription_type` are exempt
because they're injected downstream of plugin validation.)

Confirmed empirically: ~185K BUILD_FAILED rows in BQ over the past 2
days had `sdk_bootstrap_phase` = NULL and `sdk_bootstrap_err` = NULL
despite the Python code emitting them. ~28K BUILD_FAILED sessions/day
had no diagnostic split — flying blind on whether the failures are
pip-no-match vs dns-fail vs ssl-verify vs proxy-auth etc.

Fix: encode phase + err_kind as stable integers via
SDK_BOOTSTRAP_PHASE_CODES and SDK_BOOTSTRAP_ERR_CODES. Phase: 1=pre,
2=venv, 3=pip, 4=main. Err: 10 known categories (1-10), 11-98
reserved, 99 = uncategorized catch-all (covers "exc:<X>", "other:<X>",
and unmapped strings). APPEND-ONLY for telemetry stability.

Also corrects the misleading "CC accepts string metric values"
comment in ensure_agent_sdk.py that led to the bug originally.

Verified locally on macOS Python 3.13:

  - py_compile clean.
  - 32 new tests in test_telemetry_failure_signals.py (added to
    internal test suite at sg-staging/tests/, not in this PR):

      * 4 HTTP-error tracking unit tests: _record_http_error increments
        count + tracks most-recent; handles None/invalid; -1 for
        network/timeout.
      * 4 _usage_metrics emission tests: empty when no activity;
        successful call has no http_err fields; failure-only has http_err
        and no api_calls; mixed has both.
      * 1 contract test: every emitted value is bool|finite-number
        (catches future regression of the string-dropping bug class).
      * 13 sdk_bootstrap encoding tests (parametrized over the 10 known
        err_kind categories + 5 catch-all shapes): each maps to the
        right integer; unknown phase = 0; unknown err = 99.
      * 1 static-shape regression catcher: every `err_kind = "..."`
        string in ensure_agent_sdk.main() must be in
        SDK_BOOTSTRAP_ERR_CODES (otherwise new err_kinds silently
        collapse to 99).
      * 2 emit-shape regression catchers: the assignments in main() go
        through _encode_phase / _encode_err_kind helpers (no raw
        strings); no literal string values for sdk_bootstrap_phase/err.
      * 1 comment-accuracy: the misleading "CC accepts string metric
        values" comment is gone.

  - Full suite: 437/437 pass + 2 skipped (live API tests, opt-in).

NOT verified end-to-end against BQ — would require shipping +
observing in production for 24h to confirm the http_err and
sdk_bootstrap_phase/err fields actually appear in
tengu_hook_plugin_metrics rows. The unit tests pin the contract; if
the wire shape is broken, BQ will show NULL for the new fields and we
revisit (with the same diagnostic the BUILD_FAILED bug gave us).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-31 08:34:35 -07:00