Mohamed Hegazy 43fcf6d513
security-guidance: encode exception type + errno + ensurepip instrumentation for venv BUILD_FAILED (2.0.3 → 2.0.4)
Follow-up to #2154. v2.0.3 telemetry showed the venv BUILD_FAILED bucket
splits into two unexplained groups; this PR instruments both.

## 1. The exc: bucket — exception type + errno

The dominant remaining venv BUILD_FAILED (phase=venv, err=99) is ~99%
sdk_bootstrap_stderr_sig=NULL — Python exceptions caught by the generic
`except Exception` ("exc:<TypeName>"), not CalledProcessErrors with
categorizable stderr. ~56k/30h, all opaque (stderr_sig only covers
"other:<tail>").

  - Handler embeds errno for OSError-family: "exc:OSError:28", etc.
  - SDK_BOOTSTRAP_EXC_CODES maps the type → sdk_bootstrap_exc
    (FileNotFoundError=1 … OSError=6 … 99=other).
  - errno decoded → sdk_bootstrap_errno (ENOENT/EACCES/ENOSPC/…).

## 2. venv_ensurepip_fail instrumentation (the other category)

venv_ensurepip_fail (code 11) is the top categorizable venv failure, and
telemetry flipped the naive assumption: it's NOT just Debian/Ubuntu —
macOS has the MOST distinct affected users (466 vs 121 linux), and linux
is a retry storm (~172 fires/user). Before committing to a `pip install
--target` fallback (Option A) we need to know (a) which interpreter these
users run and (b) whether that interpreter even has pip (→ whether
--target would work, vs needing a system package).

  - sdk_hook_py (always emitted): interpreter version as major*100+minor
    (309/312). Disambiguates Apple-3.9 vs a 3.10+-with-broken-ensurepip,
    and also recovers the version for HOOK_PY_INCOMPATIBLE (whose "py_3.9"
    err_kind otherwise collapses to err=99).
  - sdk_has_pip (only on err==11, to avoid an extra subprocess per healthy
    session): whether `<interpreter> -m pip --version` works. has_pip=true
    → the --target fallback would fix them; has_pip=false → they need a
    system package (python3-venv / a complete Python).

Both #1 and #2 are purely additive telemetry on the existing BUILD_FAILED
path — no behavior change to the bootstrap. They de-risk the Option A
decision: ship A only if the affected cohort has pip.

Verified locally on macOS Python 3.13:
  - py_compile clean.
  - 39 tests in test_exc_failure_encoding.py (34 exc/errno + 5 ensurepip
    instrumentation): type-code map, errno extraction + round-trip,
    APPEND-ONLY stability, handler-embeds-errno, _probe_has_pip returns
    bool + true-on-this-machine, sdk_hook_py always-emitted as
    major*100+minor, sdk_has_pip gated on err==11.
  - Full suite: 503/503 pass + 2 skipped.

Version 2.0.3 -> 2.0.4 per the per-PR-bump policy (#2114).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-04 14:55:55 -07:00
..
2026-05-26 14:06:52 -07:00