Fixesanthropics/claude-plugins-official#2056 — on Windows, when the
worktree contains an untracked file whose name has a character undefined
in cp1252 (accented capitals like Á Í Ï Ð Ý, most CJK, emoji), the
UserPromptSubmit hook crashes:
Exception in thread Thread-5 (_readerthread):
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81
Traceback (most recent call last):
File diffstate.py, line 338, in _list_untracked
for p in r.stdout.split('\\0'):
AttributeError: 'NoneType' object has no attribute 'split'
Non-blocking (UPS failures still let the prompt through) but the
baseline-untracked snapshot is silently lost, so the Stop-hook review
mis-handles pre-existing untracked files.
Root cause (reporter's diagnosis, verified):
1. core.quotePath=false makes git emit raw UTF-8 for non-ASCII filenames.
2. subprocess.run(..., text=True) decodes via
locale.getpreferredencoding(False) in strict mode — on Windows that
is cp1252, in which 0x81 / 0x8D / 0x8F / 0x90 / 0x9D are undefined.
Those bytes appear in the UTF-8 encodings of Á (C3 81), Í (C3 8D),
Ï (C3 8F), Ð (C3 90), Ý (C3 9D), and a large fraction of CJK / emoji
codepoints.
3. The decode runs in the subprocess reader thread. The thread raises
UnicodeDecodeError, threading prints 'Exception in thread Thread-N',
subprocess.run returns with stdout=None. The handler then does
None.split('\\0') -> AttributeError, which is NOT in the narrow
except (TimeoutExpired, FileNotFoundError, OSError) tuple, so it
escapes the helper, propagates out of UserPromptSubmit's
ThreadPoolExecutor.result(), and exits the hook non-zero.
This is internally inconsistent: gitutil._git_diff_range,
security_reminder_hook._reflog_amend_lookup (line ~540), and the commit
diff loop (line ~1115) already do bytes + decode utf-8/replace, with
comments explicitly noting that text=True would crash. The fix below
extends that established pattern to the helpers that were holdouts.
Affected helpers (6 total):
- diffstate._list_untracked <- reporter, hot path, CRITICAL
- diffstate.capture_git_baseline <- reporter, latent
- diffstate.get_baseline_file_content <- audit, file content read, HIGH
- gitutil._git_name_only <- reporter, latent
- gitutil._git_status_porcelain <- reporter, latent
- gitutil._git_reflog_recent_commits <- audit, embeds %gs commit msg, HIGH
For each one:
- Drop text=True from subprocess.run.
- Decode r.stdout / r.stderr as .decode('utf-8', errors='replace').
- Add ValueError to the except tuple as defense against any future
strict-decode regression (UnicodeDecodeError is a ValueError
subclass; including it explicitly degrades the helper to its
empty/None return instead of escaping out of the hook).
Verified locally on macOS Python 3.13:
- py_compile clean on both files.
- 45 existing smoke + extensibility tests still pass.
- 21 new internal tests (not in this PR — added to the team's local
test suite at staging/tests/test_unicode_decode.py):
* 18 static-shape parametrized: each of the 6 fixed helpers has
no text=True in its subprocess calls, contains errors='replace',
and lists ValueError in its except.
* Deterministic end-to-end: create real git repo + Ávila_report.txt
untracked, call _list_untracked, verify it returns
{'Ávila_report.txt': <mtime>} without crashing.
* Deterministic end-to-end: same for capture_git_baseline (verifies
the latent stderr-warning case stays valid).
* Deterministic end-to-end: get_baseline_file_content on a file
whose content has 山田太郎 + 🎉; verify the bytes round-trip
through the decode.
- 66/66 tests pass total (45 existing + 21 new).
NOT verified end-to-end on Windows — would need actual cp1252 strict
decode to fire. Reporter has the deterministic repro and will
re-verify on their Win11 / Python 3.14.x setup before merge.
Not in this PR (defense-in-depth, lower risk):
- 3 git rev-parse calls returning path output (gitutil._find_git_index,
_git_toplevel, _git_dir) could fail on Windows if cwd is in a
non-ASCII install directory. Same fix shape but unreported and
much lower probability — worth a separate follow-up if anyone
actually hits it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>