Morgan Lunt 22a1b25977
Harden code-modernization plugin from a real CardDemo dry run
Fixes found by running the discovery workflow against the AWS CardDemo
mainframe sample (~50 KLOC of COBOL/CICS/JCL/BMS/VSAM):

- modernize-assess: add scc -> cloc -> find/wc fallback chain with the
  COCOMO-II formula so Step 1 works when scc isn't installed; same for
  portfolio-mode cloc/lizard. Drop the reference to a specific
  agent-spawning tool name (just "in parallel"). Sharpen the structural-
  map subagent prompt: 5-12 domains, subgraph clustering, ~40-edge cap,
  repo-relative paths, dangling-reference check.
- modernize-map: expand the parse-target list with the things a
  literal-minded reader would miss on a real mainframe codebase — CICS
  CSD DEFINE TRANSACTION/FILE for entry points and online file I/O,
  EXEC CICS file ops, SELECT...ASSIGN TO joined with JCL DD,
  EXEC SQL table refs (not JCL DD), SEND/RECEIVE MAP, dynamic
  data-name XCTL resolution, COBOL fixed-format column slicing. Without
  these the dead-code list is wrong (most CICS programs look unreachable).
  Also write a machine-readable topology.json alongside the summary.
- modernize-extract-rules: add a Priority (P0/P1/P2) field with a
  heuristic, and an optional Suspected-defect field. modernize-brief
  reads P0 rules to build the behavior contract, but the Rule Card had
  no priority slot — the chain was broken.
- modernize-brief: read the new P0 tags; flag low-confidence P0 rules as
  SME blockers.
- modernize-reimagine: drop "for the demo" wording.
- security-auditor agent: add mainframe/COBOL coverage items (RACF,
  JCL/PROC creds, BMS field validation, DB2 dynamic SQL, copybook PII)
  and mark web-only items as such so it adapts to the target stack.
- README: add Optional Tooling section and a symlink example for the
  expected layout.
2026-05-11 16:28:27 -07:00

100 lines
4.4 KiB
Markdown

---
description: Dependency & topology mapping — call graphs, data lineage, batch flows, rendered as navigable diagrams
argument-hint: <system-dir>
---
Build a **dependency and topology map** of `legacy/$1` and render it visually.
The assessment gave us domains. Now go one level deeper: how do the *pieces*
connect? This is the map an engineer needs before touching anything.
## What to produce
Write a one-off analysis script (Python or shell — your choice) that parses
the source under `legacy/$1` and extracts the four datasets below. Cover
the parse targets that are real for the stack you're looking at — these are
the ones LLMs reliably miss:
- **Program/module call graph** — who calls whom.
- COBOL/CICS: `CALL '...'` and `EXEC CICS LINK/XCTL PROGRAM(...)`. Most
`PROGRAM(...)` targets are **data-names, not literals** — resolve them
against working-storage `VALUE` clauses and any menu/route copybooks
before declaring an edge unresolvable.
- Java: class-level imports/invocations. Node: `require`/`import`.
- **Data dependency graph** — which programs read/write which data stores.
- COBOL batch: `SELECT ... ASSIGN TO <ddname>` joined with JCL `DD`
statements (this is the *only* way to attribute file I/O to a program).
- COBOL/CICS online: `EXEC CICS READ/WRITE/REWRITE/DELETE/STARTBR/READNEXT/
READPREV ... FILE(...)` joined with `DEFINE FILE` in the CSD.
- DB2: `EXEC SQL ... END-EXEC` table references — *not* JCL DD; DB2 access
is via plan/package binds.
- BMS: `SEND MAP`/`RECEIVE MAP` ↔ map source under `bms/` and copybooks
under `cpy-bms/` (or wherever the maps live).
- Java: JPA/MyBatis entities & tables. Node: model files.
- **Entry points** — whatever the stack's outermost invokers are. Mainframe:
JCL `EXEC PGM=` steps **and** CICS `DEFINE TRANSACTION ... PROGRAM(...)`
from the CSD — without the CSD, every online program looks unreachable.
Web: HTTP routes. CLI: argv parsing.
- **Dead-end candidates** — modules with no inbound edges. **Only trust this
once the entry-point and call-edge types above are all in the graph**, and
suppress the dead claim for any module that could be the target of an
unresolved dynamic call. A naive grep-only graph will mark most CICS
programs dead.
For COBOL fixed-format, slice columns 8-72 and skip `*` indicator lines
(column 7) before regex matching, or you'll match sequence numbers and
commented-out code.
Save the script as `analysis/$1/extract_topology.py` (or `.sh`) so it can be
re-run and audited. Have it write a machine-readable
`analysis/$1/topology.json` and print a human summary. Run it; show the
summary (cap at ~200 lines for very large estates).
## Render
From the extracted data, generate **three Mermaid diagrams** and write them
to `analysis/$1/TOPOLOGY.html` as a self-contained page that renders in any
browser.
The HTML page must use: dark `#1e1e1e` background, `#d4d4d4` text,
`#cc785c` for `<h2>`/accents, `system-ui` font, all CSS **inline** (no
external stylesheets). Load Mermaid from a CDN in `<head>`:
```html
<script type="module">
import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';
mermaid.initialize({ startOnLoad: true, theme: 'dark' });
</script>
```
Each diagram goes in a `<pre class="mermaid">...</pre>` block. Do **not**
wrap diagrams in markdown ` ``` ` fences inside the HTML.
1. **`graph TD` — Module call graph.** Cluster by domain (use `subgraph`).
Highlight entry points in a distinct style. Cap at ~40 nodes — if larger,
show domain-level with one expanded domain.
2. **`graph LR` — Data lineage.** Programs → data stores.
Mark read vs write edges.
3. **`flowchart TD` — Critical path.** Trace ONE end-to-end business flow
(e.g., "monthly billing run" or "process payment") through every program
and data store it touches, in execution order. If production telemetry is
available (see `/modernize-assess` Step 4), annotate each step with its
p50/p99 wall-clock.
Also export the three diagrams as standalone `.mmd` files for re-use:
`analysis/$1/call-graph.mmd`, `analysis/$1/data-lineage.mmd`,
`analysis/$1/critical-path.mmd`.
## Annotate
Below each `<pre class="mermaid">` block in TOPOLOGY.html, add a `<ul>`
with 3-5 **architect observations**: tight coupling clusters, single
points of failure, candidates for service extraction, data stores
touched by too many writers.
## Present
Tell the user to open `analysis/$1/TOPOLOGY.html` in a browser.