Morgan Lunt 22a1b25977
Harden code-modernization plugin from a real CardDemo dry run
Fixes found by running the discovery workflow against the AWS CardDemo
mainframe sample (~50 KLOC of COBOL/CICS/JCL/BMS/VSAM):

- modernize-assess: add scc -> cloc -> find/wc fallback chain with the
  COCOMO-II formula so Step 1 works when scc isn't installed; same for
  portfolio-mode cloc/lizard. Drop the reference to a specific
  agent-spawning tool name (just "in parallel"). Sharpen the structural-
  map subagent prompt: 5-12 domains, subgraph clustering, ~40-edge cap,
  repo-relative paths, dangling-reference check.
- modernize-map: expand the parse-target list with the things a
  literal-minded reader would miss on a real mainframe codebase — CICS
  CSD DEFINE TRANSACTION/FILE for entry points and online file I/O,
  EXEC CICS file ops, SELECT...ASSIGN TO joined with JCL DD,
  EXEC SQL table refs (not JCL DD), SEND/RECEIVE MAP, dynamic
  data-name XCTL resolution, COBOL fixed-format column slicing. Without
  these the dead-code list is wrong (most CICS programs look unreachable).
  Also write a machine-readable topology.json alongside the summary.
- modernize-extract-rules: add a Priority (P0/P1/P2) field with a
  heuristic, and an optional Suspected-defect field. modernize-brief
  reads P0 rules to build the behavior contract, but the Rule Card had
  no priority slot — the chain was broken.
- modernize-brief: read the new P0 tags; flag low-confidence P0 rules as
  SME blockers.
- modernize-reimagine: drop "for the demo" wording.
- security-auditor agent: add mainframe/COBOL coverage items (RACF,
  JCL/PROC creds, BMS field validation, DB2 dynamic SQL, copybook PII)
  and mark web-only items as such so it adapts to the target stack.
- README: add Optional Tooling section and a symlink example for the
  expected layout.
2026-05-11 16:28:27 -07:00

4.4 KiB

description argument-hint
Dependency & topology mapping — call graphs, data lineage, batch flows, rendered as navigable diagrams <system-dir>

Build a dependency and topology map of legacy/$1 and render it visually.

The assessment gave us domains. Now go one level deeper: how do the pieces connect? This is the map an engineer needs before touching anything.

What to produce

Write a one-off analysis script (Python or shell — your choice) that parses the source under legacy/$1 and extracts the four datasets below. Cover the parse targets that are real for the stack you're looking at — these are the ones LLMs reliably miss:

  • Program/module call graph — who calls whom.
    • COBOL/CICS: CALL '...' and EXEC CICS LINK/XCTL PROGRAM(...). Most PROGRAM(...) targets are data-names, not literals — resolve them against working-storage VALUE clauses and any menu/route copybooks before declaring an edge unresolvable.
    • Java: class-level imports/invocations. Node: require/import.
  • Data dependency graph — which programs read/write which data stores.
    • COBOL batch: SELECT ... ASSIGN TO <ddname> joined with JCL DD statements (this is the only way to attribute file I/O to a program).
    • COBOL/CICS online: EXEC CICS READ/WRITE/REWRITE/DELETE/STARTBR/READNEXT/ READPREV ... FILE(...) joined with DEFINE FILE in the CSD.
    • DB2: EXEC SQL ... END-EXEC table references — not JCL DD; DB2 access is via plan/package binds.
    • BMS: SEND MAP/RECEIVE MAP ↔ map source under bms/ and copybooks under cpy-bms/ (or wherever the maps live).
    • Java: JPA/MyBatis entities & tables. Node: model files.
  • Entry points — whatever the stack's outermost invokers are. Mainframe: JCL EXEC PGM= steps and CICS DEFINE TRANSACTION ... PROGRAM(...) from the CSD — without the CSD, every online program looks unreachable. Web: HTTP routes. CLI: argv parsing.
  • Dead-end candidates — modules with no inbound edges. Only trust this once the entry-point and call-edge types above are all in the graph, and suppress the dead claim for any module that could be the target of an unresolved dynamic call. A naive grep-only graph will mark most CICS programs dead.

For COBOL fixed-format, slice columns 8-72 and skip * indicator lines (column 7) before regex matching, or you'll match sequence numbers and commented-out code.

Save the script as analysis/$1/extract_topology.py (or .sh) so it can be re-run and audited. Have it write a machine-readable analysis/$1/topology.json and print a human summary. Run it; show the summary (cap at ~200 lines for very large estates).

Render

From the extracted data, generate three Mermaid diagrams and write them to analysis/$1/TOPOLOGY.html as a self-contained page that renders in any browser.

The HTML page must use: dark #1e1e1e background, #d4d4d4 text, #cc785c for <h2>/accents, system-ui font, all CSS inline (no external stylesheets). Load Mermaid from a CDN in <head>:

<script type="module">
  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';
  mermaid.initialize({ startOnLoad: true, theme: 'dark' });
</script>

Each diagram goes in a <pre class="mermaid">...</pre> block. Do not wrap diagrams in markdown ``` fences inside the HTML.

  1. graph TD — Module call graph. Cluster by domain (use subgraph). Highlight entry points in a distinct style. Cap at ~40 nodes — if larger, show domain-level with one expanded domain.

  2. graph LR — Data lineage. Programs → data stores. Mark read vs write edges.

  3. flowchart TD — Critical path. Trace ONE end-to-end business flow (e.g., "monthly billing run" or "process payment") through every program and data store it touches, in execution order. If production telemetry is available (see /modernize-assess Step 4), annotate each step with its p50/p99 wall-clock.

Also export the three diagrams as standalone .mmd files for re-use: analysis/$1/call-graph.mmd, analysis/$1/data-lineage.mmd, analysis/$1/critical-path.mmd.

Annotate

Below each <pre class="mermaid"> block in TOPOLOGY.html, add a <ul> with 3-5 architect observations: tight coupling clusters, single points of failure, candidates for service extraction, data stores touched by too many writers.

Present

Tell the user to open analysis/$1/TOPOLOGY.html in a browser.