Fixes found by running the discovery workflow against the AWS CardDemo mainframe sample (~50 KLOC of COBOL/CICS/JCL/BMS/VSAM): - modernize-assess: add scc -> cloc -> find/wc fallback chain with the COCOMO-II formula so Step 1 works when scc isn't installed; same for portfolio-mode cloc/lizard. Drop the reference to a specific agent-spawning tool name (just "in parallel"). Sharpen the structural- map subagent prompt: 5-12 domains, subgraph clustering, ~40-edge cap, repo-relative paths, dangling-reference check. - modernize-map: expand the parse-target list with the things a literal-minded reader would miss on a real mainframe codebase — CICS CSD DEFINE TRANSACTION/FILE for entry points and online file I/O, EXEC CICS file ops, SELECT...ASSIGN TO joined with JCL DD, EXEC SQL table refs (not JCL DD), SEND/RECEIVE MAP, dynamic data-name XCTL resolution, COBOL fixed-format column slicing. Without these the dead-code list is wrong (most CICS programs look unreachable). Also write a machine-readable topology.json alongside the summary. - modernize-extract-rules: add a Priority (P0/P1/P2) field with a heuristic, and an optional Suspected-defect field. modernize-brief reads P0 rules to build the behavior contract, but the Rule Card had no priority slot — the chain was broken. - modernize-brief: read the new P0 tags; flag low-confidence P0 rules as SME blockers. - modernize-reimagine: drop "for the demo" wording. - security-auditor agent: add mainframe/COBOL coverage items (RACF, JCL/PROC creds, BMS field validation, DB2 dynamic SQL, copybook PII) and mark web-only items as such so it adapts to the target stack. - README: add Optional Tooling section and a symlink example for the expected layout.
4.4 KiB
| description | argument-hint |
|---|---|
| Dependency & topology mapping — call graphs, data lineage, batch flows, rendered as navigable diagrams | <system-dir> |
Build a dependency and topology map of legacy/$1 and render it visually.
The assessment gave us domains. Now go one level deeper: how do the pieces connect? This is the map an engineer needs before touching anything.
What to produce
Write a one-off analysis script (Python or shell — your choice) that parses
the source under legacy/$1 and extracts the four datasets below. Cover
the parse targets that are real for the stack you're looking at — these are
the ones LLMs reliably miss:
- Program/module call graph — who calls whom.
- COBOL/CICS:
CALL '...'andEXEC CICS LINK/XCTL PROGRAM(...). MostPROGRAM(...)targets are data-names, not literals — resolve them against working-storageVALUEclauses and any menu/route copybooks before declaring an edge unresolvable. - Java: class-level imports/invocations. Node:
require/import.
- COBOL/CICS:
- Data dependency graph — which programs read/write which data stores.
- COBOL batch:
SELECT ... ASSIGN TO <ddname>joined with JCLDDstatements (this is the only way to attribute file I/O to a program). - COBOL/CICS online:
EXEC CICS READ/WRITE/REWRITE/DELETE/STARTBR/READNEXT/ READPREV ... FILE(...)joined withDEFINE FILEin the CSD. - DB2:
EXEC SQL ... END-EXECtable references — not JCL DD; DB2 access is via plan/package binds. - BMS:
SEND MAP/RECEIVE MAP↔ map source underbms/and copybooks undercpy-bms/(or wherever the maps live). - Java: JPA/MyBatis entities & tables. Node: model files.
- COBOL batch:
- Entry points — whatever the stack's outermost invokers are. Mainframe:
JCL
EXEC PGM=steps and CICSDEFINE TRANSACTION ... PROGRAM(...)from the CSD — without the CSD, every online program looks unreachable. Web: HTTP routes. CLI: argv parsing. - Dead-end candidates — modules with no inbound edges. Only trust this once the entry-point and call-edge types above are all in the graph, and suppress the dead claim for any module that could be the target of an unresolved dynamic call. A naive grep-only graph will mark most CICS programs dead.
For COBOL fixed-format, slice columns 8-72 and skip * indicator lines
(column 7) before regex matching, or you'll match sequence numbers and
commented-out code.
Save the script as analysis/$1/extract_topology.py (or .sh) so it can be
re-run and audited. Have it write a machine-readable
analysis/$1/topology.json and print a human summary. Run it; show the
summary (cap at ~200 lines for very large estates).
Render
From the extracted data, generate three Mermaid diagrams and write them
to analysis/$1/TOPOLOGY.html as a self-contained page that renders in any
browser.
The HTML page must use: dark #1e1e1e background, #d4d4d4 text,
#cc785c for <h2>/accents, system-ui font, all CSS inline (no
external stylesheets). Load Mermaid from a CDN in <head>:
<script type="module">
import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';
mermaid.initialize({ startOnLoad: true, theme: 'dark' });
</script>
Each diagram goes in a <pre class="mermaid">...</pre> block. Do not
wrap diagrams in markdown ``` fences inside the HTML.
-
graph TD— Module call graph. Cluster by domain (usesubgraph). Highlight entry points in a distinct style. Cap at ~40 nodes — if larger, show domain-level with one expanded domain. -
graph LR— Data lineage. Programs → data stores. Mark read vs write edges. -
flowchart TD— Critical path. Trace ONE end-to-end business flow (e.g., "monthly billing run" or "process payment") through every program and data store it touches, in execution order. If production telemetry is available (see/modernize-assessStep 4), annotate each step with its p50/p99 wall-clock.
Also export the three diagrams as standalone .mmd files for re-use:
analysis/$1/call-graph.mmd, analysis/$1/data-lineage.mmd,
analysis/$1/critical-path.mmd.
Annotate
Below each <pre class="mermaid"> block in TOPOLOGY.html, add a <ul>
with 3-5 architect observations: tight coupling clusters, single
points of failure, candidates for service extraction, data stores
touched by too many writers.
Present
Tell the user to open analysis/$1/TOPOLOGY.html in a browser.