Morgan Lunt 1c4a5cfded
code-modernization: interactive topology map, preflight command, persona flows
modernize-map previously rendered the call graph and data lineage as
static Mermaid diagrams, which become unreadable once a node has ~10+
edges — exactly the shape of real legacy systems. It now builds an
interactive viewer from a shipped template (assets/topology-viewer.html):
a zoomable circle-pack of domains/modules sized by LOC, rendered to
canvas with level-of-detail reveal, dependency edges with per-kind
toggles, search with fly-to, a per-node detail sidebar, and a flow
walkthrough mode. Small domain-level .mmd exports remain for docs.

- topology.json now has a documented schema (hierarchy + edges + entry
  points + observations + flows) consumed by the viewer
- map traces 2-4 business flows anchored to personas (claimant,
  operator, auditor), each step in plain business language mapped to
  the modules that implement it; the viewer plays them as numbered
  paths
- brief gains a Business Walkthroughs section connecting each persona
  flow to the phase that replaces it
- new modernize-preflight command: detects the stack, checks analysis
  tooling, smoke-compiles a real source file with the legacy toolchain,
  inventories missing copybooks/descriptors/binary-only artifacts, and
  writes a per-command readiness verdict
- transform now verifies legacy + target toolchains before its plan
  gate instead of failing at test time
- README: commands updated, optional-tooling section reframed as 'what
  to give Claude'
2026-06-09 08:48:04 -07:00

163 lines
7.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
description: Dependency & topology mapping — call graphs, data lineage, batch flows, rendered as navigable diagrams
argument-hint: <system-dir>
---
Build a **dependency and topology map** of `legacy/$1` and render it visually.
The assessment gave us domains. Now go one level deeper: how do the *pieces*
connect? This is the map an engineer needs before touching anything.
## What to produce
Write a one-off analysis script (Python or shell — your choice) that parses
the source under `legacy/$1` and extracts the four datasets below. Three
principles apply across stacks; getting them wrong produces a misleading map:
1. **Edges live in two places** — direct calls in source, *and* dispatcher/
router calls whose targets are variables (config tables, route maps,
dependency injection, dynamic dispatch). Resolve variables against config
before declaring an edge unresolvable.
2. **The code↔storage join is usually external configuration**, not source —
job/deployment descriptors map logical names to physical stores.
3. **Entry points usually live in deployment config**, not source — without
parsing it, every top-level module looks unreachable.
Extract:
- **Program/module call graph** — direct calls (`CALL`, method invocations,
`import`/`require`) *and* dispatcher calls (`EXEC CICS LINK/XCTL`, DI
container wiring, framework routing, reflection/factory). Resolve variable
call targets against route tables, copybooks, config, or constant pools.
- **Data dependency graph** — which modules read/write which data stores,
joined through the relevant config: `SELECT…ASSIGN TO` ↔ JCL `DD` (batch
COBOL), `EXEC CICS READ/WRITE…FILE()` ↔ CSD `DEFINE FILE` (CICS online),
`EXEC SQL` table refs (embedded SQL), ORM annotations/mappings (Java/.NET),
model files (Node/Python/Ruby). Include UI/screen bindings (BMS maps, JSPs,
templates) — they're dependencies too.
- **Entry points** — whatever the stack's outermost invoker is, read from
where it's defined: JCL `EXEC PGM=` and CICS CSD `DEFINE TRANSACTION`
(mainframe), `web.xml`/route annotations/route files (web), `main()`/argv
parsing (CLI), queue/scheduler subscriptions (event-driven).
- **Dead-end candidates** — modules with no inbound edges. **Only meaningful
once all the entry-point and call-edge types above are in the graph.**
Suppress the dead claim for anything that could be the target of an
unresolved dynamic call. A grep-only graph will mark most dispatcher-driven
modules (CICS programs, Spring controllers, ORM-bound DAOs) dead when they
aren't.
If the source is fixed-column (COBOL columns 872, RPG, etc.), slice the
code area and strip comment lines before regex matching, or you'll match
sequence numbers and commented-out code.
Save the script as `analysis/$1/extract_topology.py` (or `.sh`) so it can be
re-run and audited. Have it write a machine-readable
`analysis/$1/topology.json` and print a human summary. Run it; show the
summary (cap at ~200 lines for very large estates).
`topology.json` must follow this schema — it feeds the interactive viewer:
```json
{
"system": "<display name>",
"root": {
"id": "sys", "name": "<system>", "kind": "system",
"children": [
{ "id": "dom:<domain>", "name": "<Domain>", "kind": "domain",
"children": [
{ "id": "<MODULE>", "name": "<MODULE>", "kind": "module",
"language": "cobol", "loc": 1234, "file": "src/MODULE.cbl" }
] },
{ "id": "dom:data", "name": "Data stores", "kind": "domain",
"children": [
{ "id": "ds:<NAME>", "name": "<NAME>", "kind": "datastore" }
] }
]
},
"edges": [
{ "source": "<id>", "target": "<id>", "kind": "call" }
],
"entryPoints": ["<id>", "..."],
"observations": ["<architect observation>", "..."],
"flows": [
{ "name": "<business flow>", "persona": "<who experiences it>",
"description": "<one sentence, plain language>",
"steps": [
{ "label": "<business-language step>", "nodes": ["<id>", "<id>"] }
] }
]
}
```
- Group leaf modules under `domain` containers (use the domains from
`/modernize-assess` if available). Leaf kinds: `module`, `datastore`,
`job`, `screen`. `loc` drives circle size — include it for modules.
- Edge kinds: `call` (direct), `dispatch` (dynamic/router), `read`,
`write`. Every edge endpoint must be a leaf id that exists in the tree.
- `observations`: 37 architect observations — tight coupling clusters,
single points of failure, service-extraction candidates, data stores
with too many writers.
- `flows` is the **persona walkthrough** section — see below.
## Persona flows
Trace **24 end-to-end business flows**, each anchored to a persona —
the people who experience the system, not the people who maintain it
(e.g. for a benefits system: the claimant, the caseworker, the auditor;
for billing: the customer, the billing operator). For each flow:
- `name` + one-sentence `description` in plain business language —
something a steering committee member relates to ("a claimant files a
weekly claim"), not a data-flow label ("CLM batch ingest").
- `steps`: 38 steps, each with a business-language `label` and the
`nodes` (programs + data stores) that implement that step, in
execution order.
This is the bridge between the technical map and non-technical
stakeholders: the same diagram answers "which program does X" for
engineers and "what happens when someone files a claim" for everyone else.
## Render
`analysis/$1/TOPOLOGY.html` is an **interactive map**: a zoomable
circle-pack of the whole system (domains as containers, modules sized by
LOC) with dependency edges, search, per-node detail sidebar, edge-kind
toggles, and a flow-walkthrough mode that plays each persona flow as a
numbered path. Build it from the template that ships with this plugin —
do not hand-write the viewer:
```bash
python3 - "$CLAUDE_PLUGIN_ROOT/assets/topology-viewer.html" analysis/$1 <<'EOF'
import json, sys
tpl_path, out_dir = sys.argv[1], sys.argv[2]
tpl = open(tpl_path).read()
data = json.dumps(json.load(open(f"{out_dir}/topology.json")))
html = tpl.replace("/*__TOPOLOGY_DATA__*/ null", "/*__TOPOLOGY_DATA__*/ " + data)
open(f"{out_dir}/TOPOLOGY.html", "w").write(html)
print(f"wrote {out_dir}/TOPOLOGY.html ({len(html):,} bytes)")
EOF
```
The viewer loads d3 from a CDN, so opening it needs network access; the
rest is self-contained. If the data injection marker is missing from the
output, the template was not found — check `$CLAUDE_PLUGIN_ROOT`.
Mermaid stays for **small, exportable** diagrams. Generate standalone
`.mmd` files for reuse in docs and PRs — but keep each under ~40 edges;
collapse to domain level if the full graph is bigger (dense Mermaid
becomes unreadable, which is exactly what the interactive map is for):
- `analysis/$1/call-graph.mmd` — domain-level `graph TD`, entry points
highlighted
- `analysis/$1/data-lineage.mmd``graph LR`, programs → data stores,
read vs write marked
- `analysis/$1/critical-path.mmd``flowchart TD` of the primary flow
from `flows`, annotated with p50/p99 wall-clock if telemetry is
available (see `/modernize-assess` Step 4)
## Present
Tell the user to open `analysis/$1/TOPOLOGY.html` in a browser, and to
try: search for a module, click it to see its connections, and pick a
persona flow from the walkthrough dropdown.