diff --git a/plugins/code-modernization/README.md b/plugins/code-modernization/README.md index 65ada5b..048db6d 100644 --- a/plugins/code-modernization/README.md +++ b/plugins/code-modernization/README.md @@ -14,7 +14,15 @@ The discovery commands (`assess`, `map`, `extract-rules`) build artifacts under ## Expected layout -Commands assume the system being modernized lives at `legacy//`. Discovery artifacts go to `analysis//`, transformed code to `modernized//…`. Adjust the paths in the commands or symlink if your layout differs. +Commands take a `` argument and assume the system being modernized lives at `legacy//`. Discovery artifacts go to `analysis//`, transformed code to `modernized//…`. If your codebase lives elsewhere, symlink it in: + +```bash +mkdir -p legacy && ln -s /path/to/your/legacy/codebase legacy/billing +``` + +## Optional tooling + +`/modernize-assess` works best with [`scc`](https://github.com/boyter/scc) (LOC + complexity + COCOMO) or [`cloc`](https://github.com/AlDanial/cloc), and falls back to `find`/`wc` if neither is installed. Portfolio mode also benefits from [`lizard`](https://github.com/terryyin/lizard) (cyclomatic complexity). The commands degrade gracefully without them, but the metrics will be coarser. ## Commands @@ -24,7 +32,7 @@ The commands are designed to be run in order, but each produces a standalone art Inventory the legacy codebase: languages, line counts, complexity, build system, integrations, technical debt, security posture, documentation gaps, and a COCOMO-derived effort estimate. Produces `analysis//ASSESSMENT.md` and `analysis//ARCHITECTURE.mmd`. Spawns `legacy-analyst` (×2) and `security-auditor` in parallel for deep reads. With `--portfolio`, sweeps every subdirectory of a parent directory and writes a sequencing heat-map to `analysis/portfolio.html`. ### `/modernize-map ` -Build a dependency and topology map of the **legacy** system: program/module call graph, data lineage (programs ↔ data stores), entry points, dead-end candidates, and one traced critical-path business flow. Writes a re-runnable extraction script and produces `analysis//TOPOLOGY.html` (rendered Mermaid + architect observations) plus standalone `call-graph.mmd`, `data-lineage.mmd`, and `critical-path.mmd`. +Build a dependency and topology map of the **legacy** system: program/module call graph, data lineage (programs ↔ data stores), entry points, dead-end candidates, and one traced critical-path business flow. Writes a re-runnable extraction script and produces `analysis//topology.json` (machine-readable), `analysis//TOPOLOGY.html` (rendered Mermaid + architect observations), and standalone `call-graph.mmd`, `data-lineage.mmd`, and `critical-path.mmd`. ### `/modernize-extract-rules [module-pattern]` Mine the business rules embedded in the legacy code — calculations, validations, eligibility, state transitions, policies — into Given/When/Then "Rule Cards" with `file:line` citations and confidence ratings. Spawns three `business-rules-extractor` agents in parallel (calculations, validations, lifecycle). Produces `analysis//BUSINESS_RULES.md` and `analysis//DATA_OBJECTS.md`. diff --git a/plugins/code-modernization/agents/security-auditor.md b/plugins/code-modernization/agents/security-auditor.md index f26aac5..e33ee94 100644 --- a/plugins/code-modernization/agents/security-auditor.md +++ b/plugins/code-modernization/agents/security-auditor.md @@ -11,20 +11,28 @@ engineer can fix. ## Coverage checklist -Work through systematically: -- **Injection** (SQL, NoSQL, OS command, LDAP, XPath, template) — trace every - user-controlled input to every sink +Adapt to the target stack — web items don't apply to a batch COBOL system, +mainframe items don't apply to a SPA. Work through what's relevant: + +- **Injection** (SQL, NoSQL, OS command, LDAP, XPath, template, dynamic + DB2 SQL, JCL/PARM injection) — trace every user-controlled input to every sink - **Authentication / session** — hardcoded creds, weak session handling, - missing auth checks on sensitive routes -- **Sensitive data exposure** — secrets in source, weak crypto, PII in logs -- **Access control** — IDOR, missing ownership checks, privilege escalation paths -- **XSS / CSRF** — unescaped output, missing tokens + missing auth checks on sensitive routes/transactions +- **Sensitive data exposure** — secrets in source, weak crypto, PII/PAN/SSN in + logs, cleartext data in copybooks/flat files +- **Access control** — IDOR, missing ownership checks, privilege escalation; + for CICS: missing/permissive RACF transaction & resource definitions, + unguarded admin transactions +- **XSS / CSRF** — unescaped output, missing tokens (web targets only) - **Insecure deserialization** — pickle/yaml.load/ObjectInputStream on untrusted data - **Vulnerable dependencies** — run `npm audit` / `pip-audit` / read manifests and flag versions with known CVEs -- **SSRF / path traversal / open redirect** -- **Security misconfiguration** — debug mode, verbose errors, default creds +- **SSRF / path traversal / open redirect** (web targets only) +- **Input validation** — for CICS/3270: unvalidated BMS field input, + missing length/range/format checks before file/DB writes +- **Security misconfiguration** — debug mode, verbose errors, default creds, + hardcoded passwords/userids in JCL, PROCs, or sign-on programs ## Tooling diff --git a/plugins/code-modernization/commands/modernize-assess.md b/plugins/code-modernization/commands/modernize-assess.md index 44997aa..188c2d4 100644 --- a/plugins/code-modernization/commands/modernize-assess.md +++ b/plugins/code-modernization/commands/modernize-assess.md @@ -23,6 +23,10 @@ cloc --quiet --csv / # LOC by language lizard -s cyclomatic_complexity / 2>/dev/null | tail -1 ``` +If `cloc`/`lizard` are not installed, fall back to `scc /` +(LOC + complexity) or `find` + `wc -l` grouped by extension, and estimate +complexity by counting decision keywords per file. Note which tool you used. + Capture: total SLOC, dominant language, file count, mean & max cyclomatic complexity (CCN). For dependency freshness, locate the manifest (`package.json`, `pom.xml`, `*.csproj`, `requirements*.txt`, @@ -69,6 +73,17 @@ scc legacy/$1 Then run `scc --by-file -s complexity legacy/$1 | head -25` to identify the highest-complexity files. Capture the COCOMO effort/cost estimate scc provides. +If `scc` is not installed, fall back in order: +1. `cloc legacy/$1` for the LOC table, then compute COCOMO-II effort + yourself: `PM = 2.94 × (KSLOC)^1.10` (nominal scale factors). Show the + inputs. +2. If `cloc` is also missing, use `find` + `wc -l` grouped by extension + for LOC, and rank file complexity by counting decision keywords + (`IF`/`EVALUATE`/`WHEN`/`PERFORM` for COBOL; `if`/`for`/`while`/`case`/ + `catch` for C-family). Compute COCOMO from KSLOC as above. + +Note in the assessment which tool was used so the figures are reproducible. + ## Step 2 — Technology fingerprint Identify, with file evidence: @@ -80,12 +95,15 @@ Identify, with file evidence: ## Step 3 — Parallel deep analysis -Spawn three subagents **concurrently** using the Task tool: +Spawn three subagents **in parallel**: 1. **legacy-analyst** — "Build a structural map of legacy/$1: what are the - 5-10 major functional domains, which source files belong to each, and how - do they depend on each other? Return a markdown table + a Mermaid - `graph TD` of domain-level dependencies. Cite file paths." + 5-12 major functional domains (group optional/feature-gated subsystems + under one umbrella), which source files belong to each, and how do they + depend on each other (control flow + shared data)? Return a markdown + table + a Mermaid `graph TD` of domain-level dependencies — use + `subgraph` to cluster and cap at ~40 edges. Cite repo-relative file + paths. Flag dangling references (defined but no source, or unused)." 2. **legacy-analyst** — "Identify technical debt in legacy/$1: dead code, deprecated APIs, copy-paste duplication, god objects/programs, missing diff --git a/plugins/code-modernization/commands/modernize-brief.md b/plugins/code-modernization/commands/modernize-brief.md index 86265cd..28eeb62 100644 --- a/plugins/code-modernization/commands/modernize-brief.md +++ b/plugins/code-modernization/commands/modernize-brief.md @@ -37,8 +37,11 @@ fewest-dependencies first. For each phase: Render the phases as a Mermaid `gantt` chart. ### 4. Behavior Contract -List the **P0 behaviors** from BUSINESS_RULES.md that MUST be proven -equivalent before any phase ships. These become the regression suite. +List the **P0 rules** from BUSINESS_RULES.md (the ones tagged `Priority: P0` — +money, regulatory, data integrity) that MUST be proven equivalent before any +phase ships. These become the regression suite. Flag any P0 rule with +Confidence < High as a blocker requiring SME confirmation before its phase +starts. ### 5. Validation Strategy State which combination applies: characterization tests, contract tests, diff --git a/plugins/code-modernization/commands/modernize-extract-rules.md b/plugins/code-modernization/commands/modernize-extract-rules.md index 34e6247..1fe7979 100644 --- a/plugins/code-modernization/commands/modernize-extract-rules.md +++ b/plugins/code-modernization/commands/modernize-extract-rules.md @@ -38,6 +38,7 @@ Merge the three result sets. Deduplicate. For each distinct rule, write a ``` ### RULE-NNN: **Category:** Calculation | Validation | Lifecycle | Policy +**Priority:** P0 | P1 | P2 **Source:** `path/to/file.ext:line-line` **Plain English:** One sentence a business analyst would recognize. **Specification:** @@ -47,11 +48,18 @@ Merge the three result sets. Deduplicate. For each distinct rule, write a [And ] **Parameters:** **Edge cases handled:** -**Confidence:** High | Medium | Low — +**Suspected defect:** +**Confidence:** High | Medium | Low — ``` +Priority heuristic — default to **P1**. Assign **P0** if the rule moves money, +enforces a regulatory/compliance requirement, or guards data integrity (and +flag P0 rules at ` joined with JCL `DD` + statements (this is the *only* way to attribute file I/O to a program). + - COBOL/CICS online: `EXEC CICS READ/WRITE/REWRITE/DELETE/STARTBR/READNEXT/ + READPREV ... FILE(...)` joined with `DEFINE FILE` in the CSD. + - DB2: `EXEC SQL ... END-EXEC` table references — *not* JCL DD; DB2 access + is via plan/package binds. + - BMS: `SEND MAP`/`RECEIVE MAP` ↔ map source under `bms/` and copybooks + under `cpy-bms/` (or wherever the maps live). + - Java: JPA/MyBatis entities & tables. Node: model files. +- **Entry points** — whatever the stack's outermost invokers are. Mainframe: + JCL `EXEC PGM=` steps **and** CICS `DEFINE TRANSACTION ... PROGRAM(...)` + from the CSD — without the CSD, every online program looks unreachable. + Web: HTTP routes. CLI: argv parsing. +- **Dead-end candidates** — modules with no inbound edges. **Only trust this + once the entry-point and call-edge types above are all in the graph**, and + suppress the dead claim for any module that could be the target of an + unresolved dynamic call. A naive grep-only graph will mark most CICS + programs dead. + +For COBOL fixed-format, slice columns 8-72 and skip `*` indicator lines +(column 7) before regex matching, or you'll match sequence numbers and +commented-out code. Save the script as `analysis/$1/extract_topology.py` (or `.sh`) so it can be -re-run and audited. Run it. Show the raw output. +re-run and audited. Have it write a machine-readable +`analysis/$1/topology.json` and print a human summary. Run it; show the +summary (cap at ~200 lines for very large estates). ## Render diff --git a/plugins/code-modernization/commands/modernize-reimagine.md b/plugins/code-modernization/commands/modernize-reimagine.md index c676601..c16523a 100644 --- a/plugins/code-modernization/commands/modernize-reimagine.md +++ b/plugins/code-modernization/commands/modernize-reimagine.md @@ -57,8 +57,9 @@ Enter plan mode. Present the architecture. Wait for approval. ## Phase E — Parallel scaffolding -For each service in the approved architecture (cap at 3 for the demo), spawn -a **general-purpose agent in parallel**: +For each service in the approved architecture (cap at 3 to keep the run +tractable; tell the user which you deferred), spawn a **general-purpose agent +in parallel**: "Scaffold the service per analysis/$1/REIMAGINED_ARCHITECTURE.md and AI_NATIVE_SPEC.md. Create: project skeleton, domain model, API stubs