Skip to main content

Diagnose failures with Doctor

io.github.shafthq:shaft-doctor collects explicitly selected local test evidence into a portable, redacted bundle and applies ordered deterministic rules. It does not require shaft-ai, a provider credential, or network access. The complete baseline works with pilot.ai.enabled=false.

Optional provider analysis is advisory only. It is disabled unless explicitly requested, receives only minimized already-redacted evidence, and never replaces the deterministic diagnosis, findings, confidence, or remediation.

Outputs

Each analysis writes:

  • doctor-evidence.json: versioned EvidenceBundle with checksums, provenance, relative paths, size-limit decisions, and a redaction summary;
  • doctor-report.json: the bundle plus versioned Diagnosis, cited Finding records, confidence, uncertainty, and Remediation actions, plus a separately identified advisory when provider analysis is requested;
  • doctor-report.md: a portable human-readable report and evidence index;
  • doctor-triage.json and doctor-triage.md: deterministic counts for failing attempts, retry-hidden failures, recurring signatures, primary signature, summary, and cited evidence IDs;
  • execution-intelligence.json and execution-intelligence.md: a compact execution trend summary with primary cause, confidence, hidden retry count, and recurring failure count;
  • artifacts/: approved binary evidence such as screenshots.

Reports contain no original absolute machine paths. Evidence IDs and bundle IDs are content-derived, and JSON formatting uses LF line endings so repeated analysis of identical inputs is byte stable.

Run Doctor

java -cp "$MCP_CP" com.shaft.mcp.ShaftMcpApplication doctor analyze --input allure-results --allowed-root "$PWD" --output-dir target/shaft-doctor

The command writes doctor-evidence.json, doctor-report.json, doctor-report.md, doctor-triage.*, and execution-intelligence.* under target/shaft-doctor.

From an MCP chat:

Use doctor_analyze_failed_allure on the allure-results directory. Allow only the current project root, write results to target/shaft-doctor, and do not collect screenshots.

CLI reference

The MCP main class exposes Doctor as a local command. Use ; instead of : in MCP_CP on Windows:

MCP_CP="shaft-mcp/target/shaft-mcp-<version>.jar:shaft-mcp/target/dependency/*"
MCP_MAIN="com.shaft.mcp.ShaftMcpApplication"
java -cp "$MCP_CP" "$MCP_MAIN" doctor analyze \
--input allure-results \
--input target/shaft-logs \
--allowed-root "$PWD" \
--output-dir target/shaft-doctor \
--minimum-results 1

Every readable input must resolve under an explicit allowed root. Symlink targets are resolved before collection. The output directory must also be inside a declared root. Add repeated --history options to correlate recurring signatures from older doctor-evidence.json files.

Screenshots and page snapshots are excluded by default. Retain them only with explicit approval:

java -cp "$MCP_CP" "$MCP_MAIN" doctor analyze \
--input allure-results \
--allowed-root "$PWD" \
--output-dir target/shaft-doctor \
--include-screenshots \
--include-page-snapshots

Use --max-item-bytes and --max-bundle-bytes to lower the conservative retention limits. Use --minimum-results when the expected run size is known; an empty, malformed, truncated, or unexpectedly small Allure run is reported as incomplete and is never interpreted as successful.

Optional provider advisory

Add shaft-ai when invoking DoctorAnalyzer.analyzeWithAi(...) directly. The shaft-mcp runtime classpath includes the OpenAI, Anthropic, Gemini, and Ollama adapters. CLI provider analysis also requires --ai; Pilot properties must independently enable the provider, processing location, model, and every submitted evidence category.

Local Ollama example:

java \
-Dpilot.ai.enabled=true \
-Dpilot.ai.provider=ollama \
-Dpilot.ai.consent.local=true \
-Dpilot.ai.allowedEvidenceCategories=TEXT,LOG,CONFIGURATION \
-Dpilot.ai.ollama.model=<local-model> \
-cp "$MCP_CP" "$MCP_MAIN" doctor analyze \
--input allure-results \
--allowed-root "$PWD" \
--output-dir target/shaft-doctor \
--ai

Ollama defaults to http://127.0.0.1:11434/api/chat. Changing the endpoint does not weaken consent, redaction, minimization, schema validation, or evidence-ID checks.

For OpenAI, Anthropic, or Gemini, select the provider and model, approve remote processing, and approve the same evidence categories. Credentials remain in the provider-specific environment variable documented in optional provider controls; they are never Doctor arguments or report fields.

Doctor submits the deterministic diagnosis, its explicit uncertainty, and only the textual evidence cited by deterministic findings. Unknown cases may submit the smallest available textual evidence set. Provider output must match the versioned shaft-doctor-advisory-1.0 schema and may contain observations, hypotheses with confidence, missing evidence, recommended actions, and limitations. References outside the submitted evidence-ID allowlist reject the entire advisory. Uncited claims and hypotheses that contradict the deterministic primary cause are visibly marked.

Timeout, rate limit, invalid credentials, unavailable provider, malformed JSON, schema violation, oversized output, invented evidence, and budget exhaustion produce an explicit fallback advisory while retaining the complete deterministic report. Reports contain provider/model/configuration identifiers, duration, usage when available, cache state, and a safe fallback reason. They never contain credentials, raw provider responses, or hidden reasoning.

Use --ai-cache to explicitly cache successful safe structured advisories under the output directory. Cache keys include the evidence bundle checksum, deterministic diagnosis checksum, and a non-secret provider/configuration checksum. Failures and raw evidence are never cached.

MCP

doctor_analyze_failed_allure accepts explicit input paths, historical bundle paths, allowed roots, an output directory, screenshot/page-snapshot approvals, and the minimum expected Allure result count. The tool calls the same DoctorAnalyzer used by the CLI. It remains deterministic when Pilot AI is disabled; when the MCP server is explicitly started with an enabled provider, the same separate advisory and fallback rules apply.

Use doctor_analyze_failed_allure and doctor_suggest_fix for default Selenium/WebDriver remediation snippets. Use playwright_doctor_analyze_failed_allure and playwright_doctor_suggest_fix when the failed test is written with SHAFT.GUI.Playwright; the evidence model is the same, but the returned code blocks use Playwright assertions and actions.

ChatGPT, Codex, Claude, Gemini, and GitHub Copilot can invoke doctor_analyze_failed_allure as external MCP clients. Their model authentication stays in the client and is not ingested by SHAFT. Copilot is MCP interoperability, not a generic Copilot API-key adapter. Download the credential-free representative invocations.

MCP healer loop

healer_run_failed_test builds on Doctor for failing Selenium tests. It reruns an allowlisted Maven test command under guardrails, snapshots the populated Allure results that changed during each attempt, and sends the fresh failing evidence to doctor_analyze_failed_allure before returning repair suggestions. Use playwright_healer_run_failed_test for SHAFT.GUI.Playwright tests; it uses the same execution guardrails and sends evidence through the Playwright Doctor tool surface.

The healer gives the MCP agent an explicit replay handoff: the agent may use its own LLM plus SHAFT MCP browser, DOM, screenshot, element, and natural-action tools to inspect WebDriver failures, or the playwright_* inspection and replay tools for Playwright failures, even when no SHAFT provider API key is configured. Configured SHAFT provider advisories remain optional and require the same Pilot consent as Doctor.

The boundary is still review-only. The healer can suggest locator, wait, test-data, assertion, or setup fixes, or report a suspected product bug, but it does not edit files, skip tests, quarantine tests, publish branches, or bypass user confirmation.

Reviewed repair proposals

Doctor repair is a separate, approval-gated workflow. propose-fix requires an exact 40-character base commit SHA, explicit repository-relative file allowlists, structured full-file patches, and tokenized Maven validation commands. It creates codex/doctor-<issue-or-session>-<proposal> in a temporary Git worktree, applies changes only there, and returns a persisted manifest with the complete unified diff, patch checksums, diagnosis/evidence references, exact validation commands, populated Allure counts, residual risk, rollback guidance, and a one-proposal approval token.

Example repair-input.json:

{
"patches": [
{
"path": "src/test/java/example/CheckoutTest.java",
"operation": "REPLACE",
"content": "package example;\n\nfinal class CheckoutTest {}\n",
"rationale": "Apply the reviewed diagnosis.",
"evidenceIds": ["allure-result-1"]
}
],
"validationCommands": [
["mvn", "-pl", "shaft-engine", "-am", "test", "-Dtest=CheckoutTest"],
["mvn", "-pl", "shaft-engine", "-am", "compile", "-DskipTests"]
]
}
java -cp "$MCP_CP" "$MCP_MAIN" doctor propose-fix \
--repository "$PWD" \
--base-sha <full-approved-sha> \
--diagnosis target/shaft-doctor/doctor-report.json \
--evidence-bundle target/shaft-doctor/doctor-evidence.json \
--issue 2857 \
--allowed-path src/test/java/example/CheckoutTest.java \
--repair-input repair-input.json \
--output-dir target/shaft-doctor/repairs

Only Maven compile, test, package, install, verification, Surefire/Failsafe, and JavaDoc goals are accepted. Commands are executed as argument arrays without a shell. Release, deployment, SCM, Versions Plugin, shell metacharacter, and arbitrary executable input is rejected. Test-running commands are forced to include -DheadlessExecution=true; a zero process exit alone is insufficient when populated passing Allure results are expected. Maven runs offline by default. CLI users must add --approve-network-validation, or MCP clients must set networkValidationApproved=true, before validation may access the network.

--ai can request an optional provider-generated patch. The provider receives only the deterministic diagnosis and exact approved regular source files under explicit TEXT and SOURCE consent. Output must match the versioned repair patch schema. Invented paths, commands, symlinks, binary or oversized content, protected workflows, generated paths, and secret-like material are rejected. Provider, consent, timeout, or schema failure returns no patch and does not create a worktree.

Publishing is always a later explicit action:

java -cp "$MCP_CP" "$MCP_MAIN" doctor publish-draft-pr \
--manifest target/shaft-doctor/repairs/repair-proposal-<id>.json \
--approval-token <token-from-reviewed-proposal> \
--approve

Failed validation blocks publication by default. An explicit --override-failed-validation also requires --override-rationale, which is recorded in the manifest and pull-request body. Publication stages only the manifested files, creates a Doctor-identified commit, pushes the dedicated branch, and creates or reuses an open draft pull request through authenticated gh. It never marks a PR ready, merges, releases, deploys, resets, cleans, or switches the user's current worktree. The temporary worktree is removed after publication or explicit cancellation; the published branch remains.

The MCP equivalents are doctor_propose_fix and doctor_publish_draft_pr. The latter requires the same separate approved boolean and exact proposal token.

Evidence

The collector recognizes populated Allure *-result.json, normalized exception chains, SHAFT logs/action history, environment metadata, dependency/build metadata, configuration summaries, screenshots, and page snapshots. Text and structured JSON are redacted before retention or hashing. Password fields, authorization and cookie headers, tokens, private keys, common credential fields, and configured sensitive names are replaced without retaining their original values.

Allure attempts are grouped by history ID and ordered by their recorded start time. Non-final failed, broken, and skipped attempts remain visible even when the final attempt passes. Historical bundles are optional and can be copied with their relative artifacts/ directory for offline analysis.

Diagnosis

The ordered rule engine classifies primary and contributing causes as:

  • PRODUCT
  • TEST
  • LOCATOR
  • DATA
  • TIMING_SYNCHRONIZATION
  • ENVIRONMENT_CONFIGURATION
  • INFRASTRUCTURE
  • UNKNOWN

Rules cover locator-not-found, duplicate, stale, hidden/covered/interactable, frame/window context, assertion and test-data mismatches, timeout symptoms, driver/browser startup, Grid/Appium/network/filesystem/resource failures, setup/cleanup failures, parallel shared-state symptoms, retry-hidden failures, and recurring historical signatures. Every inference cites evidence IDs and is kept separate from observations. Unknown and contradictory cases remain unknown and list the missing evidence needed to narrow them.

Sharing

Review doctor-evidence.json, doctor-report.json, and any approved artifacts/ before sharing. Screenshots and page source can contain personal or confidential data even after deterministic redaction and therefore remain opt-in. Doctor never uploads evidence automatically.

Validation

Run from the repository root:

mvn -pl shaft-doctor,shaft-ai,shaft-mcp -am test
mvn -pl shaft-doctor -am javadoc:javadoc