Diagnose failures with Doctor

Name: SHAFT Engine
Author: ShaftHQ

io.github.shafthq:shaft-doctor collects explicitly selected local test evidence into a portable, redacted bundle and applies ordered deterministic rules. It does not require shaft-ai, a provider credential, or network access. The complete baseline works with pilot.ai.enabled=false.

Optional provider analysis is advisory only. It is disabled unless explicitly requested, receives only minimized already-redacted evidence, and never replaces the deterministic diagnosis, findings, confidence, or remediation.

Outputs

Each analysis writes:

doctor-evidence.json: versioned EvidenceBundle with checksums, provenance, relative paths, size-limit decisions, and a redaction summary;
doctor-report.json: the bundle plus versioned Diagnosis, cited Finding records, confidence, uncertainty, and Remediation actions, plus a separately identified advisory when provider analysis is requested;
doctor-report.md: a portable human-readable report and evidence index, including a Ranked Root Causes section listing each candidate cause with its trust percentage, rationale, and a copy/paste-ready fix prompt;
doctor-triage.json and doctor-triage.md: deterministic counts for failing attempts, retry-hidden failures, recurring signatures, primary signature, summary, and cited evidence IDs;
execution-intelligence.json and execution-intelligence.md: a compact execution trend summary with primary cause, confidence, hidden retry count, and recurring failure count;
artifacts/: approved binary evidence such as screenshots.

Reports contain no original absolute machine paths. Evidence IDs and bundle IDs are content-derived, and JSON formatting uses LF line endings so repeated analysis of identical inputs is byte stable.

Run Doctor

Use the Doctor command on Connect shaft-mcp. The canonical doctor analyze command and its --allowed-root option live there.

The command writes doctor-evidence.json, doctor-report.json, doctor-report.md, doctor-triage.*, and execution-intelligence.* under target/shaft-doctor.

Doctor input is flexible: point it at an allure-results directory, an individual populated *-result.json file, or a SHAFT single-file Allure HTML report (AllureReport.html or a timestamped variant such as AllureReport-20260713-101500.html) -- Doctor extracts the embedded results from the HTML report the same way it reads a results directory. Naming no path at all auto-discovers the newest evidence in the workspace: the most recently populated allure-results directory, or otherwise the newest AllureReport.html. The SHAFT Assistant routes natural-language phrasing such as "diagnose my last run" or "why did my tests fail" through the same auto-discovery.

From an MCP chat:

Use doctor_analyze_failed_allure on the allure-results directory. Allow only the current project root, write results to target/shaft-doctor, and do not collect screenshots.

CLI reference

The MCP main class exposes Doctor as a local command. Keep runnable MCP commands on the MCP command reference so classpath, Windows separator, and local/remote setup guidance stay in one place.

Every readable input must resolve under an explicit allowed root. Symlink targets are resolved before collection. The output directory must also be inside a declared root. Add repeated --history options to correlate recurring signatures from older doctor-evidence.json files.

Screenshots and page snapshots are excluded by default. Retain them only with explicit approval through the Doctor command options documented on Connect shaft-mcp.

Use --max-item-bytes and --max-bundle-bytes to lower the conservative retention limits. Use --minimum-results when the expected run size is known; an empty, malformed, truncated, or unexpectedly small Allure run is reported as incomplete and is never interpreted as successful.

Optional provider advisory

Add shaft-ai when invoking DoctorAnalyzer.analyzeWithAi(...) directly. The shaft-mcp runtime classpath includes the OpenAI, Anthropic, Gemini, and Ollama adapters. CLI provider analysis also requires --ai; Pilot properties must independently enable the provider, processing location, model, and every submitted evidence category.

Local Ollama follows the same Doctor command path after Pilot properties enable local advisory analysis.

Ollama defaults to http://127.0.0.1:11434/api/chat. Changing the endpoint does not weaken consent, redaction, minimization, schema validation, or evidence-ID checks.

For OpenAI, Anthropic, or Gemini, select the provider and model, approve remote processing, and approve the same evidence categories. Credentials remain in the provider-specific environment variable documented in optional provider controls; they are never Doctor arguments or report fields.

Doctor submits the deterministic diagnosis, its explicit uncertainty, and only the textual evidence cited by deterministic findings. Unknown cases may submit the smallest available textual evidence set. Provider output must match the versioned shaft-doctor-advisory-1.0 schema and may contain observations, hypotheses with confidence, missing evidence, recommended actions, and limitations. References outside the submitted evidence-ID allowlist reject the entire advisory. Uncited claims and hypotheses that contradict the deterministic primary cause are visibly marked.

Timeout, rate limit, invalid credentials, unavailable provider, malformed JSON, schema violation, oversized output, invented evidence, and budget exhaustion produce an explicit fallback advisory while retaining the complete deterministic report. Reports contain provider/model/configuration identifiers, duration, usage when available, cache state, and a safe fallback reason. They never contain credentials, raw provider responses, or hidden reasoning.

Use --ai-cache to explicitly cache successful safe structured advisories under the output directory. Cache keys include the evidence bundle checksum, deterministic diagnosis checksum, and a non-secret provider/configuration checksum. Failures and raw evidence are never cached.

MCP

doctor_analyze_failed_allure accepts explicit input paths, historical bundle paths, allowed roots, an output directory, screenshot/page-snapshot approvals, and the minimum expected Allure result count. The tool calls the same DoctorAnalyzer used by the CLI. It remains deterministic when Pilot AI is disabled; when the MCP server is explicitly started with an enabled provider, the same separate advisory and fallback rules apply.

doctor_analyze_failed_allure and doctor_suggest_fix default to Selenium/ WebDriver remediation snippets. Pass backend=playwright to either tool when the failed test is written with SHAFT.GUI.Playwright (absorbing the former playwright_doctor_analyze_failed_allure/playwright_doctor_suggest_fix tool names); the evidence model is the same, but the returned code blocks use Playwright assertions and actions.

doctor_suggest_fix caps its remediation output at the top 5 ranked causes and tags each block with its category and trust score, for example LOCATOR (trust 82%), so an MCP client (including the IntelliJ Assistant's /doctor command) can present causes in trust order without re-deriving the ranking itself.

ChatGPT, Codex, Claude, Gemini, and GitHub Copilot can invoke doctor_analyze_failed_allure as external MCP clients. Their model authentication stays in the client and is not ingested by SHAFT. Copilot is MCP interoperability, not a generic Copilot API-key adapter. Download the credential-free representative invocations.

MCP healer loop

healer_run_failed_test builds on Doctor for failing Selenium tests. It reruns an allowlisted Maven test command under guardrails, snapshots the populated Allure results that changed during each attempt, and sends the fresh failing evidence to doctor_analyze_failed_allure before returning repair suggestions. Pass backend=playwright for SHAFT.GUI.Playwright tests (absorbing the former playwright_healer_run_failed_test tool name); it uses the same execution guardrails and sends evidence through the same Doctor tool surface.

The healer gives the MCP agent an explicit replay handoff: the agent may use its own LLM plus the same SHAFT MCP browser, DOM, screenshot, and element inspection/replay tools to inspect either WebDriver or Playwright failures, dispatching to whichever engine the session's driver_initialize call selected, even when no SHAFT provider API key is configured. Configured SHAFT provider advisories remain optional and require the same Pilot consent as Doctor.

The boundary is still review-only. The healer can suggest locator, wait, test-data, assertion, or setup fixes, or report a suspected product bug, but it does not edit files, skip tests, quarantine tests, publish branches, or bypass user confirmation.

Reviewed repair proposals

Doctor repair is a separate, approval-gated workflow. propose-fix requires an exact 40-character base commit SHA, explicit repository-relative file allowlists, structured full-file patches, and tokenized Maven validation commands. It creates codex/doctor-<issue-or-session>-<proposal> in a temporary Git worktree, applies changes only there, and returns a persisted manifest with the complete unified diff, patch checksums, diagnosis/evidence references, exact validation commands, populated Allure counts, residual risk, rollback guidance, and a one-proposal approval token.

Example repair-input.json:

{
  "patches": [
    {
      "path": "src/test/java/example/CheckoutTest.java",
      "operation": "REPLACE",
      "content": "package example;\n\nfinal class CheckoutTest {}\n",
      "rationale": "Apply the reviewed diagnosis.",
      "evidenceIds": ["allure-result-1"]
    }
  ],
  "validationCommands": [
    ["mvn", "-pl", "shaft-engine", "-am", "test", "-Dtest=CheckoutTest"],
    ["mvn", "-pl", "shaft-engine", "-am", "compile", "-DskipTests"]
  ]
}

Run the reviewed repair command from Connect shaft-mcp after preparing the reviewed input.

Only Maven compile, test, package, install, verification, Surefire/Failsafe, and JavaDoc goals are accepted. Commands are executed as argument arrays without a shell. Release, deployment, SCM, Versions Plugin, shell metacharacter, and arbitrary executable input is rejected. Test-running commands are forced to include -DheadlessExecution=true; a zero process exit alone is insufficient when populated passing Allure results are expected. Maven runs offline by default. CLI users must add --approve-network-validation, or MCP clients must set networkValidationApproved=true, before validation may access the network.

--ai can request an optional provider-generated patch. The provider receives only the deterministic diagnosis and exact approved regular source files under explicit TEXT and SOURCE consent. Output must match the versioned repair patch schema. Invented paths, commands, symlinks, binary or oversized content, protected workflows, generated paths, and secret-like material are rejected. Provider, consent, timeout, or schema failure returns no patch and does not create a worktree.

Publishing is always a later explicit action using the matching command from Connect shaft-mcp.

Failed validation blocks publication by default. An explicit --override-failed-validation also requires --override-rationale, which is recorded in the manifest and pull-request body. Publication stages only the manifested files, creates a Doctor-identified commit, pushes the dedicated branch, and creates or reuses an open draft pull request through authenticated gh. It never marks a PR ready, merges, releases, deploys, resets, cleans, or switches the user's current worktree. The temporary worktree is removed after publication or explicit cancellation; the published branch remains.

The MCP equivalents are doctor_propose_fix and doctor_publish_draft_pr. The latter requires the same separate approved boolean and exact proposal token.

Evidence

The collector recognizes populated Allure *-result.json, normalized exception chains, SHAFT logs/action history, environment metadata, dependency/build metadata, configuration summaries, shaft-diagnostics.zip attachments, screenshots, and page snapshots. When a failed SHAFT test attaches shaft-diagnostics.zip, Doctor reads only diagnostics.json from the archive and treats it as sanitized SHAFT log evidence for deterministic rules and MCP handoff. Text and structured JSON are redacted before retention or hashing. Password fields, authorization and cookie headers, tokens, private keys, common credential fields, and configured sensitive names are replaced without retaining their original values.

Allure attempts are grouped by history ID and ordered by their recorded start time. Non-final failed, broken, and skipped attempts remain visible even when the final attempt passes. Historical bundles are optional and can be copied with their relative artifacts/ directory for offline analysis.

Diagnosis

The ordered rule engine classifies primary and contributing causes as:

PRODUCT
TEST
LOCATOR
DATA
TIMING_SYNCHRONIZATION
ENVIRONMENT_CONFIGURATION
INFRASTRUCTURE
UNKNOWN

Rules cover locator-not-found, duplicate, stale, hidden/covered/interactable, frame/window context, assertion and test-data mismatches, timeout symptoms, driver/browser startup, Grid/Appium/network/filesystem/resource failures, setup/cleanup failures, parallel shared-state symptoms, retry-hidden failures, and recurring historical signatures. Every inference cites evidence IDs and is kept separate from observations. Unknown and contradictory cases remain unknown and list the missing evidence needed to narrow them.

Each ranked cause additionally carries a deterministic trust percentage (5-95%) from a factor model: a confidence band for the matched rule, the evidence citations backing it, rule precedence when multiple rules match the same signature, minus a contradiction penalty when another finding disagrees. The same evidence always yields the same trust score. doctor-report.md's Ranked Root Causes section lists causes highest-trust first, each with its rationale and a fenced, copy/paste-ready fix prompt that can be sent to any AI assistant or agent to apply the fix.

Review doctor-evidence.json, doctor-report.json, and any approved artifacts/ before sharing. Screenshots and page source can contain personal or confidential data even after deterministic redaction and therefore remain opt-in. Doctor never uploads evidence automatically.

Validation

Run from the repository root:

mvn -pl shaft-doctor,shaft-ai,shaft-mcp -am test
mvn -pl shaft-doctor -am javadoc:javadoc

Outputs​

Run Doctor​

CLI reference​

Optional provider advisory​

MCP​

MCP healer loop​

Reviewed repair proposals​

Evidence​

Diagnosis​

Sharing​

Validation​

Related​