How SHAFT reduces flakiness

Name: SHAFT Engine
Author: ShaftHQ

SHAFT reduces flakiness by moving timing, locator, retry, and evidence capture out of individual tests and into the engine. It does not make every failure pass: product defects, broken test data, and real environment outages should still fail. The difference is that common UI timing and locator churn have one configured path instead of many hand-written sleeps and custom retry loops.

Semantic Locators

Raw Selenium locators often bind tests to implementation details: generated IDs, CSS classes, absolute XPath, or DOM depth. SHAFT gives tests a higher level locator vocabulary:

SHAFT.GUI.Locator.inputField("Email") resolves inputs by user-facing signals such as placeholders, ARIA labels, IDs, adjacent labels, and nearby text.
SHAFT.GUI.Locator.clickableField("Sign in") resolves buttons, links, submit inputs, ARIA-labelled controls, role-based controls, and visible text.
driver.element().type("Email", value) and driver.element().click("Sign in") are intent-shaped overloads that route through those Smart Locators in both WebDriver and Playwright sessions.
ARIA role locators let tests target semantic roles and text instead of brittle markup structure.
driver.act(...) can express a guarded workflow intent, then SHAFT plans it into ordinary browser, element, and touch actions only when natural actions are enabled and the plan passes the configured trust threshold.

SemanticLocators.java
driver.element()
        .type("Email", "qa@example.com")
        .type("Search", "invoice 1001")
        .click("Apply filter");

By search = SHAFT.GUI.Locator.inputField("Search");
By save = SHAFT.GUI.Locator.clickableField("Save");

driver.element()
        .type(search, "invoice 1001")
        .click(save);

The practical effect is simple: when a team renames a CSS class or wraps a button in a new container, the test can keep describing what the user sees. When the user-facing label itself changes, the failing test points to a real product or requirements change instead of a hidden selector detail.

Automatic Synchronization

Most flaky UI tests fail because they click before the browser is ready. SHAFT element actions use a configured fluent wait, poll for retriable Selenium states, scroll the element into view, and then perform the action. Browser and element actions also call lazy-loading synchronization where applicable: SHAFT waits for document readiness, active fetch/XHR quiet time, resource timing changes, jQuery activity, and Angular readiness when those signals exist on the page. For native user actions, SHAFT re-locates the element after this synchronization and verifies that the final target is displayed and enabled before acting on it.

The default element lookup budget comes from defaultElementIdentificationTimeout; condition-specific waits use waitForUiStateTimeout unless you pass a shorter Duration. Lazy-loading synchronization uses waitForLazyLoadingTimeout; its network quiet behavior can be tuned with lazyLoadingNetworkIdleInitialObservationMillis and lazyLoadingNetworkIdleQuietWindowMillis.

Lazy-loading readiness itself is layered, so the default wait adapts to what a page actually does without any test-code changes:

JS readiness engine (default, always on) polls document.readyState, in-flight fetch/XHR/sendBeacon activity, resource-timing changes, jQuery and Angular activity, and a main-thread idle probe (requestIdleCallback → requestAnimationFrame → setTimeout fallback) every lazyLoadingPollingIntervalMillis (default 200ms). Long-lived EventSource/WebSocket connections are observed but never block idle, so SSE and long-poll pages don't pin every action at the timeout ceiling.
DOM-stability quiet window (opt-in, off by default) adds a MutationObserver-backed check: once enabled, readiness also requires lazyLoadingDomStabilityQuietWindowMillis of DOM-mutation silence. It defaults to 0 (disabled) because carousels, tickers, and React re-renders would otherwise pin every action at the 30-second ceiling; turn it on only for pages you know settle and stay settled.
Advisory BiDi network layer (gated on platform.enableBiDi()) tracks in-flight requests via WebDriver BiDi network events when BiDi is enabled. This signal is advisory only: it can extend the quiet window while requests are genuinely in flight (aged out after 10 seconds; SSE/WebSocket upgrades excluded), but it never hard-gates readiness, preserving the same tolerance for long-poll and SSE-style pages as the JS-only path.

src/main/resources/properties/custom.properties
lazyLoadingPollingIntervalMillis=200
lazyLoadingDomStabilityQuietWindowMillis=0
waitForLazyLoadingTimeout=30

For content that only loads on scroll — infinite lists, or sections behind an IntersectionObserver that hydrate only once visible — call scrollToLoadAll() right before a full-page assertion or screenshot. It sweeps the page in bounded, viewport-height steps (up to lazyLoadingScrollSweepMaxSteps, default 20), waiting for lazy-loading readiness between each step, then restores the original scroll position. It is never invoked automatically by any readiness wait — sweeping an entire page is not a safe default before an arbitrary action — so call it explicitly:

ScrollToLoadAll.java
driver.browser().navigateToURL("https://example.test/catalog")
        .scrollToLoadAll()
        .captureScreenshot();

ExplicitStateWait.java
driver.browser().navigateToURL("https://example.test/orders")
        .and().element().click("Refresh orders")
        .waitUntil(webDriver ->
                webDriver.findElement(By.id("order-count")).getText().equals("25"));

Use explicit waits for business states the browser cannot infer: a queue finishing, a toast disappearing, a calculated value appearing, or a backend job reaching a terminal state. Do not add sleeps around SHAFT actions by default; that duplicates the engine wait and makes failures slower without making them more accurate.

For strict WebDriver form-entry checks, enable forceCheckTextWasTypedCorrectly=true. SHAFT then verifies the final value after type, typeSecure, and typeAppend; special key sequences are skipped because they do not map to a stable literal field value.

Retry With Evidence

Retry is a diagnostic boundary, not a cure. By default, retryMaximumNumberOfAttempts=0, so SHAFT does not hide failures unless you opt in. When retries are enabled, SHAFT's TestNG retry analyzer and JUnit extension use the same retry budget. If forceCaptureSupportingEvidenceOnRetry=true, the retry attempt turns on richer evidence such as video, animated GIF, WebDriver logs, page source on failure, and Playwright tracing when retry-only tracing is enabled.

retryMaximumNumberOfAttempts means additional attempts after the original failure: 1 allows one retry, for two executions total. TestNG delegates this to its retry analyzer. JUnit schedules the retry after the failed attempt's @AfterEach lifecycle has finished, then runs the retry through JUnit again so @BeforeEach setup is executed for the retry attempt. Keep browser/session creation in setup and cleanup in teardown so each retry starts from isolated state.

src/main/resources/properties/custom.properties
retryMaximumNumberOfAttempts=1
forceCaptureSupportingEvidenceOnRetry=true
playwright.tracing.onRetryOnly=true

Keep the retry budget small. A pass-after-retry is still a flaky signal that should be reviewed; the value is that the retry attempt carries enough evidence to tell timing, locator, environment, and product failures apart.

Flake Profiler

The flake profiler is disabled by default. Enable it when you need a run-level and per-test view of where time is going during element actions, assertions, wait polling, retries, and evidence capture.

src/main/resources/properties/custom.properties
shaft.flakeProfiler.enabled=true
shaft.flakeProfiler.attachPerTest=true
shaft.flakeProfiler.failOnSevereFlakeRisk=false
shaft.flakeProfiler.slowActionThresholdMs=2000

The Allure attachments show slow actions, wait-heavy actions, locator lookup counts, match counts, stale element retries, healing attempts, retry history, and evidence costs. Element action duration excludes screenshot capture and report attachment time, while assertion and verification duration is measured around the validation step itself.

Leave shaft.flakeProfiler.failOnSevereFlakeRisk=false during investigation. Use the JSON profile to set a realistic shaft.flakeProfiler.slowActionThresholdMs before making severe flake risk fail the run.

Optional Self Healing

For web locator churn that survives the locator strategy above, SHAFT Heal can recover eligible locator-not-found failures. It is optional, disabled by default, and provided by the separate io.github.shafthq:shaft-heal artifact.

src/main/resources/properties/custom.properties
healing.strategy=shaft-heal
healing.minimumTrustPercentage=85
healing.ambiguityMargin=0.10

SHAFT first tries the original locator. Recovery runs only after a web locator-not-found result, and the action proceeds only when the provider returns exactly one validated element. The provider scores deterministic evidence such as accessibility names, labels, configured test IDs, stable IDs/names, semantic attributes, DOM fingerprints, native state, ancestor context, and bounded local history. Low trust, ties, changed frame locators, and changed shadow-host locators preserve the original failure.

Use healing as a safety net while updating locators, not as a reason to ignore broken tests. Reports under target/shaft-heal/reports and Allure attachments show why a candidate was accepted or rejected.

What To Use First

Flakiness source	First SHAFT feature to use	Why
Generated IDs, wrapper markup, or CSS churn	Smart Locators and ARIA locators	Tests follow user-facing meaning instead of DOM implementation.
Clicks before render, XHR, or framework settling	Built-in synchronization and explicit waits	Timing policy lives in one engine path.
Unknown action or assertion slowdown	Flake profiler	Allure shows action, wait, retry, and evidence timings separately.
Intermittent CI browser or infrastructure blips	Small retry budget with retry evidence	The suite can classify a transient failure without losing the original signal.
A known web locator changed after a release	SHAFT Heal	Recovery is bounded, trust-gated, and reported.
Hard-to-triage failures	Allure evidence, Doctor, and retry diagnostics	The failure carries screenshots, logs, source snapshots, and retry context.

Semantic Locators​

Automatic Synchronization​

Retry With Evidence​

Flake Profiler​

Optional Self Healing​

What To Use First​

Related​