CASE STUDY · QA PLATFORM · PROFESSIONAL WORK

QA for AI-drafted
SDLCs.

A multi-app QA platform that survives AI-speed shipping. HTTP and API-based invariant tests guard system-wide rules. Self-healing absorbs UI drift on what UI coverage remains. A backend orchestrator runs everything with proper queueing, test-account locking, and a dashboard that surfaces the operating model (invariants, components, ownership, coverage).

year: 2024, ongoing role: sole developer / platform owner scope: 4 repos (2 test frameworks, orchestrator, dashboard)

Playwright
Cypress
Node.js / TypeScript
Postgres / pgvector
BullMQ
Jenkins
Docker
Next.js 15
OpenAI (self-healing)

01 / the shift

Traditional automation (Cypress, Playwright scripted E2E) was built for a slow SDLC. Scripts are tightly coupled to implementation: selectors, URLs, DOM structure, form fields. When code moves slowly, this coupling is fine. Maintenance is a small tax paid once a week.

In an AI-drafted SDLC the math breaks. Idea, AI drafts the spec, AI writes the code, developer reviews, ship. Cycle times collapse from weeks to hours. Selectors shift, URLs move, DOM structure is reshaped, and a test suite that was green on Monday is red on Tuesday for reasons that have nothing to do with correctness.

The honest outcome at that velocity: the team ships ten features while the automation is stuck fixing one. Script maintenance cost exceeds script value. The response can't be "auto-heal coupled scripts forever." It has to be decouple at the layer below.

02 / two-layer test strategy

Layer 1: HTTP (always)

Verifies server behavior. Pure API requests, no browser. Status codes, response headers, API responses, session lifecycle, role enforcement at the network edge. Fast, stable, no selectors to break.

Layer 2: Browser (only when needed)

Added on top only when the invariant involves something the server cannot verify alone: React-side access control, redirects, role-based rendering. Hybrid auth pattern: authenticate over HTTP (fast), extract cookies or JWT, inject into a browser context, then run the minimal browser assertion. No slow login forms in the critical path.

RULE · HTTP FIRST

HTTP layer always comes first. Browser layer is added on top only when the invariant involves page access, redirects, or role-based rendering. If the test only needs a status code or response body, never open a browser.

03 / invariants: the constitution of the app

Invariants are rules that must always hold regardless of implementation. Payment idempotency. Order-total consistency across DB, UI, PDF, and email. Inventory tracking. Role enforcement at the API layer. Session lifecycle. These rules don't change when the UI is refactored, which makes them a stable substrate to test against.

test account pool

Any invariant that needs a test account must acquire it from a pool, never from a hardcoded list. Accounts are locked for the duration of the test and released in afterAll. Without this, parallel runs collide and results become non-deterministic.

const pool = new TestAccountPool(ENVIRONMENT);
const account = await pool.acquire('corporate_buyer', 'I2-001-order-totals');
try {
  // run invariant
} finally {
  await pool.releaseAll();
}

read vs write DB access

Database helper exposes two connection types: a read-only query() for SELECTs, and an executeWrite() for mutations. Using the wrong connection fails loudly instead of corrupting data silently.

capture, don't guess

When building a new pure-API page object, walk the UI flow in a real browser, inspect the network tab, and derive the exact request contract: endpoint, method, parameter location (query vs body vs header), headers (like X-CSRF-TOKEN), response shape. No reverse-engineering from backend source.

04 / self-healing for the UI-critical remainder

Even after decoupling, some tests genuinely need the UI. For those, a preflight system absorbs selector drift automatically:

Extract all data-cy attributes from the current page.
Compare against expected selectors declared in the Page Object Model.
If a selector is missing, ask AI (GPT-4o-mini) to suggest the replacement, using the selector's declared description as grounding.
Consensus voting: 3 parallel calls, 2/3 agreement required. No consensus? Fall back to a stronger model (GPT-4.1) with a screenshot.
Healed selectors persist via an auto-PR on a fixed branch (auto-heal/selectors). Subsequent healings push to the same branch; merged and deleted, the next run creates a fresh PR.

WHY CONSENSUS

One LLM call can hallucinate a plausible-but-wrong selector. Three parallel calls with a 2/3 agreement threshold turns low-cost inference into a reliable signal. Cheap, fast, good enough for a preflight check.

05 / orchestration (the backend)

A dedicated service owns the operating model. Express + TypeScript on Postgres (with pgvector), BullMQ for queues. Multi-project from the root, so one orchestrator can manage several app test frameworks.

runs, queues, locks

spec_runs: individual spec executions grouped by execution_id. Statuses: queued, waiting for resources, running, done.
schedule_runs and suite_runs: cron-based and named-suite executions.
test_account_locks: prevent parallel runs from sharing an account. 409 retries with configurable timeout.
resource_locks: generic lock table for any shared resource (accounts, data pools, API keys, email inboxes).

manifest (versioned test catalog)

Test frameworks push their catalog to the orchestrator. Each upload is a snapshot with the repo commit. Specs and tests carry rich metadata: feature area, flow, priority, required account types, preconditions, expected behavior, invariants covered. The dashboard reads these to render coverage from the spec side, not only from execution results.

results and triage

Jenkins posts back per-spec results. Failures are preserved individually (jenkins_test_failures), triaged through failure_investigations with status, screenshots, and AI analysis. Failures without an investigation row default to open.

the operating model

The schema encodes quality as more than just "tests that pass":

invariants + invariant_spec_mappings: which rules the system protects, and which specs verify each.
components + component_sme + component_suites: ownership, criticality, and suite coverage per component.
knowledge_entries (pgvector): semantic search across accumulated QA knowledge.

06 / the dashboard

Next.js 15 App Router with a typed client over the orchestrator API. Surfaces:

Real-time run monitoring (polling, status transitions).
Coverage matrix: manifest tests vs features, per app.
Failure triage with AI analysis and screenshots.
The operating model view: invariants, components, ownership, SME by area.
Schedule management and Jenkins deploy-to-test mapping.

07 / what I learned

The shift from coupled UI automation to HTTP + invariants is the single most important bet in modern QA. Everything downstream depends on it.
Invariants are easier to write, harder to break, and easier to reason about than flows. Test rules, not implementations.
Self-healing cannot save coupled tests. Decouple first, heal second.
Proper orchestration (queues, locks, manifest, triage) is 80% of the engineering. The tests are the visible 20%.
Test account pools are table stakes for parallel execution. Learned this the hard way, twice.

TAKEAWAY

When code moves AI-fast, QA can't be selector-deep. Protect the constitution of the app. Its invariants. Everything else is implementation detail that changes by the day.

QA for AI-draftedSDLCs.

01 / the shift

02 / two-layer test strategy

Layer 1: HTTP (always)

Layer 2: Browser (only when needed)

03 / invariants: the constitution of the app

test account pool

read vs write DB access

capture, don't guess

04 / self-healing for the UI-critical remainder

05 / orchestration (the backend)

runs, queues, locks

manifest (versioned test catalog)

results and triage

the operating model

06 / the dashboard

07 / what I learned

QA for AI-drafted
SDLCs.