QA for AI-drafted
SDLCs.
A multi-app QA platform that survives AI-speed shipping. HTTP and API-based invariant tests guard system-wide rules. Self-healing absorbs UI drift on what UI coverage remains. A backend orchestrator runs everything with proper queueing, test-account locking, and a dashboard that surfaces the operating model (invariants, components, ownership, coverage).
- Playwright
- Cypress
- Node.js / TypeScript
- Postgres / pgvector
- BullMQ
- Jenkins
- Docker
- Next.js 15
- OpenAI (self-healing)
01 / the shift
Traditional automation (Cypress, Playwright scripted E2E) was built for a slow SDLC. Scripts are tightly coupled to implementation: selectors, URLs, DOM structure, form fields. When code moves slowly, this coupling is fine. Maintenance is a small tax paid once a week.
In an AI-drafted SDLC the math breaks. Idea, AI drafts the spec, AI writes the code, developer reviews, ship. Cycle times collapse from weeks to hours. Selectors shift, URLs move, DOM structure is reshaped, and a test suite that was green on Monday is red on Tuesday for reasons that have nothing to do with correctness.
The honest outcome at that velocity: the team ships ten features while the automation is stuck fixing one. Script maintenance cost exceeds script value. The response can't be "auto-heal coupled scripts forever." It has to be decouple at the layer below.
02 / two-layer test strategy
Layer 1: HTTP (always)
Verifies server behavior. Pure API requests, no browser. Status codes, response headers, API responses, session lifecycle, role enforcement at the network edge. Fast, stable, no selectors to break.
Layer 2: Browser (only when needed)
Added on top only when the invariant involves something the server cannot verify alone: React-side access control, redirects, role-based rendering. Hybrid auth pattern: authenticate over HTTP (fast), extract cookies or JWT, inject into a browser context, then run the minimal browser assertion. No slow login forms in the critical path.
HTTP layer always comes first. Browser layer is added on top only when the invariant involves page access, redirects, or role-based rendering. If the test only needs a status code or response body, never open a browser.
03 / invariants: the constitution of the app
Invariants are rules that must always hold regardless of implementation. Payment idempotency. Order-total consistency across DB, UI, PDF, and email. Inventory tracking. Role enforcement at the API layer. Session lifecycle. These rules don't change when the UI is refactored, which makes them a stable substrate to test against.
test account pool
Any invariant that needs a test account must acquire it from a pool,
never from a hardcoded list. Accounts are locked for the duration of
the test and released in afterAll. Without this, parallel
runs collide and results become non-deterministic.
const pool = new TestAccountPool(ENVIRONMENT);
const account = await pool.acquire('corporate_buyer', 'I2-001-order-totals');
try {
// run invariant
} finally {
await pool.releaseAll();
}
read vs write DB access
Database helper exposes two connection types: a read-only
query() for SELECTs, and an
executeWrite() for mutations. Using the wrong connection
fails loudly instead of corrupting data silently.
capture, don't guess
When building a new pure-API page object, walk the UI flow in a real
browser, inspect the network tab, and derive the exact request
contract: endpoint, method, parameter location (query vs body vs
header), headers (like X-CSRF-TOKEN), response shape. No
reverse-engineering from backend source.
04 / self-healing for the UI-critical remainder
Even after decoupling, some tests genuinely need the UI. For those, a preflight system absorbs selector drift automatically:
- Extract all
data-cyattributes from the current page. - Compare against expected selectors declared in the Page Object Model.
- If a selector is missing, ask AI (GPT-4o-mini) to suggest the replacement, using the selector's declared
descriptionas grounding. - Consensus voting: 3 parallel calls, 2/3 agreement required. No consensus? Fall back to a stronger model (GPT-4.1) with a screenshot.
- Healed selectors persist via an auto-PR on a fixed branch (
auto-heal/selectors). Subsequent healings push to the same branch; merged and deleted, the next run creates a fresh PR.
One LLM call can hallucinate a plausible-but-wrong selector. Three parallel calls with a 2/3 agreement threshold turns low-cost inference into a reliable signal. Cheap, fast, good enough for a preflight check.
05 / orchestration (the backend)
A dedicated service owns the operating model. Express + TypeScript on Postgres (with pgvector), BullMQ for queues. Multi-project from the root, so one orchestrator can manage several app test frameworks.
runs, queues, locks
spec_runs: individual spec executions grouped byexecution_id. Statuses: queued, waiting for resources, running, done.schedule_runsandsuite_runs: cron-based and named-suite executions.test_account_locks: prevent parallel runs from sharing an account. 409 retries with configurable timeout.resource_locks: generic lock table for any shared resource (accounts, data pools, API keys, email inboxes).
manifest (versioned test catalog)
Test frameworks push their catalog to the orchestrator. Each upload is a snapshot with the repo commit. Specs and tests carry rich metadata: feature area, flow, priority, required account types, preconditions, expected behavior, invariants covered. The dashboard reads these to render coverage from the spec side, not only from execution results.
results and triage
Jenkins posts back per-spec results. Failures are preserved
individually (jenkins_test_failures), triaged through
failure_investigations with status, screenshots, and AI
analysis. Failures without an investigation row default to open.
the operating model
The schema encodes quality as more than just "tests that pass":
invariants+invariant_spec_mappings: which rules the system protects, and which specs verify each.components+component_sme+component_suites: ownership, criticality, and suite coverage per component.knowledge_entries(pgvector): semantic search across accumulated QA knowledge.
06 / the dashboard
Next.js 15 App Router with a typed client over the orchestrator API. Surfaces:
- Real-time run monitoring (polling, status transitions).
- Coverage matrix: manifest tests vs features, per app.
- Failure triage with AI analysis and screenshots.
- The operating model view: invariants, components, ownership, SME by area.
- Schedule management and Jenkins deploy-to-test mapping.
07 / what I learned
- The shift from coupled UI automation to HTTP + invariants is the single most important bet in modern QA. Everything downstream depends on it.
- Invariants are easier to write, harder to break, and easier to reason about than flows. Test rules, not implementations.
- Self-healing cannot save coupled tests. Decouple first, heal second.
- Proper orchestration (queues, locks, manifest, triage) is 80% of the engineering. The tests are the visible 20%.
- Test account pools are table stakes for parallel execution. Learned this the hard way, twice.
When code moves AI-fast, QA can't be selector-deep. Protect the constitution of the app. Its invariants. Everything else is implementation detail that changes by the day.