Every engineering team we talk to in 2026 is shipping faster than they were eighteen months ago. By a lot. Some report 3×, some 5×, a few outliers 10×. The mechanism is the same across all of them: AI coding assistants — Cursor, GitHub Copilot, Claude Code, Codex — have moved from novelty to default.
Engineering leaders love the velocity. Product leaders love the velocity. Customers love the velocity.
QA teams are drowning.
The velocity gap
Traditional test automation was designed for traditional development velocity. When a team shipped 20 UI changes per week, a QA team could keep test coverage current. When a team ships 200 UI changes per week because developers are working with AI assistants in tight iteration loops, test coverage falls behind. Fast.
The symptoms are the same across most teams we’ve observed:
- Test suite skip rate creeping up week by week
- Regression failures going unchased because nobody has time to diagnose them
- QA engineering consumed by maintaining selectors that rot faster than they can be fixed
- Production incidents increasing because coverage no longer reflects the product
- Leadership asking why QA is a bottleneck when engineering is moving so fast
It’s not a people problem. It’s an architecture problem. The tools that QA teams are working with weren’t designed for this kind of velocity.
Why AI-generated code breaks traditional testing
Three specific dynamics:
1. UI iteration velocity
AI assistants make it easy to iterate on UI. A developer asks Cursor to “clean up this dashboard component” and it restructures the JSX, renames classes, moves elements. The UI looks the same; the selectors that Selenium tests depend on are all different.
2. Component library churn
AI assistants don’t respect your component library commitments. Ask Claude Code to build a new form and it might import Button from whatever looked most appropriate at the time. Over weeks, your codebase picks up inconsistent component imports. Selectors are silently invalidated.
3. Volume of new code
There’s simply more code. More features, more pages, more components. Test coverage scales linearly; development scales much faster.
The insight: use AI to test AI-generated code
If AI is generating the code, AI should generate and execute the tests. This is the premise behind Karate Agent.
But it’s not quite as simple as “LLMs generate tests.” The right workflow is tighter: put the verification inside the generation loop, not after it.
The MCP pattern
Model Context Protocol (MCP) is an open standard that lets AI assistants call tools. Anthropic launched it; most major AI coding assistants now support it. MCP turns any tool into something Claude Code, Cursor, Copilot, or similar can invoke.
Karate Agent exposes a karate_eval MCP tool. When properly configured, this means:
- Developer asks the AI assistant to build a feature
- AI generates the code
- AI also invokes Karate Agent via MCP to verify the feature end-to-end
- If verification fails, AI iterates on the code until it passes
- The verification test becomes a regression asset for future runs
Feature work and test work happen in one loop, driven by the same assistant the developer already uses. There’s no handoff, no separate QA phase for routine verification, no delay.
“But won’t AI-generated tests miss the bugs in AI-generated code?”
A reasonable concern. If the AI wrote the code and also wrote the tests, doesn’t that compound the risks?
The mitigation: tests operate at the user-facing level. They describe what the user should be able to do. “Add two items to cart, apply discount code, verify total is $47.85.” That behavior spec is the thing that matters. Whether the underlying code is clean or not, if the user-facing behavior is correct, the feature is correct.
AI-generated tests are good at this because they’re not “testing the implementation” — they’re testing the contract between the app and its users. That’s what functional testing was always supposed to be.
The new QA engineering
When AI handles routine test generation and verification, what’s left for QA engineers?
A lot, actually. More than before:
- Test strategy: what to cover, what to prioritize, where the risks are
- Test architecture: how tests compose, what shared setup and data patterns make sense
- Quality analysis: reviewing failure modes, identifying systemic issues, triaging regressions
- Production-grade verification: end-to-end flows with real data, auth, integrations
- Exploratory testing: the creative part that AI doesn’t do well, directed at likely problem areas
- Quality bar ownership: being the function that owns “is this good enough to ship”
Teams that adopt this model report their QA engineers become more strategic, more influential, and more satisfied. They stop being the bottleneck; they become the quality architects.
Getting started
A narrow pilot path:
- Pick a feature that a developer is about to build with Cursor, Copilot, or Claude Code
- Deploy Karate Agent in Docker (under 10 minutes)
- Configure the MCP connection from the AI assistant to Karate Agent
- As the developer builds, have the assistant also write and run Karate Agent verifications
- Observe: the assistant fixes its own bugs before handing off; the test becomes a durable regression asset
Once this pattern clicks for one feature, it scales naturally to every feature the team ships. Soon the regression suite is larger, more current, and more trusted than manual test authoring could produce.
- Testing AI-generated code landing page
- QA for vibe coding
- Karate Agent