How It Works

Two modes.
One platform.

Interactive mode — your LLM agent drives a live browser through our grid. Autonomous mode — submit a job and walk away.

Scripted flows run at native JS speed; the LLM only wakes up when something deviates.

Get Started Schedule a Demo

100% Self-Hosted Bring Your Own LLM Java 21 + Docker

karate-agent — zsh

$ ./start.sh ──────────────────────────────────────── ▸ license valid ▸ worker image pulled ▸ Dashboard http://localhost:4444 ▸ REST /api/sessions ▸ MCP /mcp · tool: karate_eval $ curl -sX POST localhost:4444/api/sessions \ -d '{"mode":"auto", "prompt":"submit a quote", "flowFiles":["login.js"], "maxIterations":20}' { "id": "s_8c3b1d", "state": "started" }

Two Modes

One platform, two ways to run

Interactive mode keeps the LLM on your side. Autonomous mode puts the LLM inside the container, so jobs run unattended.

Mode 1

Interactive

Your LLM coding agent — Claude Code, Cursor, Copilot — sends JavaScript commands via REST or MCP. The Dashboard proxies them to a Worker container that drives a real browser. No LLM runs on the Dashboard itself.

LLM coding agent

Claude Code

Cursor

Copilot · curl

Client-side LLM

REST · MCP JS commands

Dashboard

POST /api/sessions

POST /sessions/$id/proxy

GET /sessions/$id/prompt

MCP /mcp

Proxy · No LLM

PROXIES JS agent.* calls

Worker container

Real browser

noVNC · H.264

host.docker.internal

Session s_8c3b1d

claude-code — exploring

# POST /sessions/$id/proxy { "js": ... } > agent.go("https://app.local/login") > agent.look() { h1: "Sign In", inputs: [...] } > agent.act("type admin in Username") > agent.act("click Sign in") > agent.wait("{h1}Dashboard") ✓ Flow.run() → ok: true

✓No LLM needed on the grid — your client-side agent drives
✓Exploratory testing, debugging, live demos
✓Discover locators, build flows interactively
✓Connect anything that can curl

Mode 2

Autonomous

A worker-side LLM drives the observe-decide-act loop inside the karate-agent container. Flows run first at native speed — the LLM only takes over for unknown steps. Poll status with GET /api/jobs/$id; download the full report.zip when it completes.

Autonomous + self-hosted Gemma · 90 sec

Submit a job. LLM drives the loop inside the container. Report comes back.

Client → Dashboard

Submit Job

POST /api/jobs

{
  prompt,
  flowFiles,
  model,
  maxIterations
}

karate-agent container

LLM Worker

· observe page
· plan next step
· choose locator

act ↓ · ↑ observe · decide

act

observe · decide

Browser

· execute action
· render result
· snapshot DOM

Budget: maxIterations

Dashboard → Client

report.zip

GET /api/jobs/$id/download

· transcript
· screenshots
· structured report
· video (optional)

job — running

# flow found, running scripted path ✓ login.js 2.1s 0 tok ✓ create-quote.js 3.4s 0 tok ⚠ submit.js LLM invoked → look() · analyze · recover ✓ submit.js (recovered) 2.8s 512 tok ✓ verify.js 1.2s 0 tok done · 12/12 · ~$0.02

✓LLM configured on the worker — runs independently
✓CI/CD, scheduled tests, batch jobs
✓Flows run first (fast), LLM handles unknowns
✓Token budget enforced via maxIterations

Development Workflow

From exploration to autonomy

Six steps to move a new app from zero coverage to zero-token regression.

interactive

> look()

{ h1: "Login", ... }

Explore

discover locators

Create

write .js flow

export async login() {

click('{button}Sign')

}

Flow.run()

↓

ok: true

Test

native JS speed

Compose

chain flows

orchestrator.js

job_7f2a

running

8/12 · 0 tok

Autonomy

submit, walk away

Report

review · iterate

12/12 passed

video · transcript

Explore

Discover locators via interactive mode.

> look() { h1: "Login" }

Create

LLM writes .js flow files.

Test

Flow.run() → ok: true

Compose

Chain flows into orchestrators.

Autonomy

Submit as a job, LLM handles unknowns.

Report

Review deviations, fix, repeat.

Architecture

One jar, one Docker image

No microservices, no databases, no message queues.

Karate Agent grid architecture — grid server with REST API, proxy, and job runner managing isolated Docker worker containers

Runtime

Java 21

Container

Docker 24+

Artifacts

1 jar · 1 image

Per Session

~2 GB RAM

API

REST + MCP

Scalability

Scales without changing anything

Same jar, same image — runs on a laptop, a shared server, or dedicated CI infrastructure.

Solo Developer

On your laptop. Test against localhost. Flows in a local directory.

$ java -jar karate-agent.jar

Shared Team Server

Mac Mini or EC2. Dashboard URL shared with the team. Flows backed by git.

git dashboard team-wide

CI-Integrated

Dedicated infra. Jobs submitted by pipelines. Reports as artifacts. Kubernetes for scale.

Real-World Proof

Guidewire PolicyCenter

12-step Personal Auto submission — from login through quote creation. Watch the cost collapse as flows take over from the LLM.

Guided + flows · 2 min

8 iterations. ~2¢. Personal Auto end-to-end on a real enterprise SPA.

Stage LLM iterations → fewer = faster & cheaper Cost · Time

Pure explore

39 iterations

~$0.50 · 4 min

With hints

35 iterations

~$0.40 · 3 min

Guided + flows

8 iterations

~$0.02 · 2 min

Pure flows

•

$0 · ~15 s

Iterations

390

Cost per run

$0.50$0

Run time

4 min15 s

Each step was developed as an independent flow via Interactive mode, then composed into a single orchestrator. The orchestrator runs all 12 flows sequentially at native JS speed. The LLM is invoked only if a step deviates from the expected path.

Guidewire PolicyCenter is a complex enterprise SPA where standard locator strategies fail. Karate Agent's cursor-pointer discovery handles the non-standard <div onclick> targets that Selenium and Playwright struggle with.

Progressive Adoption

Start with zero AI cost.
Graduate as confidence grows.

Each stage reduces LLM dependency. Most teams land at Stage 3 or 4, where the happy path is deterministic and the LLM only covers edge cases.

Stage 1

Explore

39 iters

~$0.50 · 4 min

LLM explores app from scratch, discovers locators, navigates with no guidance.

Stage 2

Hints

35 iters

~$0.40 · 3 min

Markdown task file provides context. LLM still drives, but with fewer wrong turns.

Most teams land here

Stage 3

Guided + Flows

8 iters

~$0.02 · 2 min

Scripted flows handle known steps. LLM only on failure.

Stage 4

Full Flows

0 iters

$0 · ~15 s

Pure deterministic JS. No LLM, no tokens, native speed.

Day 1

Week 1

Sprint 1

Steady state

Two modes. One platform.

One platform, two ways to run

Interactive

Autonomous

From exploration to autonomy

One jar, one Docker image

Scales without changing anything

Solo Developer

Shared Team Server

CI-Integrated

Guidewire PolicyCenter

Start with zero AI cost. Graduate as confidence grows.

Explore

Hints

Guided + Flows

Full Flows

See the workflow in action

Two modes.
One platform.

Start with zero AI cost.
Graduate as confidence grows.