How It Works

Two modes.
One platform.

Interactive mode — your LLM agent drives a live browser through our grid. Autonomous mode — submit a job and walk away.

Scripted flows run at native JS speed; the LLM only wakes up when something deviates.

100% Self-Hosted Bring Your Own LLM Java 21 + Docker
karate-agent — zsh
$ java -jar karate-agent.jar dashboard ──────────────────────────────────────── Dashboard http://localhost:4444 REST /api/sessions · /api/jobs MCP /mcp · tool: karate_eval Workers 4 ready $ curl -sX POST localhost:4444/api/jobs \ -d '{"prompt":"submit a quote", "flowFiles":["login.js"], "maxIterations":20}' { "jobId": "j_8c3b1d", "status": "running" }

Two Modes

One platform, two ways to run

Interactive mode keeps the LLM on your side. Autonomous mode puts the LLM inside the container, so jobs run unattended.

Mode 1

Interactive

Your LLM coding agent — Claude Code, Cursor, Copilot — sends JavaScript commands via REST or MCP. The Dashboard proxies them to a Worker container that drives a real browser. No LLM runs on the Dashboard itself.

LLM coding agent

Claude Code
Cursor
Copilot · curl

Client-side LLM

REST · MCP JS commands

Dashboard

POST /api/sessions
POST /sessions/$id/proxy
GET  /sessions/$id/prompt
MCP  /mcp

Proxy · No LLM

PROXIES JS agent.* calls

Worker container

Real browser
noVNC · H.264

host.docker.internal

Session s_8c3b1d

claude-code — exploring
# POST /sessions/$id/proxy { "js": ... } > agent.go("https://app.local/login") > agent.look() { h1: "Sign In", inputs: [...] } > agent.act("type admin in Username") > agent.act("click Sign in") > agent.wait("{h1}Dashboard") Flow.run() → ok: true
  • No LLM needed on the grid — your client-side agent drives
  • Exploratory testing, debugging, live demos
  • Discover locators, build flows interactively
  • Connect anything that can curl
Mode 2

Autonomous

A worker-side LLM drives the observe-decide-act loop inside the karate-agent container. Flows run first at native speed — the LLM only takes over for unknown steps. Poll status with GET /api/jobs/$id; download the full report.zip when it completes.

Client → Dashboard

Submit Job

POST /api/jobs

{
  prompt,
  flowFiles,
  model,
  maxIterations
}
karate-agent container

LLM Worker

  • · observe page
  • · plan next step
  • · choose locator
act ↓ · ↑ observe · decide

Browser

  • · execute action
  • · render result
  • · snapshot DOM
Budget: maxIterations

Dashboard → Client

report.zip

GET /api/jobs/$id/download

  • · transcript
  • · screenshots
  • · structured report
  • · video (optional)
job — running
# flow found, running scripted path login.js 2.1s 0 tok create-quote.js 3.4s 0 tok submit.js LLM invoked → look() · analyze · recover submit.js (recovered) 2.8s 512 tok verify.js 1.2s 0 tok done · 12/12 · ~$0.02
  • LLM configured on the worker — runs independently
  • CI/CD, scheduled tests, batch jobs
  • Flows run first (fast), LLM handles unknowns
  • Token budget enforced via maxIterations

Development Workflow

From exploration to autonomy

Six steps to move a new app from zero coverage to zero-token regression.

1

Explore

Discover locators via interactive mode.

> look() { h1: "Login" }
2

Create

LLM writes .js flow files.

3

Test

Flow.run()ok: true

4

Compose

Chain flows into orchestrators.

5

Autonomy

Submit as a job, LLM handles unknowns.

6

Report

Review deviations, fix, repeat.

Architecture

One jar, one Docker image

No microservices, no databases, no message queues.

Karate Agent grid architecture diagram: a grid server exposes a REST API, a proxy, and a job runner, which manages isolated Docker worker containers
Karate Agent grid architecture — grid server with REST API, proxy, and job runner managing isolated Docker worker containers

Runtime

Java 21

Container

Docker 24+

Artifacts

1 jar · 1 image

Per Session

~2 GB RAM

API

REST + MCP

Scalability

Scales without changing anything

Same jar, same image — runs on a laptop, a shared server, or dedicated CI infrastructure.

Solo Developer

On your laptop. Test against localhost. Flows in a local directory.

$ java -jar karate-agent.jar

Shared Team Server

Mac Mini or EC2. Dashboard URL shared with the team. Flows backed by git.

git dashboard team-wide

CI-Integrated

Dedicated infra. Jobs submitted by pipelines. Reports as artifacts. Kubernetes for scale.

GitHub Actions Jenkins Azure DevOps Docker

Real-World Proof

Guidewire PolicyCenter

12-step Personal Auto submission — from login through quote creation. Watch the cost collapse as flows take over from the LLM.

Stage Cost · Time
Pure explore
39 iterations
~$0.50 · 4 min
With hints
35 iterations
~$0.40 · 3 min
Guided + flows
8 iterations
~$0.02 · 2 min
Pure flows
$0 · ~15 s

Iterations

390

Cost per run

$0.50$0

Run time

4 min15 s

Each step was developed as an independent flow via Interactive mode, then composed into a single orchestrator. The orchestrator runs all 12 flows sequentially at native JS speed. The LLM is invoked only if a step deviates from the expected path.

Guidewire PolicyCenter is a complex enterprise SPA where standard locator strategies fail. Karate Agent's cursor-pointer discovery handles the non-standard <div onclick> targets that Selenium and Playwright struggle with.

Progressive Adoption

Start with zero AI cost. Graduate as confidence grows.

Each stage reduces LLM dependency. Most teams land at Stage 3 or 4, where the happy path is deterministic and the LLM only covers edge cases.

Stage 1

Explore

39 iters

~$0.50 · 4 min

LLM explores app from scratch, discovers locators, navigates with no guidance.

Stage 2

Hints

35 iters

~$0.40 · 3 min

Markdown task file provides context. LLM still drives, but with fewer wrong turns.

Most teams land here

Stage 3

Guided + Flows

8 iters

~$0.02 · 2 min

Scripted flows handle known steps. LLM only on failure.

Stage 4

Full Flows

0 iters

$0 · ~15 s

Pure deterministic JS. No LLM, no tokens, native speed.

See the workflow in action

Let us walk you through Interactive mode, flows, and Autonomous mode on your application.