Self-Hosted AI Testing | Open-Weight LLM Browser Automation, Air-Gap Ready

Q: How do I deploy a local LLM alongside Karate Agent?

Two containers: ollama/ollama for the model server, karatelabs/agent for the test agent. Karate Agent’s config points to the Ollama endpoint instead of a cloud API. Kubernetes manifest or Docker Compose — both well supported.

Why self-host

Three reasons most teams have no real choice

Regulatory compliance. Financial services, insurance, and healthcare have data-residency, processing-jurisdiction, and audit-trail requirements that no SaaS testing vendor can satisfy by writing them on a marketing page. The compliance officer wants the application data to stay inside the perimeter. That’s self-hosted.

IP protection. Test data is often production-like data — or actual production data anonymized. Sending it through a third-party LLM, however well-meaning the vendor, introduces an attack surface most security teams will never sign off on. Self-hosted removes the question.

Cost predictability at scale. Cloud testing tools price per-seat or per-test. Self-hosted is amortized infrastructure cost. At 50,000+ test runs/month, the break-even moves decisively toward self-hosted, even before factoring in negotiated hardware pricing.

The architecture

Four components, all in your network

Karate Agent server

Lightweight orchestrator. Receives test requests, manages browser sessions, coordinates LLM calls.

~500MB RAM, minimal CPU

Chrome containers

One headless Chrome per test session. Pinned versions, ephemeral state, no cross-contamination.

~1GB each, parallel

Local LLM server

Ollama, vLLM, or TGI serving Gemma 4 or Qwen 3.6, both benchmark-verified. The reasoning brain of the system.

64–140GB VRAM for 70B

Report storage

HTML reports, JUnit XML, Cucumber JSON, session video. Stored on your S3-compatible bucket or NFS.

No vendor blob store

Zero outbound calls. Once deployed, the system makes no network requests outside your cluster. No phone-home telemetry, no license-server pings, no auto-update calls. The only network traffic is from the agent to the LLM endpoint, and from the agent to the application under test — both inside your perimeter.

In practice

Air-gapped Kubernetes deployment

The shape of a real production deployment. Configure once, scale by adding nodes.

# docker-compose.yml (or equivalent Kubernetes manifest)
version: '3.8'
services:

  ollama:
    image: ollama/ollama:latest
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1           # 1x H100 or A100
    volumes:
      - ./models:/root/.ollama
    command: "serve"

  karate-agent:
    image: karatelabs/karate-agent:latest
    depends_on: [ollama]
    environment:
      # Point at the local Ollama, not a cloud LLM
      LLM_BASE_URL: http://ollama:11434/v1
      LLM_MODEL: llama3.3:70b
    ports: ["8080:8080"]

Same pattern in Kubernetes: a Deployment for Ollama with GPU resource requests, a Deployment for the agent, a Service routing between them. Standard ops — no special operators or proprietary CRDs.

Component	Lean dev	Team prod	Enterprise scale
Agent + browsers	2 vCPU / 4GB	8 vCPU / 16GB	K8s, autoscaling
LLM (verified open-weight)	RTX 4090 24GB + Gemma 4 26B	RTX 4090 24GB + Qwen 3.6 35B-A3B	—
LLM (production)	—	A100 80GB + Llama 70B	2–4× H100 + 70B model
Parallel sessions	~10	~50	500+

Security properties

What stays inside, by design

Application data

Never leaves the cluster. The agent talks to your app, extracts DOM, sends only structured data to the local LLM. No screenshots transmitted off-host.

Credentials

Test credentials, API tokens, secrets — all live in your secrets manager. Karate Agent reads them via standard Kubernetes Secrets / HashiCorp Vault / AWS Secrets Manager.

Reports & artifacts

HTML, video, JUnit XML, Cucumber JSON — all written to your storage. Datadog, Splunk, Elastic, S3-compatible buckets — pick yours.

Telemetry

None. Karate Agent does not phone home, does not call out to license servers, does not report anonymous usage stats. Zero outbound connections by design.

Observability

Plugs into the stack you already run

Karate Agent emits structured JSON logs (parseable by anything), Prometheus metrics (success rate, latency, LLM token usage), and JUnit XML + HTML reports per run.

Tested integrations: Datadog, Grafana / Loki, Splunk, Elastic Stack, New Relic. Anything that consumes JSON logs or Prometheus metrics works without configuration.

For air-gapped deployments specifically, the metrics endpoint stays on-cluster — you scrape into your local Prometheus, no cloud telemetry endpoint required.

FAQ

Frequently asked questions

What does “self-hosted AI testing” mean?

Self-hosted AI testing runs the entire test automation pipeline — agent, browser, LLM — on infrastructure you control. No calls to a vendor’s cloud for orchestration. No application data or screenshots sent to third parties for model inference. For regulated industries, self-hosted is the only viable deployment model.

Is self-hosted the same as air-gapped?

Air-gapped is a specific variant of self-hosted where the system has no internet connectivity at all. Self-hosted can still reach external LLMs (Claude, GPT) over the internet if you choose. Air-gapped means the LLM runs locally too — typically via Ollama or vLLM with open-source models. See bring your own LLM.

What hardware do I need for self-hosted AI testing?

The agent server is lightweight (~500MB RAM, minimal CPU). The browser containers are standard headless Chrome (~1GB each, scales with parallelism). Local LLMs are the heaviest component: a 70B model needs 64–140GB VRAM depending on quantization. A single A100 or H100 can serve dozens of concurrent test sessions; smaller deployments use RTX 4090 or similar.

Can I use Karate Agent with my existing OpenAI / Anthropic API keys?

Yes, in a self-hosted-but-cloud-LLM pattern: agent runs on your infrastructure, but calls out to Anthropic or OpenAI for inference. Data that goes to the LLM (structured DOM, not screenshots) flows through their APIs under your existing enterprise agreement. Good fit for teams without data residency constraints.

Which open-source LLMs are viable for self-hosted testing?

Two are benchmark-verified across the full karate-agent UI-automation suite: Gemma 4 26B (Google, 4B active MoE) and Qwen 3.6 35B-A3B (Alibaba, 3B active MoE, Apache 2.0). Each fits a single 24 GB GPU at 4-bit via Ollama, and both handle form fills, vision reads, data extraction and deep navigation at parity. Llama, DeepSeek, Mistral, GLM and Kimi also run, but only Gemma 4 and Qwen 3.6 carry the benchmark stamp. See LLM browser automation for the full matrix.

How do I deploy a local LLM alongside Karate Agent?

Two containers: ollama/ollama for the model server, karatelabs/karate-agent for the test agent. Karate Agent’s config points to the Ollama endpoint instead of a cloud API. Kubernetes manifest or Docker Compose — both well supported.

Can the self-hosted deployment integrate with our existing observability stack?

Yes. Karate Agent emits structured logs in JSON, exports Prometheus metrics, and produces JUnit XML + HTML reports per run. Integrates naturally with Datadog, Grafana, Splunk, Elastic, or any standard observability stack.

How does this compare to cloud-based AI testing tools?

Self-hosted trades operational complexity for data sovereignty, cost control, and regulatory compliance. Cloud-based tools are easier to start (no infra), but expose you to data residency risk, vendor lock-in, and per-token pricing surprises at scale. For enterprise and regulated industries, self-hosted is the dominant pattern. See enterprise AI testing.

The whole pipeline,
inside your perimeter.

Three reasons most teams have no real choice

Four components, all in your network

Karate Agent server

Chrome containers

Local LLM server

Report storage

Air-gapped Kubernetes deployment

What you actually need

What stays inside, by design

Application data

Credentials

Reports & artifacts

Telemetry

Plugs into the stack you already run

Frequently asked questions

Air-gapped AI testing
your CISO will sign off on.

The whole pipeline, inside your perimeter.

Three reasons most teams have no real choice

Four components, all in your network

Karate Agent server

Chrome containers

Local LLM server

Report storage

Air-gapped Kubernetes deployment

What you actually need

What stays inside, by design

Application data

Credentials

Reports & artifacts

Telemetry

Plugs into the stack you already run

Frequently asked questions

Air-gapped AI testingyour CISO will sign off on.

The whole pipeline,
inside your perimeter.

Air-gapped AI testing
your CISO will sign off on.