Self-Hosted AI Testing

The whole pipeline,
inside your perimeter.

Browser. Agent. LLM. Reports. Every component runs on infrastructure you control. No outbound calls, no telemetry, no hosted control plane — the only AI testing architecture regulated industries can actually deploy.

Why self-host

Three reasons most teams have no real choice

Regulatory compliance. Financial services, insurance, and healthcare have data-residency, processing-jurisdiction, and audit-trail requirements that no SaaS testing vendor can satisfy by writing them on a marketing page. The compliance officer wants the application data to stay inside the perimeter. That’s self-hosted.

IP protection. Test data is often production-like data — or actual production data anonymized. Sending it through a third-party LLM, however well-meaning the vendor, introduces an attack surface most security teams will never sign off on. Self-hosted removes the question.

Cost predictability at scale. Cloud testing tools price per-seat or per-test. Self-hosted is amortized infrastructure cost. At 50,000+ test runs/month, the break-even moves decisively toward self-hosted, even before factoring in negotiated hardware pricing.

The architecture

Four components, all in your network

1

Karate Agent server

Lightweight orchestrator. Receives test requests, manages browser sessions, coordinates LLM calls.

~500MB RAM, minimal CPU

2

Chrome containers

One headless Chrome per test session. Pinned versions, ephemeral state, no cross-contamination.

~1GB each, parallel

3

Local LLM server

Ollama, vLLM, or TGI serving Llama / Qwen / DeepSeek. The reasoning brain of the system.

64–140GB VRAM for 70B

4

Report storage

HTML reports, JUnit XML, Cucumber JSON, session video. Stored on your S3-compatible bucket or NFS.

No vendor blob store

Zero outbound calls. Once deployed, the system makes no network requests outside your cluster. No phone-home telemetry, no license-server pings, no auto-update calls. The only network traffic is from the agent to the LLM endpoint, and from the agent to the application under test — both inside your perimeter.

In practice

Air-gapped Kubernetes deployment

The shape of a real production deployment. Configure once, scale by adding nodes.

# docker-compose.yml (or equivalent Kubernetes manifest)
version: '3.8'
services:

  ollama:
    image: ollama/ollama:latest
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1           # 1x H100 or A100
    volumes:
      - ./models:/root/.ollama
    command: "serve"

  karate-agent:
    image: karatelabs/karate-agent:latest
    depends_on: [ollama]
    environment:
      # Point at the local Ollama, not a cloud LLM
      LLM_BASE_URL: http://ollama:11434/v1
      LLM_MODEL: llama3.3:70b
    ports: ["8080:8080"]

Same pattern in Kubernetes: a Deployment for Ollama with GPU resource requests, a Deployment for the agent, a Service routing between them. Standard ops — no special operators or proprietary CRDs.

Hardware

What you actually need

Component Lean dev Team prod Enterprise scale
Agent + browsers2 vCPU / 4GB8 vCPU / 16GBK8s, autoscaling
LLM (small)RTX 4090 + Llama 3.1 8BRTX 4090 + Qwen 32B
LLM (production)A100 80GB + Llama 70B2–4× H100 + 70B model
Parallel sessions~10~50500+

The agent and browsers scale linearly with CPU/RAM — standard ops. The LLM is the budget item. A single H100 can serve dozens of concurrent test sessions; most enterprise deployments need 2–4 GPUs for sustained CI throughput.

Security properties

What stays inside, by design

Application data

Never leaves the cluster. The agent talks to your app, extracts DOM, sends only structured data to the local LLM. No screenshots transmitted off-host.

Credentials

Test credentials, API tokens, secrets — all live in your secrets manager. Karate Agent reads them via standard Kubernetes Secrets / HashiCorp Vault / AWS Secrets Manager.

Reports & artifacts

HTML, video, JUnit XML, Cucumber JSON — all written to your storage. Datadog, Splunk, Elastic, S3-compatible buckets — pick yours.

Telemetry

None. Karate Agent does not phone home, does not call out to license servers, does not report anonymous usage stats. Zero outbound connections by design.

Observability

Plugs into the stack you already run

Karate Agent emits structured JSON logs (parseable by anything), Prometheus metrics (success rate, latency, LLM token usage), and JUnit XML + HTML reports per run.

Tested integrations: Datadog, Grafana / Loki, Splunk, Elastic Stack, New Relic. Anything that consumes JSON logs or Prometheus metrics works without configuration.

For air-gapped deployments specifically, the metrics endpoint stays on-cluster — you scrape into your local Prometheus, no cloud telemetry endpoint required.

FAQ

Frequently asked questions

What does “self-hosted AI testing” mean?

Self-hosted AI testing runs the entire test automation pipeline — agent, browser, LLM — on infrastructure you control. No calls to a vendor’s cloud for orchestration. No application data or screenshots sent to third parties for model inference. For regulated industries, self-hosted is the only viable deployment model.

Is self-hosted the same as air-gapped?

Air-gapped is a specific variant of self-hosted where the system has no internet connectivity at all. Self-hosted can still reach external LLMs (Claude, GPT) over the internet if you choose. Air-gapped means the LLM runs locally too — typically via Ollama or vLLM with open-source models. See bring your own LLM.

What hardware do I need for self-hosted AI testing?

The agent server is lightweight (~500MB RAM, minimal CPU). The browser containers are standard headless Chrome (~1GB each, scales with parallelism). Local LLMs are the heaviest component: a 70B model needs 64–140GB VRAM depending on quantization. A single A100 or H100 can serve dozens of concurrent test sessions; smaller deployments use RTX 4090 or similar.

Can I use Karate Agent with my existing OpenAI / Anthropic API keys?

Yes, in a self-hosted-but-cloud-LLM pattern: agent runs on your infrastructure, but calls out to Anthropic or OpenAI for inference. Data that goes to the LLM (structured DOM, not screenshots) flows through their APIs under your existing enterprise agreement. Good fit for teams without data residency constraints.

Which open-source LLMs are viable for self-hosted testing?

Strong performers in 2026 for DOM-first testing: Llama 3.3 70B, Qwen 2.5 72B, DeepSeek V3, Mistral Large 2. Smaller viable models for cost-sensitive deployments: Qwen 2.5 32B, Llama 3.1 8B (with careful prompt engineering). See LLM browser automation for the full matrix.

How do I deploy a local LLM alongside Karate Agent?

Two containers: ollama/ollama for the model server, karatelabs/karate-agent for the test agent. Karate Agent’s config points to the Ollama endpoint instead of a cloud API. Kubernetes manifest or Docker Compose — both well supported.

Can the self-hosted deployment integrate with our existing observability stack?

Yes. Karate Agent emits structured logs in JSON, exports Prometheus metrics, and produces JUnit XML + HTML reports per run. Integrates naturally with Datadog, Grafana, Splunk, Elastic, or any standard observability stack.

How does this compare to cloud-based AI testing tools?

Self-hosted trades operational complexity for data sovereignty, cost control, and regulatory compliance. Cloud-based tools are easier to start (no infra), but expose you to data residency risk, vendor lock-in, and per-token pricing surprises at scale. For enterprise and regulated industries, self-hosted is the dominant pattern. See enterprise AI testing.

Air-gapped AI testing
your CISO will sign off on.

Standard Kubernetes manifests, standard Docker Compose, standard observability integrations. The most enterprise-friendly AI testing deployment available.