Control loop quickstart

The TruLayer control loop watches your production traces, spots systematic failures, proposes a prompt fix, validates it in a sandboxed A/B replay, and then either ships it automatically or parks it for your review. This guide walks through the full cycle and shows you how to configure the two key safety gates: HITL approval and the cascade depth limit. Requirements: Team plan or above. Owner role for any mutation (approve, reject, rollback, policy changes). Viewer and Member roles can read all control-loop data.

Step 1 — Detect failures

TruLayer’s failure detector (running in the consumer pipeline) clusters incoming traces by their error signature. You do not need to configure anything for detection to work — every span with an error field set, and every trace that fails an active eval rule, is automatically fed into the cluster engine. To confirm detection is working, go to Dashboard → Failures. You should see clusters appearing within a few minutes of traces arriving. Each cluster has a signature (a stable hash of the project and error type) that links it to any prompt deployment the system proposes later. For programmatic access, use GET /v1/failures/clusters:

curl https://api.trulayer.ai/v1/failures/clusters \
  -H "Authorization: Bearer $TRULAYER_API_KEY"

Step 2 — Cluster and propose a prompt diff

When a cluster reaches a threshold size, the cluster-to-diff worker asks the LLM to synthesise a candidate prompt. The result is stored as a prompt deployment in the proposed state. You can list pending proposals via the API:

curl "https://api.trulayer.ai/v1/prompts/deployments?status=proposed" \
  -H "Authorization: Bearer $TRULAYER_API_KEY"

{
  "items": [
    {
      "id": "018f9ab2-...",
      "status": "proposed",
      "cluster_signature": "sha256:abc123...",
      "current_prompt": "You are a helpful assistant...",
      "proposed_prompt": "You are a helpful assistant. When you are unsure, say so explicitly...",
      "rationale": "Added explicit uncertainty instruction to reduce hallucination rate in the detected cluster.",
      "created_at": "2026-05-01T08:00:00Z"
    }
  ],
  "total": 1,
  "limit": 50,
  "offset": 0
}

In the dashboard, open Dashboard → Prompt Improvements to see the same list with a diff view. See Prompt improvements for the full dashboard guide.

Step 3 — A/B replay and review

Once proposed, the A/B harness replays a held-out trace set against both the current prompt and the candidate. The deployment moves through ab_running and lands on either ab_passed (candidate recommended) or ab_failed (no improvement detected). The ab_report field on the deployment carries the full per-metric delta report. Fetch a deployment by ID to see where it is:

curl "https://api.trulayer.ai/v1/prompts/deployments/018f9ab2-..." \
  -H "Authorization: Bearer $TRULAYER_API_KEY"

{
  "id": "018f9ab2-...",
  "status": "ab_passed",
  "ab_report": {
    "correctness_baseline": 0.42,
    "correctness_candidate": 0.71,
    "correctness_delta": 0.29,
    "sample_size": 80
  }
}

Step 4 — Ship the winning prompt

There are two ways to ship an ab_passed deployment: HITL approval (the default) or auto-ship.

HITL approval (default)

With prompt_autoship_enabled set to false on the project (the default), every ab_passed deployment waits for an owner to approve it. To approve via the API:

curl -X POST \
  "https://api.trulayer.ai/v1/prompts/deployments/018f9ab2-.../approve" \
  -H "Authorization: Bearer $TRULAYER_API_KEY"

The deployment moves to shipped and the new prompt is immediately live for that project. The approved_by field records your user ID. To reject a candidate:

curl -X POST \
  "https://api.trulayer.ai/v1/prompts/deployments/018f9ab2-.../reject" \
  -H "Authorization: Bearer $TRULAYER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"reason": "Candidate prompt is too verbose for our product voice."}'

In the dashboard, use the Approve & ship and Reject buttons on the deployment detail page. See Prompt improvements.

Auto-ship

If you want the platform to ship ab_passed deployments automatically without human review, set prompt_autoship_enabled to true on the project. You do this via Dashboard → Projects → [project] → Settings or via PATCH /v1/projects/{id}:

curl -X PATCH \
  "https://api.trulayer.ai/v1/projects/YOUR_PROJECT_ID" \
  -H "Authorization: Bearer $TRULAYER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt_autoship_enabled": true}'

Auto-ship bypasses human review. Enable it only when you are confident in your A/B eval rules and your failure cluster signal is low-noise. The recommendation is to run in HITL mode first, approve several deployments manually, and then enable auto-ship once you trust the pipeline.

When auto-ship fires, approved_by is set to "system:autoship" so you can distinguish automated approvals from human ones in the audit trail.

Step 5 — Monitor after shipping

Once a deployment is shipped, it moves to monitoring. The regression monitor watches the rolling metric window and compares it to the A/B candidate baseline. You can read the current regression metric from the deployment:

curl "https://api.trulayer.ai/v1/prompts/deployments/018f9ab2-..." \
  -H "Authorization: Bearer $TRULAYER_API_KEY"

The regression_metric field carries the current rolling value. If it falls below the threshold, the deployment moves to regressed and a banner appears across the dashboard. Check Dashboard → Prompt Improvements for the regression banner and the post-ship monitoring chart.

Step 6 — Rollback if regressed

When a deployment is in regressed or shipped/monitoring, you can roll it back to the previous prompt:

curl -X POST \
  "https://api.trulayer.ai/v1/prompts/deployments/018f9ab2-.../rollback" \
  -H "Authorization: Bearer $TRULAYER_API_KEY"

The deployment moves to rolled_back and the prior prompt is restored immediately. If auto-rollback is configured, the platform fires this transition automatically; approved_by is set to "system:auto_rollback" in that case.

Configuring the safety gates

`prompt_autoship_enabled` (per project)

Value	Behaviour
`false` (default)	`ab_passed` deployments wait for an owner to approve via dashboard or API.
`true`	`ab_passed` deployments ship automatically. `approved_by` is `"system:autoship"`.

Configure it on the project via Dashboard → Projects → [project] → Settings or PATCH /v1/projects/{id}.

`max_retry_depth` (per policy)

Policies with action_type: retry will retry a trace up to max_retry_depth times. When this limit is reached, the next retry is automatically converted to an escalate action and the trace enters the HITL queue. This prevents unbounded retry loops.

curl -X PATCH \
  "https://api.trulayer.ai/v1/policies/YOUR_POLICY_ID" \
  -H "Authorization: Bearer $TRULAYER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"max_retry_depth": 2}'

Default: 3. Valid range: 1–10. A value of 1 means a single retry before the policy escalates. Worked example: A policy has max_retry_depth: 2. The trace hits the policy and is retried once (control_loop_depth: 1). The retry fails the eval again and is retried a second time (control_loop_depth: 2). The second retry also fails. On the third attempt, the retry is auto-converted to escalate and the trace goes to the HITL queue. No further retries occur; a human must approve or reject it.

`max_cascade_depth` (per policy)

max_cascade_depth is a broader safety gate than max_retry_depth. Where max_retry_depth counts only retry actions for a single policy, max_cascade_depth counts every remediation action — retry, fallback_model, and prompt_modification — across all policies on a trace. When the total reaches the cap, the next remediation is auto-converted to escalate and parked in the HITL queue. Both gates run on every control-loop execution. The cascade gate runs first because it spans the wider budget. If neither gate fires, normal action execution proceeds.

curl -X PATCH \
  "https://api.trulayer.ai/v1/policies/YOUR_POLICY_ID" \
  -H "Authorization: Bearer $TRULAYER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"max_cascade_depth": 2}'

Default: 5. Valid range: 1–20. Worked example — retry → fallback_model → prompt_modification with max_cascade_depth: 2: Consider a trace that fails an eval and is processed by three policies in sequence:

Action 1 (retry): The retry policy fires. The trace is retried (total remediation count: 1). The retry still fails the eval.
Action 2 (fallback_model): A second policy fires and switches the model (total remediation count: 2). The fallback model response still fails the eval.
Action 3 (prompt_modification) — cascade gate trips: A third policy fires and would apply a prompt modification. The cascade gate runs first and sees total remediation count: 2, which equals max_cascade_depth: 2. The prompt modification is auto-converted to escalate and the trace enters the HITL pending-approval queue. No further automatic remediations occur.

In the HITL queue, you can inspect the trace’s remediation history and either approve the escalation (sending the trace for human review) or reject it (closing the loop). The escalation_reason in the action metadata is set to cascade_depth_exhausted so you can distinguish this from a retry_threshold_exceeded escalation produced by max_retry_depth.

Use max_cascade_depth when your policies can chain across action types — for example, a retry policy followed by a fallback-model policy on the same eval rule. Without a cascade cap, a trace that fails every action type in sequence could accumulate an unbounded number of remediations before any human sees it.

See max_retry_depth for the retry-only gate and API reference for the full Policy schema.

Endpoint reference

Method	Path	What it does
`GET`	`/v1/prompts/deployments`	List deployments. Filter by `status`, `project_id`.
`GET`	`/v1/prompts/deployments/{id}`	Get a single deployment by ID.
`POST`	`/v1/prompts/deployments/{id}/approve`	Approve an `ab_passed` (or `proposed`) deployment. Owner only.
`POST`	`/v1/prompts/deployments/{id}/reject`	Reject any non-terminal deployment. Owner only.
`POST`	`/v1/prompts/deployments/{id}/rollback`	Roll back a `shipped` or `monitoring`/`regressed` deployment. Owner only.

All endpoints require the Team+ plan and Clerk session authentication (dashboard-only; not reachable via API key). Owners can mutate; viewers and members can read.

Next steps

Prompt improvements dashboard guide — review deployments, the diff view, and the monitoring chart in the UI
Control loop dashboard guide — manage policies, the HITL queue, and the kill-switch
Remediation diffs guide — inspect per-span output diffs after a retry action
API reference — full PromptDeployment schema and parameter details
Changelog — recent updates

Getting started

Core concepts

Python SDK

TypeScript SDK

Go SDK

SDK features

Dashboard

Integrations

Control loop

Guides

Best practices

Reference

Contributing

Control loop quickstart

Step 1 — Detect failures

Step 2 — Cluster and propose a prompt diff

Step 3 — A/B replay and review

Step 4 — Ship the winning prompt

HITL approval (default)

Auto-ship

Step 5 — Monitor after shipping

Step 6 — Rollback if regressed

Configuring the safety gates

`prompt_autoship_enabled` (per project)

`max_retry_depth` (per policy)

`max_cascade_depth` (per policy)

Endpoint reference

Next steps

Getting started

Core concepts

Python SDK

TypeScript SDK

Go SDK

SDK features

Dashboard

Integrations

Control loop

Guides

Best practices

Reference

Contributing

Documentation Index

​Step 1 — Detect failures

​Step 2 — Cluster and propose a prompt diff

​Step 3 — A/B replay and review

​Step 4 — Ship the winning prompt

​HITL approval (default)

​Auto-ship

​Step 5 — Monitor after shipping

​Step 6 — Rollback if regressed

​Configuring the safety gates

​prompt_autoship_enabled (per project)

​max_retry_depth (per policy)

​max_cascade_depth (per policy)

​Endpoint reference

​Next steps

Step 1 — Detect failures

Step 2 — Cluster and propose a prompt diff

Step 3 — A/B replay and review

Step 4 — Ship the winning prompt

HITL approval (default)

Auto-ship

Step 5 — Monitor after shipping

Step 6 — Rollback if regressed

Configuring the safety gates

`prompt_autoship_enabled` (per project)

`max_retry_depth` (per policy)

`max_cascade_depth` (per policy)

Endpoint reference

Next steps