The TruLayer control loop watches your production traces, spots systematic failures, proposes a prompt fix, validates it in a sandboxed A/B replay, and then either ships it automatically or parks it for your review. This guide walks through the full cycle and shows you how to configure the two key safety gates: HITL approval and the cascade depth limit. Requirements: Team plan or above. Owner role for any mutation (approve, reject, rollback, policy changes). Viewer and Member roles can read all control-loop data.Documentation Index
Fetch the complete documentation index at: https://docs.trulayer.ai/llms.txt
Use this file to discover all available pages before exploring further.
Step 1 — Detect failures
TruLayer’s failure detector (running in the consumer pipeline) clusters incoming traces by their error signature. You do not need to configure anything for detection to work — every span with anerror field set, and every trace that fails an active eval rule, is automatically fed into the cluster engine.
To confirm detection is working, go to Dashboard → Failures. You should see clusters appearing within a few minutes of traces arriving. Each cluster has a signature (a stable hash of the project and error type) that links it to any prompt deployment the system proposes later.
For programmatic access, use GET /v1/failures/clusters:
Step 2 — Cluster and propose a prompt diff
When a cluster reaches a threshold size, the cluster-to-diff worker asks the LLM to synthesise a candidate prompt. The result is stored as a prompt deployment in theproposed state.
You can list pending proposals via the API:
Step 3 — A/B replay and review
Once proposed, the A/B harness replays a held-out trace set against both the current prompt and the candidate. The deployment moves throughab_running and lands on either ab_passed (candidate recommended) or ab_failed (no improvement detected). The ab_report field on the deployment carries the full per-metric delta report.
Fetch a deployment by ID to see where it is:
Step 4 — Ship the winning prompt
There are two ways to ship anab_passed deployment: HITL approval (the default) or auto-ship.
HITL approval (default)
Withprompt_autoship_enabled set to false on the project (the default), every ab_passed deployment waits for an owner to approve it. To approve via the API:
shipped and the new prompt is immediately live for that project. The approved_by field records your user ID.
To reject a candidate:
Auto-ship
If you want the platform to shipab_passed deployments automatically without human review, set prompt_autoship_enabled to true on the project. You do this via Dashboard → Projects → [project] → Settings or via PATCH /v1/projects/{id}:
approved_by is set to "system:autoship" so you can distinguish automated approvals from human ones in the audit trail.
Step 5 — Monitor after shipping
Once a deployment is shipped, it moves tomonitoring. The regression monitor watches the rolling metric window and compares it to the A/B candidate baseline. You can read the current regression metric from the deployment:
regression_metric field carries the current rolling value. If it falls below the threshold, the deployment moves to regressed and a banner appears across the dashboard. Check Dashboard → Prompt Improvements for the regression banner and the post-ship monitoring chart.
Step 6 — Rollback if regressed
When a deployment is inregressed or shipped/monitoring, you can roll it back to the previous prompt:
rolled_back and the prior prompt is restored immediately. If auto-rollback is configured, the platform fires this transition automatically; approved_by is set to "system:auto_rollback" in that case.
Configuring the safety gates
prompt_autoship_enabled (per project)
| Value | Behaviour |
|---|---|
false (default) | ab_passed deployments wait for an owner to approve via dashboard or API. |
true | ab_passed deployments ship automatically. approved_by is "system:autoship". |
PATCH /v1/projects/{id}.
max_retry_depth (per policy)
Policies with action_type: retry will retry a trace up to max_retry_depth times. When this limit is reached, the next retry is automatically converted to an escalate action and the trace enters the HITL queue. This prevents unbounded retry loops.
max_retry_depth: 2. The trace hits the policy and is retried once (control_loop_depth: 1). The retry fails the eval again and is retried a second time (control_loop_depth: 2). The second retry also fails. On the third attempt, the retry is auto-converted to escalate and the trace goes to the HITL queue. No further retries occur; a human must approve or reject it.
max_cascade_depth (per policy)
max_cascade_depth is a broader safety gate than max_retry_depth. Where max_retry_depth counts only retry actions for a single policy, max_cascade_depth counts every remediation action — retry, fallback_model, and prompt_modification — across all policies on a trace. When the total reaches the cap, the next remediation is auto-converted to escalate and parked in the HITL queue.
Both gates run on every control-loop execution. The cascade gate runs first because it spans the wider budget. If neither gate fires, normal action execution proceeds.
retry → fallback_model → prompt_modification with max_cascade_depth: 2:
Consider a trace that fails an eval and is processed by three policies in sequence:
- Action 1 (retry): The retry policy fires. The trace is retried (
total remediation count: 1). The retry still fails the eval. - Action 2 (fallback_model): A second policy fires and switches the model (
total remediation count: 2). The fallback model response still fails the eval. - Action 3 (prompt_modification) — cascade gate trips: A third policy fires and would apply a prompt modification. The cascade gate runs first and sees
total remediation count: 2, which equalsmax_cascade_depth: 2. The prompt modification is auto-converted toescalateand the trace enters the HITL pending-approval queue. No further automatic remediations occur.
escalation_reason in the action metadata is set to cascade_depth_exhausted so you can distinguish this from a retry_threshold_exceeded escalation produced by max_retry_depth.
Use
max_cascade_depth when your policies can chain across action types — for example, a retry policy followed by a fallback-model policy on the same eval rule. Without a cascade cap, a trace that fails every action type in sequence could accumulate an unbounded number of remediations before any human sees it.max_retry_depth for the retry-only gate and API reference for the full Policy schema.
Endpoint reference
| Method | Path | What it does |
|---|---|---|
GET | /v1/prompts/deployments | List deployments. Filter by status, project_id. |
GET | /v1/prompts/deployments/{id} | Get a single deployment by ID. |
POST | /v1/prompts/deployments/{id}/approve | Approve an ab_passed (or proposed) deployment. Owner only. |
POST | /v1/prompts/deployments/{id}/reject | Reject any non-terminal deployment. Owner only. |
POST | /v1/prompts/deployments/{id}/rollback | Roll back a shipped or monitoring/regressed deployment. Owner only. |
Next steps
- Prompt improvements dashboard guide — review deployments, the diff view, and the monitoring chart in the UI
- Control loop dashboard guide — manage policies, the HITL queue, and the kill-switch
- Remediation diffs guide — inspect per-span output diffs after a retry action
- API reference — full
PromptDeploymentschema and parameter details - Changelog — recent updates