Remediation diffs

After the control loop remediates a failing LLM call, you can retrieve a structured diff that shows exactly what changed between the original span and the remediated one. This guide explains how to fetch that diff, how to handle the lazy-computation pattern, and what each field means.

What is a remediation diff?

When a control action runs and produces a remediated span, TruLayer computes a structured before/after diff. The diff endpoint (GET /v1/control/actions/{id}/diff) compares the two span outputs and returns:

Token length delta — how many tokens the remediated output added or removed
Latency delta — how many milliseconds faster or slower the remediated call was
Embedding similarity — cosine similarity between the original and remediated outputs, computed by Claude Haiku 4.5
Score deltas — for each eval rule that scored both spans, the before and after scores and the delta

Use remediation diffs to audit whether a remediation improved quality holistically, to tune your policies, or to catch regressions where a fix on one eval dimension introduced a regression on another.

Which action types produce a diff?

All three action types that produce a remediated span support the diff endpoint:

Action type	Produces a diff?
`retry`	Yes
`fallback_model`	Yes
`prompt_modification`	Yes

For all three action types, the remediated span is published to the trace and eval rules re-fire on it. Once evaluation completes on both spans, the diff is available. There is no action type in the current API that returns 422 from the diff endpoint. HTTP 422 is reserved for future action types that do not produce a new output span.

Fetching a diff

curl https://api.trulayer.ai/v1/control/actions/{id}/diff \
  -H "Authorization: Bearer tl_..."

Replace {id} with the UUID of the control action. This endpoint is dashboard-accessible for all Team+ plan roles.

The 202 → poll-until-200 pattern

Diff computation is lazy: the first GET triggers the computation, which runs asynchronously. If the underlying evaluation has not yet completed, the API returns 202:

{
  "status": "pending",
  "reason": "evaluation_incomplete"
}

Poll until you receive 200. A simple polling loop:

async function waitForDiff(actionId: string, token: string) {
  while (true) {
    const res = await fetch(
      `https://api.trulayer.ai/v1/control/actions/${actionId}/diff`,
      { headers: { Authorization: `Bearer ${token}` } }
    );
    if (res.status === 200) return await res.json();
    if (res.status !== 202) throw new Error(`Unexpected status: ${res.status}`);
    await new Promise((resolve) => setTimeout(resolve, 2000));
  }
}

import time
import httpx

def wait_for_diff(action_id: str, token: str) -> dict:
    while True:
        r = httpx.get(
            f"https://api.trulayer.ai/v1/control/actions/{action_id}/diff",
            headers={"Authorization": f"Bearer {token}"},
        )
        if r.status_code == 200:
            return r.json()
        if r.status_code != 202:
            r.raise_for_status()
        time.sleep(2)

Once computed, the diff is cached. Subsequent requests for the same action return 200 immediately.

Response schema

A 200 response returns a RemediationDiff object:

{
  "action_id": "018f1234-...",
  "original_span_id": "018f1235-...",
  "remediated_span_id": "018f1236-...",
  "token_length_delta": -42,
  "latency_delta_ms": 120,
  "embedding_similarity": 0.87,
  "score_deltas": [
    {
      "eval_rule_id": "018f1237-...",
      "rule_name": "Correctness",
      "original_score": 0.3,
      "remediated_score": 0.9,
      "delta": 0.6
    }
  ],
  "summary": "Remediated output 12% shorter, embedding similarity 0.87, correctness +0.6"
}

`embedding_similarity`

Embedding similarity is the cosine similarity between the original and remediated output embeddings, computed by Claude Haiku 4.5. The value ranges from 0.0 to 1.0. A value of -1.0 is a sentinel meaning the embedding computation failed — treat it as unavailable and do not render it as a score. This can occur when the model is temporarily unavailable or when the output is too short to embed meaningfully.

`score_deltas`

Each entry in score_deltas corresponds to one eval rule that scored both the original and remediated spans. delta is remediated_score - original_score, so a positive delta means the retry improved the score on that rule. If an eval rule did not score one or both spans — for example, because the rule was added after the original trace was ingested — it does not appear in the array.

Access and plan requirements

The diff endpoint is gated to the Team+ plan. Starter and Pro tenants receive 402 with code: "plan_upgrade_required". The endpoint is dashboard-accessible — it cannot be reached via API key. All three dashboard roles (owner, member, viewer) can read diffs. Only owners can execute control actions that produce them.

Error reference

Status	Condition
`200`	Diff is available
`202`	Evaluation still running — poll again
`404`	Action not found or belongs to a different tenant
`422`	Reserved — action type produces no new output span (none currently)

Auto-escalation and the retry depth cap

When a policy’s action type is retry, the control loop will automatically stop retrying and escalate to HITL if a single trace has been retried too many times. This prevents unbounded retry cascades.

`max_retry_depth`

Every retry-action policy has a max_retry_depth field (integer, 1–10, default 3). Set it when creating or updating a policy:

# Create a policy with a custom retry cap
curl -X POST https://api.trulayer.ai/v1/control/policies \
  -H "Authorization: Bearer tl_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Hallucination retry",
    "action_type": "retry",
    "max_retry_depth": 2,
    ...
  }'

# Update the cap on an existing policy
curl -X PATCH https://api.trulayer.ai/v1/control/policies/{id} \
  -H "Authorization: Bearer tl_..." \
  -H "Content-Type: application/json" \
  -d '{ "max_retry_depth": 5 }'

Values outside [1, 10] return 422 with a field-level validation error.

How escalation fires

When the control loop is about to fire a retry and the trace’s retry_count is already equal to or greater than max_retry_depth, the action is automatically converted to an escalate action:

require_approval is set to true.
The action’s metadata includes escalation_reason: "retry_threshold_exceeded" and retry_count: <N>.
The trace is routed to the HITL pending-approval queue — no further automatic retries occur for that policy on that trace.

Example action metadata when auto-escalation fires:

{
  "escalation_reason": "retry_threshold_exceeded",
  "retry_count": 3
}

Retry cap hit metric

Monitor how often auto-escalation fires across your project using the retry_cap_hit project metric:

curl "https://api.trulayer.ai/v1/projects/{id}/metrics?metric=retry_cap_hit&window=7d" \
  -H "Authorization: Bearer tl_..."

Response:

{ "metric": "retry_cap_hit", "value": 5, "window": "7d" }

In the dashboard, the Retry cap hit count appears on the project overview. Clicking it opens a filtered trace list showing only traces where escalation_reason: "retry_threshold_exceeded" — so you can inspect each case and decide whether to raise the cap, fix the policy trigger, or take no action.

`control_loop_depth` on traces

GET /v1/traces/{id} includes a control_loop_depth integer field counting the number of retry actions that executed on the trace. Escalation actions are not counted. Use this field to understand how many attempts the system made before succeeding or escalating. See Control for policy configuration in the dashboard and Traces for how control_loop_depth appears in the trace detail view.

Next steps

Control loop dashboard guide — view and manage control actions in the UI
API reference — full RemediationDiff and ScoreDelta schema definitions
Metrics — retry_cap_hit project metric reference
Changelog — recent additions and changes

Getting started

Core concepts

Python SDK

TypeScript SDK

Go SDK

SDK features

Dashboard

Integrations

Control loop

Guides

Best practices

Reference

Contributing

Remediation diffs

What is a remediation diff?

Which action types produce a diff?

Fetching a diff

The 202 → poll-until-200 pattern

Response schema

`embedding_similarity`

`score_deltas`

Access and plan requirements

Error reference

Auto-escalation and the retry depth cap

`max_retry_depth`

How escalation fires

Retry cap hit metric

`control_loop_depth` on traces

Next steps

Getting started

Core concepts

Python SDK

TypeScript SDK

Go SDK

SDK features

Dashboard

Integrations

Control loop

Guides

Best practices

Reference

Contributing

Documentation Index

​What is a remediation diff?

​Which action types produce a diff?

​Fetching a diff

​The 202 → poll-until-200 pattern

​Response schema

​embedding_similarity

​score_deltas

​Access and plan requirements

​Error reference

​Auto-escalation and the retry depth cap

​max_retry_depth

​How escalation fires

​Retry cap hit metric

​control_loop_depth on traces

​Next steps

What is a remediation diff?

Which action types produce a diff?

Fetching a diff

The 202 → poll-until-200 pattern

Response schema

`embedding_similarity`

`score_deltas`

Access and plan requirements

Error reference

Auto-escalation and the retry depth cap

`max_retry_depth`

How escalation fires

Retry cap hit metric

`control_loop_depth` on traces

Next steps