> ## Documentation Index
> Fetch the complete documentation index at: https://docs.trulayer.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Remediation diffs

> Compare original and remediated span outputs after a control-loop action.

After the control loop remediates a failing LLM call, you can retrieve a structured diff that shows exactly what changed between the original span and the remediated one. This guide explains how to fetch that diff, how to handle the lazy-computation pattern, and what each field means.

## What is a remediation diff?

When a control action runs and produces a remediated span, TruLayer computes a structured before/after diff. The diff endpoint (`GET /v1/control/actions/{id}/diff`) compares the two span outputs and returns:

* **Token length delta** — how many tokens the remediated output added or removed
* **Latency delta** — how many milliseconds faster or slower the remediated call was
* **Embedding similarity** — cosine similarity between the original and remediated outputs, computed by Claude Haiku 4.5
* **Score deltas** — for each eval rule that scored both spans, the before and after scores and the delta

Use remediation diffs to audit whether a remediation improved quality holistically, to tune your policies, or to catch regressions where a fix on one eval dimension introduced a regression on another.

## Which action types produce a diff?

All three action types that produce a remediated span support the diff endpoint:

| Action type           | Produces a diff? |
| --------------------- | ---------------- |
| `retry`               | Yes              |
| `fallback_model`      | Yes              |
| `prompt_modification` | Yes              |

For all three action types, the remediated span is published to the trace and eval rules re-fire on it. Once evaluation completes on both spans, the diff is available.

There is no action type in the current API that returns `422` from the diff endpoint. HTTP `422` is reserved for future action types that do not produce a new output span.

## Fetching a diff

```bash theme={null}
curl https://api.trulayer.ai/v1/control/actions/{id}/diff \
  -H "Authorization: Bearer tl_..."
```

Replace `{id}` with the UUID of the control action. This endpoint is dashboard-accessible for all Team+ plan roles.

## The 202 → poll-until-200 pattern

Diff computation is lazy: the first GET triggers the computation, which runs asynchronously. If the underlying evaluation has not yet completed, the API returns `202`:

```json theme={null}
{
  "status": "pending",
  "reason": "evaluation_incomplete"
}
```

Poll until you receive `200`. A simple polling loop:

```typescript theme={null}
async function waitForDiff(actionId: string, token: string) {
  while (true) {
    const res = await fetch(
      `https://api.trulayer.ai/v1/control/actions/${actionId}/diff`,
      { headers: { Authorization: `Bearer ${token}` } }
    );
    if (res.status === 200) return await res.json();
    if (res.status !== 202) throw new Error(`Unexpected status: ${res.status}`);
    await new Promise((resolve) => setTimeout(resolve, 2000));
  }
}
```

```python theme={null}
import time
import httpx

def wait_for_diff(action_id: str, token: str) -> dict:
    while True:
        r = httpx.get(
            f"https://api.trulayer.ai/v1/control/actions/{action_id}/diff",
            headers={"Authorization": f"Bearer {token}"},
        )
        if r.status_code == 200:
            return r.json()
        if r.status_code != 202:
            r.raise_for_status()
        time.sleep(2)
```

Once computed, the diff is cached. Subsequent requests for the same action return `200` immediately.

## Response schema

A `200` response returns a `RemediationDiff` object:

```json theme={null}
{
  "action_id": "018f1234-...",
  "original_span_id": "018f1235-...",
  "remediated_span_id": "018f1236-...",
  "token_length_delta": -42,
  "latency_delta_ms": 120,
  "embedding_similarity": 0.87,
  "score_deltas": [
    {
      "eval_rule_id": "018f1237-...",
      "rule_name": "Correctness",
      "original_score": 0.3,
      "remediated_score": 0.9,
      "delta": 0.6
    }
  ],
  "summary": "Remediated output 12% shorter, embedding similarity 0.87, correctness +0.6"
}
```

### `embedding_similarity`

Embedding similarity is the cosine similarity between the original and remediated output embeddings, computed by Claude Haiku 4.5. The value ranges from `0.0` to `1.0`.

A value of **`-1.0`** is a sentinel meaning the embedding computation failed — treat it as unavailable and do not render it as a score. This can occur when the model is temporarily unavailable or when the output is too short to embed meaningfully.

### `score_deltas`

Each entry in `score_deltas` corresponds to one eval rule that scored both the original and remediated spans. `delta` is `remediated_score - original_score`, so a positive delta means the retry improved the score on that rule.

If an eval rule did not score one or both spans — for example, because the rule was added after the original trace was ingested — it does not appear in the array.

## Access and plan requirements

The diff endpoint is gated to the Team+ plan. Starter and Pro tenants receive `402` with `code: "plan_upgrade_required"`. The endpoint is dashboard-accessible — it cannot be reached via API key.

All three dashboard roles (owner, member, viewer) can read diffs. Only owners can execute control actions that produce them.

## Error reference

| Status | Condition                                                           |
| ------ | ------------------------------------------------------------------- |
| `200`  | Diff is available                                                   |
| `202`  | Evaluation still running — poll again                               |
| `404`  | Action not found or belongs to a different tenant                   |
| `422`  | Reserved — action type produces no new output span (none currently) |

## Auto-escalation and the retry depth cap

When a policy's action type is `retry`, the control loop will automatically stop retrying and escalate to HITL if a single trace has been retried too many times. This prevents unbounded retry cascades.

### `max_retry_depth`

Every `retry`-action policy has a `max_retry_depth` field (integer, 1–10, default `3`). Set it when creating or updating a policy:

```bash theme={null}
# Create a policy with a custom retry cap
curl -X POST https://api.trulayer.ai/v1/control/policies \
  -H "Authorization: Bearer tl_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Hallucination retry",
    "action_type": "retry",
    "max_retry_depth": 2,
    ...
  }'

# Update the cap on an existing policy
curl -X PATCH https://api.trulayer.ai/v1/control/policies/{id} \
  -H "Authorization: Bearer tl_..." \
  -H "Content-Type: application/json" \
  -d '{ "max_retry_depth": 5 }'
```

Values outside `[1, 10]` return `422` with a field-level validation error.

### How escalation fires

When the control loop is about to fire a retry and the trace's `retry_count` is already equal to or greater than `max_retry_depth`, the action is automatically converted to an `escalate` action:

* `require_approval` is set to `true`.
* The action's `metadata` includes `escalation_reason: "retry_threshold_exceeded"` and `retry_count: <N>`.
* The trace is routed to the HITL pending-approval queue — no further automatic retries occur for that policy on that trace.

Example action metadata when auto-escalation fires:

```json theme={null}
{
  "escalation_reason": "retry_threshold_exceeded",
  "retry_count": 3
}
```

### Retry cap hit metric

Monitor how often auto-escalation fires across your project using the `retry_cap_hit` project metric:

```bash theme={null}
curl "https://api.trulayer.ai/v1/projects/{id}/metrics?metric=retry_cap_hit&window=7d" \
  -H "Authorization: Bearer tl_..."
```

Response:

```json theme={null}
{ "metric": "retry_cap_hit", "value": 5, "window": "7d" }
```

In the dashboard, the **Retry cap hit** count appears on the project overview. Clicking it opens a filtered trace list showing only traces where `escalation_reason: "retry_threshold_exceeded"` — so you can inspect each case and decide whether to raise the cap, fix the policy trigger, or take no action.

### `control_loop_depth` on traces

`GET /v1/traces/{id}` includes a `control_loop_depth` integer field counting the number of retry actions that executed on the trace. Escalation actions are not counted. Use this field to understand how many attempts the system made before succeeding or escalating.

See [Control](/dashboard/control#policies) for policy configuration in the dashboard and [Traces](/dashboard/traces#header) for how `control_loop_depth` appears in the trace detail view.

## Next steps

* [Control loop dashboard guide](/dashboard/control) — view and manage control actions in the UI
* [API reference](/api-reference/introduction) — full `RemediationDiff` and `ScoreDelta` schema definitions
* [Metrics](/concepts/metrics#project-scoped-metrics) — `retry_cap_hit` project metric reference
* [Changelog](/guides/changelog) — recent additions and changes
