> ## Documentation Index
> Fetch the complete documentation index at: https://docs.trulayer.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Failure behavior

> What happens when the TruLayer ingest API is unreachable, and how to opt in to blocking semantics.

The Python SDK is designed so that a TruLayer ingest outage never becomes an application outage. This page documents the default behavior and the one opt-in knob for teams who deliberately want the opposite tradeoff.

## Default — drop and warn

When the ingest API is unreachable (network error) or returns a transient status (`5xx`), the SDK:

1. Retries the batch up to **3×** with exponential backoff (500 ms, 1 s, 2 s).
2. On the third failure, drops the batch in-memory and emits a single `warnings.warn(...)`.
3. Suppresses warnings for subsequent failures within a **60-second window** to avoid log flooding — a fresh warning is emitted once the window rolls over.

User code never blocks on network I/O and never sees a batch failure surface as an exception. Trace capture runs on the caller thread; transport runs on a background flush thread that owns the retry logic.

This is the right default for almost every production service. A dead ingest endpoint should degrade observability, not customer-facing behavior.

## Opt-in — `TRULAYER_FAIL_MODE=block`

Set `TRULAYER_FAIL_MODE=block` to make `client.shutdown()` (and the flush that runs during shutdown) raise a typed `TruLayerFlushError` when a batch exhausts its retries.

```python theme={null}
import os
import trulayer
from trulayer.errors import TruLayerFlushError

os.environ["TRULAYER_FAIL_MODE"] = "block"

trulayer.init(
    api_key=os.environ["TRULAYER_API_KEY"],
    project_name="critical-eval-pipeline",
)

try:
    with trulayer.trace("nightly-eval") as trace:
        ...
    trulayer.shutdown()
except TruLayerFlushError as err:
    # Alert, mark the run as failed, or abort the job deliberately.
    print(f"ingest failed: {err} (batch size {err.batch_size})")
    raise
```

`TruLayerFlushError` exposes two fields:

* `batch_size: int` — number of traces in the failed batch.
* `__cause__` — the underlying network or HTTP error (standard Python exception chaining).

### When to use block mode

Block mode is a niche tool. Reach for it only when:

* The workload is a **batch job** whose entire value depends on TruLayer receiving the output (eval pipelines, backfills, scheduled quality runs).
* Silently losing traces is **materially worse** than surfacing an error to the operator.
* The caller is prepared to handle `TruLayerFlushError` — typically by failing the job and retrying the whole run.

Do **not** use block mode for:

* User-facing request handlers (ASGI/WSGI apps). A transient ingest outage will cascade into customer-visible failures.
* Background services that must survive observability outages (payment processors, auth flows, webhooks).

## Zero-network — `TRULAYER_MODE=local`

For CI and offline development, set `TRULAYER_MODE=local`. The SDK swaps the HTTP sender for an in-memory `LocalBatchSender` that stores every trace for inspection, never touches the network, and never warns.

```bash theme={null}
TRULAYER_MODE=local pytest
```

Combine with the [`trulayer.testing`](/sdks/python/testing) helpers for assertions on captured traces.

## Replay — `TRULAYER_MODE=replay`

Set `TRULAYER_MODE=replay` together with `TRULAYER_REPLAY_FILE=<path>` to load a previously captured JSONL file on `init()`. Useful for golden-file regression tests and reproducing a production trace locally.

```bash theme={null}
TRULAYER_MODE=replay \
TRULAYER_REPLAY_FILE=fixtures/golden.jsonl \
  pytest
```

`TRULAYER_MODE=replay` implies `local` — replayed traces never escape to the live API, because they were produced by a previous capture and would double-count in the dashboard. Malformed JSONL lines are skipped with a warning.

## Decision guide

| Scenario                                      | Recommended mode                                |
| --------------------------------------------- | ----------------------------------------------- |
| Production HTTP service                       | Default (drop + warn)                           |
| Background worker with SLO on ingest          | Default (drop + warn)                           |
| Nightly eval / backfill job                   | `TRULAYER_FAIL_MODE=block`                      |
| CI unit tests                                 | `TRULAYER_MODE=local`                           |
| CI integration tests against a golden capture | `TRULAYER_MODE=replay` + `TRULAYER_REPLAY_FILE` |
| Local development without an API key          | `TRULAYER_MODE=local`                           |

<h2 id="archived-project">
  Archived project
</h2>

HTTP `403` responses with `code: "error.project.archived"` are treated differently from other errors. They indicate that the project associated with your API key has been archived — a deliberate configuration change, not a transient failure.

When the SDK receives this response:

1. It logs an ERROR-level message via the standard Python `logging` module (logger name `trulayer`):
   ```
   ERROR trulayer: Ingest permanently disabled — the project associated with this API key has been archived.
   Unarchive the project at https://app.trulayer.ai/projects to resume, then restart the process or create a new client.
   ```
2. The exporter is **permanently disabled** for that client instance. Subsequent flush attempts are no-ops and produce no further log output.
3. Your application continues running normally — only TruLayer observability is suspended.

### Why the exporter does not retry

A `403` is an authoritative refusal. Retrying would produce noise without any possibility of success. The SDK treats this the same way a browser treats an HTTP 403: stop, log, and do not retry.

### Resuming after unarchiving

Unarchiving the project (from **Projects settings** at [app.trulayer.ai/projects](https://app.trulayer.ai/projects)) restores ingest immediately — no key rotation needed.

However, any already-running client instance that received the 403 will not automatically resume. You must either:

* **Restart the process** — the new process starts a fresh client that will send normally.
* **Create a new client** — call `trulayer.init(...)` (or instantiate a new `TruLayerClient`) again with the same API key. The new client is independent of the disabled one.

### Detecting the error programmatically

In block mode (`TRULAYER_FAIL_MODE=block`), the `TruLayerFlushError` raised on permanent 403s carries a `status_code` attribute you can inspect:

```python theme={null}
from trulayer.errors import TruLayerFlushError

try:
    trulayer.shutdown()
except TruLayerFlushError as err:
    if getattr(err, "status_code", None) == 403:
        print("Project may be archived — check app.trulayer.ai/projects")
    raise
```

See [Project lifecycle](/guides/projects) for full details on archiving, unarchiving, and the one-active-project constraint.

## See also

* [Testing helpers](/sdks/python/testing) — in-memory assertions against captured traces.
* [Python SDK reference](/sdks/python/reference) — full signatures for `TruLayerFlushError` and the error hierarchy.
* [Project lifecycle](/guides/projects) — archiving, unarchiving, and the effect on active API keys.
