> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.usescout.sh/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.usescout.sh/_mcp/server.

# Rate limits

Scout enforces rate limits at three layers: per-account credits, a
shared concurrency pool, and upstream provider limits we have to live
within. Knowing which is which makes it easier to size a workload and
recover from a `429` or `503`.

## Account-level limits

| Plan    | Monthly credits | Notes                                                      |
| ------- | --------------- | ---------------------------------------------------------- |
| Free    | 1,500           | Hard cap. Requests past it return `429 Too Many Requests`. |
| Builder | 5,500           | Soft cap — overage billed at \$9 / 1,000 credits.          |
| Scale   | 25,500          | Soft cap — overage billed at \$6 / 1,000 credits.          |

The credit counter resets on the first of each month in UTC. The
[Settings → Usage](https://platform.usescout.sh) page shows live
remaining balance.

## Concurrency

Scout runs every account on a shared pool. Today we cap simultaneous
in-flight work at the service level, not per key — the practical limits
to expect:

* **Synchronous endpoints** (`/v1/search`, `/v1/extract`,
  `/v1/chat/completions`): designed to return in under 30 seconds.
  Requests that would push the browser pool past its capacity queue for
  up to 10 seconds before returning `503 Service Unavailable`. A retry
  with a small backoff is usually enough.
* **Asynchronous endpoints** (`/v1/task`, `/v1/search?depth=deep`,
  `/v1/findall`): the API responds with a `task_id` or `search_id` in
  under a second, and the heavy work runs in the background. You will
  not see a queue delay on creation, but the run itself may take longer
  to reach `completed` if the pool is busy.
* **Webhook deliveries**: retried up to 3 times with exponential backoff
  on any non-2xx response. See [Webhooks](/webhooks).

Per-key concurrency caps are on the roadmap; until then a runaway
client can affect their own account's tail latency. Use a polite
parallelism cap (start at 5–10 concurrent requests per key) and let the
async endpoints handle anything fan-out-shaped.

## When you see a 429 or 503

| Status                    | Reason                                                                                                         | What to do                                                                                              |
| ------------------------- | -------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
| `429 Too Many Requests`   | Monthly credit cap reached on a free plan, or per-account quota exceeded.                                      | Wait for the next reset or upgrade in the dashboard. Retrying immediately will keep returning `429`.    |
| `503 Service Unavailable` | The browser pool was saturated, an upstream fetch was blocked, or the LLM provider returned a transient error. | Retry with exponential backoff (start at 1s, double up to 30s, give up after 5 attempts).               |
| `502 Bad Gateway`         | A specific upstream URL was unreachable.                                                                       | Usually a problem with the target page, not Scout. Retrying the same URL rarely helps; reroute or skip. |

The exact reason for a `503` is in the `detail` field of the JSON body
when it is safe to surface. See [Errors & status codes](/errors).

## Upstream limits we live within

Scout's web fetch path goes through a residential and datacenter proxy
pool, then to the open web. Two upstreams set ceilings we cannot exceed:

* **Google SERP rate limits.** Google throttles aggressively on
  repeated identical queries from the same exit IP. Scout caches every
  SERP for a short window and rotates IPs across the pool, but bursts
  of the same query within seconds still risk a throttle and a `503`.
* **Per-page anti-bot challenges.** Some sites serve a challenge page
  to non-human traffic. Scout retries with a fresh browser fingerprint
  and IP up to three times before returning the result with an
  `errors[]` entry.

These show up to you as occasional `503`s on heavy or repetitive
workloads. Diversifying queries and spacing them out by even a few
hundred milliseconds is usually enough to stay under the limit.

## Recommended client behavior

* **Set a small connection pool.** 5–10 concurrent requests is a good
  starting point for the synchronous endpoints. Async endpoints can
  fan out further because they return immediately.
* **Honor `Retry-After`.** When we send one, retrying before that time
  will not succeed any faster.
* **Use exponential backoff.** Start at 1 second, double each attempt,
  cap at 30 seconds. Five tries is enough; past that, surface the error
  to your caller.
* **Cache idempotent responses on your side.** Scout already caches the
  SERP layer, but if the same `(query, country, language)` will be
  consumed many times by your app, cache the result in your own store
  to keep both your latency and your credit spend low.