A Practical Introduction to Circuit Breakers and Fallback Design in FastAPI: Real-World Patterns for Preventing External API Failures from Becoming System-Wide Failures

greeden

4 hours ago

A Practical Introduction to Circuit Breakers and Fallback Design in FastAPI: Real-World Patterns for Preventing External API Failures from Becoming System-Wide Failures

Summary

A circuit breaker is a design pattern that temporarily stops calls to an external dependency after repeated failures, in order to prevent your entire API from becoming slow or failing as it gets dragged down by that dependency. Martin Fowler describes the basic form as opening the breaker after a certain number of failures, after which subsequent calls fail immediately instead of being attempted.
FastAPI is strong at asynchronous I/O and works very well with external API integrations, but if a slow dependency drags on, the event loop and worker processes can suffer across the whole application. That is why it is important to think about timeouts, retries, connection pooling, and fault isolation together.
HTTPX provides features such as shared AsyncClient instances, Timeout, Limits, and transport-level support for connection retries, which makes it a good foundation for external API client design. Timeouts are split into connect, read, write, and pool, and connection counts can also be controlled explicitly.
In Python, libraries such as PyBreaker exist and can be used to implement the Circuit Breaker pattern. PyBreaker describes itself as a “Python implementation of the Circuit Breaker pattern.”
A fallback is the idea that, instead of simply failing when an outage occurs, you return cached data, degrade functionality, or defer retrying until later through some alternative path. In practice, combining a circuit breaker with fallback behavior and monitoring tends to produce more stable systems for both user experience and operations than using a circuit breaker alone.

Who will benefit from reading this

Individual developers and learners

This is for people who have started using one or two external APIs and are struggling with situations like “sometimes it is slow” or “sometimes it fails.”
It is especially useful if you are calling httpx.get() or AsyncClient.get() directly inside FastAPI and have experienced cases where your own API becomes slow just because the external service is unhealthy. FastAPI is well suited for async code and makes it easy to design for external I/O, but unless you add mechanisms to absorb delays from external dependencies, that strength is hard to fully use.

Backend engineers in small teams

This is for teams calling multiple external services such as shipping, payments, notifications, or authentication platforms from FastAPI.
If you are asking questions like “How far should we retry?” “Right now, when an external service is down, all we can do is return 500s.” or “Timeouts and connection limits differ depending on who wrote the code,” this article will help you organize those issues from the perspective of circuit breakers and fallbacks. HTTPX provides AsyncClient, timeout controls, connection limit controls, and transport configuration, making it a natural fit for a shared client layer.

SaaS teams and startups

This is for teams where trouble with external APIs directly affects your product’s SLOs and support ticket volume.
If you are at the stage where you want to say, “Even if one external API is unhealthy, we do not want it to become a system-wide outage,” or “We want to survive by returning cached results or degrading gracefully,” or “We also want proper observability during failures,” then circuit breakers and fallbacks are worth treating as core infrastructure, on the same level as authorization, auditing, and job queues. Circuit Breaker is widely known as a representative pattern for isolating failure, and it is also frequently discussed as an important pattern in microservice architectures.

Accessibility notes

The article starts with a summary, then proceeds step by step through “why this is necessary,” “how to design it,” and “how to implement it.”
Technical terms are explained briefly on first use, and the same terminology is used consistently afterward to make the flow easier to follow.
Code examples are split into short sections so that each block shows just one responsibility.
Each chapter is written so it can be read independently, with the needed context provided where relevant.
The target level is roughly equivalent to WCAG AA readability expectations.

1. What is a circuit breaker?

A circuit breaker is a mechanism that temporarily stops calls to an external dependency when failures continue to occur.
Martin Fowler explains the basic form as wrapping a protected function call with a breaker object, and when the number of failures reaches a threshold, the breaker opens the circuit and stops making further calls, returning errors immediately instead. In practice, this is usually considered together with monitoring and alerting.

The electrical analogy makes this easy to understand.
If something is close to short-circuiting, you do not keep letting current flow until everything burns out. You cut it off once to prevent the damage from spreading. The same applies to applications. If an external API is taking dozens of seconds to respond, returning 5xx continuously, or causing connections to pile up, then if you keep calling it, your entire FastAPI application can get dragged down. The Circuit Breaker pattern exists specifically to stop that chain reaction.

2. Why this matters especially in FastAPI

FastAPI is built around async def and asynchronous I/O, so it is very well suited to I/O-bound work such as external APIs and databases.
At the same time, if an external API becomes very slow or starts failing repeatedly, the number of tasks waiting on the event loop increases, and this can affect connection pools and the perceived performance of the entire application. FastAPI’s documentation explains that async processing is especially useful for I/O-bound workloads, and external API calls are exactly that kind of work.

FastAPI’s lifespan feature is also well suited for managing resources at application startup and shutdown, which makes it a natural place to create and clean up shared HTTP clients. Its dependency injection system also makes it easy to assemble shared external API clients as common dependencies. In other words, FastAPI already provides the foundation needed to place circuit breakers and fallbacks cleanly.

3. The overall picture to understand first: timeout, retry, breaker, fallback

A circuit breaker alone is not enough to make your system resilient to external API failures.
In practice, it becomes much easier to reason about things if you think in terms of these four pieces together.

Timeout
- Decide how long you are willing to wait in the first place
Retry
- Retry briefly if the failure seems temporary
Circuit breaker
- Temporarily stop calling a dependency that keeps failing
Fallback
- Decide what to return when you cannot call it

HTTPX gives you timeouts, connection controls, and transport-level support for connection retries. Tenacity is a good fit for retry design with exponential backoff. PyBreaker can be used for the Circuit Breaker pattern itself. So the tooling around FastAPI is already quite mature.

4. Start with the foundation: keep a shared `AsyncClient` in lifespan

Before you even get to circuit breakers, it is easier to extend your design later if you first build a shared external API client foundation.
HTTPX recommends using AsyncClient in asynchronous environments and reusing the client instance. It also explicitly advises against creating clients repeatedly inside hot loops.

from contextlib import asynccontextmanager

import httpx
from fastapi import FastAPI

@asynccontextmanager
async def lifespan(app: FastAPI):
    timeout = httpx.Timeout(connect=2.0, read=5.0, write=5.0, pool=1.0)
    limits = httpx.Limits(
        max_keepalive_connections=20,
        max_connections=100,
        keepalive_expiry=5.0,
    )
    app.state.http_client = httpx.AsyncClient(
        timeout=timeout,
        limits=limits,
    )
    try:
        yield
    finally:
        await app.state.http_client.aclose()

app = FastAPI(lifespan=lifespan)

The Timeout and Limits used here are both official HTTPX features. Timeouts are divided into connect, read, write, and pool, and connection limits can be adjusted with max_connections and related settings.

5. Inject the shared client through a dependency

Using FastAPI dependencies, you can naturally pass the shared AsyncClient into each client class.
FastAPI describes its dependency system as powerful and intuitive, and it works very well for reusable components and testability.

import httpx
from fastapi import Request

def get_http_client(request: Request) -> httpx.AsyncClient:
    return request.app.state.http_client

On top of that, you can create a class for a specific external service.

import httpx

class BillingClient:
    def __init__(self, client: httpx.AsyncClient, base_url: str, api_key: str):
        self.client = client
        self.base_url = base_url.rstrip("/")
        self.api_key = api_key

Once you structure it this way, you can keep the circuit breaker and fallback logic inside the client layer.
That prevents HTTPX-specific details from being scattered across routers and service layers, and it makes later improvements much easier.

6. Explicitly define timeouts: the minimum defense before a breaker

HTTPX enables timeouts by default, but in practice it is safer to define them explicitly.
The official documentation explains the four kinds of timeouts: connect, read, write, and pool. For example, pool timeout means how long to wait for an available connection from the pool.

import httpx

DEFAULT_TIMEOUT = httpx.Timeout(
    connect=1.5,
    read=3.0,
    write=3.0,
    pool=0.5,
)

What matters is recognizing that different external APIs deserve different waiting times.
A fast internal service might justify a much shorter timeout, while a generative AI API or heavy reporting API might need a bit longer. But a design that waits indefinitely, or simply “for a long time,” is dangerous even before you consider circuit breakers. If you keep waiting on a slow dependency, your own application becomes slow too.

7. Convert HTTPX exceptions into your own exception types

HTTPX has a well-structured exception hierarchy including RequestError, HTTPStatusError, and TimeoutException.
Instead of letting those bubble upward as-is, it is better to convert them into your own application exceptions so you can handle them consistently. That also makes it easier to define circuit breaker and fallback conditions. HTTPX’s quickstart and exception docs show how to use raise_for_status() and handle those exceptions.

class ExternalAPIError(Exception):
    pass

class ExternalAPITimeoutError(ExternalAPIError):
    pass

class ExternalAPIUnavailableError(ExternalAPIError):
    pass

class ExternalAPIBadResponseError(ExternalAPIError):
    pass

import httpx

async def safe_request(client: httpx.AsyncClient, method: str, url: str, **kwargs) -> httpx.Response:
    try:
        response = await client.request(method, url, **kwargs)
        response.raise_for_status()
        return response
    except httpx.TimeoutException as exc:
        raise ExternalAPITimeoutError(str(exc)) from exc
    except httpx.HTTPStatusError as exc:
        raise ExternalAPIBadResponseError(str(exc)) from exc
    except httpx.RequestError as exc:
        raise ExternalAPIUnavailableError(str(exc)) from exc

Once your exceptions are organized at this level, it becomes much easier to express decisions like “open the breaker after this many timeouts” or “do not open the breaker on 4xx errors.”

8. Understand the breaker states: Closed / Open / Half-Open

A circuit breaker is easiest to reason about if you think in terms of these three states.
This is the common basic form used by many implementations and explanations, and it aligns with Martin Fowler’s description.

Closed
- Normal state. Calls to the external API are allowed
Open
- Failures have continued, so the breaker is now blocking calls. Requests fail immediately
Half-Open
- A small number of trial calls are allowed to check whether the dependency has recovered

Because of these three states, you avoid both extremes:
not “stop forever because it failed once,”
and not “wait for the full timeout every single time forever.”
Instead, you get a balanced recovery behavior.

9. A minimal implementation idea using PyBreaker

PyBreaker describes itself as a “Python implementation of the Circuit Breaker pattern.”
It also supports things like Redis-backed state storage, which is evident from its package description and common usage patterns.

Conceptually, you wrap risky calls with a breaker like this:

import pybreaker

billing_breaker = pybreaker.CircuitBreaker(
    fail_max=5,
    reset_timeout=30,
)

However, it is important to note that PyBreaker is most commonly used in synchronous call contexts.
If you want to use it directly in FastAPI with asynchronous HTTPX calls, you need to think a little more carefully about the surrounding implementation. In practice, one of these two approaches tends to work best.

Manage only the breaker state while wrapping async calls through a compatible abstraction
Start with a small custom “simple breaker” implementation that is easy to swap out later

In this article, to make the idea easier to understand in FastAPI, I will show the second approach first.

10. Designing a simple circuit breaker for FastAPI

If you want to start small, even a minimal self-built breaker can be very useful.
The core idea is this:

Count failures
Once the threshold is exceeded, move to Open for a fixed time
While Open, fail immediately
After the Open period expires, allow a trial request
If the trial succeeds, return to Closed

from dataclasses import dataclass
from datetime import datetime, timedelta, timezone

@dataclass
class SimpleCircuitBreaker:
    fail_max: int
    reset_timeout_sec: int
    failure_count: int = 0
    opened_at: datetime | None = None

    def is_open(self) -> bool:
        if self.opened_at is None:
            return False
        now = datetime.now(timezone.utc)
        return now < self.opened_at + timedelta(seconds=self.reset_timeout_sec)

    def allow_request(self) -> bool:
        return not self.is_open()

    def record_success(self) -> None:
        self.failure_count = 0
        self.opened_at = None

    def record_failure(self) -> None:
        self.failure_count += 1
        if self.failure_count >= self.fail_max:
            self.opened_at = datetime.now(timezone.utc)

This is not a complete production-ready implementation, but it is more than enough to understand the basic behavior of a circuit breaker.
Even just adding the idea that you should stop seriously hitting a dependency that keeps failing goes a long way toward preventing system-wide outages.

11. Put the breaker into the client layer

Next, use the simple breaker inside your external API client.

class CircuitOpenError(Exception):
    pass

class ShippingClient:
    def __init__(self, client, base_url: str, api_key: str, breaker: SimpleCircuitBreaker):
        self.client = client
        self.base_url = base_url.rstrip("/")
        self.api_key = api_key
        self.breaker = breaker

    async def get_quote(self, payload: dict) -> dict:
        if not self.breaker.allow_request():
            raise CircuitOpenError("shipping circuit is open")

        try:
            response = await safe_request(
                self.client,
                "POST",
                f"{self.base_url}/quotes",
                json=payload,
                headers={"Authorization": f"Bearer {self.api_key}"},
            )
            self.breaker.record_success()
            return response.json()
        except (ExternalAPITimeoutError, ExternalAPIUnavailableError):
            self.breaker.record_failure()
            raise

Once you do this, decisions like
“Which exceptions count as failures?”
or “Should HTTP 4xx open the breaker?”
can stay inside the client layer.

For example, if a 400 is caused by a user input mistake, that is not a dependency outage.
So in many cases, treating every HTTPStatusError as a breaker failure would not be the right design.

12. What is fallback? Decide what to do when you cannot call the dependency

A circuit breaker is a defensive mechanism that says “do not call it,” but fallback is the idea of deciding how your system should behave when it cannot call it.
Typical fallback options include:

Return the last successful cached result
Return a degraded response
Omit some information but still keep the screen or endpoint usable
Clearly say “currently unavailable” while offering a way to try again later
Stop doing the work synchronously and switch to background job submission

The Circuit Breaker pattern as described by Martin Fowler also assumes it will be combined with monitoring and operational behaviors, not just raw blocking logic. In microservice contexts it is often discussed together with things like timeouts and bulkheads, and fallback is the practical expression of that idea at the application level.

13. Fallback implementation example 1: return a cached response

One of the most practical forms of fallback is to return the most recent successful result for a limited time.

from dataclasses import dataclass
from datetime import datetime, timedelta, timezone

@dataclass
class CacheEntry:
    value: dict
    expires_at: datetime

class SimpleResponseCache:
    def __init__(self):
        self.data: dict[str, CacheEntry] = {}

    def get(self, key: str) -> dict | None:
        entry = self.data.get(key)
        if not entry:
            return None
        if datetime.now(timezone.utc) > entry.expires_at:
            return None
        return entry.value

    def set(self, key: str, value: dict, ttl_sec: int) -> None:
        self.data[key] = CacheEntry(
            value=value,
            expires_at=datetime.now(timezone.utc) + timedelta(seconds=ttl_sec),
        )

class ShippingClient:
    def __init__(self, client, base_url: str, api_key: str, breaker: SimpleCircuitBreaker, cache: SimpleResponseCache):
        self.client = client
        self.base_url = base_url.rstrip("/")
        self.api_key = api_key
        self.breaker = breaker
        self.cache = cache

    async def get_quote_with_fallback(self, payload: dict) -> dict:
        cache_key = f"quote:{payload.get('zip')}:{payload.get('weight')}"

        if not self.breaker.allow_request():
            cached = self.cache.get(cache_key)
            if cached is not None:
                return {"source": "cache", "data": cached}
            raise CircuitOpenError("shipping circuit is open")

        try:
            response = await safe_request(
                self.client,
                "POST",
                f"{self.base_url}/quotes",
                json=payload,
                headers={"Authorization": f"Bearer {self.api_key}"},
            )
            data = response.json()
            self.cache.set(cache_key, data, ttl_sec=60)
            self.breaker.record_success()
            return {"source": "live", "data": data}
        except (ExternalAPITimeoutError, ExternalAPIUnavailableError):
            self.breaker.record_failure()
            cached = self.cache.get(cache_key)
            if cached is not None:
                return {"source": "cache", "data": cached}
            raise

With this design, even if the shipping quote API is temporarily unhealthy, the screen can still remain usable if a recent successful result exists.
That said, you always need to decide carefully whether returning slightly stale data is acceptable in that specific use case.

14. Fallback implementation example 2: stop doing it synchronously and move to a job

Some types of external APIs are not a good fit for cache-based fallback.
For example, “report generation,” “heavy external aggregation,” or “large file conversion” may not complete quickly enough for synchronous request-response flows.

In those cases, a fallback can be to give up on synchronous completion and switch to an asynchronous job model.

During normal operation
- Call the external API during the request and return the result
During outages
- Return “accepted for processing” and retry later through a job

This is often kinder from a UX perspective than total failure, and it also protects your API better when the dependency is unhealthy.
FastAPI includes BackgroundTasks, which can be used for small post-response tasks. For heavier work, a dedicated job queue is a better fit.

15. The relationship with retries: adding a breaker does not make retries unnecessary

Adding a circuit breaker does not mean retries are no longer needed.
In practice, their roles are different.

Retry
- Absorb small temporary glitches
Circuit breaker
- Stop calling a dependency that is continuously failing
Fallback
- Decide what to return when you cannot call it

HTTPX transport-level retries can be used for ConnectError and ConnectTimeout. For broader retry behavior or exponential backoff, Tenacity is usually more suitable.

A practical mental model is something like this:

First, let HTTPX do a single connection retry
If that still fails, do limited retry with Tenacity
If failures continue, open the breaker
While the breaker is open, route requests to a fallback

This lets you build resilience gradually, without jumping immediately to something extreme.

16. Monitoring and metrics: a breaker is dangerous if nobody can see when it opens

Circuit breakers are useful, but they become dangerous if they open and nobody notices.
At minimum, it is good to make these kinds of metrics visible:

Success rate per external API
Timeout rate
Retry count
Number of times the breaker opens
Number of times fallback is used
Cache return rate

Martin Fowler’s description also notes that when a breaker opens, you typically want monitoring and alerting. In other words, a circuit breaker is not something you “just add and forget.” It belongs together with observability.

On the FastAPI side, if you are already using structured logs and metrics, leaving fields like circuit_state="open" or fallback="cache" makes later diagnosis much easier.

17. Logging design: record breaker state transitions

Your logs should capture not only external API errors themselves, but also changes in breaker state.

import logging

logger = logging.getLogger("circuit_breaker")

def log_breaker_open(name: str) -> None:
    logger.warning("circuit opened", extra={"circuit": name})

def log_breaker_closed(name: str) -> None:
    logger.info("circuit closed", extra={"circuit": name})

If you can see when the breaker opened, when it recovered from Half-Open, and when a response was served via fallback,
it becomes much easier to answer questions like,
“Why did the user receive a degraded response here?”

18. Testing strategy: the minimum cases worth protecting

Circuit breakers and fallbacks are fragile if you only test the happy path.
At minimum, these cases are worth testing:

The breaker opens after repeated timeouts
While Open, the external API is not actually called, and the system fails immediately or uses fallback
If fallback cache exists, it is returned
After recovery, a successful trial request returns the breaker to Closed
User-caused errors such as 4xx do not open the breaker

Code close to policy logic, such as record_failure() or allow_request(), becomes much easier to test if you keep it in small classes or nearly pure functions.

19. Common failure patterns

19.1 Adding only a breaker, but no timeouts

If one failure takes too long to be recognized, the whole system can become slow before the breaker has any chance to help. It is safer to set up HTTPX timeouts first.

19.2 Counting both 4xx and 5xx as the same kind of failure

If user mistakes or authorization errors also count toward the breaker, it can open even when the dependency itself is healthy.
Usually it is more natural to count only errors that truly represent dependency instability.

19.3 Using careless fallbacks that return stale data too freely

Returning cached values is powerful, but not every kind of data can safely be stale.
For things like payment status or inventory, freshness matters much more.

19.4 Letting the breaker open without anyone noticing

Without alerts or metrics, you can end up in a state where the feature has been quietly degraded for a long time. Fowler also treats breaker monitoring as an important part of the pattern.

19.5 Different rules for every client

If your shipping API, payment API, and notification API each behave in completely different ways, team operations become painful.
Exceptions are fine, but your basic pattern should still be consistent.

20. A roadmap by reader type

Individual developers and learners

First add a shared AsyncClient, timeouts, and connection limits
Convert external API exceptions into your own exceptions
Introduce a small simple breaker for just one client
Try cache-based fallback in one place
Leave the results in logs

Engineers in small teams

Inventory each external API by importance and freshness requirements
Build a shared client layer
Align timeout, retry, and breaker rules across the team
Turn breaker-open counts and fallback counts into metrics
Revisit write operations with idempotency and job-queue options in mind

SaaS teams and startups

Define per-dependency policies for “hard fail,” “degraded mode,” and “cache return”
Centralize circuit breaker and fallback logic in the client layer
Build audit logs, alerts, and dashboards
Run outage drills or chaos-style tests to verify behavior when dependencies go down
If needed, move toward Redis-backed shared breaker state or more production-grade implementations

Reference links

FastAPI
HTTPX
Circuit Breaker
- Martin Fowler: Circuit Breaker
- PyBreaker (PyPI)
Retry
- Tenacity Documentation

Conclusion

A circuit breaker is a very practical pattern for preventing your entire FastAPI application from being dragged down by repeatedly calling an external API that continues to fail. As Martin Fowler explains, opening the circuit after a failure threshold and stopping further calls helps prevent cascading damage.
In FastAPI, a realistic approach is to build on top of shared AsyncClient, timeout settings, connection limits, and exception conversion, and then gradually layer in retries, breakers, and fallbacks. HTTPX already provides much of the groundwork for this.
A fallback is not just “swallowing errors.” It is a design choice that protects both user experience and system stability through things like cached responses, degraded operation, or background-job handoff. In practice, circuit breakers are much stronger when designed together with fallbacks and monitoring.
You do not need a perfect implementation from day one. Even if you start with just one external API client, adding timeouts, exception conversion, a simple breaker, and a cache fallback will make the design much more tangible and easier to understand.

A natural next article after this would be something like “Design Patterns for Internal Admin APIs in FastAPI” or “Job Queue Design and Graceful Degradation in FastAPI.”

A Practical Introduction to Circuit Breakers and Fallback Design in FastAPI: Real-World Patterns for Preventing External API Failures from Becoming System-Wide Failures

Summary

Who will benefit from reading this

Individual developers and learners

Backend engineers in small teams

SaaS teams and startups

Accessibility notes

1. What is a circuit breaker?

2. Why this matters especially in FastAPI

3. The overall picture to understand first: timeout, retry, breaker, fallback

4. Start with the foundation: keep a shared AsyncClient in lifespan

5. Inject the shared client through a dependency

6. Explicitly define timeouts: the minimum defense before a breaker

7. Convert HTTPX exceptions into your own exception types

8. Understand the breaker states: Closed / Open / Half-Open

9. A minimal implementation idea using PyBreaker

10. Designing a simple circuit breaker for FastAPI

11. Put the breaker into the client layer

12. What is fallback? Decide what to do when you cannot call the dependency

13. Fallback implementation example 1: return a cached response

14. Fallback implementation example 2: stop doing it synchronously and move to a job

15. The relationship with retries: adding a breaker does not make retries unnecessary

16. Monitoring and metrics: a breaker is dangerous if nobody can see when it opens

17. Logging design: record breaker state transitions

18. Testing strategy: the minimum cases worth protecting

19. Common failure patterns

19.1 Adding only a breaker, but no timeouts

19.2 Counting both 4xx and 5xx as the same kind of failure

19.3 Using careless fallbacks that return stale data too freely

19.4 Letting the breaker open without anyone noticing

19.5 Different rules for every client

20. A roadmap by reader type

Individual developers and learners

Engineers in small teams

SaaS teams and startups

Reference links

Conclusion

Share this:

4. Start with the foundation: keep a shared `AsyncClient` in lifespan