FastAPI Performance Tuning & Caching Strategy 101: A Practical Recipe for Growing a Slow API into a “Lightweight, Fast API”
A quick big-picture overview (summary)
- In most cases, what slows down FastAPI isn’t the framework itself, but design-side issues like DB queries, the N+1 problem, missing/insufficient indexes, and a lack of caching.
- Start by getting async endpoints right and by covering SQLAlchemy 2.x connection pooling and query optimization so you can reduce wasted waiting time.
- Then combine HTTP cache headers (
Cache-Control,ETag) withfastapi-cache2, Redis, and related tools to cache responses and function results—so you stop repeating expensive work. - By stacking “FastAPI-specific micro-optimizations” (Uvicorn worker count, JSON response choice, middleware review, etc.), you gradually get an API that’s lighter, faster, and easier to operate.
- Finally, this guide includes a roadmap for “where to start,” and explains the impact for learners, small teams, and SaaS/startup teams.
Who benefits from reading this (concrete personas)
1) Solo developers / learners
- You run a FastAPI app on Heroku, a VPS, or a small cloud setup.
- Users and data have grown a bit, and you sometimes feel responses are “kind of sluggish.”
- You think, “I made it async, so it should be fast… right?” but you’re not fully sure.
For you, this guide focuses on what to suspect first and what order to improve things, in steps that are practical and low-friction.
2) Backend engineers on a small team
- A 3–5 person team builds a FastAPI + SQLAlchemy API, and as features pile up, latency slowly worsens.
- You want to deal with N+1 queries and missing indexes, but you’re unsure where to start.
- You want caching, but you’re overwhelmed by choices like HTTP caching, Redis, and
fastapi-cache2.
For you, this guide organizes DB + caching strategy into a three-layer model you can share as a baseline inside your team.
3) SaaS teams / startups
- You already have meaningful users/traffic, and peak-time latency & throughput directly affect the business.
- You’ve begun using Redis/CDNs, but deciding “how much caching should live at the API layer?” is tricky.
- You’re thinking ahead to multi-instance scaling or Kubernetes—and want a design you won’t regret later.
For you, this guide helps with finding bottlenecks, practical SQLAlchemy + caching patterns, and a multi-layer caching strategy including HTTP caching.
Accessibility self-check (readability & clarity)
- Structure: first the “how to think about bottlenecks,” then deeper dives in order: DB optimization → caching strategy → FastAPI-specific tuning → roadmap.
- Terminology: key terms (N+1, connection pooling, HTTP cache headers) get short first-use explanations.
- Code: shorter blocks with minimal comments to reduce visual scanning cost.
- Target: mainly “post-tutorial intermediate” readers, while keeping sections independently readable.
Overall, it aims for the kind of clarity and structure you’d expect around WCAG AA for technical writing.
1) The real causes of performance issues: start by questioning “what is slow”
The first thing to stress is that “FastAPI is slow” is rarer than you might think.
Common bottlenecks look like:
- DB layer
- N+1 queries (looping and triggering a SELECT each time)
- Missing or insufficient indexes
- Fetching unnecessary columns/tables
- Network layer
- Calling external APIs sequentially
- Weak timeout/retry strategy leading to snowballing waits
- Application layer
- No caching, so the same expensive computation repeats
- CPU-heavy work (image processing, large JSON shaping, etc.) stuck in a single worker
FastAPI-specific tweaks (JSON class changes, workers tuning) usually start paying off after you’ve addressed these root causes.
2) Async endpoints and SQLAlchemy basics for tuning
2.1 The “must-respect line” for async/await
FastAPI supports both sync (def) and async (async def) endpoints.
But “making everything async” doesn’t automatically make it fast.
- I/O-bound work (DB, external APIs, file I/O) benefits greatly from async
- CPU-bound work (image transforms, heavy math) hits limits due to the GIL; threads/async won’t magically scale it—process separation (e.g., Celery) is often better
When writing async endpoints, always confirm that the libraries you call are also async-capable. If you call a synchronous SQLAlchemy session repeatedly from async def, throughput gains may be smaller than expected.
2.2 SQLAlchemy 2.x + connection pooling
Opening a new DB connection every time can be a huge overhead.
With SQLAlchemy, you can configure the connection pool via create_engine():
# app/infra/db.py
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
DATABASE_URL = "postgresql+psycopg://user:pass@localhost:5432/app"
engine = create_engine(
DATABASE_URL,
pool_size=20, # baseline concurrency
max_overflow=0, # don't grow beyond this (tune by environment)
)
SessionLocal = sessionmaker(bind=engine, autoflush=False, autocommit=False)
Then in FastAPI, reuse sessions via a dependency:
# app/deps/db.py
from app.infra.db import SessionLocal
def get_db():
db = SessionLocal()
try:
yield db
finally:
db.close()
This “pool + dependency” pattern is a solid baseline for most FastAPI + SQLAlchemy apps.
2.3 N+1 queries and indexes
The N+1 problem is a classic ORM trap:
- Example: loop through users and access
.posts, triggering an extra SELECT per user - Fix: eager load using
selectinload()/joinedload(), or rewrite with explicit JOINs
Also, if the columns used in WHERE clauses or JOINs lack indexes, queries can slow dramatically as tables grow. Check DB execution plans (EXPLAIN) and add the right indexes.
3) Caching fundamentals: what to cache, and where
Now for caching strategy.
3.1 Think in “three cache layers”
Caching can be roughly divided by “where it lives”:
-
Client/CDN/proxy side (HTTP cache headers)
- Browsers/CDNs reuse responses
- Controlled by
Cache-Control,ETag, etc.
-
Application side (memory/Redis, etc.)
- “Close” to FastAPI
- Cache function results or query results
-
DB side
- Materialized views or precomputed tables for heavy aggregates
This guide focuses mainly on (1) and (2).
3.2 What you should NOT cache (or must treat carefully)
- Putting personal data or session-specific info into a shared cache is dangerous
- Use
Cache-Control: privateor include user-specific keys (e.g.,user_id) on the app side - Separate “safe-for-everyone” data from user-specific data
4) Handling HTTP cache headers in FastAPI
Start with HTTP caching—often high impact with low implementation cost, making it a great first move.
4.1 Add Cache-Control
For example, an endpoint whose ranking can be cached for 60 seconds:
# app/api/ranking.py
from fastapi import APIRouter, Response
router = APIRouter(prefix="/ranking", tags=["ranking"])
@router.get("")
def get_ranking(response: Response):
# imagine this is computed from DB
data = {"items": ["A", "B", "C"]}
# cacheable for 60 seconds (including intermediate proxies)
response.headers["Cache-Control"] = "public, max-age=60"
return data
Now browsers/CDNs can reuse the response for a short time.
4.2 ETag and conditional requests
A common pattern: “If unchanged, return 304 Not Modified.”
import hashlib
import json
from fastapi import Request, Response
@router.get("/popular")
def get_popular_items(request: Request, response: Response):
data = {"items": ["A", "B", "C"]}
body = json.dumps(data, sort_keys=True).encode("utf-8")
etag = hashlib.sha1(body).hexdigest()
inm = request.headers.get("if-none-match")
if inm == etag:
return Response(status_code=304)
response.headers["ETag"] = etag
response.headers["Cache-Control"] = "public, max-age=120"
return data
In production, you’ll often compute ETags from DB version fields or updated timestamps.
5) Add application-side caching with fastapi-cache2 + Redis
When HTTP caching alone isn’t enough, add an application-layer cache using Redis. Here’s an example using fastapi-cache2.
5.1 Install
pip install fastapi-cache2 redis
Run Redis (example with Docker):
docker run -d --name redis -p 6379:6379 redis:7
5.2 Initialization code
# app/core/cache.py
from fastapi_cache import FastAPICache
from fastapi_cache.backends.redis import RedisBackend
import redis.asyncio as redis
async def init_cache():
client = redis.from_url("redis://localhost:6379/0", encoding="utf8", decode_responses=True)
FastAPICache.init(
backend=RedisBackend(client),
prefix="fastapi-cache:",
)
# app/main.py
from fastapi import FastAPI
from app.core.cache import init_cache
app = FastAPI(title="FastAPI Cache Example")
@app.on_event("startup")
async def on_startup():
await init_cache()
5.3 Cache an endpoint (or function)
The @cache() decorator makes it easy to cache results:
from fastapi import APIRouter
from fastapi_cache.decorator import cache
router = APIRouter(prefix="/articles", tags=["articles"])
@router.get("")
@cache(expire=60) # cache for 60 seconds
async def list_articles():
# assume this is a heavy DB query
return [{"id": 1, "title": "Hello"}, {"id": 2, "title": "World"}]
First call stores the result in Redis; subsequent calls return quickly from Redis for 60 seconds.
5.4 Add user-specific cache keys
You can customize the key via namespace or key_builder. Helpful for per-user caching:
from fastapi import Depends, Request
from fastapi_cache.decorator import cache
from app.deps.auth import get_current_user
def user_cache_key_builder(func, *args, **kwargs):
request = kwargs.get("request")
user = kwargs.get("current_user")
return f"user:{user.id}:path:{request.url.path}"
@router.get("/me")
@cache(expire=30, key_builder=user_cache_key_builder)
async def get_my_dashboard(
request: Request,
current_user=Depends(get_current_user),
):
# heavy per-user aggregation
...
In real systems, be careful about argument handling, key collisions, and invalidation.
6) Multi-layer caching and how to think about TTL and invalidation
Caching isn’t just “add cache.” The key is “how long is it valid?” and “how do we invalidate it?”
6.1 How to choose TTL
- Data changes often, but a small delay is OK
- short TTL (seconds to minutes): rankings, trending, etc.
- Data rarely changes
- longer TTL (minutes to hours): master data
- Data must never be stale
- don’t cache, or use
no-cache(must revalidate) on HTTP caching
- don’t cache, or use
Start by caching “things that can be slightly stale” first.
6.2 Invalidation patterns
- Let it expire naturally (TTL-based)
- Delete on update events (e.g., article update clears only that article’s cache)
- Versioned keys (
v1:ranking→ switch tov2:rankingwhen updating)
With complex Redis usage, share a team-wide cache key naming convention so it stays maintainable.
7) FastAPI-specific tuning points
After design-level fixes, micro-optimizations become worthwhile.
7.1 Uvicorn worker count
You can run multiple processes with --workers:
uvicorn app.main:app \
--host 0.0.0.0 \
--port 8000 \
--workers 4
- Start around
#CPU coresto2× #CPU cores - If workloads are truly heavy, consider offloading to Celery or similar rather than only raising workers
7.2 Choosing a JSON response class
For large JSON outputs, swapping response classes can help:
from fastapi.responses import ORJSONResponse
app = FastAPI(default_response_class=ORJSONResponse)
orjson is known for speed; if JSON serialization is CPU-heavy, this can be worthwhile.
7.3 Middleware count and order
Too many middlewares add overhead per request.
- Keep only what you truly need
- If a middleware is heavy, limit it to certain routes or a sub-application
8) Measurement and verification: track improvements in numbers
Performance tuning needs measurement so you don’t end up with “it feels faster.”
- Local/staging:
- Load test with
ab,hey,wrk, etc. - Track P95/P99 latency, error rate, and RPS changes
- Load test with
- Production:
- Visualize latency and throughput with Prometheus + Grafana
- Ideally also track “cache hit rate” and “DB query count”
Try to verify changes like: “P95 went from X ms to Y ms after caching.”
9) Impact by reader type and where to start
9.1 For solo devs / learners
- Start with HTTP cache headers (
Cache-Control) and SQLAlchemy connection pooling. - Then try
fastapi-cache2on heavy read-only APIs and feel how much work caching removes. - Even at tens of users, perceived “snappiness” and system “headroom” can improve a lot.
9.2 For small-team engineers
- When aligning as a team, it’s easier to talk in three steps:
- Fix N+1 and missing indexes
- Set connection pooling properly
- Add caching (HTTP + Redis)
- Use logs/metrics to find the bottleneck first, then prioritize the biggest wins.
9.3 For SaaS teams / startups
- Assume a multi-layer strategy (HTTP/Redis/DB) and document “what goes where.”
- Before adding more Redis/CDN complexity, you can often see massive gains by improving query quality, schema design, and N+1 fixes.
- Then combine
fastapi-cache2and/or a custom cache layer to scale smoothly.
10) A step-by-step roadmap (so you can proceed gradually)
-
Measure current state
- Measure latency and RPS for representative endpoints
- Log DB query count and external API call count
-
Review DB and queries
- Fix N+1 and missing indexes
- Tune connection pooling
-
Introduce HTTP cache headers
- Add
Cache-Controlto read-only endpoints that can be slightly stale - Consider
ETagand/orLast-Modifiedwhen needed
- Add
-
Add application caching
- Cache specific endpoints/functions via Redis +
fastapi-cache2 - Define TTLs, key design, and invalidation strategy
- Cache specific endpoints/functions via Redis +
-
FastAPI-specific tuning
- Review Uvicorn workers, response class, middleware setup
- If CPU is the bottleneck, consider offloading to Celery or separate processes
-
Continuous observation and tuning in production
- Iterate: improve → measure → improve
- Update cache strategy and DB design as your product grows
Further reading (if you want to go deeper)
Note: These reflect the state at the time of writing; check each source for the latest info.
-
Performance / overall optimization
-
SQLAlchemy / DB optimization
-
Caching (HTTP / Redis / FastAPI)
- Caching in FastAPI: Unlocking High-Performance Development
- Caching Strategies for FastAPI: Redis, In-Memory, and HTTP Cache Headers
- fastapi-cache2 (PyPI)
- fastapi-cache GitHub repository
- FastAPI Caching at Scale: What Worked for Me (and What Didn’t)
- Implement Caching in FastAPI (HTTP headers)
- FastAPI + HTTP Caching with Stale-While-Revalidate
-
Redis + FastAPI
-
HTTP caching in general
Closing
Performance tuning and caching strategy can feel “hard” or “hard to change later.”
But in practice, even small steps—starting from heavily-used endpoints and data that can be slightly stale—can drastically improve perceived speed.
- First, find the bottleneck
- Next, fix DB/query issues
- Then combine HTTP caching and Redis caching so heavy work isn’t repeated
By progressing through these three, your FastAPI app can become a much “lighter and more reliable partner.”
Try one step at a time at your own pace.
I’ll be quietly rooting for your API to keep running smoothly.

