Blazing-Fast & Rock-Solid! A Complete FastAPI Caching Guide — Redis, HTTP Caching, ETag, Rate Limiting, and Compression
✅ TL;DR (Inverted Pyramid)
- What you’ll achieve here
Design caching in FastAPI that boosts perceived speed by running a two-tier strategy: app layer (Redis) and HTTP layer (Cache-Control
/ETag
/ 304). We’ll also add rate limiting and GZip compression to balance performance and stability. - Main topics
- Caching basics and where to apply it (app / HTTP / data)
- Redis (async) key design, TTL, and invalidation
- HTTP caching:
ETag
/If-None-Match
/Cache-Control
with 304 responses - FastAPI examples: caching paginated results, tag-based invalidation, partial updates & consistency
- Rate limiting (token bucket) and GZip compression in tandem
- Benefits
- Handles concurrency and peak traffic while cutting costs & latency
- Flattens backend load (DB / external APIs)
- Protocol-level reproducibility (304/headers) makes front-end & CDN cooperation straightforward
🎯 Who will get the most out of this? (Concrete personas)
- Solo dev A (senior undergrad)
Campus API gets suddenly slow under bursts. Wants to supercharge list endpoints using Redis + HTTP caching. - Contract dev B (3-person team)
Client dashboard API sees per-second spikes. Wants rate limiting for protection and 304 responses to leverage front-end caching. - SaaS C (startup)
Heavy dependence on external APIs with tight pricing and rate caps. Wants to cache fetch results intelligently to reduce cost and incidents.
♿ Accessibility Notes (Readability of this article)
- Level: ~AA. Structured with headings/bullets; terms briefly defined on first use. Code is commented and monospaced.
- Considerations: A summary up front to help screen-reader users grasp the flow; minimal runnable samples at key points.
- Audience: Clear starting points for beginners; key design / consistency / invalidation for intermediates.
1. Where should caching kick in?
Think in three layers:
- HTTP layer (browser / CDN)
- Use protocol features like
Cache-Control
,ETag
,If-None-Match
. - Great match for lists and detail fetches with low change rates. 304 Not Modified saves bandwidth.
- Use protocol features like
- App layer (KVS like Redis)
- Store computed results or external API responses on the server.
- Flexible TTL, tag invalidation, and recomputation control.
- Data layer (e.g., DB materialized views)
- Precompute aggregates; plan refresh cadence. (This article focuses on app/HTTP layers.)
Key takeaways
- HTTP caching is “free optimization”: just set headers; browsers/CDNs do the work.
- Redis is a “high-agency tool”: design invalidation with TTLs and tags.
- Start with HTTP headers → Redis in that order. ♡
2. Sample setup & dependencies (assumptions)
2.1 Dependencies (example)
fastapi
uvicorn[standard]
redis>=5.0 # official client with asyncio support
pydantic
pydantic-settings
2.2 Settings class
# app/core/settings.py
from pydantic import Field
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
app_name: str = "FastAPI Cache Guide"
redis_url: str = "redis://localhost:6379/0"
default_ttl_seconds: int = 60 # default TTL
class Config:
env_file = ".env"
extra = "ignore"
def get_settings() -> Settings:
return Settings()
2.3 Redis client (async)
# app/core/redis_client.py
from redis import asyncio as aioredis
from app.core.settings import get_settings
redis: aioredis.Redis | None = None
async def connect_redis():
global redis
settings = get_settings()
redis = aioredis.from_url(
settings.redis_url,
encoding="utf-8",
decode_responses=True,
)
async def disconnect_redis():
global redis
if redis:
await redis.close()
redis = None
Key takeaways
- Use the official
redis
package’s async API.- Open/close a connection pool on app startup/shutdown.
3. HTTP caching essentials: ETag
and 304
ETag is a “fingerprint” of the response body. The client sends If-None-Match: <etag>
on subsequent requests. If the server decides content hasn’t changed, it can return 304 without a body.
3.1 Adding ETag and conditional GET
# app/utils/http_cache.py
import hashlib
from fastapi import Request, Response
def make_etag(payload: bytes) -> str:
# Hash the body into an ETag
return '"' + hashlib.sha256(payload).hexdigest() + '"'
def conditional_response(request: Request, response: Response, body_bytes: bytes):
etag = make_etag(body_bytes)
client_etag = request.headers.get("if-none-match")
response.headers["ETag"] = etag
# If it matches, return 304 and omit the body
if client_etag == etag:
response.status_code = 304
response.body = b""
else:
response.body = body_bytes
3.2 Designing Cache-Control
- Publicly cacheable (let CDN/shared caches store it):
Cache-Control: public, max-age=60
- User-specific (auth responses):
Cache-Control: private, max-age=0, no-cache
(lean on ETag) - Rarely changes: use a longer
max-age
. Ensure ETag changes when the backend updates.
Key takeaways
- ETag + 304 saves bandwidth and time.
Cache-Control
clarifies public/private and lifetime (max-age).- For authenticated responses, private is the default (keep out of shared caches).
4. App-layer caching: use Redis to “not compute”
4.1 Key design (robust patterns)
- Key granularity: make a unique key from URL + query like
<resource>:<path_hash>
. - Versioning: prefix with
v1:
so you can invalidate everything on breaking changes. - Per-user: include
user:<id>
when auth is involved (protects privacy). - Tags: register keys into sets like
tag:<category>
to bulk-delete by tag.
4.2 Read/write/TTL/tag registration
# app/utils/app_cache.py
import json, hashlib
from typing import Any
from fastapi import Request
from app.core.redis_client import redis
from app.core.settings import get_settings
def cache_key_from_request(request: Request, prefix="v1:list") -> str:
# Hash path + query
raw = request.url.path + "?" + "&".join(sorted(request.query_params.multi_items()))
return f"{prefix}:{hashlib.sha256(raw.encode()).hexdigest()}"
async def cache_get(key: str) -> bytes | None:
if not redis:
return None
return await redis.get(key)
async def cache_set(key: str, value: bytes, ttl: int | None = None):
if not redis:
return
settings = get_settings()
await redis.set(key, value, ex=ttl or settings.default_ttl_seconds)
async def tag_add(tag: str, key: str):
# Register key in the tag’s set
if redis:
await redis.sadd(f"tag:{tag}", key)
async def tag_invalidate(tag: str):
if not redis:
return 0
tag_key = f"tag:{tag}"
members = await redis.smembers(tag_key)
if members:
await redis.delete(*members) # delete all at once
await redis.delete(tag_key)
return len(members or [])
Key takeaways
- Think key = function inputs (path, query, user).
- Tags enable category-wide invalidation.
- Start TTL at ~60s to feel impact, then tune.
5. Implementing pagination caching (concrete example)
5.1 Router
# app/main.py
from fastapi import FastAPI, Request, Response, Query, HTTPException
from fastapi.middleware.gzip import GZipMiddleware
from app.core.settings import get_settings
from app.core.redis_client import connect_redis, disconnect_redis
from app.utils.http_cache import conditional_response
from app.utils.app_cache import cache_key_from_request, cache_get, cache_set, tag_add, tag_invalidate
import json
app = FastAPI(title="FastAPI Cache Demo")
app.add_middleware(GZipMiddleware, minimum_size=600) # compress to save bandwidth (see below)
@app.on_event("startup")
async def on_startup():
await connect_redis()
@app.on_event("shutdown")
async def on_shutdown():
await disconnect_redis()
# Pseudo data source (use a DB in production)
FAKE_POSTS = [
{"id": i, "title": f"post-{i}", "category": "tech" if i % 2 == 0 else "life"}
for i in range(1, 501)
]
@app.get("/posts")
async def list_posts(
request: Request,
response: Response,
limit: int = Query(20, ge=1, le=100),
offset: int = Query(0, ge=0),
category: str | None = Query(None)
):
# 1) Try app-layer cache first
key = cache_key_from_request(request, prefix="v1:posts")
cached = await cache_get(key)
if cached:
# 2) Cache hit: run ETag/304 check to save bandwidth
conditional_response(request, response, cached)
response.headers["Cache-Control"] = "public, max-age=30"
return Response(content=response.body, media_type="application/json", headers=response.headers)
# 3) Build the data (DB query in prod)
items = [p for p in FAKE_POSTS if (category is None or p["category"] == category)]
total = len(items)
page = items[offset: offset + limit]
body = json.dumps({"total": total, "limit": limit, "offset": offset, "items": page}).encode()
# 4) Save + tag-register (allow category-wide invalidation)
await cache_set(key, body, ttl=60)
if category:
await tag_add(f"posts:cat:{category}", key)
# 5) HTTP cache (ETag/304) and headers
conditional_response(request, response, body)
response.headers["Cache-Control"] = "public, max-age=30"
return Response(content=response.body, media_type="application/json", headers=response.headers)
5.2 Invalidation (e.g., when an admin updates posts)
@app.post("/admin/posts/invalidate")
async def invalidate_posts(category: str):
# Bulk-delete cache by category
deleted = await tag_invalidate(f"posts:cat:{category}")
return {"invalidated": deleted}
Key takeaways
- Redis first → on hit, also apply ETag for 304 → both compute and bandwidth get lighter.
- Keys are URL + query; category tags enable “invalidate in batches.”
GZipMiddleware
adds compression to save network time.
6. Rate limiting (defensive side of caching)
Use a token bucket: refill tokens every interval, return 429 if exceeded. Redis atomic ops keep it safe under concurrency.
# app/utils/rate_limit.py
import time
from fastapi import HTTPException, status
# Example: 60 requests per minute per user/IP
async def token_bucket(redis, bucket_key: str, limit=60, refill_seconds=60):
now = int(time.time())
# Use a time window; reset automatically on window rollover
window = now // refill_seconds
key = f"ratelimit:{bucket_key}:{window}"
current = await redis.incr(key)
if current == 1:
await redis.expire(key, refill_seconds)
if current > limit:
raise HTTPException(
status_code=status.HTTP_429_TOO_MANY_REQUESTS,
detail="Rate limit exceeded. Please try again later."
)
Apply per-IP:
from app.utils.rate_limit import token_bucket
from app.core.redis_client import redis
@app.get("/protected")
async def protected(request: Request):
ip = request.client.host
await token_bucket(redis, bucket_key=f"ip:{ip}", limit=120, refill_seconds=60)
return {"ok": True}
Key takeaways
- Protects the backend from aggressive traffic.
- Switch bucket keys between IP / user / API key.
- Apply per endpoint as needed.
7. Partial updates & consistency: designing cache invalidation
Invalidation is the hard part. Practical guidance:
- Min-scope invalidation: delete only affected ranges (e.g., category change → that category tag set)
- Versioned prefixes:
v1:
→ easy global switch on breaking changes - Short TTL + overwrite: simpler ops by allowing slight staleness
- Write-time invalidation: POST/PUT/PATCH/DELETE must clear relevant tags
- Jobs: do bulk invalidation via async jobs (Celery, etc.)
Key takeaways
- Express “what’s affected” with tags → surgical deletes.
- Short TTL + tag invalidation is a realistic, low-trouble combo. ♡
8. Authenticated responses and caching: private
/ Vary
- Responses with Authorization should not land in shared caches.
- Consider
Cache-Control: private, no-store
(security first). - If you must cache per user, include the user ID in the key and send
Vary: Authorization
to signal intent downstream.
Key takeaways
- Auth responses: private, ideally no-store.
- If caching, isolate per-user strictly.
9. Compression (GZip) and header strategy
- Use
GZipMiddleware(minimum_size=600)
to compress large bodies only. - JSON compresses well; helps with slow links.
- Generate ETag from the post-compression body so it matches the actual bytes delivered (this article’s approach).
Key takeaways
- Always include compression—it directly improves perceived speed.
- Team must align on what ETag represents.
10. Common pitfalls and how to avoid them
Symptom | Cause | Fix |
---|---|---|
Stale data lingers | Missed invalidation / TTL too long | Enforce tag invalidation; start with short TTL |
304 never triggers | ETag mismatch | Ensure identical responses yield identical ETags; mind pre/post compression |
User data leaks | Stored in shared caches | Use Cache-Control: private ; no-store when in doubt |
Rate limiting doesn’t bite | Bad bucket key design | Choose IP / user / API key per requirement |
Overreliance on Redis | Global slowdown on outage | Implement fallback (compute directly when Redis is down) |
Key takeaways
- Avoid the trio “stale / mixed / ineffective” via key design, tags, and headers.
11. Testing strategy (toward trustworthy caching)
- Functional tests:
- Verify miss → set → hit transitions.
- Send
If-None-Match
and confirm 304.
- Invalidation tests:
- Update a post → invalidate tag → refetch returns fresh data.
- Load tests:
- Spike list APIs and observe Redis hit ratio and backend load.
- Fallback:
- Kill Redis connection; app should still serve (compute directly).
- Rate limiting:
- Exceed limit → 429; confirm recovery after window rollover.
Key takeaways
- The five pillars: functional, invalidation, load, fallback, and limits. ♡
12. Mini-sample: add only ETag
to a detail API (shortest first step)
from fastapi import Request, Response, HTTPException
from app.utils.http_cache import conditional_response
POSTS = {1: {"id": 1, "title": "hello"}, 2: {"id": 2, "title": "world"}}
@app.get("/posts/{pid}")
async def get_post(pid: int, request: Request, response: Response):
post = POSTS.get(pid)
if not post:
raise HTTPException(404)
body = json.dumps(post).encode()
conditional_response(request, response, body)
response.headers["Cache-Control"] = "public, max-age=60"
return Response(content=response.body, media_type="application/json", headers=response.headers)
Start here, then add Redis once you feel the gains.
13. Phased rollout roadmap
- Step 1: Add ETag + Cache-Control to details/lists.
- Step 2: Add Redis caching (~60s) to read-heavy lists.
- Step 3: Add tag invalidation by category/ID.
- Step 4: Add rate limiting and GZip for protection & bandwidth.
- Step 5: Load test → optimize TTL/keys/tags; verify fallback.
Key takeaways
- Start small and expand gradually. Avoid early complexity.
14. Impact by reader type (more concrete)
- Solo developers: Even just ETag delivers a big perceived boost. Adding Redis visibly reduces backend load.
- Small teams: Tag invalidation + rate limiting lowers operational incidents, cutting late-night on-calls.
- Growing SaaS: When CDN and HTTP caching click, bandwidth costs drop and peak-time availability rises.
Summary (Ship an API that’s “fast & light” starting today ♡)
- HTTP caching (ETag/Cache-Control) is high ROI: low implementation effort, fast payoff.
- Add Redis where it matters (lists, external API calls). Design keys as URL + query + user, keep consistency with tag invalidation.
- Pair with rate limiting and GZip to optimize defense and bandwidth.
- Try the minimal sample, then refine TTL and tags with metrics. Small steady gains grow an API that’s fast and hard to break. I’m cheering you on. ♡