Table of Contents

Blazing-Fast & Rock-Solid! A Complete FastAPI Caching Guide — Redis, HTTP Caching, ETag, Rate Limiting, and Compression

✅ TL;DR (Inverted Pyramid)

What you’ll achieve here
Design caching in FastAPI that boosts perceived speed by running a two-tier strategy: app layer (Redis) and HTTP layer (Cache-Control / ETag / 304). We’ll also add rate limiting and GZip compression to balance performance and stability.
Main topics
1. Caching basics and where to apply it (app / HTTP / data)
2. Redis (async) key design, TTL, and invalidation
3. HTTP caching: ETag / If-None-Match / Cache-Control with 304 responses
4. FastAPI examples: caching paginated results, tag-based invalidation, partial updates & consistency
5. Rate limiting (token bucket) and GZip compression in tandem
Benefits
- Handles concurrency and peak traffic while cutting costs & latency
- Flattens backend load (DB / external APIs)
- Protocol-level reproducibility (304/headers) makes front-end & CDN cooperation straightforward

🎯 Who will get the most out of this? (Concrete personas)

Solo dev A (senior undergrad)
Campus API gets suddenly slow under bursts. Wants to supercharge list endpoints using Redis + HTTP caching.
Contract dev B (3-person team)
Client dashboard API sees per-second spikes. Wants rate limiting for protection and 304 responses to leverage front-end caching.
SaaS C (startup)
Heavy dependence on external APIs with tight pricing and rate caps. Wants to cache fetch results intelligently to reduce cost and incidents.

♿ Accessibility Notes (Readability of this article)

Level: ~AA. Structured with headings/bullets; terms briefly defined on first use. Code is commented and monospaced.
Considerations: A summary up front to help screen-reader users grasp the flow; minimal runnable samples at key points.
Audience: Clear starting points for beginners; key design / consistency / invalidation for intermediates.

1. Where should caching kick in?

Think in three layers:

HTTP layer (browser / CDN)
- Use protocol features like Cache-Control, ETag, If-None-Match.
- Great match for lists and detail fetches with low change rates. 304 Not Modified saves bandwidth.
App layer (KVS like Redis)
- Store computed results or external API responses on the server.
- Flexible TTL, tag invalidation, and recomputation control.
Data layer (e.g., DB materialized views)
- Precompute aggregates; plan refresh cadence. (This article focuses on app/HTTP layers.)

Key takeaways

HTTP caching is “free optimization”: just set headers; browsers/CDNs do the work.

Redis is a “high-agency tool”: design invalidation with TTLs and tags.

Start with HTTP headers → Redis in that order. ♡

2. Sample setup & dependencies (assumptions)

2.1 Dependencies (example)

fastapi
uvicorn[standard]
redis>=5.0  # official client with asyncio support
pydantic
pydantic-settings

2.2 Settings class

# app/core/settings.py
from pydantic import Field
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    app_name: str = "FastAPI Cache Guide"
    redis_url: str = "redis://localhost:6379/0"
    default_ttl_seconds: int = 60  # default TTL

    class Config:
        env_file = ".env"
        extra = "ignore"

def get_settings() -> Settings:
    return Settings()

2.3 Redis client (async)

# app/core/redis_client.py
from redis import asyncio as aioredis
from app.core.settings import get_settings

redis: aioredis.Redis | None = None

async def connect_redis():
    global redis
    settings = get_settings()
    redis = aioredis.from_url(
        settings.redis_url,
        encoding="utf-8",
        decode_responses=True,
    )

async def disconnect_redis():
    global redis
    if redis:
        await redis.close()
        redis = None

Key takeaways

Use the official redis package’s async API.

Open/close a connection pool on app startup/shutdown.

3. HTTP caching essentials: `ETag` and 304

ETag is a “fingerprint” of the response body. The client sends If-None-Match: <etag> on subsequent requests. If the server decides content hasn’t changed, it can return 304 without a body.

3.1 Adding ETag and conditional GET

# app/utils/http_cache.py
import hashlib
from fastapi import Request, Response

def make_etag(payload: bytes) -> str:
    # Hash the body into an ETag
    return '"' + hashlib.sha256(payload).hexdigest() + '"'

def conditional_response(request: Request, response: Response, body_bytes: bytes):
    etag = make_etag(body_bytes)
    client_etag = request.headers.get("if-none-match")
    response.headers["ETag"] = etag
    # If it matches, return 304 and omit the body
    if client_etag == etag:
        response.status_code = 304
        response.body = b""
    else:
        response.body = body_bytes

3.2 Designing `Cache-Control`

Publicly cacheable (let CDN/shared caches store it): Cache-Control: public, max-age=60
User-specific (auth responses): Cache-Control: private, max-age=0, no-cache (lean on ETag)
Rarely changes: use a longer max-age. Ensure ETag changes when the backend updates.

Key takeaways

ETag + 304 saves bandwidth and time.

Cache-Control clarifies public/private and lifetime (max-age).

For authenticated responses, private is the default (keep out of shared caches).

4. App-layer caching: use Redis to “not compute”

4.1 Key design (robust patterns)

Key granularity: make a unique key from URL + query like <resource>:<path_hash>.
Versioning: prefix with v1: so you can invalidate everything on breaking changes.
Per-user: include user:<id> when auth is involved (protects privacy).
Tags: register keys into sets like tag:<category> to bulk-delete by tag.

4.2 Read/write/TTL/tag registration

# app/utils/app_cache.py
import json, hashlib
from typing import Any
from fastapi import Request
from app.core.redis_client import redis
from app.core.settings import get_settings

def cache_key_from_request(request: Request, prefix="v1:list") -> str:
    # Hash path + query
    raw = request.url.path + "?" + "&".join(sorted(request.query_params.multi_items()))
    return f"{prefix}:{hashlib.sha256(raw.encode()).hexdigest()}"

async def cache_get(key: str) -> bytes | None:
    if not redis:
        return None
    return await redis.get(key)

async def cache_set(key: str, value: bytes, ttl: int | None = None):
    if not redis:
        return
    settings = get_settings()
    await redis.set(key, value, ex=ttl or settings.default_ttl_seconds)

async def tag_add(tag: str, key: str):
    # Register key in the tag’s set
    if redis:
        await redis.sadd(f"tag:{tag}", key)

async def tag_invalidate(tag: str):
    if not redis:
        return 0
    tag_key = f"tag:{tag}"
    members = await redis.smembers(tag_key)
    if members:
        await redis.delete(*members)   # delete all at once
    await redis.delete(tag_key)
    return len(members or [])

Key takeaways

Think key = function inputs (path, query, user).

Tags enable category-wide invalidation.

Start TTL at ~60s to feel impact, then tune.

5. Implementing pagination caching (concrete example)

5.1 Router

# app/main.py
from fastapi import FastAPI, Request, Response, Query, HTTPException
from fastapi.middleware.gzip import GZipMiddleware
from app.core.settings import get_settings
from app.core.redis_client import connect_redis, disconnect_redis
from app.utils.http_cache import conditional_response
from app.utils.app_cache import cache_key_from_request, cache_get, cache_set, tag_add, tag_invalidate
import json

app = FastAPI(title="FastAPI Cache Demo")
app.add_middleware(GZipMiddleware, minimum_size=600)  # compress to save bandwidth (see below)

@app.on_event("startup")
async def on_startup():
    await connect_redis()

@app.on_event("shutdown")
async def on_shutdown():
    await disconnect_redis()

# Pseudo data source (use a DB in production)
FAKE_POSTS = [
    {"id": i, "title": f"post-{i}", "category": "tech" if i % 2 == 0 else "life"}
    for i in range(1, 501)
]

@app.get("/posts")
async def list_posts(
    request: Request,
    response: Response,
    limit: int = Query(20, ge=1, le=100),
    offset: int = Query(0, ge=0),
    category: str | None = Query(None)
):
    # 1) Try app-layer cache first
    key = cache_key_from_request(request, prefix="v1:posts")
    cached = await cache_get(key)
    if cached:
        # 2) Cache hit: run ETag/304 check to save bandwidth
        conditional_response(request, response, cached)
        response.headers["Cache-Control"] = "public, max-age=30"
        return Response(content=response.body, media_type="application/json", headers=response.headers)

    # 3) Build the data (DB query in prod)
    items = [p for p in FAKE_POSTS if (category is None or p["category"] == category)]
    total = len(items)
    page = items[offset: offset + limit]
    body = json.dumps({"total": total, "limit": limit, "offset": offset, "items": page}).encode()

    # 4) Save + tag-register (allow category-wide invalidation)
    await cache_set(key, body, ttl=60)
    if category:
        await tag_add(f"posts:cat:{category}", key)

    # 5) HTTP cache (ETag/304) and headers
    conditional_response(request, response, body)
    response.headers["Cache-Control"] = "public, max-age=30"
    return Response(content=response.body, media_type="application/json", headers=response.headers)

5.2 Invalidation (e.g., when an admin updates posts)

@app.post("/admin/posts/invalidate")
async def invalidate_posts(category: str):
    # Bulk-delete cache by category
    deleted = await tag_invalidate(f"posts:cat:{category}")
    return {"invalidated": deleted}

Key takeaways

Redis first → on hit, also apply ETag for 304 → both compute and bandwidth get lighter.

Keys are URL + query; category tags enable “invalidate in batches.”

GZipMiddleware adds compression to save network time.

6. Rate limiting (defensive side of caching)

Use a token bucket: refill tokens every interval, return 429 if exceeded. Redis atomic ops keep it safe under concurrency.

# app/utils/rate_limit.py
import time
from fastapi import HTTPException, status

# Example: 60 requests per minute per user/IP
async def token_bucket(redis, bucket_key: str, limit=60, refill_seconds=60):
    now = int(time.time())
    # Use a time window; reset automatically on window rollover
    window = now // refill_seconds
    key = f"ratelimit:{bucket_key}:{window}"
    current = await redis.incr(key)
    if current == 1:
        await redis.expire(key, refill_seconds)
    if current > limit:
        raise HTTPException(
            status_code=status.HTTP_429_TOO_MANY_REQUESTS,
            detail="Rate limit exceeded. Please try again later."
        )

Apply per-IP:

from app.utils.rate_limit import token_bucket
from app.core.redis_client import redis

@app.get("/protected")
async def protected(request: Request):
    ip = request.client.host
    await token_bucket(redis, bucket_key=f"ip:{ip}", limit=120, refill_seconds=60)
    return {"ok": True}

Key takeaways

Protects the backend from aggressive traffic.

Switch bucket keys between IP / user / API key.

Apply per endpoint as needed.

7. Partial updates & consistency: designing cache invalidation

Invalidation is the hard part. Practical guidance:

Min-scope invalidation: delete only affected ranges (e.g., category change → that category tag set)
Versioned prefixes: v1: → easy global switch on breaking changes
Short TTL + overwrite: simpler ops by allowing slight staleness
Write-time invalidation: POST/PUT/PATCH/DELETE must clear relevant tags
Jobs: do bulk invalidation via async jobs (Celery, etc.)

Key takeaways

Express “what’s affected” with tags → surgical deletes.

Short TTL + tag invalidation is a realistic, low-trouble combo. ♡

8. Authenticated responses and caching: `private` / `Vary`

Responses with Authorization should not land in shared caches.
Consider Cache-Control: private, no-store (security first).
If you must cache per user, include the user ID in the key and send Vary: Authorization to signal intent downstream.

Key takeaways

Auth responses: private, ideally no-store.

If caching, isolate per-user strictly.

9. Compression (GZip) and header strategy

Use GZipMiddleware(minimum_size=600) to compress large bodies only.
JSON compresses well; helps with slow links.
Generate ETag from the post-compression body so it matches the actual bytes delivered (this article’s approach).

Key takeaways

Always include compression—it directly improves perceived speed.

Team must align on what ETag represents.

10. Common pitfalls and how to avoid them

Symptom	Cause	Fix
Stale data lingers	Missed invalidation / TTL too long	Enforce tag invalidation; start with short TTL
304 never triggers	ETag mismatch	Ensure identical responses yield identical ETags; mind pre/post compression
User data leaks	Stored in shared caches	Use `Cache-Control: private`; `no-store` when in doubt
Rate limiting doesn’t bite	Bad bucket key design	Choose IP / user / API key per requirement
Overreliance on Redis	Global slowdown on outage	Implement fallback (compute directly when Redis is down)

Key takeaways

Avoid the trio “stale / mixed / ineffective” via key design, tags, and headers.

11. Testing strategy (toward trustworthy caching)

Functional tests:
- Verify miss → set → hit transitions.
- Send If-None-Match and confirm 304.
Invalidation tests:
- Update a post → invalidate tag → refetch returns fresh data.
Load tests:
- Spike list APIs and observe Redis hit ratio and backend load.
Fallback:
- Kill Redis connection; app should still serve (compute directly).
Rate limiting:
- Exceed limit → 429; confirm recovery after window rollover.

Key takeaways

The five pillars: functional, invalidation, load, fallback, and limits. ♡

12. Mini-sample: add only `ETag` to a detail API (shortest first step)

from fastapi import Request, Response, HTTPException
from app.utils.http_cache import conditional_response

POSTS = {1: {"id": 1, "title": "hello"}, 2: {"id": 2, "title": "world"}}

@app.get("/posts/{pid}")
async def get_post(pid: int, request: Request, response: Response):
    post = POSTS.get(pid)
    if not post:
        raise HTTPException(404)
    body = json.dumps(post).encode()
    conditional_response(request, response, body)
    response.headers["Cache-Control"] = "public, max-age=60"
    return Response(content=response.body, media_type="application/json", headers=response.headers)

Start here, then add Redis once you feel the gains.

13. Phased rollout roadmap

Step 1: Add ETag + Cache-Control to details/lists.
Step 2: Add Redis caching (~60s) to read-heavy lists.
Step 3: Add tag invalidation by category/ID.
Step 4: Add rate limiting and GZip for protection & bandwidth.
Step 5: Load test → optimize TTL/keys/tags; verify fallback.

Key takeaways

Start small and expand gradually. Avoid early complexity.

14. Impact by reader type (more concrete)

Solo developers: Even just ETag delivers a big perceived boost. Adding Redis visibly reduces backend load.
Small teams: Tag invalidation + rate limiting lowers operational incidents, cutting late-night on-calls.
Growing SaaS: When CDN and HTTP caching click, bandwidth costs drop and peak-time availability rises.

Summary (Ship an API that’s “fast & light” starting today ♡)

HTTP caching (ETag/Cache-Control) is high ROI: low implementation effort, fast payoff.
Add Redis where it matters (lists, external API calls). Design keys as URL + query + user, keep consistency with tag invalidation.
Pair with rate limiting and GZip to optimize defense and bandwidth.
Try the minimal sample, then refine TTL and tags with metrics. Small steady gains grow an API that’s fast and hard to break. I’m cheering you on. ♡

Blazing-Fast & Rock-Solid! A Complete FastAPI Caching Guide — Redis, HTTP Caching, ETag, Rate Limiting, and Compression

Blazing-Fast & Rock-Solid! A Complete FastAPI Caching Guide — Redis, HTTP Caching, ETag, Rate Limiting, and Compression

✅ TL;DR (Inverted Pyramid)

🎯 Who will get the most out of this? (Concrete personas)

♿ Accessibility Notes (Readability of this article)

1. Where should caching kick in?

2. Sample setup & dependencies (assumptions)

2.1 Dependencies (example)

2.2 Settings class

2.3 Redis client (async)

3. HTTP caching essentials: `ETag` and 304

3.1 Adding ETag and conditional GET

3.2 Designing `Cache-Control`

4. App-layer caching: use Redis to “not compute”

4.1 Key design (robust patterns)

4.2 Read/write/TTL/tag registration

5. Implementing pagination caching (concrete example)

5.1 Router

5.2 Invalidation (e.g., when an admin updates posts)

6. Rate limiting (defensive side of caching)

7. Partial updates & consistency: designing cache invalidation

8. Authenticated responses and caching: `private` / `Vary`

9. Compression (GZip) and header strategy

10. Common pitfalls and how to avoid them

11. Testing strategy (toward trustworthy caching)

12. Mini-sample: add only `ETag` to a detail API (shortest first step)

13. Phased rollout roadmap

14. Impact by reader type (more concrete)

Summary (Ship an API that’s “fast & light” starting today ♡)

By greeden

Leave a Reply Cancel reply

You Missed

The Real Impact of an OpenAI IPO — Timing, Valuation, Governance, and the Outlook for Pricing & Procurement (October 2025 Edition)

[Complete Explanation] Toyota Production System (TPS): Eliminating Waste While Improving Quality and Speed — Implementation Guide for Just-in-Time / Jidoka / Andon / Kanban / A3

World News Roundup for October 30, 2025: U.S.–China “limited truce” framework and U.S. president orders resumption of nuclear testing, widening damage from Hurricane Melissa, Day 30 of the U.S. government shutdown pushes food aid to the brink — Markets see gold rebound, oil flat

[Class Report] Introduction to System Development — Week 32: Semester Wrap-Up and Individual Feedback

Blazing-Fast & Rock-Solid! A Complete FastAPI Caching Guide — Redis, HTTP Caching, ETag, Rate Limiting, and Compression

✅ TL;DR (Inverted Pyramid)

🎯 Who will get the most out of this? (Concrete personas)

♿ Accessibility Notes (Readability of this article)

1. Where should caching kick in?

2. Sample setup & dependencies (assumptions)

2.1 Dependencies (example)

2.2 Settings class

2.3 Redis client (async)

3. HTTP caching essentials: ETag and 304

3.1 Adding ETag and conditional GET

3.2 Designing Cache-Control

4. App-layer caching: use Redis to “not compute”

4.1 Key design (robust patterns)

4.2 Read/write/TTL/tag registration

5. Implementing pagination caching (concrete example)

5.1 Router

5.2 Invalidation (e.g., when an admin updates posts)

6. Rate limiting (defensive side of caching)

7. Partial updates & consistency: designing cache invalidation

8. Authenticated responses and caching: private / Vary

9. Compression (GZip) and header strategy

10. Common pitfalls and how to avoid them

11. Testing strategy (toward trustworthy caching)

12. Mini-sample: add only ETag to a detail API (shortest first step)

13. Phased rollout roadmap

14. Impact by reader type (more concrete)

Summary (Ship an API that’s “fast & light” starting today ♡)

Share this:

By greeden

Related Post

Leave a Reply Cancel reply

You Missed

3. HTTP caching essentials: `ETag` and 304

3.2 Designing `Cache-Control`

8. Authenticated responses and caching: `private` / `Vary`

12. Mini-sample: add only `ETag` to a detail API (shortest first step)