green snake
Photo by Pixabay on Pexels.com

Blazing-Fast & Rock-Solid! A Complete FastAPI Caching Guide — Redis, HTTP Caching, ETag, Rate Limiting, and Compression


✅ TL;DR (Inverted Pyramid)

  • What you’ll achieve here
    Design caching in FastAPI that boosts perceived speed by running a two-tier strategy: app layer (Redis) and HTTP layer (Cache-Control / ETag / 304). We’ll also add rate limiting and GZip compression to balance performance and stability.
  • Main topics
    1. Caching basics and where to apply it (app / HTTP / data)
    2. Redis (async) key design, TTL, and invalidation
    3. HTTP caching: ETag / If-None-Match / Cache-Control with 304 responses
    4. FastAPI examples: caching paginated results, tag-based invalidation, partial updates & consistency
    5. Rate limiting (token bucket) and GZip compression in tandem
  • Benefits
    • Handles concurrency and peak traffic while cutting costs & latency
    • Flattens backend load (DB / external APIs)
    • Protocol-level reproducibility (304/headers) makes front-end & CDN cooperation straightforward

🎯 Who will get the most out of this? (Concrete personas)

  • Solo dev A (senior undergrad)
    Campus API gets suddenly slow under bursts. Wants to supercharge list endpoints using Redis + HTTP caching.
  • Contract dev B (3-person team)
    Client dashboard API sees per-second spikes. Wants rate limiting for protection and 304 responses to leverage front-end caching.
  • SaaS C (startup)
    Heavy dependence on external APIs with tight pricing and rate caps. Wants to cache fetch results intelligently to reduce cost and incidents.

♿ Accessibility Notes (Readability of this article)

  • Level: ~AA. Structured with headings/bullets; terms briefly defined on first use. Code is commented and monospaced.
  • Considerations: A summary up front to help screen-reader users grasp the flow; minimal runnable samples at key points.
  • Audience: Clear starting points for beginners; key design / consistency / invalidation for intermediates.

1. Where should caching kick in?

Think in three layers:

  1. HTTP layer (browser / CDN)
    • Use protocol features like Cache-Control, ETag, If-None-Match.
    • Great match for lists and detail fetches with low change rates. 304 Not Modified saves bandwidth.
  2. App layer (KVS like Redis)
    • Store computed results or external API responses on the server.
    • Flexible TTL, tag invalidation, and recomputation control.
  3. Data layer (e.g., DB materialized views)
    • Precompute aggregates; plan refresh cadence. (This article focuses on app/HTTP layers.)

Key takeaways

  • HTTP caching is “free optimization”: just set headers; browsers/CDNs do the work.
  • Redis is a “high-agency tool”: design invalidation with TTLs and tags.
  • Start with HTTP headers → Redis in that order. ♡

2. Sample setup & dependencies (assumptions)

2.1 Dependencies (example)

fastapi
uvicorn[standard]
redis>=5.0  # official client with asyncio support
pydantic
pydantic-settings

2.2 Settings class

# app/core/settings.py
from pydantic import Field
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    app_name: str = "FastAPI Cache Guide"
    redis_url: str = "redis://localhost:6379/0"
    default_ttl_seconds: int = 60  # default TTL

    class Config:
        env_file = ".env"
        extra = "ignore"

def get_settings() -> Settings:
    return Settings()

2.3 Redis client (async)

# app/core/redis_client.py
from redis import asyncio as aioredis
from app.core.settings import get_settings

redis: aioredis.Redis | None = None

async def connect_redis():
    global redis
    settings = get_settings()
    redis = aioredis.from_url(
        settings.redis_url,
        encoding="utf-8",
        decode_responses=True,
    )

async def disconnect_redis():
    global redis
    if redis:
        await redis.close()
        redis = None

Key takeaways

  • Use the official redis package’s async API.
  • Open/close a connection pool on app startup/shutdown.

3. HTTP caching essentials: ETag and 304

ETag is a “fingerprint” of the response body. The client sends If-None-Match: <etag> on subsequent requests. If the server decides content hasn’t changed, it can return 304 without a body.

3.1 Adding ETag and conditional GET

# app/utils/http_cache.py
import hashlib
from fastapi import Request, Response

def make_etag(payload: bytes) -> str:
    # Hash the body into an ETag
    return '"' + hashlib.sha256(payload).hexdigest() + '"'

def conditional_response(request: Request, response: Response, body_bytes: bytes):
    etag = make_etag(body_bytes)
    client_etag = request.headers.get("if-none-match")
    response.headers["ETag"] = etag
    # If it matches, return 304 and omit the body
    if client_etag == etag:
        response.status_code = 304
        response.body = b""
    else:
        response.body = body_bytes

3.2 Designing Cache-Control

  • Publicly cacheable (let CDN/shared caches store it): Cache-Control: public, max-age=60
  • User-specific (auth responses): Cache-Control: private, max-age=0, no-cache (lean on ETag)
  • Rarely changes: use a longer max-age. Ensure ETag changes when the backend updates.

Key takeaways

  • ETag + 304 saves bandwidth and time.
  • Cache-Control clarifies public/private and lifetime (max-age).
  • For authenticated responses, private is the default (keep out of shared caches).

4. App-layer caching: use Redis to “not compute”

4.1 Key design (robust patterns)

  • Key granularity: make a unique key from URL + query like <resource>:<path_hash>.
  • Versioning: prefix with v1: so you can invalidate everything on breaking changes.
  • Per-user: include user:<id> when auth is involved (protects privacy).
  • Tags: register keys into sets like tag:<category> to bulk-delete by tag.

4.2 Read/write/TTL/tag registration

# app/utils/app_cache.py
import json, hashlib
from typing import Any
from fastapi import Request
from app.core.redis_client import redis
from app.core.settings import get_settings

def cache_key_from_request(request: Request, prefix="v1:list") -> str:
    # Hash path + query
    raw = request.url.path + "?" + "&".join(sorted(request.query_params.multi_items()))
    return f"{prefix}:{hashlib.sha256(raw.encode()).hexdigest()}"

async def cache_get(key: str) -> bytes | None:
    if not redis:
        return None
    return await redis.get(key)

async def cache_set(key: str, value: bytes, ttl: int | None = None):
    if not redis:
        return
    settings = get_settings()
    await redis.set(key, value, ex=ttl or settings.default_ttl_seconds)

async def tag_add(tag: str, key: str):
    # Register key in the tag’s set
    if redis:
        await redis.sadd(f"tag:{tag}", key)

async def tag_invalidate(tag: str):
    if not redis:
        return 0
    tag_key = f"tag:{tag}"
    members = await redis.smembers(tag_key)
    if members:
        await redis.delete(*members)   # delete all at once
    await redis.delete(tag_key)
    return len(members or [])

Key takeaways

  • Think key = function inputs (path, query, user).
  • Tags enable category-wide invalidation.
  • Start TTL at ~60s to feel impact, then tune.

5. Implementing pagination caching (concrete example)

5.1 Router

# app/main.py
from fastapi import FastAPI, Request, Response, Query, HTTPException
from fastapi.middleware.gzip import GZipMiddleware
from app.core.settings import get_settings
from app.core.redis_client import connect_redis, disconnect_redis
from app.utils.http_cache import conditional_response
from app.utils.app_cache import cache_key_from_request, cache_get, cache_set, tag_add, tag_invalidate
import json

app = FastAPI(title="FastAPI Cache Demo")
app.add_middleware(GZipMiddleware, minimum_size=600)  # compress to save bandwidth (see below)

@app.on_event("startup")
async def on_startup():
    await connect_redis()

@app.on_event("shutdown")
async def on_shutdown():
    await disconnect_redis()

# Pseudo data source (use a DB in production)
FAKE_POSTS = [
    {"id": i, "title": f"post-{i}", "category": "tech" if i % 2 == 0 else "life"}
    for i in range(1, 501)
]

@app.get("/posts")
async def list_posts(
    request: Request,
    response: Response,
    limit: int = Query(20, ge=1, le=100),
    offset: int = Query(0, ge=0),
    category: str | None = Query(None)
):
    # 1) Try app-layer cache first
    key = cache_key_from_request(request, prefix="v1:posts")
    cached = await cache_get(key)
    if cached:
        # 2) Cache hit: run ETag/304 check to save bandwidth
        conditional_response(request, response, cached)
        response.headers["Cache-Control"] = "public, max-age=30"
        return Response(content=response.body, media_type="application/json", headers=response.headers)

    # 3) Build the data (DB query in prod)
    items = [p for p in FAKE_POSTS if (category is None or p["category"] == category)]
    total = len(items)
    page = items[offset: offset + limit]
    body = json.dumps({"total": total, "limit": limit, "offset": offset, "items": page}).encode()

    # 4) Save + tag-register (allow category-wide invalidation)
    await cache_set(key, body, ttl=60)
    if category:
        await tag_add(f"posts:cat:{category}", key)

    # 5) HTTP cache (ETag/304) and headers
    conditional_response(request, response, body)
    response.headers["Cache-Control"] = "public, max-age=30"
    return Response(content=response.body, media_type="application/json", headers=response.headers)

5.2 Invalidation (e.g., when an admin updates posts)

@app.post("/admin/posts/invalidate")
async def invalidate_posts(category: str):
    # Bulk-delete cache by category
    deleted = await tag_invalidate(f"posts:cat:{category}")
    return {"invalidated": deleted}

Key takeaways

  • Redis first → on hit, also apply ETag for 304 → both compute and bandwidth get lighter.
  • Keys are URL + query; category tags enable “invalidate in batches.”
  • GZipMiddleware adds compression to save network time.

6. Rate limiting (defensive side of caching)

Use a token bucket: refill tokens every interval, return 429 if exceeded. Redis atomic ops keep it safe under concurrency.

# app/utils/rate_limit.py
import time
from fastapi import HTTPException, status

# Example: 60 requests per minute per user/IP
async def token_bucket(redis, bucket_key: str, limit=60, refill_seconds=60):
    now = int(time.time())
    # Use a time window; reset automatically on window rollover
    window = now // refill_seconds
    key = f"ratelimit:{bucket_key}:{window}"
    current = await redis.incr(key)
    if current == 1:
        await redis.expire(key, refill_seconds)
    if current > limit:
        raise HTTPException(
            status_code=status.HTTP_429_TOO_MANY_REQUESTS,
            detail="Rate limit exceeded. Please try again later."
        )

Apply per-IP:

from app.utils.rate_limit import token_bucket
from app.core.redis_client import redis

@app.get("/protected")
async def protected(request: Request):
    ip = request.client.host
    await token_bucket(redis, bucket_key=f"ip:{ip}", limit=120, refill_seconds=60)
    return {"ok": True}

Key takeaways

  • Protects the backend from aggressive traffic.
  • Switch bucket keys between IP / user / API key.
  • Apply per endpoint as needed.

7. Partial updates & consistency: designing cache invalidation

Invalidation is the hard part. Practical guidance:

  1. Min-scope invalidation: delete only affected ranges (e.g., category change → that category tag set)
  2. Versioned prefixes: v1: → easy global switch on breaking changes
  3. Short TTL + overwrite: simpler ops by allowing slight staleness
  4. Write-time invalidation: POST/PUT/PATCH/DELETE must clear relevant tags
  5. Jobs: do bulk invalidation via async jobs (Celery, etc.)

Key takeaways

  • Express “what’s affected” with tags → surgical deletes.
  • Short TTL + tag invalidation is a realistic, low-trouble combo. ♡

8. Authenticated responses and caching: private / Vary

  • Responses with Authorization should not land in shared caches.
  • Consider Cache-Control: private, no-store (security first).
  • If you must cache per user, include the user ID in the key and send Vary: Authorization to signal intent downstream.

Key takeaways

  • Auth responses: private, ideally no-store.
  • If caching, isolate per-user strictly.

9. Compression (GZip) and header strategy

  • Use GZipMiddleware(minimum_size=600) to compress large bodies only.
  • JSON compresses well; helps with slow links.
  • Generate ETag from the post-compression body so it matches the actual bytes delivered (this article’s approach).

Key takeaways

  • Always include compression—it directly improves perceived speed.
  • Team must align on what ETag represents.

10. Common pitfalls and how to avoid them

Symptom Cause Fix
Stale data lingers Missed invalidation / TTL too long Enforce tag invalidation; start with short TTL
304 never triggers ETag mismatch Ensure identical responses yield identical ETags; mind pre/post compression
User data leaks Stored in shared caches Use Cache-Control: private; no-store when in doubt
Rate limiting doesn’t bite Bad bucket key design Choose IP / user / API key per requirement
Overreliance on Redis Global slowdown on outage Implement fallback (compute directly when Redis is down)

Key takeaways

  • Avoid the trio “stale / mixed / ineffective” via key design, tags, and headers.

11. Testing strategy (toward trustworthy caching)

  1. Functional tests:
    • Verify miss → set → hit transitions.
    • Send If-None-Match and confirm 304.
  2. Invalidation tests:
    • Update a post → invalidate tag → refetch returns fresh data.
  3. Load tests:
    • Spike list APIs and observe Redis hit ratio and backend load.
  4. Fallback:
    • Kill Redis connection; app should still serve (compute directly).
  5. Rate limiting:
    • Exceed limit → 429; confirm recovery after window rollover.

Key takeaways

  • The five pillars: functional, invalidation, load, fallback, and limits. ♡

12. Mini-sample: add only ETag to a detail API (shortest first step)

from fastapi import Request, Response, HTTPException
from app.utils.http_cache import conditional_response

POSTS = {1: {"id": 1, "title": "hello"}, 2: {"id": 2, "title": "world"}}

@app.get("/posts/{pid}")
async def get_post(pid: int, request: Request, response: Response):
    post = POSTS.get(pid)
    if not post:
        raise HTTPException(404)
    body = json.dumps(post).encode()
    conditional_response(request, response, body)
    response.headers["Cache-Control"] = "public, max-age=60"
    return Response(content=response.body, media_type="application/json", headers=response.headers)

Start here, then add Redis once you feel the gains.


13. Phased rollout roadmap

  1. Step 1: Add ETag + Cache-Control to details/lists.
  2. Step 2: Add Redis caching (~60s) to read-heavy lists.
  3. Step 3: Add tag invalidation by category/ID.
  4. Step 4: Add rate limiting and GZip for protection & bandwidth.
  5. Step 5: Load test → optimize TTL/keys/tags; verify fallback.

Key takeaways

  • Start small and expand gradually. Avoid early complexity.

14. Impact by reader type (more concrete)

  • Solo developers: Even just ETag delivers a big perceived boost. Adding Redis visibly reduces backend load.
  • Small teams: Tag invalidation + rate limiting lowers operational incidents, cutting late-night on-calls.
  • Growing SaaS: When CDN and HTTP caching click, bandwidth costs drop and peak-time availability rises.

Summary (Ship an API that’s “fast & light” starting today ♡)

  • HTTP caching (ETag/Cache-Control) is high ROI: low implementation effort, fast payoff.
  • Add Redis where it matters (lists, external API calls). Design keys as URL + query + user, keep consistency with tag invalidation.
  • Pair with rate limiting and GZip to optimize defense and bandwidth.
  • Try the minimal sample, then refine TTL and tags with metrics. Small steady gains grow an API that’s fast and hard to break. I’m cheering you on. ♡

By greeden

Leave a Reply

Your email address will not be published. Required fields are marked *

日本語が含まれない投稿は無視されますのでご注意ください。(スパム対策)