green snake
Photo by Pixabay on Pexels.com
Table of Contents

FastAPI Logging & Observability Starter Guide: Growing a Production-Ready API with Structured Logs, Metrics, and Tracing


Summary (Get the Big Picture First)

  • To make a FastAPI app truly “production-ready,” it’s important to build not only features but also logging, metrics, and health checks—the observability layer.
  • By combining Python’s standard logging with FastAPI middleware/dependencies, you can easily implement structured (JSON) logs and request-ID–tagged logs.
  • Collect metrics such as latency, request volume, and error rate, expose them to Prometheus, and visualize them in dashboards like Grafana.
  • Adding health checks (/health) and readiness checks (/ready) makes it easier to integrate with container orchestration and load balancers—and to automatically remove unhealthy instances.
  • This article provides a step-by-step roadmap and practical code examples, aimed at solo developers, small teams, and SaaS/startup teams.

Who Benefits from Reading This (Concrete Reader Personas)

Solo Developers / Learners

  • You’re building small APIs or services with FastAPI.
  • When errors occur, you mostly “just stare at uvicorn logs,” and root cause analysis takes too long.
  • You’ve heard of structured logs and metrics, but you’re not sure where to start.

For you, we’ll begin with standard logging + a simple middleware, then gradually improve the logging setup step by step.

Backend Engineers in Small Teams

  • You’re operating a FastAPI service in a team of 3–5 people and occasionally get dragged into production incident response.
  • You feel pain like “errors show up but it’s hard to know which user/request caused them” or “we only feel which endpoint is slow.”
  • You want to standardize “logging rules” and “monitoring items” across the team to reduce operational load.

For you, a practical template combining request-ID logs, structured JSON logs, metrics, and health checks is likely to be immediately useful.

SaaS Teams / Startups

  • Your user base has grown, and peak-time outages or latency regressions directly affect the business.
  • You operate FastAPI in multi-instance/container-orchestrated environments (e.g., Kubernetes), or plan to.
  • You want to set up the FastAPI side properly for log aggregation (ELK/Loki), metrics (Prometheus/Cloud Monitoring), and tracing (OpenTelemetry).

For you, we’ll organize design points from the angle of “what the app should emit” and “how to split responsibilities with infra monitoring.”


Accessibility Evaluation (Readability and Ease of Understanding)

  • The article uses an inverted-pyramid structure—“summary → personas → why observability matters → logging basics → structured logs & request IDs → metrics → health checks → roadmap”—so you can skim and still grasp the flow.
  • Technical terms (structured logs, metrics, tracing, latency, etc.) are briefly explained the first time they appear.
  • Code blocks are kept short and lightly commented to remain friendly for screen readers and read-aloud setups.
  • Headings (##, ###) are properly layered so skimming headings alone reveals the whole structure.

As a technical article, the text structure aims roughly at WCAG AA-level readability.


1. Why Logging & Monitoring Matter So Much

FastAPI makes it easy to build REST APIs quickly, but “it runs” and “it’s operable” are different problems.

1.1 Logs Are Essential for Bug Investigation and Incident Response

  • Which endpoint threw the error?
  • What request parameters were sent?
  • Which user action triggered it?

If you can’t answer these, investigations become “guesswork and grit.”
Good logs speed up root cause isolation and leave room to think about prevention instead of only firefighting.

1.2 The Real Cause of “Feels Slow” Is Usually Latency and Error Rate

When users say “it’s slow lately” or “it sometimes breaks,” it often means:

  • latency is high on a specific endpoint, or
  • error rates spike under certain conditions.

If you can see these as numbers via metrics, you can calmly evaluate:

  • how performance degrades under load, and
  • whether error rates increased after a release.

1.3 Three Core Elements of Healthy Service Operations

In this guide, we break healthy operations into:

  1. Logs
  2. Metrics
  3. Health checks / Tracing

…and walk through how to implement each from the FastAPI side.


2. Basic Integration: Python Standard logging + FastAPI

Let’s start with what you can do without adding special libraries.

2.1 Basic logging Setup

Python includes the logging module by default.
It’s handy to prepare a config file like the following for a FastAPI app.

# app/core/logging_config.py
import logging
from logging.config import dictConfig

LOGGING = {
    "version": 1,
    "disable_existing_loggers": False,
    # Output destination (console in this example)
    "handlers": {
        "default": {
            "level": "INFO",
            "class": "logging.StreamHandler",
            "formatter": "default",
        },
    },
    # Log format
    "formatters": {
        "default": {
            "format": "%(asctime)s %(levelname)s [%(name)s] %(message)s",
        },
    },
    # Root logger
    "root": {
        "level": "INFO",
        "handlers": ["default"],
    },
    # Align uvicorn loggers too
    "loggers": {
        "uvicorn": {"level": "INFO", "handlers": ["default"], "propagate": False},
        "uvicorn.error": {"level": "INFO", "handlers": ["default"], "propagate": False},
        "uvicorn.access": {"level": "INFO", "handlers": ["default"], "propagate": False},
        "app": {"level": "INFO", "handlers": ["default"], "propagate": False},
    },
}

def setup_logging():
    dictConfig(LOGGING)
    logging.getLogger("app").info("Logging is configured")

Load it at app startup.

# app/main.py
from fastapi import FastAPI
from app.core.logging_config import setup_logging
import logging

setup_logging()
logger = logging.getLogger("app")

app = FastAPI(title="Logging Example")

@app.on_event("startup")
async def on_startup():
    logger.info("Application startup")

@app.get("/ping")
def ping():
    logger.info("Ping endpoint called")
    return {"message": "pong"}

Even this alone consolidates uvicorn logs and your application logs into one consistent stream.

2.2 How to Use Loggers in Practice

Common real-world patterns include:

  • defining logger = logging.getLogger(__name__) per module
  • logging only important events at INFO or above
  • enabling DEBUG logs only in dev/staging environments
# app/domain/users/service.py
import logging

logger = logging.getLogger(__name__)

def create_user(email: str) -> int:
    logger.debug("Creating user %s", email)
    # Actual creation logic...
    user_id = 1
    logger.info("User created: id=%s, email=%s", user_id, email)
    return user_id

3. Make Logs “Traceable”: Structured Logs + Request IDs

Plain text logs work, but at scale you’ll hit issues like “which request is this?” and “hard to parse.”
That’s where structured logs and request IDs help.

3.1 What Are Structured Logs?

Structured logs record data as key/value fields rather than just free text.

Example:

  • Plain log:
    2025-01-01 00:00:00 INFO [app] User created: id=1 email=foo@example.com
  • Structured (JSON) log:
    {"time":"2025-01-01T00:00:00Z","level":"INFO","logger":"app.domain.users.service","msg":"User created","user_id":1,"email":"foo@example.com"}

With JSON logs:

  • log aggregation systems can parse fields easily
  • searching for user_id=1 becomes straightforward

3.2 A JSON Formatter for logging

Even with standard logging, you can write a JSON formatter.

# app/core/json_formatter.py
import json
import logging
from datetime import datetime, timezone

class JsonFormatter(logging.Formatter):
    def format(self, record: logging.LogRecord) -> str:
        log_record = {
            "time": datetime.fromtimestamp(record.created, tz=timezone.utc).isoformat(),
            "level": record.levelname,
            "logger": record.name,
            "message": record.getMessage(),
        }
        # Include fields passed via `extra`
        if hasattr(record, "request_id"):
            log_record["request_id"] = record.request_id
        if hasattr(record, "path"):
            log_record["path"] = record.path
        return json.dumps(log_record, ensure_ascii=False)

Wire it into the LOGGING config:

# Replace parts of logging_config.LOGGING
"formatters": {
    "json": {
        "()": "app.core.json_formatter.JsonFormatter",
    },
},
"handlers": {
    "default": {
        "level": "INFO",
        "class": "logging.StreamHandler",
        "formatter": "json",
    },
},

3.3 Middleware to Attach a Request ID

To trace “which logs came from which HTTP request,” add a request ID (trace ID).

# app/middleware/request_id.py
import uuid
import logging
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.requests import Request
from starlette.responses import Response

logger = logging.getLogger(__name__)

REQUEST_ID_HEADER = "X-Request-ID"

class RequestIDMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        # Reuse existing ID if provided by a reverse proxy, etc.
        request_id = request.headers.get(REQUEST_ID_HEADER, str(uuid.uuid4()))

        response: Response = await call_next(request)
        response.headers[REQUEST_ID_HEADER] = request_id

        logger.info(
            "Request completed",
            extra={
                "request_id": request_id,
                "path": request.url.path,
                "method": request.method,
                "status_code": response.status_code,
            },
        )
        return response

Attach it:

# app/main.py (excerpt)
from app.middleware.request_id import RequestIDMiddleware

app = FastAPI(title="Logging Example with Request ID")
app.add_middleware(RequestIDMiddleware)

If you also attach request_id when logging inside handlers/services, you can later group logs by a single request in your log tool.


4. Metrics: See Latency, Request Volume, and Error Rate as Numbers

Next: metrics—how you quantify “how slow” and “how broken.”

4.1 Which Metrics Should You Collect?

For API servers like FastAPI, these three are especially important:

  • Request volume (RPS: Requests Per Second)
  • Latency (especially P95/P99 tail latency)
  • Error rate (how many 5xx/4xx)

Also valuable when possible:

  • DB query time and count
  • External API call count and failure count

These make bottleneck discovery much easier.

4.2 Minimal Setup with Prometheus Client

Install:

pip install prometheus-client

Define metrics:

# app/core/metrics.py
import time
from prometheus_client import Counter, Histogram

REQUEST_COUNT = Counter(
    "http_requests_total",
    "Total HTTP requests",
    ["method", "path", "status_code"],
)

REQUEST_LATENCY = Histogram(
    "http_request_duration_seconds",
    "HTTP request latency (seconds)",
    ["method", "path"],
)

Measure in middleware:

# app/middleware/metrics.py
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.requests import Request
from starlette.responses import Response
from app.core.metrics import REQUEST_COUNT, REQUEST_LATENCY

class MetricsMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        method = request.method
        path = request.url.path

        with REQUEST_LATENCY.labels(method=method, path=path).time():
            response: Response = await call_next(request)

        REQUEST_COUNT.labels(
            method=method,
            path=path,
            status_code=response.status_code,
        ).inc()

        return response

Expose /metrics:

# app/main.py (excerpt)
from prometheus_client import generate_latest, CONTENT_TYPE_LATEST
from fastapi.responses import Response
from app.middleware.metrics import MetricsMiddleware

app.add_middleware(MetricsMiddleware)

@app.get("/metrics")
def metrics():
    data = generate_latest()
    return Response(content=data, media_type=CONTENT_TYPE_LATEST)

Now Prometheus (or similar) can scrape /metrics, and you can visualize in Grafana.

4.3 Using a Dedicated Library

If you want an easier setup, you can use prometheus-fastapi-instrumentator.

pip install prometheus-fastapi-instrumentator
from prometheus_fastapi_instrumentator import Instrumentator

app = FastAPI()

@app.on_event("startup")
async def _startup():
    Instrumentator().instrument(app).expose(app)

This can provide a baseline set of HTTP metrics without writing your own definitions and middleware.


5. Health and Readiness Checks: Automatically Remove Broken Instances

Health checks are essential in the container era.

5.1 Health vs. Readiness

  • Health check (Liveness):

    • “Is the process alive?”
    • If hung or crashed, /health fails and orchestration restarts it.
  • Readiness check:

    • “Is this instance ready to accept traffic?”
    • If the cache isn’t warmed or DB isn’t reachable, return NG temporarily and remove it from the load balancer.

5.2 A Simple Health Check

# app/api/health.py
from fastapi import APIRouter

router = APIRouter(tags=["health"])

@router.get("/health")
def health():
    return {"status": "ok"}

Include it:

# app/main.py (excerpt)
from app.api import health

app.include_router(health.router)

5.3 Readiness Check Including DB/External Dependencies

Example readiness endpoint checking DB connectivity:

# app/api/health.py (continued)
from sqlalchemy.orm import Session
from app.deps.db import get_db
from fastapi import Depends, HTTPException, status

@router.get("/ready")
def ready(db: Session = Depends(get_db)):
    try:
        # Lightweight check (e.g., SELECT 1)
        db.execute("SELECT 1")
    except Exception:
        raise HTTPException(
            status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
            detail="database not available",
        )
    return {"status": "ready"}

In Kubernetes, you can commonly configure:

  • livenessProbe → /health
  • readinessProbe → /ready

…and automate “restart if dead” and “remove from traffic while not ready.”


6. Tracing Entry Point: Follow the Flow per Request

As operations mature, you may want distributed tracing.
Here we won’t dive deep into full OpenTelemetry setup—just how to prepare from FastAPI.

6.1 Trace IDs and Spans

  • Trace ID: identifies an end-to-end user action (one request)
  • Span: a unit of work within it (API call, DB query, etc.)

Sending these to a tracing backend makes it possible to visualize where time is spent.

6.2 Start by Passing Request IDs via Headers

Extend the earlier RequestID approach so outbound calls also include X-Request-ID.
That makes later correlation easier when you add tracing infrastructure.

# app/infra/http_client.py
import httpx
from contextvars import ContextVar

request_id_var: ContextVar[str | None] = ContextVar("request_id", default=None)

async def get_client() -> httpx.AsyncClient:
    headers = {}
    request_id = request_id_var.get()
    if request_id:
        headers["X-Request-ID"] = request_id
    return httpx.AsyncClient(headers=headers, timeout=10.0)

Set it in the middleware:

# app/middleware/request_id.py (excerpt)
from app.infra.http_client import request_id_var

class RequestIDMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        request_id = request.headers.get(REQUEST_ID_HEADER, str(uuid.uuid4()))
        request_id_var.set(request_id)
        # ...

This helps “connect the dots” between your trace ID and logs in external services later.


7. Log Levels and Operational Rules: How Much Should You Log?

More logs aren’t always better. Too many logs reduce readability and increase storage/log-platform costs.

7.1 A Practical Guide to Log Levels

  • DEBUG: detailed debugging info (dev/staging)
  • INFO: important normal events (user created, order confirmed, batch done)—usually on in prod
  • WARNING: unexpected but not immediately fatal (retry succeeded, fallback used)
  • ERROR: failures likely affecting users; good for alert triggers
  • CRITICAL: severe service-wide issues (missing config, cannot start)

7.2 Security and Privacy Considerations

Avoid logging:

  • passwords, tokens, credit card numbers, and other secrets
  • unnecessary full personal data (address, phone number) at INFO level

If needed, consider anonymization, hashing, or masking for identifiers like emails/user IDs.


8. Roadmap by Persona: Improve Gradually

Here’s a realistic path by reader type.

8.1 For Solo Developers / Learners

  1. Add standard logging and define loggers per module.
  2. Add RequestIDMiddleware and output access-style logs with X-Request-ID.
  3. Add /health to make uptime checks easier.
  4. If you have time, try prometheus-fastapi-instrumentator and expose /metrics.

8.2 For Small Team Backend Engineers

  1. Share and standardize logging config, formats, and logger naming rules.
  2. Switch to structured JSON logs + request IDs; improve searchability in ELK/Loki, etc.
  3. Add Prometheus and dashboard latency/error rate/RPS.
  4. Add /ready with DB connectivity checks and wire it into orchestrator probes.

8.3 For SaaS Teams / Startups

  1. Define an observability policy across app + infra layers: who monitors what where.
  2. Consider OpenTelemetry; first establish trace propagation via request IDs/headers.
  3. Create alert rules (error rate, latency, etc.) and define “who gets notified at which threshold.”
  4. Build a unified dashboard combining logs, metrics, and traces—the shared screen used in releases and incidents.

9. Reference Links (For Going Deeper)

Note: these are general pointers—always check the latest official docs when adopting.

  • FastAPI official docs
    • Logging-related tips (Tips & Tricks / Advanced pages)
    • Middleware, Events, Dependencies are especially relevant to observability.
  • Prometheus ecosystem
    • Official prometheus_client docs
    • prometheus-fastapi-instrumentator GitHub repository
  • Structured logging
    • Examples of JSON logging with Python logging
    • Libraries like structlog and loguru (pick based on team preference)
  • Distributed tracing
    • OpenTelemetry Python SDK official docs
    • Backends like Jaeger / Tempo / Zipkin

Closing

FastAPI is lightweight and lets you ship an API quickly—which is exactly why it’s easy to run out of time to build observability once “it works.”

But by improving logging, metrics, and health checks step by step, you gain a stronger operational posture:

  • staying calm and systematic when incidents happen
  • explaining “feels slow” with numbers
  • sharing operational instincts more easily with new teammates

You don’t need to make everything perfect from day one.
Starting with a single logging setup—or even just adding /health—is already a meaningful step toward a more operable FastAPI.

I’m quietly rooting for your API to become a stable, long-loved service.


By greeden

Leave a Reply

Your email address will not be published. Required fields are marked *

日本語が含まれない投稿は無視されますのでご注意ください。(スパム対策)