green snake
Photo by Pixabay on Pexels.com

FastAPI Observability Practical Guide: Grow an API You Can “See” with Structured Logs, Metrics, Traces, and Error Monitoring


Summary (The big picture first)

  • In production FastAPI operations, what’s scarier than bugs is not knowing what’s happening. The key to avoiding that is observability—combining logs, metrics, traces, and error monitoring.
  • Output logs as structured logs (JSON) so you can track request IDs, users, and important business events.
  • Use Prometheus plus tools like prometheus-fastapi-instrumentator to visualize latency, error rate, and throughput as numbers.
  • Use OpenTelemetry plus opentelemetry-instrumentation-fastapi to visualize the execution path of a single request end-to-end, so bottlenecks and external calls are obvious.
  • Integrate Sentry (or similar) into FastAPI to automatically collect stack traces, request details, and user context when exceptions occur.

Who benefits from reading this (Concrete personas)

  • Learner A (solo dev / side projects)
    Running FastAPI on Heroku or a VPS, but struggling with “it sometimes crashes and I don’t know why” or “logs are scattered so I can’t trace anything.”
    → By adding structured logging + Sentry, you aim to at least reach a state where “when an error occurs, I’m notified and can investigate the root cause.”

  • Small team B (3–5 person dev shop)
    Building small-to-medium business systems with FastAPI, but incident investigations always take time.
    → With Prometheus metrics + OpenTelemetry traces, you can quickly identify “which endpoint got slow and when,” and “which external API is the bottleneck.”

  • SaaS dev C (startup)
    Shipping features with frequent deployments and wants a “release-without-fear” posture.
    → Design logs/metrics/traces/error monitoring together, and set up dashboards to detect performance regressions and rising errors in real time.


1. Organizing the three pillars of observability

First, align on “what to observe and how.”

1.1 The three pillars

  1. Logs

    • Text-based records: exceptions, business events, debug information.
    • Flexible in format, but unless you structure them (JSON), later aggregation/search becomes painful.
  2. Metrics

    • Time-series numeric data: RPS, latency, error rate, CPU, memory, etc.
    • Commonly exposed in Prometheus format and graphed in Grafana.
  3. Traces

    • Records of “a single request’s journey,” including which services/steps it passed through and how long each took.
    • Often built with OpenTelemetry + Jaeger / Tempo / Application Insights, etc.

Complementing these is error monitoring (e.g., Sentry):

  • Automatically collects stack traces, request headers, user context, and sends notifications when exceptions occur.

2. Logs: Structured logging and request IDs

2.1 Applying Python logging basics to FastAPI

FastAPI doesn’t come with its own logging system—it typically uses Python’s standard logging module combined with Uvicorn’s logging config.

Start by preparing a formatter that outputs JSON.

# app/core/logging.py
import json
import logging
import sys
from typing import Any

class JsonFormatter(logging.Formatter):
    def format(self, record: logging.LogRecord) -> str:
        payload: dict[str, Any] = {
            "level": record.levelname,
            "logger": record.name,
            "msg": record.getMessage(),
        }
        if hasattr(record, "request_id"):
            payload["request_id"] = record.request_id
        if record.exc_info:
            payload["exc_info"] = self.formatException(record.exc_info)
        return json.dumps(payload, ensure_ascii=False)

def setup_logging(level: str = "INFO") -> None:
    handler = logging.StreamHandler(sys.stdout)
    handler.setFormatter(JsonFormatter())
    root = logging.getLogger()
    root.handlers[:] = [handler]
    root.setLevel(level)

2.2 Middleware to attach a request ID

To make logs easier to follow, attach a request-scoped ID so you can correlate all logs from the same request.

# app/middleware/request_id.py
import uuid
from starlette.types import ASGIApp, Receive, Scope, Send

class RequestIDMiddleware:
    def __init__(self, app: ASGIApp):
        self.app = app

    async def __call__(self, scope: Scope, receive: Receive, send: Send):
        if scope["type"] != "http":
            return await self.app(scope, receive, send)

        request_id = str(uuid.uuid4())
        scope["state"] = scope.get("state", {})
        scope["state"]["request_id"] = request_id

        async def send_wrapper(message):
            if message["type"] == "http.response.start":
                headers = [(b"x-request-id", request_id.encode())]
                message.setdefault("headers", []).extend(headers)
            await send(message)

        await self.app(scope, receive, send_wrapper)

2.3 Propagating the request ID through the logger

Receive request via a FastAPI dependency and prepare a helper that includes request_id in log output.

# app/deps/logging.py
import logging
from fastapi import Request

def get_logger(request: Request) -> logging.LoggerAdapter:
    base = logging.getLogger("app")
    request_id = getattr(request.state, "request_id", None)
    return logging.LoggerAdapter(base, extra={"request_id": request_id})
# app/main.py
from fastapi import FastAPI, Depends
from app.core.logging import setup_logging
from app.middleware.request_id import RequestIDMiddleware
from app.deps.logging import get_logger

setup_logging()

app = FastAPI(title="Observable API")
app.add_middleware(RequestIDMiddleware)

@app.get("/hello")
def hello(logger = Depends(get_logger)):
    logger.info("hello endpoint called")
    return {"message": "hello"}

Now your logs become JSON that includes request_id, and clients also receive an X-Request-ID response header. Once your log aggregation (Loki, Cloud Logging, etc.) can search by this ID, incident investigation becomes far easier.


3. Metrics: Watch latency and error rates with Prometheus

3.1 Prometheus metrics basics

For metrics, these three (the “RED metrics”) are especially important:

  • Rate: request count (RPS)
  • Errors: error ratio (4xx/5xx)
  • Duration: latency (P50/P95/P99)

In FastAPI, prometheus-fastapi-instrumentator makes it easy to measure and export them.

3.2 Install and basic setup

pip install prometheus-fastapi-instrumentator prometheus-client
# app/metrics.py
from prometheus_fastapi_instrumentator import Instrumentator

def setup_metrics(app):
    Instrumentator().instrument(app).expose(app, endpoint="/metrics")
# app/main.py
from fastapi import FastAPI
from app.metrics import setup_metrics

app = FastAPI(title="Observable API")
setup_metrics(app)

This enables Prometheus-format metrics at /metrics.

Example:

  • http_request_duration_seconds_bucket{method="GET",path="/hello",status="2xx",le="0.1"} 42
  • http_requests_total{method="GET",path="/hello",status="2xx"} 100

3.3 Multiprocess support with Gunicorn, etc.

With multiprocess setups like Gunicorn + UvicornWorker, you must enable the Prometheus client’s multiprocess mode. prometheus-fastapi-instrumentator supports this by setting prometheus_multiproc_dir.

export prometheus_multiproc_dir=./metrics
gunicorn app.main:app -w 4 -k uvicorn.workers.UvicornWorker

In real operations, Prometheus scrapes /metrics, and Grafana visualizes it via dashboards.


4. Tracing: Follow a request’s journey with OpenTelemetry

Metrics tell you “where it tends to be slow,” but tracing helps answer “why was this particular request slow?”

4.1 OpenTelemetry instrumentation for FastAPI

OpenTelemetry provides auto-instrumentation for FastAPI via opentelemetry-instrumentation-fastapi.

pip install \
  opentelemetry-sdk \
  opentelemetry-exporter-otlp \
  opentelemetry-instrumentation-fastapi \
  opentelemetry-instrumentation-logging \
  opentelemetry-instrumentation-requests \
  opentelemetry-instrumentation-httpx

4.2 Minimal example (OTLP export)

# app/otel.py
from opentelemetry import trace
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

def setup_tracing(service_name: str = "fastapi-app"):
    resource = Resource(attributes={
        SERVICE_NAME: service_name,
    })
    provider = TracerProvider(resource=resource)
    processor = BatchSpanProcessor(OTLPSpanExporter())
    provider.add_span_processor(processor)
    trace.set_tracer_provider(provider)
# app/main.py
from fastapi import FastAPI
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.logging import LoggingInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor

from app.otel import setup_tracing

setup_tracing(service_name="observable-fastapi")

app = FastAPI(title="Observable API")

FastAPIInstrumentor.instrument_app(app)
LoggingInstrumentor().instrument()
RequestsInstrumentor().instrument()
HTTPXClientInstrumentor().instrument()

The OTLP exporter is a common protocol used to send traces to OpenTelemetry Collector or various APMs (Grafana Tempo, Jaeger, Azure Monitor, Datadog, etc.).

4.3 What you’ll be able to see

  • Each request generates a root span, with tags like endpoint name and status code
  • DB queries and external HTTP calls (requests/httpx) appear as child spans
  • Dashboards show visually where each request spent time (in ms)

This makes it easy to spot issues like “this endpoint is slow only during certain periods,” or “timeouts are frequently happening against this external API.”


5. Error monitoring: Don’t miss exceptions with Sentry

5.1 Sentry SDK integration with FastAPI

Sentry provides an SDK and integration guides for Python/FastAPI.

Install:

pip install "sentry-sdk[fastapi]"

Basic initialization is simple:

# app/sentry_setup.py
import sentry_sdk

def setup_sentry(dsn: str, env: str = "dev"):
    sentry_sdk.init(
        dsn=dsn,
        enable_tracing=True,   # Enable performance monitoring
        traces_sample_rate=0.1, # Adjust sampling as needed
        environment=env,
    )
# app/main.py
from fastapi import FastAPI
from sentry_sdk.integrations.fastapi import FastApiIntegration
from app.sentry_setup import setup_sentry

setup_sentry(dsn="https://examplePublicKey@o0.ingest.sentry.io/0", env="prod")

app = FastAPI(title="Observable API", lifespan=None)

# The sentry-sdk FastAPI integration is enabled automatically at init time (follow the guide).

5.2 What information gets sent

  • Stack traces at exception time
  • Request URL, HTTP method, headers, query params, etc.
  • (If configured) context such as user ID and organization ID

In the dashboard, similar errors are grouped, and you can see “since when it increased” or “which version caused it.” If you also add Sentry to the frontend, you can trace which screen triggered a backend error.


6. How to combine the four: Design guidelines

So far we’ve looked at logs, metrics, traces, and error monitoring separately. Here’s how to think about combining them.

6.1 The goal of observability design

  • When an incident happens, you can get a likely cause within 5–10 minutes
  • After a release, you can quickly detect latency/error-rate regressions
  • During performance tuning, you can objectively decide what to improve

Working backwards from that, define roles like:

  • Logs: detailed context (params, business events)
  • Metrics: overall health (RED metrics)
  • Traces: per-request journey and bottlenecks
  • Error monitoring: exception collection + notification + scope estimation

6.2 Observation points by use case

  • When you want to chase performance issues

    • Use metrics to identify when latency spikes
    • Inspect sample traces during that time to see which external service/query is slow
    • Add detailed logs around the suspicious code paths and reproduce/verify
  • When you want to chase rising errors

    • Check Sentry for newly increasing errors (compare with release times)
    • Inspect sample traces to compare “success vs failure patterns”
    • If needed, log business keys (e.g., order ID) to identify impact scope

When this workflow becomes routine, you move closer to “deploying without fear.”


7. Sample setup: A realistic example for a small team

7.1 Three environments: dev → staging → prod

  • Development (dev)

    • Logs: console output (JSON)
    • Metrics: local Prometheus (optional)
    • Traces: local Jaeger (optional)
    • Sentry: disabled, or separate project
  • Staging (stg)

    • Logs: shipped to centralized logging
    • Metrics: staging Prometheus + Grafana
    • Traces: staging trace backend
    • Sentry: enabled with environment=stg
  • Production (prod)

    • Logs: production log backend (Loki/Cloud Logging, etc.)
    • Metrics: production Prometheus + Grafana, dashboards and alerts
    • Traces: production trace backend (sampling rate tuned)
    • Sentry: environment=prod, alert integrations (Slack/email)

7.2 Common initialization on the FastAPI side

# app/bootstrap.py
from app.core.logging import setup_logging
from app.otel import setup_tracing
from app.sentry_setup import setup_sentry
from app.metrics import setup_metrics

from app.core.settings import get_settings

def bootstrap(app):
    settings = get_settings()
    setup_logging(settings.log_level)
    setup_tracing(service_name=settings.app_name)
    if settings.sentry_dsn:
        setup_sentry(dsn=settings.sentry_dsn, env=settings.env)
    setup_metrics(app)
# app/main.py
from fastapi import FastAPI
from app.bootstrap import bootstrap
from app.middleware.request_id import RequestIDMiddleware

app = FastAPI(title="Observable API")

bootstrap(app)
app.add_middleware(RequestIDMiddleware)

Configuration values (DSN, OTLP endpoint, environment name, etc.) can be injected via environment variables using tools like pydantic-settings, making it easy to switch per environment.


8. Common pitfalls and countermeasures

Symptom Cause Fix
Logs are messy and hard to search Plain text / inconsistent formats Structure as JSON + always output key fields (request_id, user_id)
/metrics becomes heavy Too many labels / high cardinality Don’t use extremely high-unique values (e.g., user ID) as labels
Too much trace data → cost increases Sending all traces without sampling Tune samples_rate / sampling policies per environment
Too much noise in Sentry Sending even expected errors Handle “expected business errors” and don’t send them to Sentry
Confusion because dev and prod show very different info Large config/env diffs Minimize diffs and clearly separate env and environment

9. Adoption roadmap (Step-by-step)

Finally, here’s a staged path for introducing observability in practice.

  1. Structured logs + request IDs
    • Add JSON logs + X-Request-ID, verify consistency locally and in staging.
  2. Error monitoring (Sentry, etc.)
    • Ensure exceptions trigger notifications, then address the most urgent errors first.
  3. Prometheus metrics + Grafana dashboards
    • Build dashboards centered on RED metrics and watch peak times and post-release behavior.
  4. OpenTelemetry traces
    • Focus on slow endpoints/external calls and use traces to pinpoint bottlenecks.
  5. Alerting and SLO design
    • Define thresholds and alert rules for metrics like P95 latency and error rate.
  6. Continuous improvement
    • With every new feature, build the habit of asking: “How will we observe this?”

Reference links (For deeper study)


Conclusion

  • In production FastAPI operations, it’s not just about speed—being able to see what’s happening matters. By combining logs, metrics, traces, and error monitoring, you can quickly detect incidents and regressions and trace them to root causes.
  • Structured logs + request IDs are a universally useful first step at any project scale. From there, layer on Prometheus metrics, OpenTelemetry tracing, and Sentry error monitoring gradually to raise observability without strain.
  • You don’t need to add everything at once. Start from the single most painful area in your current scale/phase, and extend observability one step at a time.

I’m quietly rooting for your FastAPI app to become “visible,” so you can keep improving it with confidence.


By greeden

Leave a Reply

Your email address will not be published. Required fields are marked *

日本語が含まれない投稿は無視されますのでご注意ください。(スパム対策)