Table of Contents

Step into Kubernetes Production: A Practical FastAPI × Kubernetes Deployment Guide

Deployment / Service / Ingress / ConfigMap / Secret / HPA / Health Checks

Introduction: What You’ll Be Able to Do and Who This Is For

Once you can run your FastAPI app with Docker, the next thing you’ll probably want is “I want to run this on a cluster” or “I want it to scale.” In this article, we’ll deploy a FastAPI app on Kubernetes (K8s) and walk through a near-production setup where you can experience scaling, configuration management, health checks, and auto-scaling using a concrete sample and detailed explanations.

Target Readers (Very Specific)

Individual developers (students / working learners)
You’ve used Docker Compose, but “Kubernetes looks scary and complicated.”
→ We’ll walk together through putting a single FastAPI service on a cluster as a first step.
Engineers in small teams (3–5 members)
You’re already running Docker images in production, but scaling and rolling updates are becoming painful.
→ Think of this as a “template” you can take home, with Deployment / Service / Ingress / ConfigMap / Secret / HPA all wired together.
Startup SaaS development teams
You’re considering future microservices where multiple FastAPI services will run on Kubernetes.
→ This article focuses on a single service, but we’ll also touch on manifest splitting per service, namespaces, and patterns for shared configuration.

Accessibility Notes (On Readability)

The structure follows an inverted triangle: “overall picture first → core resources (Deployment / Service / Ingress, etc.) → config management → scaling → operational tips → recap.”
Code and YAML are shown in fixed-width blocks, and very long examples are split by purpose.
Technical terms are briefly explained on first use, and then terminology is kept consistent to avoid confusion.
We use ample spacing and headings so that screen readers can easily follow the document structure.

Overall, this article aims for AA-equivalent readability for readers with some technical background.

1. Get the Big Picture of Kubernetes Deployment

First, let’s align our mental model of how a FastAPI app is broken down into resources and runs on Kubernetes.

1.1 Main Actors

Deployment
Defines the desired number of Pods (containers) and their template. It’s the core of rolling updates and self-healing.
Service
Because Pod IPs are short-lived, a Service provides a stable, named entry point to your app. It can be of type LoadBalancer, ClusterIP, etc.
Ingress
Routes external HTTP/HTTPS requests to the appropriate Service. It’s the gateway between the cluster and the outside world.
ConfigMap / Secret
Resources for externalizing configuration and secrets such as environment variables, settings, passwords, and tokens.
Horizontal Pod Autoscaler (HPA)
A resource that automatically scales the number of Pods based on CPU or custom metrics.
Liveness / Readiness Probes
Health-check settings to determine whether the container is “alive” and “ready to receive traffic.”

1.2 Minimal Setup We’ll Build This Time

Use an existing FastAPI Docker image (e.g., my-fastapi:latest).
Prepare the following Kubernetes manifests:
- Deployment (FastAPI container + liveness/readiness probes)
- Service (ClusterIP / NodePort, etc.)
- Ingress (routing for /api)
- ConfigMap (non-secret configuration)
- Secret (DB passwords, etc.)
- HPA (CPU-based auto-scaling)
Run and verify the app on a local cluster (Minikube, kind) or a managed K8s service (EKS/GKE/AKS, etc.).

2. Confirming the Assumptions of the FastAPI Image

Before diving into the Kubernetes side, let’s confirm how the FastAPI container image is structured.

2.1 Typical Dockerfile (Review)

FROM python:3.11-slim AS base

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

WORKDIR /app

COPY requirements.txt /app/
RUN pip install --upgrade pip && pip install -r requirements.txt

COPY app /app/app

# Run as non-root user
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser

ENV HOST=0.0.0.0 \
    PORT=8000

EXPOSE 8000

CMD ["bash", "-lc", "exec uvicorn app.main:app --host ${HOST} --port ${PORT}"]

Here we’re launching with uvicorn directly, but you can also use Gunicorn + UvicornWorker (we won’t go into those details here).

2.2 Assumptions on the Kubernetes Side

The container starts with working directory /app.
HOST and PORT can be switched via environment variables.
Health-check endpoints such as /health/liveness and /health/readiness are already implemented (as introduced in earlier articles).

We’ll write the manifests based on these assumptions.

3. Deployment: Placing FastAPI Containers on the Cluster

3.1 Basic Deployment Manifest

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fastapi-app
  labels:
    app: fastapi-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: fastapi-app
  template:
    metadata:
      labels:
        app: fastapi-app
    spec:
      containers:
        - name: fastapi
          image: my-fastapi:latest
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 8000
          env:
            - name: HOST
              value: "0.0.0.0"
            - name: PORT
              value: "8000"
          livenessProbe:
            httpGet:
              path: /health/liveness
              port: 8000
            initialDelaySeconds: 10
            periodSeconds: 10
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /health/readiness
              port: 8000
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 3
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"

Key Points

replicas: 2 maintains two Pods at all times (minimal redundancy).
selector.matchLabels and template.metadata.labels must match to link the Deployment to its Pods.
livenessProbe and readinessProbe perform health checks on FastAPI:
- If liveness fails → Kubelet restarts the container.
- If readiness fails → this Pod is removed from the Service’s endpoints so it doesn’t receive new requests.
resources define resource requests and limits, helping with HPA behavior and node scheduling.

4. Service: Creating a Stable Entry Point

A Deployment alone isn’t enough, because Pods can’t be accessed directly in a stable way. We need a Service.

4.1 ClusterIP Example (In-Cluster Access)

# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: fastapi-service
  labels:
    app: fastapi-app
spec:
  type: ClusterIP
  selector:
    app: fastapi-app
  ports:
    - name: http
      port: 80         # Service port
      targetPort: 8000 # Container port

selector.app: fastapi-app routes traffic to the Pods created by the Deployment.
Within the cluster, the Service is reachable via the DNS name fastapi-service (e.g., http://fastapi-service).

4.2 Exposing the Service Externally (NodePort / LoadBalancer)

On a local Minikube, you’ll often use NodePort; on managed K8s in the cloud, LoadBalancer is common.

Example: LoadBalancer Service (assuming cloud environment)

apiVersion: v1
kind: Service
metadata:
  name: fastapi-service
spec:
  type: LoadBalancer
  selector:
    app: fastapi-app
  ports:
    - port: 80
      targetPort: 8000

However, for flexible HTTP routing and TLS termination, using an Ingress controller is usually better, so we’ll look at an Ingress example next.

5. Ingress: Connecting External HTTP Traffic to FastAPI

Ingress defines rules for routing HTTP traffic to multiple Services based on URL paths and other conditions. Here, we’ll use a simple setup for a single FastAPI service.

5.1 Simple Ingress (Route `/api` to FastAPI)

A common Ingress controller is NGINX Ingress Controller. The example below assumes that.

# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: fastapi-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /$1
spec:
  ingressClassName: nginx   # Change according to your IngressClass
  rules:
    - host: example.local   # Use your actual FQDN in production
      http:
        paths:
          - path: /api(/|$)(.*)
            pathType: Prefix
            backend:
              service:
                name: fastapi-service
                port:
                  number: 80

host lets you route based on domain names.
path sends all paths under /api to the FastAPI Service.
In reality, you also need TLS settings (a Secret with the certificate), but we’re focusing on core concepts here.

6. ConfigMap and Secret: Externalizing Config and Secrets

As in previous articles, using pydantic-settings to read config from environment variables is recommended for FastAPI. On Kubernetes, ConfigMap and Secret are the standard way to provide those environment variables.

6.1 ConfigMap (Non-Secret Settings)

# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: fastapi-config
data:
  APP_NAME: "My FastAPI on K8s"
  ENV: "prod"
  LOG_LEVEL: "info"
  CORS_ORIGINS: "https://app.example.com"

6.2 Secret (Sensitive Settings)

# k8s/secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: fastapi-secret
type: Opaque
data:
  SECRET_KEY: cHJvZC1zZWNyZXQ=      # base64("prod-secret")
  DATABASE_URL: cG9zdGdyZXNxbCtwc3ljcGc6Ly91c2VyOnBhc3NAaG9zdDoyNTQzL2FwcA==

Note: Values under data must be base64-encoded strings (e.g., echo -n "prod-secret" | base64).

6.3 Using Them as Environment Variables in the Deployment

# k8s/deployment.yaml (excerpt from env section)
      containers:
        - name: fastapi
          image: my-fastapi:latest
          envFrom:
            - configMapRef:
                name: fastapi-config
            - secretRef:
                name: fastapi-secret
          # You can still define individual env variables as well
          env:
            - name: HOST
              value: "0.0.0.0"
            - name: PORT
              value: "8000"

envFrom injects all keys from the ConfigMap and Secret as environment variables.
Instead of using an .env file with pydantic-settings, you read these Kubernetes-provided environment variables.

7. Horizontal Pod Autoscaler (HPA) for Auto-Scaling

Kubernetes’ HPA scales the number of Pods when CPU usage passes a specified threshold.

7.1 Simple HPA Example

# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: fastapi-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: fastapi-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60

averageUtilization: 60 means “scale out if the average CPU usage across all Pods exceeds 60%.”
To use HPA, your cluster must have metrics-server or an equivalent metrics setup (most managed K8s clusters have it by default).

7.2 Things to Keep in Mind on the FastAPI Side

Design your app to leverage async I/O rather than relying heavily on blocking I/O, so CPU usage reflects load more clearly and scaling becomes more effective.
If you expose metrics in Prometheus format, you can later control HPA based on custom metrics (beyond the scope of this article).

8. Example Deployment Flow on a Local Cluster (kubectl-Based)

We’ve looked at the resource YAMLs; now let’s briefly organize the actual steps to deploy them using kubectl.

8.1 Example Workflow

Build the Docker image
```
docker build -t my-fastapi:latest .
```
Make the image available to the cluster (for Minikube)
- Run eval $(minikube docker-env) before building
  or
- Push the image to a registry (ECR/GCR/Docker Hub, etc.)

Apply the Kubernetes manifests

kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/ingress.yaml
kubectl apply -f k8s/hpa.yaml

Check the status

kubectl get pods
kubectl get svc
kubectl get ingress
kubectl get hpa

Inspect Pod logs
```
kubectl logs -f deploy/fastapi-app
```

Access via Ingress (Minikube example)

minikube ip
# Then visit: http://<minikube-ip>/api/meta

Details vary slightly depending on your local environment (Minikube, kind, Docker Desktop, etc.).

9. Operational Essentials: Rolling Updates and Rollbacks

9.1 Rolling Updates

When you change the image tag or environment variables in the Deployment and kubectl apply the new manifest, a rolling update is triggered.

Behind the scenes, Kubernetes:

Starts a new Pod with the updated spec.
Waits for the readinessProbe to succeed.
Then terminates one old Pod.
Repeats the process, minimizing downtime.

9.2 Rollbacks

If the new version causes issues, you can roll back to a previous revision:

kubectl rollout history deploy/fastapi-app
kubectl rollout undo deploy/fastapi-app

On the FastAPI side, pay special attention to database migrations. If your DB schema changes, you’ll need to decide on a safe order—for example, apply data migrations first, then roll out the new app version.

10. Handling Logs and Metrics (A Brief Overview)

10.1 Logs

As a rule of thumb, write structured logs (e.g., JSON) to stdout, and review them through kubectl logs or a log-aggregation platform (Cloud Logging, Elasticsearch, Loki, etc.).
Don’t rely on writing log files inside the container; instead, push logs out to Kubernetes and your logging infrastructure.

10.2 Metrics and Traces

With Prometheus or OpenTelemetry, you can run agents as sidecars or separate Pods.
As your architecture grows into multiple services, distributed tracing (Jaeger, Tempo, etc.) becomes important to identify which service is slow.

This area is deep and broad, so we’ll keep it to keywords here.

11. Common Pitfalls and How to Avoid Them

Symptom	Likely Cause	Countermeasure
Pod starts and exits immediately	Missing env vars or wrong Secret name	Check events via `kubectl describe pod`, inspect `env` and logs to verify keys
Can’t access the app via Service	Label selector mismatch / incorrect ports	Double-check `labels`/`selector` and `port`/`targetPort` in Deployment and Service
Ingress returns 404	IngressClass or path mismatch	Verify `ingressClassName`, Ingress Controller settings, and path/regex configuration
HPA not working	metrics-server not installed	Set up metrics in the cluster (follow your cloud provider’s docs)
Brief downtime during rolling update	Readiness probe missing or too lax	Revisit your `/health/readiness` implementation and probe timings to give the app enough startup time

12. Summary of Benefits by Reader Type

Individual developers
- Concrete templates for Deployment / Service / Ingress / HPA help make “Where do I even start with Kubernetes?” far less intimidating.
- Practicing on a local cluster (Minikube, etc.) makes it easier to imagine deploying to a real production cluster.
Small teams
- You get a “minimal setup checklist” when migrating from Docker Compose to Kubernetes.
- Combining rolling updates and HPA lets you reduce downtime while making your service more resilient to load spikes.
Startup SaaS teams
- With ConfigMap / Secret / HPA and friends, you lay the foundation for an architecture that can safely scale while switching configs per environment.
- You can start imagining a microservice future with separate Deployment / Service / Ingress per service.

13. References (Mostly Official Docs)

Kubernetes
FastAPI
- FastAPI Documentation
- FastAPI Deployment
Local K8s
- Minikube
- kind (Kubernetes in Docker)

14. Wrap-Up: Making FastAPI a “First-Class Citizen” in Kubernetes

We’ve walked through the basic resources and settings required to deploy a FastAPI app on Kubernetes:

Use a Deployment to manage the number and template of Pods, with liveness/readiness probes for health checks.
Use a Service as a stable entry point, and link it to the outside world via Ingress if necessary.
Use ConfigMap and Secret to externalize configuration and secrets, and read them as environment variables in FastAPI.
Use HPA to automatically scale Pods based on CPU utilization.
Use rolling updates and rollbacks to safely upgrade versions and troubleshoot issues.

At first, the amount of YAML might feel overwhelming, but once you understand each component’s role, Kubernetes starts to feel much less scary.

Use this template as a base, and gradually evolve it to fit your own projects.
I’ll be cheering you on as your FastAPI app thrives as a “first-class citizen” in the Kubernetes world.