Step into Kubernetes Production: A Practical FastAPI × Kubernetes Deployment Guide
Deployment / Service / Ingress / ConfigMap / Secret / HPA / Health Checks
Introduction: What You’ll Be Able to Do and Who This Is For
Once you can run your FastAPI app with Docker, the next thing you’ll probably want is “I want to run this on a cluster” or “I want it to scale.” In this article, we’ll deploy a FastAPI app on Kubernetes (K8s) and walk through a near-production setup where you can experience scaling, configuration management, health checks, and auto-scaling using a concrete sample and detailed explanations.
Target Readers (Very Specific)
-
Individual developers (students / working learners)
You’ve used Docker Compose, but “Kubernetes looks scary and complicated.”
→ We’ll walk together through putting a single FastAPI service on a cluster as a first step. -
Engineers in small teams (3–5 members)
You’re already running Docker images in production, but scaling and rolling updates are becoming painful.
→ Think of this as a “template” you can take home, with Deployment / Service / Ingress / ConfigMap / Secret / HPA all wired together. -
Startup SaaS development teams
You’re considering future microservices where multiple FastAPI services will run on Kubernetes.
→ This article focuses on a single service, but we’ll also touch on manifest splitting per service, namespaces, and patterns for shared configuration.
Accessibility Notes (On Readability)
- The structure follows an inverted triangle: “overall picture first → core resources (Deployment / Service / Ingress, etc.) → config management → scaling → operational tips → recap.”
- Code and YAML are shown in fixed-width blocks, and very long examples are split by purpose.
- Technical terms are briefly explained on first use, and then terminology is kept consistent to avoid confusion.
- We use ample spacing and headings so that screen readers can easily follow the document structure.
Overall, this article aims for AA-equivalent readability for readers with some technical background.
1. Get the Big Picture of Kubernetes Deployment
First, let’s align our mental model of how a FastAPI app is broken down into resources and runs on Kubernetes.
1.1 Main Actors
-
Deployment
Defines the desired number of Pods (containers) and their template. It’s the core of rolling updates and self-healing. -
Service
Because Pod IPs are short-lived, a Service provides a stable, named entry point to your app. It can be of type LoadBalancer, ClusterIP, etc. -
Ingress
Routes external HTTP/HTTPS requests to the appropriate Service. It’s the gateway between the cluster and the outside world. -
ConfigMap / Secret
Resources for externalizing configuration and secrets such as environment variables, settings, passwords, and tokens. -
Horizontal Pod Autoscaler (HPA)
A resource that automatically scales the number of Pods based on CPU or custom metrics. -
Liveness / Readiness Probes
Health-check settings to determine whether the container is “alive” and “ready to receive traffic.”
1.2 Minimal Setup We’ll Build This Time
-
Use an existing FastAPI Docker image (e.g.,
my-fastapi:latest). -
Prepare the following Kubernetes manifests:
Deployment(FastAPI container + liveness/readiness probes)Service(ClusterIP / NodePort, etc.)Ingress(routing for/api)ConfigMap(non-secret configuration)Secret(DB passwords, etc.)HPA(CPU-based auto-scaling)
-
Run and verify the app on a local cluster (Minikube, kind) or a managed K8s service (EKS/GKE/AKS, etc.).
2. Confirming the Assumptions of the FastAPI Image
Before diving into the Kubernetes side, let’s confirm how the FastAPI container image is structured.
2.1 Typical Dockerfile (Review)
FROM python:3.11-slim AS base
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
WORKDIR /app
COPY requirements.txt /app/
RUN pip install --upgrade pip && pip install -r requirements.txt
COPY app /app/app
# Run as non-root user
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser
ENV HOST=0.0.0.0 \
PORT=8000
EXPOSE 8000
CMD ["bash", "-lc", "exec uvicorn app.main:app --host ${HOST} --port ${PORT}"]
Here we’re launching with uvicorn directly, but you can also use Gunicorn + UvicornWorker (we won’t go into those details here).
2.2 Assumptions on the Kubernetes Side
- The container starts with working directory
/app. HOSTandPORTcan be switched via environment variables.- Health-check endpoints such as
/health/livenessand/health/readinessare already implemented (as introduced in earlier articles).
We’ll write the manifests based on these assumptions.
3. Deployment: Placing FastAPI Containers on the Cluster
3.1 Basic Deployment Manifest
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: fastapi-app
labels:
app: fastapi-app
spec:
replicas: 2
selector:
matchLabels:
app: fastapi-app
template:
metadata:
labels:
app: fastapi-app
spec:
containers:
- name: fastapi
image: my-fastapi:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8000
env:
- name: HOST
value: "0.0.0.0"
- name: PORT
value: "8000"
livenessProbe:
httpGet:
path: /health/liveness
port: 8000
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/readiness
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
Key Points
-
replicas: 2maintains two Pods at all times (minimal redundancy). -
selector.matchLabelsandtemplate.metadata.labelsmust match to link the Deployment to its Pods. -
livenessProbeandreadinessProbeperform health checks on FastAPI:- If
livenessfails → Kubelet restarts the container. - If
readinessfails → this Pod is removed from the Service’s endpoints so it doesn’t receive new requests.
- If
-
resourcesdefine resource requests and limits, helping with HPA behavior and node scheduling.
4. Service: Creating a Stable Entry Point
A Deployment alone isn’t enough, because Pods can’t be accessed directly in a stable way. We need a Service.
4.1 ClusterIP Example (In-Cluster Access)
# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
name: fastapi-service
labels:
app: fastapi-app
spec:
type: ClusterIP
selector:
app: fastapi-app
ports:
- name: http
port: 80 # Service port
targetPort: 8000 # Container port
selector.app: fastapi-approutes traffic to the Pods created by the Deployment.- Within the cluster, the Service is reachable via the DNS name
fastapi-service(e.g.,http://fastapi-service).
4.2 Exposing the Service Externally (NodePort / LoadBalancer)
On a local Minikube, you’ll often use NodePort; on managed K8s in the cloud, LoadBalancer is common.
Example: LoadBalancer Service (assuming cloud environment)
apiVersion: v1
kind: Service
metadata:
name: fastapi-service
spec:
type: LoadBalancer
selector:
app: fastapi-app
ports:
- port: 80
targetPort: 8000
However, for flexible HTTP routing and TLS termination, using an Ingress controller is usually better, so we’ll look at an Ingress example next.
5. Ingress: Connecting External HTTP Traffic to FastAPI
Ingress defines rules for routing HTTP traffic to multiple Services based on URL paths and other conditions. Here, we’ll use a simple setup for a single FastAPI service.
5.1 Simple Ingress (Route /api to FastAPI)
A common Ingress controller is NGINX Ingress Controller. The example below assumes that.
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: fastapi-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /$1
spec:
ingressClassName: nginx # Change according to your IngressClass
rules:
- host: example.local # Use your actual FQDN in production
http:
paths:
- path: /api(/|$)(.*)
pathType: Prefix
backend:
service:
name: fastapi-service
port:
number: 80
hostlets you route based on domain names.pathsends all paths under/apito the FastAPI Service.- In reality, you also need TLS settings (a Secret with the certificate), but we’re focusing on core concepts here.
6. ConfigMap and Secret: Externalizing Config and Secrets
As in previous articles, using pydantic-settings to read config from environment variables is recommended for FastAPI. On Kubernetes, ConfigMap and Secret are the standard way to provide those environment variables.
6.1 ConfigMap (Non-Secret Settings)
# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fastapi-config
data:
APP_NAME: "My FastAPI on K8s"
ENV: "prod"
LOG_LEVEL: "info"
CORS_ORIGINS: "https://app.example.com"
6.2 Secret (Sensitive Settings)
# k8s/secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: fastapi-secret
type: Opaque
data:
SECRET_KEY: cHJvZC1zZWNyZXQ= # base64("prod-secret")
DATABASE_URL: cG9zdGdyZXNxbCtwc3ljcGc6Ly91c2VyOnBhc3NAaG9zdDoyNTQzL2FwcA==
Note: Values under
datamust be base64-encoded strings (e.g.,echo -n "prod-secret" | base64).
6.3 Using Them as Environment Variables in the Deployment
# k8s/deployment.yaml (excerpt from env section)
containers:
- name: fastapi
image: my-fastapi:latest
envFrom:
- configMapRef:
name: fastapi-config
- secretRef:
name: fastapi-secret
# You can still define individual env variables as well
env:
- name: HOST
value: "0.0.0.0"
- name: PORT
value: "8000"
envFrominjects all keys from the ConfigMap and Secret as environment variables.- Instead of using an
.envfile with pydantic-settings, you read these Kubernetes-provided environment variables.
7. Horizontal Pod Autoscaler (HPA) for Auto-Scaling
Kubernetes’ HPA scales the number of Pods when CPU usage passes a specified threshold.
7.1 Simple HPA Example
# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: fastapi-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: fastapi-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
averageUtilization: 60means “scale out if the average CPU usage across all Pods exceeds 60%.”- To use HPA, your cluster must have metrics-server or an equivalent metrics setup (most managed K8s clusters have it by default).
7.2 Things to Keep in Mind on the FastAPI Side
- Design your app to leverage async I/O rather than relying heavily on blocking I/O, so CPU usage reflects load more clearly and scaling becomes more effective.
- If you expose metrics in Prometheus format, you can later control HPA based on custom metrics (beyond the scope of this article).
8. Example Deployment Flow on a Local Cluster (kubectl-Based)
We’ve looked at the resource YAMLs; now let’s briefly organize the actual steps to deploy them using kubectl.
8.1 Example Workflow
-
Build the Docker image
docker build -t my-fastapi:latest . -
Make the image available to the cluster (for Minikube)
- Run
eval $(minikube docker-env)before building
or - Push the image to a registry (ECR/GCR/Docker Hub, etc.)
- Run
-
Apply the Kubernetes manifests
kubectl apply -f k8s/configmap.yaml kubectl apply -f k8s/secret.yaml kubectl apply -f k8s/deployment.yaml kubectl apply -f k8s/service.yaml kubectl apply -f k8s/ingress.yaml kubectl apply -f k8s/hpa.yaml -
Check the status
kubectl get pods kubectl get svc kubectl get ingress kubectl get hpa -
Inspect Pod logs
kubectl logs -f deploy/fastapi-app -
Access via Ingress (Minikube example)
minikube ip # Then visit: http://<minikube-ip>/api/meta
Details vary slightly depending on your local environment (Minikube, kind, Docker Desktop, etc.).
9. Operational Essentials: Rolling Updates and Rollbacks
9.1 Rolling Updates
When you change the image tag or environment variables in the Deployment and kubectl apply the new manifest, a rolling update is triggered.
Behind the scenes, Kubernetes:
- Starts a new Pod with the updated spec.
- Waits for the
readinessProbeto succeed. - Then terminates one old Pod.
- Repeats the process, minimizing downtime.
9.2 Rollbacks
If the new version causes issues, you can roll back to a previous revision:
kubectl rollout history deploy/fastapi-app
kubectl rollout undo deploy/fastapi-app
On the FastAPI side, pay special attention to database migrations. If your DB schema changes, you’ll need to decide on a safe order—for example, apply data migrations first, then roll out the new app version.
10. Handling Logs and Metrics (A Brief Overview)
10.1 Logs
- As a rule of thumb, write structured logs (e.g., JSON) to stdout, and review them through
kubectl logsor a log-aggregation platform (Cloud Logging, Elasticsearch, Loki, etc.). - Don’t rely on writing log files inside the container; instead, push logs out to Kubernetes and your logging infrastructure.
10.2 Metrics and Traces
- With Prometheus or OpenTelemetry, you can run agents as sidecars or separate Pods.
- As your architecture grows into multiple services, distributed tracing (Jaeger, Tempo, etc.) becomes important to identify which service is slow.
This area is deep and broad, so we’ll keep it to keywords here.
11. Common Pitfalls and How to Avoid Them
| Symptom | Likely Cause | Countermeasure |
|---|---|---|
| Pod starts and exits immediately | Missing env vars or wrong Secret name | Check events via kubectl describe pod, inspect env and logs to verify keys |
| Can’t access the app via Service | Label selector mismatch / incorrect ports | Double-check labels/selector and port/targetPort in Deployment and Service |
| Ingress returns 404 | IngressClass or path mismatch | Verify ingressClassName, Ingress Controller settings, and path/regex configuration |
| HPA not working | metrics-server not installed | Set up metrics in the cluster (follow your cloud provider’s docs) |
| Brief downtime during rolling update | Readiness probe missing or too lax | Revisit your /health/readiness implementation and probe timings to give the app enough startup time |
12. Summary of Benefits by Reader Type
-
Individual developers
- Concrete templates for Deployment / Service / Ingress / HPA help make “Where do I even start with Kubernetes?” far less intimidating.
- Practicing on a local cluster (Minikube, etc.) makes it easier to imagine deploying to a real production cluster.
-
Small teams
- You get a “minimal setup checklist” when migrating from Docker Compose to Kubernetes.
- Combining rolling updates and HPA lets you reduce downtime while making your service more resilient to load spikes.
-
Startup SaaS teams
- With ConfigMap / Secret / HPA and friends, you lay the foundation for an architecture that can safely scale while switching configs per environment.
- You can start imagining a microservice future with separate Deployment / Service / Ingress per service.
13. References (Mostly Official Docs)
-
Kubernetes
-
FastAPI
-
Local K8s
14. Wrap-Up: Making FastAPI a “First-Class Citizen” in Kubernetes
We’ve walked through the basic resources and settings required to deploy a FastAPI app on Kubernetes:
- Use a Deployment to manage the number and template of Pods, with liveness/readiness probes for health checks.
- Use a Service as a stable entry point, and link it to the outside world via Ingress if necessary.
- Use ConfigMap and Secret to externalize configuration and secrets, and read them as environment variables in FastAPI.
- Use HPA to automatically scale Pods based on CPU utilization.
- Use rolling updates and rollbacks to safely upgrade versions and troubleshoot issues.
At first, the amount of YAML might feel overwhelming, but once you understand each component’s role, Kubernetes starts to feel much less scary.
Use this template as a base, and gradually evolve it to fit your own projects.
I’ll be cheering you on as your FastAPI app thrives as a “first-class citizen” in the Kubernetes world.
