Fail-Proof DB Migrations: Evolving Schemas with FastAPI × SQLAlchemy × Alembic — Autogenerate, Manual Tuning, Data Migrations, Transactions, Testing & CI
Summary (big picture first)
- Manage schema versions with Alembic and safely reflect FastAPI + SQLAlchemy changes.
- Combine autogenerate with manual edits; handle multi-environment switching, data-migration scripts, and transaction semantics.
- From local to production: checklists, how to wire into tests and CI, and common operational pitfalls.
Who benefits
- Learner A (solo dev)
Tweaked models and it runs locally, but breaks elsewhere. Wants to run migrations correctly. - Small team B (3-person contract shop)
Frequent spec changes; tracking DB diffs by hand. Wants to standardize autogenerate + review. - SaaS team C (startup)
Wants safe rollout to staging and prod, and instant rollback if something goes wrong.
Accessibility assessment
- Structured with short paragraphs and bullet points per section; code is monospaced with light comments.
- Brief explanations for first-use terms; numbered steps to make order easy to follow.
- Target level roughly AA.
1. Assumptions & goals
- ORM: SQLAlchemy 2.x style; schema migrations with Alembic.
- You can learn on SQLite, but production parity assumes an RDB like PostgreSQL.
- Goal: standardize model change → diff generation → review → apply → verify so anyone gets the same result.
Decision points
- Always use a transactional DB in production.
- Autogenerate is handy, but index names, default values, and server-side functions often need manual touch-ups.
2. Initialize: bring in Alembic and get it running
2.1 Dependencies
SQLAlchemy>=2.0
alembic
psycopg[binary] # Example when using PostgreSQL
2.2 Directory layout (example)
fastapi-app/
├─ app/
│ ├─ main.py
│ ├─ db.py
│ ├─ models.py
│ └─ settings.py
├─ alembic/
│ ├─ env.py
│ ├─ script.py.mako
│ └─ versions/
└─ alembic.ini
2.3 Init command
alembic init alembic
What gets generated
alembic.ini
includes a DB URL template.alembic/env.py
is the entrypoint during migrations. Load SQLAlchemy Base here so autogenerate can “see” your target tables.
3. DB settings in the app and Base
3.1 Settings
# app/settings.py
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
database_url: str = "postgresql+psycopg://user:pass@localhost:5432/appdb"
class Config:
env_file = ".env"
def get_settings() -> Settings:
return Settings()
3.2 Session & Base
# app/db.py
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, DeclarativeBase
class Base(DeclarativeBase): pass
def get_engine(url: str):
return create_engine(url, pool_pre_ping=True)
def make_session_factory(url: str):
return sessionmaker(bind=get_engine(url), autoflush=False, autocommit=False)
3.3 Model example
# app/models.py
from sqlalchemy.orm import Mapped, mapped_column
from sqlalchemy import String, Integer, DateTime, func
from app.db import Base
class Article(Base):
__tablename__ = "articles"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
title: Mapped[str] = mapped_column(String(200), index=True, nullable=False)
body: Mapped[str] = mapped_column(String, nullable=False)
created_at: Mapped["DateTime"] = mapped_column(DateTime(timezone=True), server_default=func.now(), nullable=False)
4. Wire env.py
to the FastAPI project
Add code in alembic/env.py
to import the app’s Base and DB URL.
# alembic/env.py (excerpt)
from logging.config import fileConfig
from sqlalchemy import engine_from_config, pool
from alembic import context
# 1) Import the app's Base and settings
import os, sys
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
from app.db import Base
from app.settings import get_settings
config = context.config
if config.get_main_option("sqlalchemy.url") in (None, "", "sqlite:///"):
config.set_main_option("sqlalchemy.url", get_settings().database_url)
target_metadata = Base.metadata
fileConfig(config.config_file_name)
def run_migrations_offline():
url = config.get_main_option("sqlalchemy.url")
context.configure(
url=url,
target_metadata=target_metadata,
literal_binds=True,
compare_type=True, # detect type changes
compare_server_default=True # detect server defaults
)
with context.begin_transaction():
context.run_migrations()
def run_migrations_online():
connectable = engine_from_config(
config.get_section(config.config_ini_section), prefix="sqlalchemy.", poolclass=pool.NullPool
)
with connectable.connect() as connection:
context.configure(
connection=connection,
target_metadata=target_metadata,
compare_type=True,
compare_server_default=True
)
with context.begin_transaction():
context.run_migrations()
if context.is_offline_mode():
run_migrations_offline()
else:
run_migrations_online()
Key points
- Enable
compare_type=True
andcompare_server_default=True
to stabilize diff detection. - Pull URL from env or
.env
, overridingalembic.ini
defaults.
5. Create the initial migration
5.1 Create base
alembic revision --autogenerate -m "init"
# Review → apply
alembic upgrade head
alembic/versions/<rev>_init.py
gets generated. Review tables and indexes before applying.
5.2 Common manual tweaks
- Mind DB specifics when changing string lengths (shrinking may lose data).
- Expressions like
server_default=func.now()
differ by dialect; hand-specify if needed. - For composite indexes and unique constraints, set naming conventions and make them explicit.
6. Make a change and run diffs
As an example, add a slug
column to Article with a unique constraint.
6.1 Model change
# app/models.py (addition)
from sqlalchemy import UniqueConstraint
class Article(Base):
__tablename__ = "articles"
# existing columns omitted
slug: Mapped[str] = mapped_column(String(200), nullable=False, unique=True)
__table_args__ = (UniqueConstraint("slug", name="uq_articles_slug"),)
6.2 Create diff → review → apply
alembic revision --autogenerate -m "add slug to articles"
alembic upgrade head
The script will contain op.add_column
and op.create_unique_constraint
. Check names, ordering, and the sanity of the downgrade step.
7. Data migrations (how to write them)
If a schema change needs initial values or transforms for existing rows, put data ops in the same revision.
# alembic/versions/xxxx_add_slug.py (excerpt)
from alembic import op
import sqlalchemy as sa
def upgrade():
op.add_column("articles", sa.Column("slug", sa.String(length=200), nullable=True))
# Safe order: allow NULL → backfill existing rows → make NOT NULL
conn = op.get_bind()
articles = conn.execute(sa.text("SELECT id, title FROM articles")).mappings().all()
for row in articles:
slug = row["title"].lower().replace(" ", "-")
conn.execute(sa.text("UPDATE articles SET slug=:s WHERE id=:i"), {"s": slug, "i": row["id"]})
op.alter_column("articles", "slug", nullable=False)
op.create_unique_constraint("uq_articles_slug", "articles", ["slug"])
def downgrade():
op.drop_constraint("uq_articles_slug", "articles", type_="unique")
op.drop_column("articles", "slug")
Notes
- With existing data, apply in stages: add column → backfill → add constraints.
- Large updates can cause long locks; split into batches or schedule off-hours.
8. Transactions & rollback
- In PostgreSQL and similar, each migration generally runs in a transaction and auto-rolls back on failure.
- For heavyweight ops (e.g., type changes on huge tables), investigate impact in advance.
- SQLite’s DDL transaction behavior is weaker; autogenerate results can vary with constraints. Validate on a production-parity RDB.
9. Operating across environments (dev/stg/prod)
- dev: freely create/apply migrations; validate with seed data.
- stg: apply tests on prod-equivalent DB; measure duration and lock time.
- prod: coordinate with maintenance windows or rolling deploys; watch monitors and alerts as you run.
Tips
alembic current
for current version;alembic history
for revision history.- Emergency rollback via
alembic downgrade -1
, but avoid data-losing downgrades. Prefer forward fixes.
10. Handling branches & merges
When multiple development lines create revisions in parallel, the version graph can split.
- To avoid conflicts, merge to the main branch often and use
alembic revision --autogenerate -m "merge heads"
to unify. - In the merge revision, ensure both diffs apply in a correct order.
- Tight naming conventions and review, and aligning generation timing among devs reduces trouble.
11. Should FastAPI auto-apply migrations at startup?
- For local development, “
upgrade head
on startup” is convenient. - In production, prefer human review + manual execution, or an explicit CI/CD pipeline step.
- Running
alembic upgrade head
in a container entrypoint and aborting startup on failure works well with IaC.
Sample (shell):
set -e
alembic upgrade head
exec uvicorn app.main:app --host 0.0.0.0 --port 8000
12. Integrate into testing
- For unit tests: create a test DB →
alembic upgrade head
→ run tests → rollback/teardown to keep it clean each time. - For migrations that touch existing data, prepare snapshot-like dummy data and verify
upgrade
success and post-state consistency. - Confirm reentrancy: failing then rerunning shouldn’t corrupt results.
pytest sketch:
import subprocess, os
def setup_module():
os.environ["DATABASE_URL"] = "postgresql+psycopg://user:pass@localhost:5432/testdb"
subprocess.check_call(["alembic", "upgrade", "head"])
13. CI/CD flow
- If a PR includes
alembic revision --autogenerate
, have CI runalembic upgrade head
to validate. - After deploying to staging, record migration runtime and logs; relax alert thresholds before prod.
- Clarify the release order: DB migration → app rollout.
14. Small operational tips
- Naming: make intent obvious—
create_<table>
,add_<col>_to_<table>
,drop_<col>_from_<table>
—so history reads well. - Split big changes into multiple steps; don’t do everything at once.
- For heavy index builds, consider parallel/online options (PostgreSQL
CONCURRENTLY
), though Alembic will need manual SQL. - Always measure on a staging DB close to production size.
15. Common pitfalls and remedies
Symptom | Cause | Remedy |
---|---|---|
Autogenerate sees nothing | Base not imported; target_metadata unset |
Ensure Base.metadata is passed in env.py |
Shrinking string length fails | Existing data too long | Query for overflow rows; agree on trimming policy; migrate in stages |
Server default keeps diffing | Dialect differences / function forms | Keep compare_server_default , but pin function names by hand |
Long locks | Rewriting huge tables | Consider online methods, staged runs, off-hours, concurrent index builds |
Downgrade breaks things | Destructive ops | Prefer forward fixes; if rollback is mandatory, take backups beforehand |
16. Concrete example: introducing a tag table and associations
We’ll do many-to-many between articles and tags in stages:
- Create
tags
table - Create join table
article_tags
(composite PK) - Attach tags to existing articles (data migration)
- Add required indexes
Model changes (excerpt):
# app/models.py (added example)
from sqlalchemy import Table, Column, ForeignKey
from sqlalchemy.orm import relationship
article_tags = Table(
"article_tags",
Base.metadata,
Column("article_id", ForeignKey("articles.id", ondelete="CASCADE"), primary_key=True),
Column("tag_id", ForeignKey("tags.id", ondelete="CASCADE"), primary_key=True),
)
class Tag(Base):
__tablename__ = "tags"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column(String(50), unique=True, nullable=False)
class Article(Base):
__tablename__ = "articles"
# existing columns omitted
tags = relationship("Tag", secondary=article_tags, backref="articles")
Generate the diff → review join-table indexes and ondelete
policy → add data-load SQL to upgrade
if needed.
17. Roadmap
- Initialize Alembic and tidy up
env.py
. - Create the first table set via autogenerate, review, and apply.
- Make model changes → generate diffs → hand-tune → apply as the team standard.
- Write data migrations in a safe order; measure time/locks in tests and staging.
- Integrate with CI/CD; document branch merges and emergency response steps.
References
- Alembic
- SQLAlchemy
- PostgreSQL
- FastAPI
Wrap-up
- Version-control your schema with Alembic; start with autogenerate and finish by hand.
- For existing data, design staged changes paired with data migrations; always measure transaction/lock impact.
- Make local, staging, and production run the same steps; wire checks into CI/CD.
- Keep branch merges and naming conventions explicit so history stays readable. This way, model evolution directly maps to safe releases.