green snake
Photo by Pixabay on Pexels.com

Fail-Proof DB Migrations: Evolving Schemas with FastAPI × SQLAlchemy × Alembic — Autogenerate, Manual Tuning, Data Migrations, Transactions, Testing & CI


Summary (big picture first)

  • Manage schema versions with Alembic and safely reflect FastAPI + SQLAlchemy changes.
  • Combine autogenerate with manual edits; handle multi-environment switching, data-migration scripts, and transaction semantics.
  • From local to production: checklists, how to wire into tests and CI, and common operational pitfalls.

Who benefits

  • Learner A (solo dev)
    Tweaked models and it runs locally, but breaks elsewhere. Wants to run migrations correctly.
  • Small team B (3-person contract shop)
    Frequent spec changes; tracking DB diffs by hand. Wants to standardize autogenerate + review.
  • SaaS team C (startup)
    Wants safe rollout to staging and prod, and instant rollback if something goes wrong.

Accessibility assessment

  • Structured with short paragraphs and bullet points per section; code is monospaced with light comments.
  • Brief explanations for first-use terms; numbered steps to make order easy to follow.
  • Target level roughly AA.

1. Assumptions & goals

  • ORM: SQLAlchemy 2.x style; schema migrations with Alembic.
  • You can learn on SQLite, but production parity assumes an RDB like PostgreSQL.
  • Goal: standardize model change → diff generation → review → apply → verify so anyone gets the same result.

Decision points

  • Always use a transactional DB in production.
  • Autogenerate is handy, but index names, default values, and server-side functions often need manual touch-ups.

2. Initialize: bring in Alembic and get it running

2.1 Dependencies

SQLAlchemy>=2.0
alembic
psycopg[binary]  # Example when using PostgreSQL

2.2 Directory layout (example)

fastapi-app/
├─ app/
│  ├─ main.py
│  ├─ db.py
│  ├─ models.py
│  └─ settings.py
├─ alembic/
│  ├─ env.py
│  ├─ script.py.mako
│  └─ versions/
└─ alembic.ini

2.3 Init command

alembic init alembic

What gets generated

  • alembic.ini includes a DB URL template.
  • alembic/env.py is the entrypoint during migrations. Load SQLAlchemy Base here so autogenerate can “see” your target tables.

3. DB settings in the app and Base

3.1 Settings

# app/settings.py
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
    database_url: str = "postgresql+psycopg://user:pass@localhost:5432/appdb"
    class Config:
        env_file = ".env"
def get_settings() -> Settings:
    return Settings()

3.2 Session & Base

# app/db.py
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, DeclarativeBase

class Base(DeclarativeBase): pass

def get_engine(url: str):
    return create_engine(url, pool_pre_ping=True)

def make_session_factory(url: str):
    return sessionmaker(bind=get_engine(url), autoflush=False, autocommit=False)

3.3 Model example

# app/models.py
from sqlalchemy.orm import Mapped, mapped_column
from sqlalchemy import String, Integer, DateTime, func
from app.db import Base

class Article(Base):
    __tablename__ = "articles"
    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    title: Mapped[str] = mapped_column(String(200), index=True, nullable=False)
    body: Mapped[str] = mapped_column(String, nullable=False)
    created_at: Mapped["DateTime"] = mapped_column(DateTime(timezone=True), server_default=func.now(), nullable=False)

4. Wire env.py to the FastAPI project

Add code in alembic/env.py to import the app’s Base and DB URL.

# alembic/env.py (excerpt)
from logging.config import fileConfig
from sqlalchemy import engine_from_config, pool
from alembic import context

# 1) Import the app's Base and settings
import os, sys
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
from app.db import Base
from app.settings import get_settings

config = context.config
if config.get_main_option("sqlalchemy.url") in (None, "", "sqlite:///"):
    config.set_main_option("sqlalchemy.url", get_settings().database_url)

target_metadata = Base.metadata

fileConfig(config.config_file_name)

def run_migrations_offline():
    url = config.get_main_option("sqlalchemy.url")
    context.configure(
        url=url,
        target_metadata=target_metadata,
        literal_binds=True,
        compare_type=True,      # detect type changes
        compare_server_default=True  # detect server defaults
    )
    with context.begin_transaction():
        context.run_migrations()

def run_migrations_online():
    connectable = engine_from_config(
        config.get_section(config.config_ini_section), prefix="sqlalchemy.", poolclass=pool.NullPool
    )
    with connectable.connect() as connection:
        context.configure(
            connection=connection,
            target_metadata=target_metadata,
            compare_type=True,
            compare_server_default=True
        )
        with context.begin_transaction():
            context.run_migrations()

if context.is_offline_mode():
    run_migrations_offline()
else:
    run_migrations_online()

Key points

  • Enable compare_type=True and compare_server_default=True to stabilize diff detection.
  • Pull URL from env or .env, overriding alembic.ini defaults.

5. Create the initial migration

5.1 Create base

alembic revision --autogenerate -m "init"
# Review → apply
alembic upgrade head

alembic/versions/<rev>_init.py gets generated. Review tables and indexes before applying.

5.2 Common manual tweaks

  • Mind DB specifics when changing string lengths (shrinking may lose data).
  • Expressions like server_default=func.now() differ by dialect; hand-specify if needed.
  • For composite indexes and unique constraints, set naming conventions and make them explicit.

6. Make a change and run diffs

As an example, add a slug column to Article with a unique constraint.

6.1 Model change

# app/models.py (addition)
from sqlalchemy import UniqueConstraint
class Article(Base):
    __tablename__ = "articles"
    # existing columns omitted
    slug: Mapped[str] = mapped_column(String(200), nullable=False, unique=True)
    __table_args__ = (UniqueConstraint("slug", name="uq_articles_slug"),)

6.2 Create diff → review → apply

alembic revision --autogenerate -m "add slug to articles"
alembic upgrade head

The script will contain op.add_column and op.create_unique_constraint. Check names, ordering, and the sanity of the downgrade step.


7. Data migrations (how to write them)

If a schema change needs initial values or transforms for existing rows, put data ops in the same revision.

# alembic/versions/xxxx_add_slug.py (excerpt)
from alembic import op
import sqlalchemy as sa

def upgrade():
    op.add_column("articles", sa.Column("slug", sa.String(length=200), nullable=True))
    # Safe order: allow NULL → backfill existing rows → make NOT NULL
    conn = op.get_bind()
    articles = conn.execute(sa.text("SELECT id, title FROM articles")).mappings().all()
    for row in articles:
        slug = row["title"].lower().replace(" ", "-")
        conn.execute(sa.text("UPDATE articles SET slug=:s WHERE id=:i"), {"s": slug, "i": row["id"]})
    op.alter_column("articles", "slug", nullable=False)
    op.create_unique_constraint("uq_articles_slug", "articles", ["slug"])

def downgrade():
    op.drop_constraint("uq_articles_slug", "articles", type_="unique")
    op.drop_column("articles", "slug")

Notes

  • With existing data, apply in stages: add column → backfill → add constraints.
  • Large updates can cause long locks; split into batches or schedule off-hours.

8. Transactions & rollback

  • In PostgreSQL and similar, each migration generally runs in a transaction and auto-rolls back on failure.
  • For heavyweight ops (e.g., type changes on huge tables), investigate impact in advance.
  • SQLite’s DDL transaction behavior is weaker; autogenerate results can vary with constraints. Validate on a production-parity RDB.

9. Operating across environments (dev/stg/prod)

  • dev: freely create/apply migrations; validate with seed data.
  • stg: apply tests on prod-equivalent DB; measure duration and lock time.
  • prod: coordinate with maintenance windows or rolling deploys; watch monitors and alerts as you run.

Tips

  • alembic current for current version; alembic history for revision history.
  • Emergency rollback via alembic downgrade -1, but avoid data-losing downgrades. Prefer forward fixes.

10. Handling branches & merges

When multiple development lines create revisions in parallel, the version graph can split.

  • To avoid conflicts, merge to the main branch often and use alembic revision --autogenerate -m "merge heads" to unify.
  • In the merge revision, ensure both diffs apply in a correct order.
  • Tight naming conventions and review, and aligning generation timing among devs reduces trouble.

11. Should FastAPI auto-apply migrations at startup?

  • For local development, “upgrade head on startup” is convenient.
  • In production, prefer human review + manual execution, or an explicit CI/CD pipeline step.
  • Running alembic upgrade head in a container entrypoint and aborting startup on failure works well with IaC.

Sample (shell):

set -e
alembic upgrade head
exec uvicorn app.main:app --host 0.0.0.0 --port 8000

12. Integrate into testing

  • For unit tests: create a test DB → alembic upgrade head → run tests → rollback/teardown to keep it clean each time.
  • For migrations that touch existing data, prepare snapshot-like dummy data and verify upgrade success and post-state consistency.
  • Confirm reentrancy: failing then rerunning shouldn’t corrupt results.

pytest sketch:

import subprocess, os
def setup_module():
    os.environ["DATABASE_URL"] = "postgresql+psycopg://user:pass@localhost:5432/testdb"
    subprocess.check_call(["alembic", "upgrade", "head"])

13. CI/CD flow

  • If a PR includes alembic revision --autogenerate, have CI run alembic upgrade head to validate.
  • After deploying to staging, record migration runtime and logs; relax alert thresholds before prod.
  • Clarify the release order: DB migration → app rollout.

14. Small operational tips

  • Naming: make intent obvious—create_<table>, add_<col>_to_<table>, drop_<col>_from_<table>—so history reads well.
  • Split big changes into multiple steps; don’t do everything at once.
  • For heavy index builds, consider parallel/online options (PostgreSQL CONCURRENTLY), though Alembic will need manual SQL.
  • Always measure on a staging DB close to production size.

15. Common pitfalls and remedies

Symptom Cause Remedy
Autogenerate sees nothing Base not imported; target_metadata unset Ensure Base.metadata is passed in env.py
Shrinking string length fails Existing data too long Query for overflow rows; agree on trimming policy; migrate in stages
Server default keeps diffing Dialect differences / function forms Keep compare_server_default, but pin function names by hand
Long locks Rewriting huge tables Consider online methods, staged runs, off-hours, concurrent index builds
Downgrade breaks things Destructive ops Prefer forward fixes; if rollback is mandatory, take backups beforehand

16. Concrete example: introducing a tag table and associations

We’ll do many-to-many between articles and tags in stages:

  1. Create tags table
  2. Create join table article_tags (composite PK)
  3. Attach tags to existing articles (data migration)
  4. Add required indexes

Model changes (excerpt):

# app/models.py (added example)
from sqlalchemy import Table, Column, ForeignKey
from sqlalchemy.orm import relationship

article_tags = Table(
    "article_tags",
    Base.metadata,
    Column("article_id", ForeignKey("articles.id", ondelete="CASCADE"), primary_key=True),
    Column("tag_id", ForeignKey("tags.id", ondelete="CASCADE"), primary_key=True),
)

class Tag(Base):
    __tablename__ = "tags"
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(50), unique=True, nullable=False)

class Article(Base):
    __tablename__ = "articles"
    # existing columns omitted
    tags = relationship("Tag", secondary=article_tags, backref="articles")

Generate the diff → review join-table indexes and ondelete policy → add data-load SQL to upgrade if needed.


17. Roadmap

  1. Initialize Alembic and tidy up env.py.
  2. Create the first table set via autogenerate, review, and apply.
  3. Make model changes → generate diffs → hand-tune → apply as the team standard.
  4. Write data migrations in a safe order; measure time/locks in tests and staging.
  5. Integrate with CI/CD; document branch merges and emergency response steps.

References


Wrap-up

  • Version-control your schema with Alembic; start with autogenerate and finish by hand.
  • For existing data, design staged changes paired with data migrations; always measure transaction/lock impact.
  • Make local, staging, and production run the same steps; wire checks into CI/CD.
  • Keep branch merges and naming conventions explicit so history stays readable. This way, model evolution directly maps to safe releases.

By greeden

Leave a Reply

Your email address will not be published. Required fields are marked *

日本語が含まれない投稿は無視されますのでご注意ください。(スパム対策)