94b5caa7e5
Defense-in-depth for the deploy pipeline. Today a backend image shipped ahead of an un-applied migration; the Tree model selected columns the DB didn't have yet, so every trees query 500'd with an opaque UndefinedColumnError and the UI showed no trees. The root cause (deploys not running migrations) is fixed separately; this makes the *symptom* impossible to miss. - app/core/schema_version.py: compare the DB's stamped alembic head to the head(s) baked into the image's migration scripts. A DB with no alembic_version table (e.g. a create_all test DB) is treated as current, so this stays quiet outside real deployments. Uses to_regclass so a missing table never poisons the caller's transaction. - /health/ready: returns 503 with an explicit "drift: db=… expected=…" message when the schema is behind, instead of reporting ready and serving 500s. - Startup lifespan: logs CRITICAL on drift (advisory — never blocks startup). Liveness (/health) is untouched, so a drifted container isn't killed into a crash-loop — it's loudly degraded and self-heals once migrations apply. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>
60 lines
2.5 KiB
Python
60 lines
2.5 KiB
Python
"""Schema-drift detection — a safety net for the deploy pipeline.
|
|
|
|
If a deploy ships code whose models reference a column a migration hasn't added
|
|
yet (the code is ahead of the DB), every query against that table 500s with an
|
|
opaque ``UndefinedColumnError``. That is exactly the failure that took the tree
|
|
list down once: the backend image advanced but ``alembic upgrade head`` hadn't
|
|
run on the server.
|
|
|
|
The real prevention is auto-migrate on deploy (the entrypoint runs
|
|
``alembic upgrade head`` when ``RUN_MIGRATIONS=1``). This module is defense in
|
|
depth: it makes the drift *loud and explicit* — a readiness failure and a
|
|
CRITICAL startup log — instead of a silent storm of 500s, so a half-applied
|
|
deploy is obvious within seconds.
|
|
"""
|
|
|
|
from functools import lru_cache
|
|
from pathlib import Path
|
|
|
|
from sqlalchemy import text
|
|
from sqlalchemy.ext.asyncio import AsyncConnection
|
|
|
|
# app/core/schema_version.py -> backend/ (parents: core, app, backend)
|
|
_MIGRATIONS_DIR = Path(__file__).resolve().parents[2] / "migrations"
|
|
|
|
|
|
@lru_cache
|
|
def expected_heads() -> frozenset[str]:
|
|
"""Revision head(s) baked into this image's migration scripts. Static for a
|
|
given build, so cache it."""
|
|
from alembic.config import Config
|
|
from alembic.script import ScriptDirectory
|
|
|
|
cfg = Config()
|
|
cfg.set_main_option("script_location", str(_MIGRATIONS_DIR))
|
|
return frozenset(ScriptDirectory.from_config(cfg).get_heads())
|
|
|
|
|
|
async def db_heads(conn: AsyncConnection) -> frozenset[str] | None:
|
|
"""Revision(s) the database is stamped at, or ``None`` when the DB is not
|
|
Alembic-managed (no ``alembic_version`` table — e.g. a test DB built straight
|
|
from ``create_all``). ``to_regclass`` returns NULL rather than erroring when
|
|
the table is absent, so this never poisons the caller's transaction."""
|
|
if await conn.scalar(text("SELECT to_regclass('public.alembic_version')")) is None:
|
|
return None
|
|
result = await conn.execute(text("SELECT version_num FROM alembic_version"))
|
|
return frozenset(row[0] for row in result)
|
|
|
|
|
|
async def schema_is_current(
|
|
conn: AsyncConnection,
|
|
) -> tuple[bool, frozenset[str], frozenset[str]]:
|
|
"""``(ok, db, expected)``. ``ok`` is True when the DB is stamped at the
|
|
code's head(s). A DB with no ``alembic_version`` table is treated as current
|
|
(not Alembic-managed → nothing to compare), so this stays quiet in tests."""
|
|
expected = expected_heads()
|
|
current = await db_heads(conn)
|
|
if current is None:
|
|
return True, frozenset(), expected
|
|
return current == expected, current, expected
|