initial: docs-mcp-template — build guide + scaffolded server

Template for building hosted MCP servers over a product's public documentation. Distilled from one production build; everything product-specific has been factored out. Contents: - PLAN.md — comprehensive build guide. 13 phases from project skeleton through weekly_digest. Includes the gotchas ("fetch-depth: 0 always", reranker per-pair token limit, Cloudflare body cap, dash-not-bash on Gitea runners), the decisions worth carrying forward, and a per-product customization checklist. - CLAUDE.md — guidance for Claude Code working in a clone of this template. Phase identification table, conventions (env-gating + operator confirmation for side-effecting tools, defensive fallback for retrieval components), common commands. - README.md — quick-start summary. Scaffolded code (all signature-stable, with NotImplementedError stubs where phase-specific work is required): docs_mcp/server.py FastMCP server, stateless_http=True, with search_docs / get_page / list_versions baseline tools and commented stubs for the rest of the phase set. docs_mcp/usage.py TimedCall telemetry, JSONL, daily rotation, 90-day retention. Reusable as-is. rag/embeddings.py Ollama embedder (nomic-embed-text default), load-balanced across N URLs. Reusable. rag/chunk.py Paragraph-aware chunker with synthetic chunk 0. Per-product tunable. rag/index.py Chroma + BM25 builder. --rebuild and --bm25-only flags. rag/bm25.py SQLite FTS5 lexical index. Reusable. scrape/changelog.py --cached / --ref / --json / --history-out. Reusable. scrape/README.md What you write per-product. eval/queries.jsonl.example Curate ~25 hand-labeled queries here. eval/retrievers.py Retriever protocol + stub classes. eval/run_eval.py MRR / Recall@K / nDCG@K harness skeleton. scripts/usage_report.py Standalone log analyzer; the FOLLOW-UP CHECKS pattern noted in the module docstring. scripts/registry_gc.py Gitea container registry cleanup. Reusable. Deployment + CI: Dockerfile Python 3.12-slim; COPY corpus + chroma + bm25 last for cache efficiency. deploy/docker-compose.yml MCP + reranker sidecar + Watchtower. Templated with <placeholders>. .gitea/workflows/refresh.yml Weekly cron + manual dispatch. fetch-depth: 0, retry-on-race, three-tag image scheme. .gitea/workflows/image-only.yml Code-only ship cycle, ~18min. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:18:17 -04:00
commit 9ba615c8ee
26 changed files with 3280 additions and 0 deletions
@@ -0,0 +1,277 @@
+"""SQLite FTS5-backed BM25 retrieval over the same chunks Chroma indexes.
+
+Hybrid retrieval (BM25 + dense + Reciprocal Rank Fusion) addresses a
+limit of single-tower dense embeddings: when a query has specific
+technical terms (filenames, language names, error codes, API paths),
+the dense embedding doesn't bridge from the query into a short
+code-focused chunk. The chunk loses to the much larger crowd of
+prose chunks that semantically match the query topic.
+
+BM25 handles this directly. Lexical overlap on rare terms ("python",
+"create_vpg.py", "PROTECTED_SITE_ID", "applyUpgrade") scores those
+chunks high. Fused with the dense ranking via RRF, the hybrid result
+is strictly better than either alone for the queries we've seen
+fail.
+
+Why SQLite FTS5:
+  - In the stdlib. Zero new deps.
+  - On-disk. Same persistence model as Chroma — Docker COPY the dir,
+    `rag.index --rebuild` regenerates from corpus.
+  - Built-in `bm25()` ranking function. No knobs to tune that matter
+    for our use case (k1=1.2, b=0.75 defaults are fine).
+  - Builds 70k+ chunks in seconds. Faster than the Chroma rebuild's
+    embedding step by 100×, so it adds basically nothing to the
+    full-rebuild cycle.
+
+Schema is two tables to keep filtering clean. FTS5 doesn't filter
+nicely on its own columns; the content_rowid pattern keeps an
+external metadata table joinable by rowid:
+
+    CREATE TABLE chunks_meta (
+        rowid INTEGER PRIMARY KEY AUTOINCREMENT,
+        id TEXT UNIQUE,
+        bundle_id TEXT, page_id TEXT, version TEXT,
+        platform TEXT, product TEXT, ordinal INTEGER
+    );
+    CREATE VIRTUAL TABLE chunks_fts USING fts5(
+        text,
+        tokenize = 'porter unicode61 remove_diacritics 2',
+        content = 'chunks_meta',
+        content_rowid = 'rowid'
+    );
+
+Queries:
+
+    SELECT m.id, bm25(chunks_fts) AS score
+    FROM chunks_meta m
+    JOIN chunks_fts  f ON m.rowid = f.rowid
+    WHERE f MATCH ?
+      AND m.version = ?            -- optional metadata filter
+    ORDER BY bm25(chunks_fts)      -- lower = better in FTS5
+    LIMIT ?;
+"""
+from __future__ import annotations
+
+import logging
+import re
+import sqlite3
+from pathlib import Path
+from typing import Any
+
+log = logging.getLogger(__name__)
+
+# Default location: bm25/<product>_docs.db at the repo root, next to chroma/.
+ROOT = Path(__file__).resolve().parent.parent
+DEFAULT_DB_DIR = ROOT / "bm25"
+DEFAULT_DB_NAME = "<product>_docs.db"
+
+# Columns we expose as filterable metadata. Mirrors what _build_where in
+# docs_mcp/server.py accepts so the same filter dicts work for both
+# Chroma and BM25 without per-retriever translation in the caller.
+FILTER_COLUMNS = ("bundle_id", "page_id", "version", "platform", "product", "ordinal")
+
+
+# Allowlist tokenizer for free-text queries. FTS5's parser chokes on lots
+# of punctuation we routinely see in user queries (".10.9", "?", "VPG's",
+# em-dash, etc.). Rather than blocklist every operator, just keep
+# alphanumerics + a few separators and replace everything else with a
+# space. This loses the ability to phrase-search ("exact match") but we
+# don't expose that to users anyway — they ask natural-language questions
+# and want the answer, not a Boolean DSL.
+_KEEP_RE = re.compile(r"[^A-Za-z0-9_\s]")
+# FTS5 reserves these Boolean operator KEYWORDS at the token level —
+# stripping them avoids accidental phrase-query behavior when a user
+# query happens to contain bare "AND", "OR", "NOT", "NEAR".
+_BOOLEAN_KW_RE = re.compile(r"(?<!\w)(AND|OR|NOT|NEAR)(?!\w)")
+
+
+def _sanitize_query(text: str) -> str:
+    """Reduce a natural-language query to an FTS5 OR-of-tokens query.
+
+    Two transformations:
+
+    1. Non-alphanumeric → space (drops punctuation; "10.9?" becomes
+       "10 9"). Lets us handle versions, parens, question marks, etc.
+       without inviting FTS5 parse errors.
+    2. Boolean keywords stripped (FTS5 reserves AND/OR/NOT/NEAR).
+    3. Tokens explicitly OR'd. FTS5's default is AND-of-tokens — for
+       any non-trivial natural-language query that means zero hits
+       (no chunk contains every word). OR semantics is what we want:
+       BM25 already weights documents containing more query terms
+       higher, so we don't lose precision, but we DO gain recall.
+    """
+    cleaned = _KEEP_RE.sub(" ", text)
+    cleaned = _BOOLEAN_KW_RE.sub(" ", cleaned)
+    tokens = cleaned.split()
+    if not tokens:
+        return ""
+    return " OR ".join(tokens)
+
+
+def _where_to_sql(where: dict | None) -> tuple[str, list[Any]]:
+    """Translate a Chroma-shaped filter dict into a SQL fragment + params.
+
+    Accepts the same shapes ``docs_mcp.server._build_where`` produces:
+
+        None                          → ("", [])
+        {"version": "10.9"}           → ("AND m.version = ?", ["10.9"])
+        {"$and": [{...}, {...}]}      → ("AND m.X = ? AND m.Y = ?", [...])
+
+    Unknown keys are silently dropped (defensive — better to over-match
+    than to crash on a filter we don't know).
+    """
+    if not where:
+        return "", []
+    parts: list[str] = []
+    params: list[Any] = []
+
+    def _emit_eq(cond: dict[str, Any]) -> None:
+        for k, v in cond.items():
+            if k in FILTER_COLUMNS:
+                parts.append(f"m.{k} = ?")
+                params.append(v)
+
+    if "$and" in where:
+        for sub in where["$and"]:
+            _emit_eq(sub)
+    else:
+        _emit_eq(where)
+    if not parts:
+        return "", []
+    return "AND " + " AND ".join(parts), params
+
+
+class BM25Index:
+    """Thin wrapper around an FTS5-backed sqlite db.
+
+    Single-writer model. Reads are connection-per-call (sqlite handles
+    concurrency through file locks; for our read-heavy workload that's
+    fine and avoids cross-thread connection sharing issues with the MCP
+    server's request handlers).
+    """
+
+    def __init__(self, db_path: Path | None = None):
+        self.db_path = Path(db_path) if db_path else (DEFAULT_DB_DIR / DEFAULT_DB_NAME)
+
+    # -- build ----------------------------------------------------------
+
+    def build(self, records: list[dict]) -> int:
+        """Rebuild the index from scratch from `records`.
+
+        `records` is the same list ``rag.index.page_records`` produces:
+        ``[{"id": ..., "text": ..., "metadata": {...}}, ...]``. Bulk
+        insert wrapped in a transaction — single-digit seconds for the
+        full 73k-chunk corpus.
+        """
+        self.db_path.parent.mkdir(parents=True, exist_ok=True)
+        # Drop and recreate. Idempotent rebuild.
+        if self.db_path.exists():
+            self.db_path.unlink()
+        with sqlite3.connect(self.db_path) as con:
+            con.executescript(self._schema_sql())
+            con.executemany(
+                "INSERT INTO chunks_meta (id, bundle_id, page_id, version, "
+                "platform, product, ordinal) VALUES (?, ?, ?, ?, ?, ?, ?)",
+                [
+                    (
+                        r["id"],
+                        r["metadata"].get("bundle_id") or "",
+                        r["metadata"].get("page_id") or "",
+                        r["metadata"].get("version") or "",
+                        r["metadata"].get("platform") or "",
+                        r["metadata"].get("product") or "",
+                        int(r["metadata"].get("ordinal") or 0),
+                    )
+                    for r in records
+                ],
+            )
+            # Populate the FTS5 contentless-ish table by rowid. We populated
+            # chunks_meta first; rowids align with insertion order.
+            con.executemany(
+                "INSERT INTO chunks_fts (rowid, text) VALUES (?, ?)",
+                [
+                    (i + 1, r["text"])
+                    for i, r in enumerate(records)
+                ],
+            )
+            con.commit()
+        log.info("bm25: indexed %d chunks → %s", len(records), self.db_path)
+        return len(records)
+
+    # -- query ----------------------------------------------------------
+
+    def query(
+        self,
+        text: str,
+        n: int = 200,
+        where: dict | None = None,
+    ) -> list[tuple[str, float]]:
+        """Return up to `n` (chunk_id, bm25_score) pairs, lowest score first.
+
+        FTS5's bm25() returns NEGATIVE numbers — more relevant docs have
+        smaller (more negative) scores. We order ASC so the first row is
+        the most relevant. Callers that need a "rank" should enumerate
+        the returned list.
+        """
+        sanitized = _sanitize_query(text)
+        if not sanitized:
+            return []
+        where_sql, params = _where_to_sql(where)
+        # FTS5 MATCH wants the unaliased table name on its left, so we use
+        # chunks_fts (no alias) and JOIN by rowid against chunks_meta.
+        sql = (
+            "SELECT m.id, bm25(chunks_fts) AS score "
+            "FROM chunks_fts "
+            "JOIN chunks_meta m ON m.rowid = chunks_fts.rowid "
+            f"WHERE chunks_fts MATCH ? {where_sql} "
+            "ORDER BY bm25(chunks_fts) "
+            "LIMIT ?"
+        )
+        try:
+            with sqlite3.connect(self.db_path) as con:
+                cur = con.execute(sql, [sanitized, *params, n])
+                return [(row[0], float(row[1])) for row in cur.fetchall()]
+        except sqlite3.OperationalError as e:
+            # FTS5 syntax error (rare after sanitization) or db missing.
+            # Caller decides whether to fall back to dense-only.
+            log.warning("bm25 query failed (%s); query=%r", e, sanitized[:80])
+            return []
+
+    def exists(self) -> bool:
+        """Cheap probe — does the index file exist on disk?"""
+        return self.db_path.exists()
+
+    def count(self) -> int:
+        """Number of chunks indexed. 0 if the db is missing or empty."""
+        if not self.exists():
+            return 0
+        try:
+            with sqlite3.connect(self.db_path) as con:
+                return con.execute("SELECT COUNT(*) FROM chunks_meta").fetchone()[0]
+        except sqlite3.OperationalError:
+            return 0
+
+    # -- schema ---------------------------------------------------------
+
+    @staticmethod
+    def _schema_sql() -> str:
+        return """
+        CREATE TABLE chunks_meta (
+            rowid     INTEGER PRIMARY KEY AUTOINCREMENT,
+            id        TEXT UNIQUE NOT NULL,
+            bundle_id TEXT,
+            page_id   TEXT,
+            version   TEXT,
+            platform  TEXT,
+            product   TEXT,
+            ordinal   INTEGER
+        );
+        CREATE INDEX idx_meta_version  ON chunks_meta(version);
+        CREATE INDEX idx_meta_platform ON chunks_meta(platform);
+        CREATE INDEX idx_meta_bundle   ON chunks_meta(bundle_id);
+
+        CREATE VIRTUAL TABLE chunks_fts USING fts5(
+            text,
+            tokenize = 'porter unicode61 remove_diacritics 2'
+        );
+        """
@@ -0,0 +1,126 @@
+"""Markdown chunker — paragraph-aware, ~400-600 token target.
+
+Adjust the chunking strategy per product if your page format differs
+significantly from prose. The output shape (id, text, metadata) is
+fixed by the downstream Chroma + BM25 indexing in rag/index.py — don't
+change that.
+
+The key knob you'll tune per product is chunk-0. Dense retrieval lands
+on chunk 0 first for most queries. Make it a synthetic chunk built
+from:
+
+  - the page title (as natural-language H1)
+  - a 1-sentence task description (you'll have to generate this — for
+    pages that already have a "## Overview" or "## Introduction" the
+    first sentence usually works)
+  - a keyword bag of important terms (filenames, API names, error
+    codes — the rare technical tokens that BM25 lights up on)
+
+Without a rich chunk 0, dense retrieval gets dominated by the much
+larger prose body, and short pages (script examples, reference cards)
+get buried.
+"""
+from __future__ import annotations
+
+import re
+from typing import Iterator
+
+
+# Approximate token estimate from char count. Tunable — set per
+# embedder if the default 4 chars/token is wrong.
+CHARS_PER_TOKEN = 4
+TARGET_TOKENS = 500
+TARGET_CHARS = TARGET_TOKENS * CHARS_PER_TOKEN
+
+
+def estimate_tokens(text: str) -> int:
+    return max(1, len(text) // CHARS_PER_TOKEN)
+
+
+def split_paragraphs(md: str) -> list[str]:
+    """Split markdown into paragraph-ish blocks.
+
+    Keeps fenced code blocks together (don't slice through ```).
+    Headings start new paragraphs.
+    """
+    blocks: list[str] = []
+    current: list[str] = []
+    in_fence = False
+    for line in md.splitlines(keepends=True):
+        stripped = line.strip()
+        if stripped.startswith("```"):
+            in_fence = not in_fence
+            current.append(line)
+            continue
+        if in_fence:
+            current.append(line)
+            continue
+        if stripped.startswith("#"):
+            if current:
+                blocks.append("".join(current).strip())
+                current = []
+            current.append(line)
+            continue
+        if not stripped and current and not "".join(current).strip().endswith("\n\n"):
+            current.append(line)
+            blocks.append("".join(current).strip())
+            current = []
+            continue
+        current.append(line)
+    if current:
+        blocks.append("".join(current).strip())
+    return [b for b in blocks if b]
+
+
+def chunks_from_page(
+    text: str,
+    page_id: str,
+    metadata: dict,
+) -> Iterator[dict]:
+    """Yield chunk dicts ready for index.py to upsert.
+
+    The synthetic chunk 0 is the per-product customization point. The
+    default below is a simple title + body-first-paragraph; rewrite
+    for richer retrieval signal (see module docstring).
+    """
+    paragraphs = split_paragraphs(text)
+    if not paragraphs:
+        return
+
+    # ----- Chunk 0: synthetic anchor for dense retrieval ---------
+    title = metadata.get("title") or page_id
+    first_para = next((p for p in paragraphs if not p.startswith("#")), "")
+    chunk0_body = (
+        f"# {title}\n\n"
+        f"{first_para[:300]}"
+        # TODO per product: append a keyword bag here (filenames,
+        # API names, error codes) for BM25 + dense joint coverage.
+    )
+    yield {
+        "id":       f"{metadata['bundle_id']}::{page_id}::0",
+        "text":     chunk0_body,
+        "metadata": {**metadata, "ordinal": 0},
+    }
+
+    # ----- Body chunks: pack paragraphs up to TARGET_CHARS -------
+    ordinal = 1
+    buf: list[str] = []
+    buf_chars = 0
+    for p in paragraphs:
+        if buf_chars + len(p) > TARGET_CHARS and buf:
+            yield {
+                "id":       f"{metadata['bundle_id']}::{page_id}::{ordinal}",
+                "text":     "\n\n".join(buf),
+                "metadata": {**metadata, "ordinal": ordinal},
+            }
+            ordinal += 1
+            buf = []
+            buf_chars = 0
+        buf.append(p)
+        buf_chars += len(p)
+    if buf:
+        yield {
+            "id":       f"{metadata['bundle_id']}::{page_id}::{ordinal}",
+            "text":     "\n\n".join(buf),
+            "metadata": {**metadata, "ordinal": ordinal},
+        }
@@ -0,0 +1,72 @@
+"""Embedding function for Chroma — Ollama-hosted nomic-embed-text by default.
+
+Swappable: implement the same `embedding_function()` interface returning
+a Chroma `EmbeddingFunction` and the rest of the pipeline doesn't care.
+
+Defaults (override via env):
+  OLLAMA_URL    one or more comma-separated URLs (load-balanced)
+  EMBED_MODEL   model name; default 'nomic-embed-text'
+  EMBED_DIM     expected embedding dim; default 768 (nomic-embed-text)
+"""
+from __future__ import annotations
+
+import os
+import logging
+from typing import Any
+
+import httpx
+from chromadb import EmbeddingFunction, Documents, Embeddings
+
+log = logging.getLogger(__name__)
+
+OLLAMA_URLS = [u.strip() for u in os.environ.get("OLLAMA_URL",
+               "http://localhost:11434").split(",") if u.strip()]
+EMBED_MODEL = os.environ.get("EMBED_MODEL", "nomic-embed-text")
+EMBED_DIM = int(os.environ.get("EMBED_DIM", "768"))
+
+
+class OllamaEmbeddings(EmbeddingFunction):
+    """Calls /api/embed across N Ollama endpoints, naive round-robin.
+
+    For indexing throughput on multiple GPUs, run one Ollama container
+    per GPU (pinned via NVIDIA_VISIBLE_DEVICES) and pass all their URLs
+    in OLLAMA_URL — the embedder picks the next endpoint per batch.
+    """
+
+    def __init__(self, urls: list[str] = OLLAMA_URLS, model: str = EMBED_MODEL):
+        self.urls = urls
+        self.model = model
+        self._next = 0
+
+    def __call__(self, input: Documents) -> Embeddings:
+        url = self.urls[self._next % len(self.urls)]
+        self._next += 1
+        with httpx.Client(timeout=300) as c:
+            r = c.post(f"{url}/api/embed",
+                       json={"model": self.model, "input": list(input)})
+            r.raise_for_status()
+            data = r.json()
+        return data.get("embeddings") or []
+
+    def name(self) -> str:                  # newer chromadb requires this
+        return f"ollama:{self.model}"
+
+    @staticmethod
+    def build_from_config(config: dict) -> "OllamaEmbeddings":  # newer chromadb
+        return OllamaEmbeddings(
+            urls=config.get("urls", OLLAMA_URLS),
+            model=config.get("model", EMBED_MODEL),
+        )
+
+    def get_config(self) -> dict:           # newer chromadb
+        return {"urls": self.urls, "model": self.model}
+
+    def default_space(self) -> str:
+        return "cosine"
+
+    def supported_spaces(self) -> list[str]:
+        return ["cosine", "l2", "ip"]
+
+
+def embedding_function() -> EmbeddingFunction:
+    return OllamaEmbeddings()
@@ -0,0 +1,134 @@
+"""Build Chroma (and optionally BM25) indexes from corpus on disk.
+
+Reads `corpus/<bundle>/<page>.{md,json}`, chunks each page, upserts
+into Chroma. With --rebuild, drops + recreates the collection (clean
+state). With --bm25-only, skips Chroma and rebuilds only the FTS5
+index — useful for fast iteration when chunking didn't change.
+"""
+from __future__ import annotations
+
+import argparse
+import json
+import logging
+import time
+from pathlib import Path
+from typing import Iterator
+
+import chromadb
+from chromadb.config import Settings
+
+from .chunk import chunks_from_page
+from .embeddings import embedding_function
+
+log = logging.getLogger(__name__)
+logging.basicConfig(level=logging.INFO, format="%(asctime)s  %(message)s")
+
+ROOT = Path(__file__).resolve().parent.parent
+CORPUS = ROOT / "corpus"
+CHROMA_DIR = ROOT / "chroma"
+
+# Collection name — convention: <product>_docs. Override via env if needed.
+import os
+PRODUCT_NAME = os.environ.get("PRODUCT_NAME", "myproduct")
+COLLECTION = f"{PRODUCT_NAME}_docs"
+
+
+def page_records() -> Iterator[dict]:
+    """Walk corpus/, yield chunks for every page."""
+    if not CORPUS.exists():
+        log.error("corpus/ doesn't exist; run the scraper first")
+        return
+    for bundle_dir in sorted(CORPUS.iterdir()):
+        if not bundle_dir.is_dir() or bundle_dir.name.startswith("."):
+            continue
+        for md_path in sorted(bundle_dir.glob("*.md")):
+            page_id = md_path.stem
+            sidecar = md_path.with_suffix(".json")
+            if not sidecar.exists():
+                log.warning("skipping %s — no JSON sidecar", md_path)
+                continue
+            md = md_path.read_text()
+            meta = json.loads(sidecar.read_text())
+            # Surface common filter fields at the chunk-metadata level
+            # so Chroma's `where` filter can use them.
+            base_meta = {
+                "bundle_id": bundle_dir.name,
+                "page_id":   page_id,
+                "title":     meta.get("title") or "",
+                "version":   meta.get("version") or "",
+                "platform":  meta.get("platform") or "",
+                "product":   meta.get("product") or "",
+            }
+            yield from chunks_from_page(md, page_id, base_meta)
+
+
+def upsert_to_chroma(records: list[dict]) -> int:
+    client = chromadb.PersistentClient(
+        path=str(CHROMA_DIR),
+        settings=Settings(anonymized_telemetry=False),
+    )
+    # Drop + recreate for --rebuild semantics
+    try:
+        client.delete_collection(COLLECTION)
+    except Exception:
+        pass
+    col = client.create_collection(COLLECTION, embedding_function=embedding_function())
+
+    BATCH = 64
+    total = 0
+    for i in range(0, len(records), BATCH):
+        chunk = records[i:i + BATCH]
+        col.upsert(
+            ids=[r["id"] for r in chunk],
+            documents=[r["text"] for r in chunk],
+            metadatas=[r["metadata"] for r in chunk],
+        )
+        total += len(chunk)
+        log.info("upserted %d / %d chunks", total, len(records))
+    return total
+
+
+def main() -> int:
+    p = argparse.ArgumentParser()
+    p.add_argument("--rebuild", action="store_true",
+                   help="Drop and recreate the Chroma collection.")
+    p.add_argument("--bm25-only", action="store_true",
+                   help="Rebuild only the BM25 index, skip Chroma.")
+    p.add_argument("--bm25-db", type=Path,
+                   default=ROOT / "bm25" / f"{PRODUCT_NAME}_docs.db",
+                   help="Path to the BM25 sqlite db.")
+    args = p.parse_args()
+
+    log.info("reading corpus from %s", CORPUS)
+    t0 = time.time()
+    records = list(page_records())
+    log.info("loaded %d chunks in %.1fs", len(records), time.time() - t0)
+
+    if args.bm25_only:
+        from .bm25 import BM25Index
+        log.info("--bm25-only: building FTS5 only")
+        BM25Index(args.bm25_db).build(records)
+        return 0
+
+    if not args.rebuild:
+        log.info("no --rebuild; nothing to do. (Use --rebuild to upsert.)")
+        return 0
+
+    t_c = time.time()
+    n = upsert_to_chroma(records)
+    log.info("chroma: %d chunks in %.1fs", n, time.time() - t_c)
+
+    # Build BM25 too — see PLAN.md Phase 8. Safe to remove this block
+    # for products that don't need hybrid retrieval.
+    try:
+        from .bm25 import BM25Index
+        t_b = time.time()
+        BM25Index(args.bm25_db).build(records)
+        log.info("bm25 done in %.1fs", time.time() - t_b)
+    except ImportError:
+        log.info("rag.bm25 not available — skipping BM25 build")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())