hvm-docs

Files

T

justin dd691b0111 rag: cap chunk size at 6KB to fit nomic-embed-text 2048-tok context

The chunker emits any single paragraph as a stand-alone chunk regardless
of size. One HVM page had a 14,858-char paragraph (a big config table) —
nomic-embed-text 400'd the entire embed batch because the model's context
is 2048 tokens. Added a hard-split fallback that splits any oversized
chunk on line boundaries to MAX_CHARS=6000 (~1500 tokens, headroom).

Also defaulted PRODUCT_NAME to "hvm" in rag/index.py to match server.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-22 13:06:35 -04:00

__init__.py

initial: docs-mcp-template — build guide + scaffolded server

2026-05-22 09:18:17 -04:00

bm25.py

initial: docs-mcp-template — build guide + scaffolded server

2026-05-22 09:18:17 -04:00

chunk.py

rag: cap chunk size at 6KB to fit nomic-embed-text 2048-tok context

2026-05-22 13:06:35 -04:00

embeddings.py

initial: docs-mcp-template — build guide + scaffolded server

2026-05-22 09:18:17 -04:00

index.py

rag: cap chunk size at 6KB to fit nomic-embed-text 2048-tok context

2026-05-22 13:06:35 -04:00