rename: ppls-docs → crop-chem-docs
Repo/project rename to better reflect scope. PPLS is EPA's term for
their Pesticide Product Label System — accurate when the corpus was
EPA-only, narrow now that it also pulls from Bayer's own catalog
(and may expand to Syngenta/Corteva/BASF/FMC labels in the future).
crop-chem-docs scopes flexibly without acronyms to explain.
Renames:
- directory: ppls-docs → crop-chem-docs
- PRODUCT_NAME: ppls → crop_chem
- Chroma collection: ppls_docs → crop_chem_docs (in-place via .modify(), no re-embed)
- BM25 db: bm25/ppls_docs.db → bm25/crop_chem_docs.db
- MCP tool name: ppls_api_lessons → crop_chem_api_lessons
- FastMCP server name: ppls-docs → crop-chem-docs
- Env vars: PPLS_CORPUS_ROOT → CORPUS_ROOT
PPLS_CHROMA_DIR → CHROMA_DIR_OVERRIDE
- User-Agent: ppls-docs-scraper → crop-chem-docs-scraper
Preserved (intentional, correct):
- epa_ppls (source id) — refers specifically to EPA's PPLS database
- "EPA PPLS" mentions in regulatory text (lessons.md, server docstrings)
- PPLS_API_BASE / PPLS_PDF_BASE / PPLS_INDEX_URL_TEMPLATE in
scrape/sources/epa_ppls.py — these point at EPA's actual endpoints
Memory entries get updated in a follow-up commit so the rename is
isolated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+4
-4
@@ -5,7 +5,7 @@ into Chroma. With --rebuild, drops + recreates the collection (clean
|
||||
state). With --bm25-only, skips Chroma and rebuilds only the FTS5
|
||||
index — useful for fast iteration when chunking didn't change.
|
||||
|
||||
The corpus root honors PPLS_CORPUS_ROOT (matching the scrapers).
|
||||
The corpus root honors CORPUS_ROOT (matching the scrapers).
|
||||
The Chroma + BM25 stores stay at the repo root because both rely on
|
||||
filesystem locking semantics that vfat (typical USB drive) doesn't
|
||||
provide reliably.
|
||||
@@ -30,11 +30,11 @@ log = logging.getLogger(__name__)
|
||||
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s")
|
||||
|
||||
REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||
CORPUS_ROOT = Path(os.environ.get("PPLS_CORPUS_ROOT") or REPO_ROOT / "corpus")
|
||||
CHROMA_DIR = Path(os.environ.get("PPLS_CHROMA_DIR") or REPO_ROOT / "chroma")
|
||||
CORPUS_ROOT = Path(os.environ.get("CORPUS_ROOT") or REPO_ROOT / "corpus")
|
||||
CHROMA_DIR = Path(os.environ.get("CHROMA_DIR_OVERRIDE") or REPO_ROOT / "chroma")
|
||||
|
||||
# Collection name — convention: <product>_docs. Override via env.
|
||||
PRODUCT_NAME = os.environ.get("PRODUCT_NAME", "ppls")
|
||||
PRODUCT_NAME = os.environ.get("PRODUCT_NAME", "crop_chem")
|
||||
COLLECTION = f"{PRODUCT_NAME}_docs"
|
||||
|
||||
|
||||
|
||||
+3
-3
@@ -20,10 +20,10 @@ from typing import Iterable, Protocol
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
REPO_ROOT = Path(__file__).resolve().parent.parent
|
||||
CHROMA_DIR = Path(os.environ.get("PPLS_CHROMA_DIR") or REPO_ROOT / "chroma")
|
||||
CHROMA_DIR = Path(os.environ.get("CHROMA_DIR_OVERRIDE") or REPO_ROOT / "chroma")
|
||||
BM25_DB = Path(os.environ.get("BM25_DB",
|
||||
str(REPO_ROOT / "bm25" / "ppls_docs.db")))
|
||||
COLLECTION = f"{os.environ.get('PRODUCT_NAME', 'ppls')}_docs"
|
||||
str(REPO_ROOT / "bm25" / "crop_chem_docs.db")))
|
||||
COLLECTION = f"{os.environ.get('PRODUCT_NAME', 'crop_chem')}_docs"
|
||||
|
||||
|
||||
class Retriever(Protocol):
|
||||
|
||||
Reference in New Issue
Block a user