rename: ppls-docs → crop-chem-docs

Repo/project rename to better reflect scope. PPLS is EPA's term for
their Pesticide Product Label System — accurate when the corpus was
EPA-only, narrow now that it also pulls from Bayer's own catalog
(and may expand to Syngenta/Corteva/BASF/FMC labels in the future).
crop-chem-docs scopes flexibly without acronyms to explain.

Renames:
- directory:           ppls-docs            → crop-chem-docs
- PRODUCT_NAME:        ppls                 → crop_chem
- Chroma collection:   ppls_docs            → crop_chem_docs  (in-place via .modify(), no re-embed)
- BM25 db:             bm25/ppls_docs.db    → bm25/crop_chem_docs.db
- MCP tool name:       ppls_api_lessons     → crop_chem_api_lessons
- FastMCP server name: ppls-docs            → crop-chem-docs
- Env vars:            PPLS_CORPUS_ROOT     → CORPUS_ROOT
                       PPLS_CHROMA_DIR      → CHROMA_DIR_OVERRIDE
- User-Agent:          ppls-docs-scraper    → crop-chem-docs-scraper

Preserved (intentional, correct):
- epa_ppls (source id) — refers specifically to EPA's PPLS database
- "EPA PPLS" mentions in regulatory text (lessons.md, server docstrings)
- PPLS_API_BASE / PPLS_PDF_BASE / PPLS_INDEX_URL_TEMPLATE in
  scrape/sources/epa_ppls.py — these point at EPA's actual endpoints

Memory entries get updated in a follow-up commit so the rename is
isolated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-24 12:25:59 -04:00
parent 3c3178a6ad
commit 1a45280e45
9 changed files with 31 additions and 31 deletions
+3 -3
View File
@@ -1,8 +1,8 @@
# PPLS API Lessons
# Crop-Chem API Lessons
Curated agronomy + label-handling knowledge that an LLM should know
*before* giving recommendations from the labels corpus. Surfaced by
the `ppls_api_lessons` MCP tool.
the `crop_chem_api_lessons` MCP tool.
Each top-level `## Topic: <slug>` block is independently retrievable.
The tool docstring tells the LLM to call this proactively before
@@ -12,7 +12,7 @@ answering any pesticide recommendation question.
## Topic: how-to-use-this-corpus
The PPLS docs corpus is the source of truth for *what's on the label*.
The crop-chem-docs label corpus is the source of truth for *what's on the label*.
You should:
1. **Run `search_docs` first** with the user's natural-language
+9 -9
View File
@@ -1,4 +1,4 @@
"""MCP server for the ppls-docs pesticide label corpus.
"""MCP server for the crop-chem-docs pesticide label corpus.
Adapted from the docs-mcp-template (which targeted versioned software
docs) for the EPA pesticide-labels domain: ``bundle_id`` → ``source``,
@@ -34,7 +34,7 @@ log = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Product configuration.
# ---------------------------------------------------------------------------
PRODUCT_NAME = os.environ.get("PRODUCT_NAME", "ppls")
PRODUCT_NAME = os.environ.get("PRODUCT_NAME", "crop_chem")
PRODUCT_DOCS_URL = os.environ.get(
"PRODUCT_DOCS_URL",
"https://ordspub.epa.gov/ords/pesticides/f?p=PPLS:1",
@@ -43,8 +43,8 @@ COLLECTION = f"{PRODUCT_NAME}_docs"
# Paths — corpus on (possibly) external storage, indexes always at repo root.
REPO_ROOT = Path(__file__).resolve().parent.parent
CORPUS_ROOT = Path(os.environ.get("PPLS_CORPUS_ROOT") or REPO_ROOT / "corpus")
CHROMA_DIR = Path(os.environ.get("PPLS_CHROMA_DIR") or REPO_ROOT / "chroma")
CORPUS_ROOT = Path(os.environ.get("CORPUS_ROOT") or REPO_ROOT / "corpus")
CHROMA_DIR = Path(os.environ.get("CHROMA_DIR_OVERRIDE") or REPO_ROOT / "chroma")
BM25_DB = Path(os.environ.get("BM25_DB",
str(REPO_ROOT / "bm25" / f"{PRODUCT_NAME}_docs.db")))
SOURCES_JSON = REPO_ROOT / "sources.json"
@@ -464,7 +464,7 @@ def list_versions() -> str:
cat = _sources()
# Source-level summary from sources.json
lines: list[str] = ["# PPLS docs corpus"]
lines: list[str] = ["# crop-chem-docs corpus"]
# Live counts from Chroma (best-effort; the server should still
# render a useful response if Chroma is unreachable)
@@ -628,7 +628,7 @@ def _load_lessons() -> tuple[str, list[tuple[str, str]]]:
@mcp.tool()
def ppls_api_lessons(
def crop_chem_api_lessons(
topic: Annotated[
str | None,
Field(description="OPTIONAL: topic slug or substring (e.g., "
@@ -654,7 +654,7 @@ def ppls_api_lessons(
warnings that make them actionable. Call this first; cite specific
lessons in your response.
"""
with TimedCall("ppls_api_lessons", {"topic": topic}) as _call:
with TimedCall("crop_chem_api_lessons", {"topic": topic}) as _call:
full, sections = _load_lessons()
if not sections:
_call.set(sections=0)
@@ -663,9 +663,9 @@ def ppls_api_lessons(
if not topic:
_call.set(sections=len(sections), returned="toc")
toc_lines = [
"# PPLS API lessons — table of contents",
"# Crop-Chem API lessons — table of contents",
"",
f"Call `ppls_api_lessons(topic='<slug>')` to fetch a specific section.",
f"Call `crop_chem_api_lessons(topic='<slug>')` to fetch a specific section.",
"",
]
for slug, body in sections: