Files
crop-chem-docs/scrape/sources/epa_registrant_allowlist.json
justin 92a95d5e78 epa_ppls: add registrant allowlist pre-API filter
Cuts the PPIS-enumeration universe from 102K rows to ~11.5K rows by
dropping products from non-row-crop-ag registrants BEFORE the per-
product API call. This is the biggest cost lever we have on the EPA
scraper — full backfill drops from ~28 h to ~3.5 h.

scrape/sources/epa_registrant_allowlist.json holds the 34 verified
ag-chem company numbers (Syngenta, Bayer, BASF, Corteva, FMC, Nufarm,
ADAMA, UPL, Albaugh, Loveland, AMVAC, Helena, Drexel, Atticus, etc.).
Each entry was verified by querying the EPA PPLS API for the first
active product registered under that company number. Edit the JSON
freely — scraper loads it at run time. Bypass with
--no-registrant-filter when you suspect a row-crop product registered
to a specialty company not on the list.

Why a curated allowlist rather than blacklist consumer brands: the
102K PPIS rows are 89% non-ag-relevant; an allowlist is shorter to
maintain and harder to false-positive.

Excluded with intent (not omissions): Bayer Environmental Science
(turf/ornamental), Scotts (consumer lawn & garden), Wellmark/Zoecon
(animal flea/tick), Control Solutions (structural pest), Cleary
(turf), PBI/Gordon (mostly turf), Buckman Labs (industrial water).

Smoke test --limit 100:
  - 1239 PPIS rows considered (in first slice of file)
  - 1139 skipped by registrant filter (no API call paid)
  - 100 hit API, 81 filtered by row-crop sites, 19 written
  - = 91% API-call reduction over the prior version

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 23:55:38 -04:00

43 lines
3.8 KiB
JSON

{
"_comment": "EPA company numbers known to register pesticides primarily for major US row crops (corn, soybeans, wheat) and the broader ag-chem industry. Used by the EPA PPLS scraper as a pre-API filter in _iter_regnos to skip products from non-ag registrants without paying for the per-product API call. Add/remove companies here without changing code. To bypass entirely use --no-registrant-filter.",
"_verified_on": "2026-05-23",
"_source": "Each entry's registrant name was verified by querying the EPA PPLS API for the first active product registered under that company number.",
"_excluded_examples": "Bayer Environmental Science (432) — turf/ornamental; Scotts (538) — consumer lawn & garden; Wellmark/Zoecon (2724) — animal flea/tick; Control Solutions (53883) — structural pest; Cleary (1001) — turf; PBI/Gordon (2217) — mostly turf; Buckman Labs (1448) — industrial water.",
"companies": [
{"number": "100", "name": "Syngenta Crop Protection", "ppis_count": 1041},
{"number": "228", "name": "Nufarm Americas", "ppis_count": 587},
{"number": "241", "name": "BASF Agricultural Solutions", "ppis_count": 247},
{"number": "264", "name": "Bayer CropScience (Aventis)", "ppis_count": 660},
{"number": "279", "name": "FMC Corporation", "ppis_count": 1165},
{"number": "352", "name": "Corteva Agriscience (DuPont)", "ppis_count": 377},
{"number": "524", "name": "Bayer CropScience (Monsanto)", "ppis_count": 339},
{"number": "829", "name": "Southern Agricultural Insecticides", "ppis_count": 171},
{"number": "1381", "name": "Winfield Solutions", "ppis_count": 211},
{"number": "1812", "name": "Griffin LLC", "ppis_count": 242},
{"number": "2935", "name": "Wilbur-Ellis", "ppis_count": 321},
{"number": "5481", "name": "AMVAC Chemical", "ppis_count": 525},
{"number": "5905", "name": "Helena Agri-Enterprises", "ppis_count": 566},
{"number": "7969", "name": "BASF Agricultural Solutions", "ppis_count": 347},
{"number": "8033", "name": "Nippon Soda", "ppis_count": 75},
{"number": "9779", "name": "Winfield Solutions", "ppis_count": 260},
{"number": "10182", "name": "Syngenta Crop Protection", "ppis_count": 142},
{"number": "19713", "name": "Drexel Chemical", "ppis_count": 498},
{"number": "33270", "name": "Winfield Solutions", "ppis_count": 22},
{"number": "34704", "name": "Loveland Products", "ppis_count": 1027},
{"number": "42750", "name": "Albaugh", "ppis_count": 260},
{"number": "51036", "name": "BASF Agricultural Solutions", "ppis_count": 166},
{"number": "55146", "name": "Nufarm Americas", "ppis_count": 147},
{"number": "62719", "name": "Corteva Agriscience (Dow)", "ppis_count": 547},
{"number": "66222", "name": "Makhteshim Agan / ADAMA", "ppis_count": 192},
{"number": "67760", "name": "Cheminova", "ppis_count": 36},
{"number": "70506", "name": "UPL NA", "ppis_count": 444},
{"number": "71368", "name": "Nufarm", "ppis_count": 132},
{"number": "71512", "name": "ISK Biosciences", "ppis_count": 46},
{"number": "71711", "name": "Nichino America", "ppis_count": 65},
{"number": "84229", "name": "Tide International USA", "ppis_count": 47},
{"number": "87290", "name": "Generic Crop Science", "ppis_count": 63},
{"number": "89167", "name": "Axion Ag Products", "ppis_count": 119},
{"number": "91234", "name": "Atticus", "ppis_count": 338}
]
}