92a95d5e78
Cuts the PPIS-enumeration universe from 102K rows to ~11.5K rows by dropping products from non-row-crop-ag registrants BEFORE the per- product API call. This is the biggest cost lever we have on the EPA scraper — full backfill drops from ~28 h to ~3.5 h. scrape/sources/epa_registrant_allowlist.json holds the 34 verified ag-chem company numbers (Syngenta, Bayer, BASF, Corteva, FMC, Nufarm, ADAMA, UPL, Albaugh, Loveland, AMVAC, Helena, Drexel, Atticus, etc.). Each entry was verified by querying the EPA PPLS API for the first active product registered under that company number. Edit the JSON freely — scraper loads it at run time. Bypass with --no-registrant-filter when you suspect a row-crop product registered to a specialty company not on the list. Why a curated allowlist rather than blacklist consumer brands: the 102K PPIS rows are 89% non-ag-relevant; an allowlist is shorter to maintain and harder to false-positive. Excluded with intent (not omissions): Bayer Environmental Science (turf/ornamental), Scotts (consumer lawn & garden), Wellmark/Zoecon (animal flea/tick), Control Solutions (structural pest), Cleary (turf), PBI/Gordon (mostly turf), Buckman Labs (industrial water). Smoke test --limit 100: - 1239 PPIS rows considered (in first slice of file) - 1139 skipped by registrant filter (no API call paid) - 100 hit API, 81 filtered by row-crop sites, 19 written - = 91% API-call reduction over the prior version Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
43 lines
3.8 KiB
JSON
43 lines
3.8 KiB
JSON
{
|
|
"_comment": "EPA company numbers known to register pesticides primarily for major US row crops (corn, soybeans, wheat) and the broader ag-chem industry. Used by the EPA PPLS scraper as a pre-API filter in _iter_regnos to skip products from non-ag registrants without paying for the per-product API call. Add/remove companies here without changing code. To bypass entirely use --no-registrant-filter.",
|
|
"_verified_on": "2026-05-23",
|
|
"_source": "Each entry's registrant name was verified by querying the EPA PPLS API for the first active product registered under that company number.",
|
|
"_excluded_examples": "Bayer Environmental Science (432) — turf/ornamental; Scotts (538) — consumer lawn & garden; Wellmark/Zoecon (2724) — animal flea/tick; Control Solutions (53883) — structural pest; Cleary (1001) — turf; PBI/Gordon (2217) — mostly turf; Buckman Labs (1448) — industrial water.",
|
|
"companies": [
|
|
{"number": "100", "name": "Syngenta Crop Protection", "ppis_count": 1041},
|
|
{"number": "228", "name": "Nufarm Americas", "ppis_count": 587},
|
|
{"number": "241", "name": "BASF Agricultural Solutions", "ppis_count": 247},
|
|
{"number": "264", "name": "Bayer CropScience (Aventis)", "ppis_count": 660},
|
|
{"number": "279", "name": "FMC Corporation", "ppis_count": 1165},
|
|
{"number": "352", "name": "Corteva Agriscience (DuPont)", "ppis_count": 377},
|
|
{"number": "524", "name": "Bayer CropScience (Monsanto)", "ppis_count": 339},
|
|
{"number": "829", "name": "Southern Agricultural Insecticides", "ppis_count": 171},
|
|
{"number": "1381", "name": "Winfield Solutions", "ppis_count": 211},
|
|
{"number": "1812", "name": "Griffin LLC", "ppis_count": 242},
|
|
{"number": "2935", "name": "Wilbur-Ellis", "ppis_count": 321},
|
|
{"number": "5481", "name": "AMVAC Chemical", "ppis_count": 525},
|
|
{"number": "5905", "name": "Helena Agri-Enterprises", "ppis_count": 566},
|
|
{"number": "7969", "name": "BASF Agricultural Solutions", "ppis_count": 347},
|
|
{"number": "8033", "name": "Nippon Soda", "ppis_count": 75},
|
|
{"number": "9779", "name": "Winfield Solutions", "ppis_count": 260},
|
|
{"number": "10182", "name": "Syngenta Crop Protection", "ppis_count": 142},
|
|
{"number": "19713", "name": "Drexel Chemical", "ppis_count": 498},
|
|
{"number": "33270", "name": "Winfield Solutions", "ppis_count": 22},
|
|
{"number": "34704", "name": "Loveland Products", "ppis_count": 1027},
|
|
{"number": "42750", "name": "Albaugh", "ppis_count": 260},
|
|
{"number": "51036", "name": "BASF Agricultural Solutions", "ppis_count": 166},
|
|
{"number": "55146", "name": "Nufarm Americas", "ppis_count": 147},
|
|
{"number": "62719", "name": "Corteva Agriscience (Dow)", "ppis_count": 547},
|
|
{"number": "66222", "name": "Makhteshim Agan / ADAMA", "ppis_count": 192},
|
|
{"number": "67760", "name": "Cheminova", "ppis_count": 36},
|
|
{"number": "70506", "name": "UPL NA", "ppis_count": 444},
|
|
{"number": "71368", "name": "Nufarm", "ppis_count": 132},
|
|
{"number": "71512", "name": "ISK Biosciences", "ppis_count": 46},
|
|
{"number": "71711", "name": "Nichino America", "ppis_count": 65},
|
|
{"number": "84229", "name": "Tide International USA", "ppis_count": 47},
|
|
{"number": "87290", "name": "Generic Crop Science", "ppis_count": 63},
|
|
{"number": "89167", "name": "Axion Ag Products", "ppis_count": 119},
|
|
{"number": "91234", "name": "Atticus", "ppis_count": 338}
|
|
]
|
|
}
|