feat: Qualification Matrix + QuickSpecs bundles (live curl_cffi for HPE www) #5
Reference in New Issue
Block a user
Delete Branch "feat/qualmatrix-and-quickspecs"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Adds hvm_qualification_matrix (sd00006551en_us, 5 hardware/software tables via support.hpe.com API) and hvm_quickspecs (a50004260enw, 3 SKUs via curl_cffi Chrome120 against www.hpe.com).
Two new bundles: * hvm_qualification_matrix (sd00006551en_us) — the "Qualification Matrix for HVM Clusters Managed by HPE Morpheus Software". Single TOC bundle, 2 pages (parent + content). The content page is ~100 KB of HTML containing five tables: Server Hardware Support, Storage Hardware Support, Independent Software Vendor (ISV) Support, Hypervisor OS Compatibility and Interoperability Matrix, and Guest OS. Scraped via the same /hpesc/public/api/document/{docId}/render endpoint as every other bundle on support.hpe.com — the API returns server-rendered DITA HTML, so no JS/SPA shenanigans. * hvm_quickspecs (a50004260enw) — HPE Morpheus VM Essentials Software QuickSpecs, Version 4 (02-Feb-2026). SKUs: S5Q81AAE (1-yr per Socket E-LTU), S5Q82AAE (3-yr), S5Q83AAE (5-yr); each includes Tech Care Essentials. QuickSpecs lives at www.hpe.com (not support.hpe.com), which drops connections at the edge for non-browser TLS fingerprints — verified 2026-05-22 against curl, wget, urllib, and Anthropic's WebFetch (all = 0 bytes / connection timeout in headers). Bypassed here via curl_cffi impersonating Chrome 120's JA3/JA4 fingerprint. HTTP 200, 255 KB on first try, all four sections + all three SKUs cleanly parseable from the server-rendered HTML. New module scrape/quickspecs.py drives the live fetch + parse for any hvm_*_quickspecs bundle. CSS selectors taken from the captured DOM: .lr-right-rail hpe-highlights-container .collateral-content — one block per H3 section h3.txto-title — section title div.txto-description — section body uc-table.uc-table-polaris — SKU and version-history tables On any live failure the parser falls back to a committed HTML fixture at scrape/quickspecs/<doc_id>.html so the build never breaks on a transient edge hiccup. scrape/runner.py learned a new mode "html-file" that dispatches to scrape.quickspecs; bundles.py extended with an optional source_url on BundleSpec for cases where the page lives outside support.hpe.com. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>