docs: add product backlog (genealogy feature gap analysis) #45

Merged
justin merged 1 commits from add-product-backlog into main 2026-06-09 09:19:23 -04:00
+434
View File
@@ -0,0 +1,434 @@
<!-- Generated by the genealogy-feature-gap-backlog workflow on 2026-06-09. -->
<!-- Gap analysis vs commercial (Ancestry/MyHeritage/FamilySearch) and OSS
(GRAMPS/Gramps Web/webtrees) genealogy software, verified against the
codebase. Statuses reflect the repo at workflow launch (before the
tree-visibility phases 1-3 landed; some items are now closed). -->
# Provenance — Product Backlog
> Status legend: **Have** (shipped) · **Partial** (substrate exists, surface incomplete) · **Planned** (on roadmap, no code) · **Missing** (no code, off roadmap).
> Importance: Critical / High / Medium / Low. Effort: S / M / L / XL.
> Phase references map onto the existing 09 roadmap. "NN#" = non-negotiable invariant.
---
## 1. Executive summary
**Where Provenance is strong today.** The foundation is genuinely solid and, in several places, ahead of the OSS field:
- **Sources-first spine is real.** A reusable `Source` + per-fact `Citation` two-tier model with a `exactly_one_target` CHECK constraint, confidence enum, and full backend CRUD. This is the architectural thing webtrees/Gramps get right and most commercial tools bury. (Caveat: citations are silently dropped on GEDCOM *export* — see below.)
- **Privacy architecture is the right shape.** A single `privacy.py` engine, `TenantScoped` mixin on every row, living-person heuristic (`is_possibly_living`, unknown-birth-treated-as-living), and media served **through the backend rather than via raw S3 URLs**. The *shape* is correct; coverage is not yet complete (the media endpoint and several child resources don't yet apply `person_visibility` — see §2.4, §2.10).
- **Non-destructive by design.** Soft-delete with timed purge worker, immutable `AuditEntry` (before/after JSONB, `actor_type` ready for the assistant), GEDCOM merge that copies rather than overwrites, full account export/import.
- **Modeling maturity.** Typed parent/child qualifiers (biological/adoptive/step/foster/donor/guardian), typed alternate names with one-primary invariant, dual verbatim+normalized dates, duplicate-relationship guards, UUID surrogate keys.
- **Standards core.** GEDCOM 5.5.1 import/export is **functional** (with preview/merge-vs-create resolution UI), pg_trgm fuzzy name search, multi-tenant tree hosting with visibility tiers. Round-trip *fidelity* has four tracked gaps (citation links, custom tags, PLAC coords/hierarchy, non-UTF-8 encoding) — see §2.11.
**Documentation-vs-code gaps to correct now (per "docs travel with code").** Three repo claims are not yet true and should be edited in the same spirit they were written:
- **ChangeProposal is documented as landed but does not exist.** CLAUDE.md states the core data model (ARCHITECTURE §5) landed / "Phase 0 complete," but `ChangeProposal` — part of §5 and the load-bearing AI invariant — has no model, migration, or schema. Either scope it out of the "landed" claim or build it; don't leave the docs asserting it.
- **pgvector is claimed as used; it is not.** Only `pg_trgm` is created. ARCHITECTURE references pgvector for match ranking.
- **i18n "from day one" is documented but unmet.** PRD §6 promises externalized strings; every label is a hardcoded literal.
These three doc edits are themselves trivial quick wins (see §3).
**The biggest gaps vs commercial (Ancestry / MyHeritage / FamilySearch).** Provenance is not trying to be a record provider, and correctly so — but it is missing several things mainstream users treat as table stakes:
- **No record hints, no "save to tree," no connector framework.** The entire SourceConnector layer (FamilySearch/Find A Grave/WikiTree) is unbuilt — this gates AI search, hints, and auto-citation.
- **No person merge outside GEDCOM import.** Merging duplicate people is fundamental hygiene and is currently impossible in-tree — the single highest-value near-term matching gap.
- **No maps at all.** No place autocomplete, no geocoding, no interactive/migration/birthplace maps — a glaring hole for an app whose thesis is *family **and** land*.
- **No report/print/PDF output.** Charts render on-screen only; there is no Ahnentafel, family group sheet, narrative report, or any PDF/SVG/HTML export. The whole "Charts, reports & printing" category is on-screen-viewing only.
- **DNA absent** (deliberately parked — treat as open question, not a gap).
**The biggest gaps vs OSS (GRAMPS / Gramps Web / webtrees).** These are where a privacy-first self-host product is expected to compete and currently trails:
- **Collaboration is plumbed but unreachable.** `TreeMembership` roles are enforced on every read/write, but there is **no API or UI to invite, grant, change, or revoke** a member — the tree is effectively single-user despite multi-user infrastructure. This also breaks the full-CRUD invariant (NN#8) and, because importance and the old Phase-6 schedule disagree, a minimal management slice is pulled forward (§2.9).
- **Living-person redaction is non-uniform.** Redaction is applied on person reads but **not** on the event/media/name/relationship/citation/source child-resource endpoints — a real PII leak on public/unlisted trees (NN#3, NN#2).
- **`site_members` visibility tier is silently broken** (defined, selectable in UI, never handled in `can_view_tree`).
- **No place as a usable first-class entity** (model exists, created by GEDCOM, but no read/edit/delete — a create-only entity, which is a bug per NN#8).
- **No research log, to-do/task planner, kinship calculator, data-quality checker, or i18n/string externalization** (the last is a documented day-one commitment that is currently unmet).
**Security-priority correctness fixes (do these first, regardless of phase).** Three current defects are user-harm or trust issues, not roadmap items:
1. **Media privacy leak (§2.4)**`list_media`/`get_media`/`media_content` gate on `can_view_tree` but never `person_visibility`; non-owners can download photos of redacted living people on public/unlisted trees.
2. **Child-resource redaction gap (§2.10)** — event/media/name/relationship/citation/source endpoints don't apply living-person redaction.
3. **Registration issues a live session before verification (§2.10)**`register` returns an authenticated session cookie + token (201) and `email_verified_at` is written but never read on any path; there is no env switch to gate self-registration. The *enforcement check* (read-side `email_verified_at`) is small; the approval-mode env switch is the larger piece.
**Strategic posture.** The differentiators worth pressing — property chain-of-title, the ChangeProposal AI model, the anonymous mutual-consent hint system, and true self-host data ownership — are mostly still ahead on the roadmap. The near-term job is (a) close the **privacy/auth correctness** and **collaboration** gaps that the architecture already implies, (b) ship the **maps + reports + merge** table stakes, and (c) build the **connector/ModelProvider/ChangeProposal** spine that unlocks the entire back half of the roadmap.
---
## 2. Backlog by category
### 2.1 Tree & data model
Core CRUD, typed relationships, dates, soft-delete, and naming are **have**. Remaining work is about reusable sub-entities, shared/event-centric modeling, and research-grade conveniences.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Repository as first-class entity | Promote `Source.repository` string to a reusable `Repository` (name/address/call-numbers) with dedup. | Partial | Med | M | 12 | If promoted, full CRUD in API+UI (NN#8) — don't half-build. |
| Note as first-class entity (SNOTE) | Promote inline `notes` text fields to reusable shared `Note`/SNOTE records. | Missing | Low | M | 2 | Full CRUD; GEDCOM 7 round-trip parity. |
| Shared/event-centric model + witnesses | Remove the `subject_person_xor_relationship` XOR; add participant/role join so one event has many people (FAN/cluster research). | Missing | Med | M | later | Unlocks FAN club + richer sourcing; participants must redact via privacy engine. |
| Non-family associations (FAN) | Add associate/neighbor relationship types; best delivered with shared-event participants. | Missing | Low | M | later | — |
| Relationship-status enum | Add married/divorced/annulled status on partnership rather than inferring from events. | Partial | High | M | 12 | — |
| Family/couple unit (GEDCOM FAM) | Persist a true FAM entity (own ID/sources, childless couples) instead of rebuilding on export. | Partial | High | L | 2 | Improves GEDCOM fidelity. |
| Kinship / relationship calculator | "How is A related to B" path + cousinship. Graph edges already exist. | Missing | High | M | 12 | Self-contained; reads via privacy engine. |
| **Read-only audit-log viewer / activity feed** | Surface `AuditEntry` as a per-tree/per-person change feed. Smaller and higher-leverage than value-level undo; partially satisfies NN#8's "read" for AuditEntry and is the substrate for watch/follow + webhooks. | Missing | High | M | 2 | Privacy-filtered projections only — never raw before/after JSON to non-members (NN#2/#3). |
| Per-field revision history + restore-prior-value | Value-level history view + undo, built atop the audit feed above. | Partial | High | L | 6 | Audit-log *UI* is the feed item; this is the larger value-level-undo work (NN#8 correction ethos). |
| Color-coded tags & custom labels | Tag people for lineages/research-status/grouping. | Missing | Med | M | 2 | Full CRUD; tenant-scoped. |
| Person timeline / LifeStory | Sort the merged event list; add place/age enrichment + narrative presentation. | Partial | Med | M | 2 | Sort is trivial (`localeCompare` on `date_start`); narrative is the larger piece. |
| Multi-calendar normalization | Store + parse Julian/Hebrew/French Republican (only `calendar` tag stored today, only Gregorian normalized). | Partial | Low | M | 2 | See also Localization §2.17. |
| Evidence/persona vs conclusion model | GEDCOM-X persona layer separate from conclusion person. | Missing | Med | XL | later | Large modeling change; strengthens sourcing + hint matching. |
| Negative assertions | Boolean "event did not happen" on Event. | Missing | Low | S | 2 | Cheap interop nicety. |
| Custom groups / networks | Named manual or rules-based groupings. | Missing | Low | M | later | Lower priority than tags. |
| Raw GEDCOM record editor / configurable fact tabs | webtrees-style raw editor + fact-type registry. | Partial | Med | L | later | Open vocabularies give de-facto custom facts today. |
| Health/medical, historical-facts index, LDS ordinances | Niche entities. | Missing | Low | ML | later | LDS BAPL/ENDL/SLGS should map to distinct types if ever pursued; medical is special-category PII. |
---
### 2.2 Sources & citations
The two-tier model is **have** and production-grade on the backend. The gaps are almost all UI/CRUD-completeness and the connector-dependent "save to tree" flows.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Citation confidence selector in UI | Confidence enum is modeled + API-writable but the `citeControl` form never sets it — every UI citation is NULL confidence. | Partial | High | S | 1 | **Quick win.** Full CRUD in UI (NN#8); reinforces evidence-quality thesis. |
| Source edit UI + all 8 fields | Source UI is add/list/delete only and create exposes ~3 of 8 fields (no author/source_type/publication_info/quality_note/citation_text). | Partial | High | S | 1 | Update API exists but no edit form — violates NN#8. |
| `GET /{tree}/citations/{id}` | Citation API has list but no single-read endpoint. | Partial | Med | S | 1 | API symmetry (NN#8). |
| Transcription / abstract / extract fields | Add `transcription_text` + `abstract_text` to Source; don't conflate with `citation_text` (GEDCOM SOUR.TEXT currently dumped into citation_text). | Missing | Med | S | 12 | **Quick win.** Central to evidence analysis; full CRUD (NN#8). |
| Evidence-Explained guided citation builder | Structured fields → formatted citation (Chicago/MLA/APA) instead of hand-typed `citation_text`. | Missing | High | L | 2 | Signature provenance feature; citation_text should be generated, not typed. |
| Citations on OwnershipEvents | Add `ownership_event_id` to Citation + extend CHECK to 5 targets when property lands. | Partial | Critical | S | 3 | **Quick win once Property exists** — single FK + constraint edit (NN#5). |
| Record-to-source attachment ("save to tree") | Search a connector record and attach its facts. | Missing | High | XL | 4 | Gated on connector framework; assistant attach must emit ChangeProposal (NN#1); legal sources only (NN#6). |
| Source Linker (one record → many persons) | Bulk-attach a record's facts across people. | Missing | Med | L | 4 | Downstream of connectors; reads/writes via service layer. |
| Auto-citation on save/match | Generate citation when a hint/record is confirmed. | Missing | Med | L | 4/7 | Blocked on connectors + hints; ChangeProposal if assistant-driven. |
| Memories-as-sources (cite a photo directly) | Allow media to be a citation target, not only attachable to a Source. | Partial | Low | M | 2 | Reads stay on privacy-checked media endpoint (NN#2). |
| GPS / Proof-Standard reasoning artifact | Container linking sources/citations into a proof narrative reconciling conflicts. | Missing | Med | L | later | Serious-researcher differentiator; full CRUD (NN#8). |
| Proprietary record collections | 1921 census, UK sets, etc. | Missing | Low | XL | — | **Out of scope** — conflicts with NN#6 / self-host. Do not pursue. |
---
### 2.3 Search & matching
Fuzzy trigram name search is **have**; everything that depends on connectors, embeddings, or multiple populated trees is planned/missing. The standout near-term gap is **in-tree person merge**.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Standalone duplicate detection | Lift the GEDCOM `_best_match` logic into a "find duplicates in my tree" scan. | Partial | High | M | 2 | Logic already written; results via privacy engine (NN#2). |
| Interactive two-person merge (side-by-side, field-select, undo) | General merge of duplicate persons with citation re-pointing — impossible outside import today. | Partial | High | L | 2 | **Highest-value matching gap.** Preserve + re-point Citations (NN#5); write-once is a bug (NN#8). |
| Advanced search (wildcards, boolean, date/place facets, sort) | Search exposes only `?q`. | Partial | High | M | 2 | Keep per-person privacy filter in the search loop (NN#2). |
| Phonetic matching (Soundex/Metaphone/DM) | Enable `fuzzystrmatch`; trigram is char-similarity, not phonetic. | Partial | High | M | 2 | Pure utility. |
| Semantic / vector search (pgvector) | **Docs claim pgvector is used; it is not** — only pg_trgm extension is created. Add `CREATE EXTENSION vector` + embedding columns (and correct the docs). | Missing | Med | L | 7 | Embedding provider is an open question (PRD §11) — don't pick silently; candidates via privacy engine. |
| Tree-to-tree matching (Smart Matches) | Cross-tree candidate generation + ranking. | Planned | High | XL | 7 | Anonymous until mutual consent (NN#4); living-person protection (NN#3). |
| Mutual-consent match notification | Anonymous notification, reveal only after both opt in. | Planned | High | L | 7 | **Mandated invariant**, not a toggle (NN#4, NN#3); rides the notification substrate (§2.9). |
| Match confirm/reject + "not a match" memory | Persistent rejected-match store (today scoring lives only inside import). | Partial | High | M | 7 | Prevents re-notifying once hints land. |
| External search deep-links | Pre-fill FamilySearch/Find A Grave/BLM-GLO search URLs from a person's name/dates/place. | Missing | Med | M | 24 | **High value, low risk** before full connectors; legal targets only (NN#6). |
| **Automated record hints** | Proactive per-person record suggestions from connectors — a marquee mainstream feature. | Missing | High | XL | 7 | Connector-gated (NN#6); surfaced anonymously where cross-tree (NN#4); attach via ChangeProposal (NN#1). |
| Jurisdiction-aware record-search hints | Map place/jurisdiction → relevant collections. Place hierarchy is a ready foundation. | Missing | Med | L | 8 | Suggested collections must be legal (NN#6). |
| Cross-language / transliteration matching | Cyrillic/Hebrew/CJK ↔ Latin. | Missing | Med | XL | later | See Localization. |
| Record Detective, newspaper matches, collection catalog, GQL query builder, OCR full-text search | Connector/record-layer dependent. | Missing/Planned | LowMed | LXL | 4/7/8 | All gated on the connector framework; any query path runs through privacy engine (NN#2). |
---
### 2.4 Media & documents
Universal media attachment is **have**, but with a **confirmed privacy leak** and no asset-processing pipeline.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| **Media privacy gating on serve paths** | `list_media`/`get_media`/`media_content` gate only on `can_view_tree`, never `person_visibility` — a non-owner can download photos of redacted living people on public/unlisted trees. | Have(leaky) | **Critical** | M | 1 | **Security-priority — fix first. Direct NN#3/NN#2 violation.** Check attached `person_id` visibility and redact/hide. |
| EXIF / GPS stripping on upload | Raw bytes stored verbatim; family photos leak GPS/home addresses/timestamps. | Planned | High | M | 1 | **Security-priority**, not cosmetic. Parse EXIF on ingest, strip/quarantine by default, allow override. |
| Thumbnail / preview generation | No image pipeline (no Pillow). Async, idempotent worker job. | Planned | High | L | 1 | Derived thumbnail must inherit parent privacy — no bypass path. |
| Image reference regions | Mark the rectangle of a census image that supports a Citation. | Missing | Med | M | later | Tenant-scoped, full CRUD; region→Citation preferred over region→Person. |
| Photo/face tagging (manual) | Multi-person tagging via single FK today. | Missing | Med | XL(ML)/M(manual) | later | Owner-only, in-deployment; face tags inherit redaction (NN#3); full CRUD. |
| Mobile photo scanning + auto-split | Shoebox digitization. | Missing | Med | L | later | Reuse privacy-gated upload + EXIF strip. |
| AI photo dating / colorize / restore / animate / narrate | Model-driven media features. | Missing | Low | LXL | 4+ | Must route through ModelProvider (NN#7), require approval (NN#1), preserve original; animating living faces raises consent issues. |
| British Library / paywalled archives, pay-per-view credits | Licensed content + metering. | Missing | Low | XL | — | **Out of scope** — conflicts with NN#6 and the self-host model. |
---
### 2.5 DNA & genetic genealogy
DNA is an **explicit PRD non-goal / open question** — treat as parked, not a backlog to grind through. Across every DNA row the rule is uniform: **a user uploading their own export is permissible; vendor connectors/scrapers (23andMe / Ancestry / MyHeritage / GEDmatch) are barred (NN#6).** Kits and matches are living-tester PII and route through the privacy engine.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| DNA-confirmed relationship flag | Model DNA confirmation as a Source/Citation backing a Relationship (not free text). | Missing | Med | M | parked | Best sources-first fit (NN#5); full CRUD (NN#8). |
| Raw DNA upload (own file) | User uploads own export; no vendor scraping. | Missing | Med | L | parked | User's own file is fine; vendor connectors barred (NN#6); special-category PII via privacy engine. |
| Kit/Match entities linked to persons | Kit (tester) + Match tied to Person, tenant-scoped/audited. | Missing | Med | M | parked | Kits = living-tester PII (NN#2/#3); full CRUD (NN#8). |
| Autosomal match list, segments, chromosome browser, triangulation, ThruLines/AutoTree, ethnicity/admixture, haplogroups, GEDmatch, NPE detection | Full genetic-genealogy suite. | Missing | LowMed | LXL | parked | DNA scope is an unresolved open question — **surface the dependency, don't build speculatively.** Own-data only (NN#6); cross-user surfacing obeys NN#4. |
---
### 2.6 Maps, places & gazetteers
This category is almost entirely **missing** despite being half the product thesis. The Place model has the right bones (parent_id, lat/long, PlaceName with date ranges) but no API/UI and no maps.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| **Place as usable first-class entity** | Place rows are created by GEDCOM but have **no read/edit/delete** API or UI — a create-only entity. | Partial | High | M | 23 | **Violates NN#8** (create-but-not-edit = bug). Make Place citable too (NN#5). |
| Place autocomplete + picker in event editor | No `/places` router; the event form has no place input, so users can't attach a place at all. | Missing | High | M | 2 | Table stakes; lookup is low-risk. |
| Geocoding (manual coords + forward) | lat/long columns exist; no UI, no geocoder. | Partial | High | M | 3 | Provider via env (NN#7), ToS-compliant (NN#6). |
| Pluggable geocoding provider | Nominatim/GeoNames/Bing/Google swappable. | Missing | Med | L | 3 | Provider+keys via env (NN#7); legal providers only (NN#6). |
| Bulk/batch geocoding (worker) | Geocode hundreds of GEDCOM-imported places. | Missing | Med | M | 3 | Idempotent, rate-limited worker job; provider via env. |
| Place merge/split (dedup) | GEDCOM imports produce near-duplicate place strings. | Missing | High | M | 23 | Needs Place update/delete (NN#8); audited merges. |
| Place-name cleanup tools | Extend the existing preview→apply cleanup UX to places. | Missing | Med | M | 2 | Preview-first + audited like existing cleanup. |
| Standardized-name vs original text | Mirror the verbatim+normalized date pattern for places. | Missing | Med | M | 23 | GEDCOM fidelity. |
| Alternate/historical place names with date ranges | `PlaceName` model exists with valid_from/to but no CRUD and never populated. | Partial | Med | M | 23 | Stored entity with no CRUD surface (NN#8). |
| Interactive map of events & places | No map library in frontend. Core to family+land positioning. | Missing | High | L | 3 | Plot via `person_visibility` so non-owners never see living locations (NN#2/#3). |
| Migration trail / pedigree-birthplace maps | Per-person life path; ancestor birthplace map. | Missing | Med | L | 3 | Redact living subjects for non-owners (NN#3). |
| Bundled world gazetteer | Offline GeoNames-style authority. | Missing | Med | XL | later | GeoNames (CC-BY) verify AGPL-compat; env-configurable. |
| Historical boundary overlays, time slider, heatmaps, radius/nearby, tile-provider switch | Advanced geo. | Missing | LowMed | SXL | later | PostGIS is an open question (ARCH §14) — **surface dependency**, don't adopt silently; tiles legal (NN#6). |
---
### 2.7 Charts, reports & printing
On-screen pedigree/descendant/fan/hourglass charts are **have**. The entire **output/print/report** half is missing — this is the linchpin gap of the category.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| **Multi-format export (PDF / SVG / image / HTML)** | No export/print path, no `@media print`, no `window.print()`. Charts and reports can't leave the screen. | Missing | High | L | 2/6 | **Linchpin.** Generate from privacy-filtered data so living people redacted in shared output (NN#3). |
| Ahnentafel report | Numbered-ancestor report; all data exists. | Missing | High | M | 6 | — |
| Family group sheet / individual summary | Printable summary; data available, needs print layout. | Missing | High | M | 6 | — |
| Narrative descendant/ancestor reports | Multi-standard prose with inline sources. | Missing | High | L | 6 | Cite Sources inline (NN#5); redact living (NN#3). |
| Sentence-template narrative engine | Deterministic fact→prose underpinning reports. | Missing | Med | L | 6 | Keep template-based; report text never mutates tree (NN#1). |
| Photo boxes in charts | Pass privacy-checked media URLs to `setCardDisplay`; CSS already present. | Missing | High | M | 2 | Stream via privacy-checked /media (NN#2/#3). |
| Drag-to-edit / interactive chart canvas | Tree canvas renders but interactive node editing (drag to re-parent, inline edit on the chart) is only partly present. | Partial | Med | M | 2 | Edits go through service layer + audit (NN#1); honor redaction. |
| Statistics dashboard | Surname/place/date distributions + tree-health. | Missing | Med | M | 6 | Reads via privacy engine (NN#2). |
| Kinship/relationship diagram report | Needs path-finding (see §2.3 calculator) + renderer. | Missing | Med | M | 6 | — |
| List reports (sources/places/repos/media) | Printable indexes (current screens are management, not reports). | Missing | Med | M | 6 | — |
| Color-by-lineage, fan overlays, lifespan/timeline charts | Sex-coloring exists; lineage/overlay/timeline don't. | Partial/Missing | Med | M | later | Overlays respect privacy engine. |
| Book/multi-report compiler, wall-chart tiling, page-setup, customizable charts | Print-shop-grade output. | Missing | LowMed | LXL | later | Saved "book" entity = full CRUD (NN#8); honor living-person privacy. |
| Bowtie/couple-rooted/circular-sun/3D/network/calendar | Niche chart variants. | Missing/Partial | LowMed | ML | later | — |
| Print-shop products, XML template engine, blank forms | Commercial/template extras. | Missing | Low | SXL | later | Weak fit for self-host. |
---
### 2.8 Research workflow & automation
The preview→approve **bulk cleanup** tool is a genuine **have** and a differentiator. The missing pieces are the serious-researcher workflow entities.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Data-quality / consistency checker | Extend cleanup beyond name issues: child-before-parent, death-before-birth, implausible ages, orphans, dups; severity tiers. | Partial | High | L | 2 | New auto-fixes keep preview→apply (NN#1). |
| Research log | Searches, repositories visited, negative results, findings — distinct from the system audit log. | Missing | High | M | 6 | Reference reusable Sources (NN#5); tenant-scoped full CRUD (NN#8). |
| To-do / research task planner | Tasks on Person/Tree with status/priority/due/assignment. | Missing | High | M | 6 | Full CRUD in API+UI (NN#8). |
| Source-driven data entry | Start from a Source document and transcribe facts into the tree. | Missing | High | M | 2 | Natural sources-first differentiator (NN#5). |
| Task↔log linkage | FK + joined view once both entities exist. | Missing | Med | S | 6 | Cheap once predecessors land. |
| Family chronology / timeline | Sort merged events; family-wide chronology (parents' marriage, siblings' births). | Partial | Med | M | 2 | Sort is trivial; presentation over privacy-filtered data. |
| Navigation: active person / history / bookmarks | Large trees rely on browser back only. | Missing | Med | M | 2 | Per-user, tenant-scoped, full CRUD; don't expose redacted persons (NN#2/#3). |
| Saved-record shoebox / review queue | Stage candidate records before committing. | Missing | Med | M | 4/7 | Auto-attach via ChangeProposal (NN#1); legal sources (NN#6). |
| Guided research suggestions | Proactive "research next" engine (today only flags problems). | Partial | High | L | 4 | Advisory; writes via ChangeProposal (NN#1); cross-tree via privacy engine (NN#2). |
| Persona-adaptive onboarding | Family Keeper / Serious Researcher / Property Researcher selector (PRD US-002, documented but unbuilt). | Missing | LowMed | L | 2 | Pure presentation. |
| Dashboard widgets, scratchpad, research-link sidebar, blog/narrative authoring, research wiki, crowd indexing | Conveniences. | Missing | Low | SXL | later | Widgets/published narratives read via privacy engine (NN#2/#3). |
---
### 2.9 Collaboration & sharing
Authorization is enforced everywhere, but the **management surface is entirely absent** — the most consequential gap relative to the multi-user product promise. Because the Critical items below previously sat at Phase 6 while their labels said "breaks NN#8," a minimal management slice is pulled forward to Phase 2; the richer invite/email UX stays at Phase 6.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| **Membership PATCH/DELETE + role change (minimal slice)** | Add/adjust/revoke a collaborator and change `role` — the substrate (mutable `role`) exists; only the endpoints are missing. Resolves the create-only NN#8 break without the full invite flow. | Partial | **Critical** | SM | 2 | **Pulled forward** — a create-only entity shouldn't wait for Phase 6 (NN#8). Revocation routes through the single privacy point. |
| Full invite/grant flow (email + UI) | Email-based invitations, pending-invite state, role-grant UI, resend/expire. Builds on the minimal slice. | Partial | High | L | 6 | Invitation email via configured SMTP (NN#7); membership changes through the one enforcement point. |
| **Read-only public tree share** | Visibility model already redacts living persons for anonymous viewers, but every endpoint requires `CurrentUser` — no optional-auth dep, no public route, no public page. | Partial | High | M | 2 | Highest-leverage near-term sharing feature; living-safe by construction via `person_visibility` (NN#2/#3). |
| SEO public profile pages (server-rendered) | Intent declared (`public` = search-indexable) but zero implementation; no sitemap/robots/meta. | Partial | Med | L | 2 | NN#2 explicitly names server-rendered public pages — must go through privacy engine, no direct row queries. |
| **Notification / event-dispatch substrate** | Shared enabler seeded from `AuditEntry`: subscription + dispatch layer emitting privacy-filtered projections. Underpins watch/follow, mutual-consent match notices, comments, moderation, and in-app messaging. | Missing | High | L | 6 | **Privacy-filtered projections only — never raw before/after JSON** (NN#2/#3). |
| Comments / discussion threads | Per-profile discussion (target = person/event/source), threaded. | Missing | High | M | 6 | Comments on living persons redacted for non-members (NN#2/#3); rides the dispatch substrate. |
| In-app messaging (contact details hidden) | SMTP exists; no Message/Thread model. | Planned | High | L | 6 | Hide contact details; opens after mutual consent (NN#4); redact living-person content; rides dispatch substrate. |
| Watch/follow + change notifications | `AuditEntry` is the natural event source; needs subscription entity + dispatch (substrate above). | Planned | Med | M | 6 | Notification builder reads via privacy engine, not raw rows. |
| **Optimistic concurrency / lost-update protection** | No version/etag/`updated_at` precondition checks; concurrent multi-user edits can silently clobber. | Missing | High | M | 6 | Full-CRUD + multi-user without this risks lost updates; concurrent paths still route through privacy engine. |
| Pending-changes moderation (human edits) | Queue contributor edits for owner approval — shares infra with the AI ChangeProposal queue. | Missing | Med | L | 6 | **Design together with ChangeProposal** (NN#1). |
| Field-by-field profile merge & approval | WikiTree-style merge center + unmerge with per-field provenance. | Missing | Med | XL | later | Conflicting facts each retain Source/Citation (NN#5). |
| Ownership transfer | `owner_id` is effectively write-once; needed for self-host longevity. A minimal reassignment endpoint is the NN#8 fix. | Missing | Med | M | 6 | **Violates write-once invariant** (NN#8) — importance/phase tension noted; ship the minimal slice when membership lands. |
| Narrative website / HTML export | Static narrated site (reuse public-page renderer). | Missing | Med | L | later | Redact living persons at build time (static bypasses runtime engine) (NN#3). |
| Two-way desktop↔online sync | Bidirectional sync with change journals. Audit log could seed a change feed. | Missing | Med | XL | later | No Ancestry TreeShare / paywalled sync (NN#6). |
| Curator roles, trusted-list ACLs, field locking, projects/workspaces, forum, honor code, free-space wiki, portal homepage | Community-platform features. | Missing | Low | SXL | later | New roles/ACLs/locks integrate with the **single** enforcement point, not parallel checks. |
| Real-time co-editing | Out of scope; only optimistic concurrency planned. | Planned | Med | XL | later | Concurrent paths must route through privacy engine. |
---
### 2.10 Privacy & access control
The architecture is correct (single engine, tenant mixin, audit, soft-delete + purge are **have**), but enforcement coverage and configurability have real holes — two of which are security-priority.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| **Uniform living-person redaction across child resources** | `_redact` runs on person reads but **not** on event/media/name/relationship/citation/source endpoints — non-members fetch a possibly-living person's events/photos/names directly. | Partial | **Critical** | M | 12 | **Security-priority. Core NN#3/NN#2 defect.** Apply `person_visibility` on every person-derived fact. |
| **Email-verification enforcement gate** | `email_verified_at` is written at `auth_service.py:154` but read on no path; `register` returns an authenticated session cookie + token (201) pre-verification. | Partial | **High** | S | 12 | **Security-priority near-quick-win** — add the read-side check (NN#7 trust path). The check is small; the registration-mode switch below is the larger piece. |
| Self-registration mode gating (approve / open / closed) | No env switch to choose open vs admin-approval vs closed registration. | Partial | High | M | 2/5 | Twelve-factor registration control (NN#7); pairs with the verification gate above. |
| **Fix `site_members` visibility tier** | Defined + selectable in UI but `can_view_tree` only handles public/unlisted — fails closed unintuitively. | Partial | Critical | S | 1 | **Quick win.** Least-surprise; honor the tier the UI offers. |
| Make `LIVING_RECENCY_YEARS` configurable | Hardcoded 100 at `privacy.py:23`. | Partial | High | S | 2 | **Quick win.** Twelve-factor (NN#7). |
| Privacy-stripped export (redact living) | GEDCOM + account export emit full tree; no "strip living" mode. | Missing | High | M | 2 | Reuse `person_visibility`/`_redact` (NN#3). Owner self-export is safe today; shareable variant is the gap. |
| Per-fact / per-field privacy + record flags | tentative/rejected/preferred/private flags on facts. | Missing | Med | L | later | If added, route through the single engine (NN#2). |
| Granular rules by record type & viewer relationship | webtrees-style "hide marriages from non-descendants". | Missing | Med | L | later | Single enforcement point. |
| OIDC / external IdP login | `AuthProvider` interface ready; only Local implemented. Authentik is the intended real auth. | Planned | High | L | 5 | Additive by design. |
| Two-factor auth (TOTP) | Bearer/cookie session auth is solid; no MFA. | Partial | High | L | 5 | — |
| DB-level audit immutability | Audit is insert-only by convention; no trigger/constraint. Verified as "adequate for self-host," so importance downgraded to match. | Have(soft) | Med | S | 9 | Adequate for self-host; upgrade to trigger only if true immutability is required. |
| Block/hide users, family-group private space, DNA opt-in controls | Depend on messaging/DNA. | Missing | LowMed | MXL | 6/parked | DNA parked (NN#6). |
---
### 2.11 Import/export & standards
GEDCOM 5.5.1 import/export and full data-portability export are **have**, but fidelity gaps directly undercut the provenance thesis — and one is outright data loss.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| **Citation links dropped on GEDCOM export** | Export never selects the Citation table — fact→source links, page, detail, confidence all dropped on export (they import fine). Re-importing your own export **destroys** the sources-first graph. | Partial | **Critical** | M | 2 | **Silent data loss on the product's signature data + destructive round-trip** (NN#5); breaks PRD US-013. |
| GEDCOM 7.0 import/export | Version hardcoded `5.5.1`; no v7 semantics, SCHMA, SUBM, or UID handling. | Partial | High | L | 2 | Stated differentiator (FamilySearch interop). |
| Custom/underscore tag preservation | `_MARNM` becomes `TYPE married`, other custom tags dropped — violates ≥99% round-trip goal. | Missing | High | L | 2 | Tension with provenance thesis (faithful record). |
| PLAC FORM hierarchy + MAP coordinate round-trip | Import reads only PLAC text; export emits flat PLAC. lat/long + hierarchy lost on round-trip. | Missing | High | M | 23 | Round-trip fidelity for the land/maps pillar. |
| Encoding detection (ANSEL/UTF-16) | UTF-8 round-trips; non-UTF-8 files silently mangled via `errors='replace'`; CHAR tag ignored. | Partial | High | S | 2 | **Near quick win.** Detect/honor CHAR; reject or transcode rather than corrupt. |
| HEAD completeness | HEAD at `gedcom.py:740` emits only `SOUR/GEDC/VERS/CHAR` — missing required `2 FORM LINEAGE-LINKED` (under GEDC) and `1 SUBM`. | Partial | Med | S | 2 | **Quick win.** Pure conformance. |
| GEDCOM media (OBJE) round-trip | OBJE in skip-tags; media ignored on import, never emitted on export. | Partial | Med | M | 2 | Any media bundle keeps privacy gating. |
| GEDZIP (.gdz) bundle | Bundled-media packaging. | Missing | Med | M | 2 | Natural once v7 + OBJE land. |
| Selective / filtered export | Clippings-cart / branch subset. | Missing | Med | M | later | Maintain single-enforcement-point on export (NN#2). |
| Import conformance validation | Preview is a mapping report, not structural/cardinality validation; bad lines silently skipped. | Partial | Med | M | 2 | — |
| GEDCOM-X, Gramps XML, multi-format import, FHISO/ELF, PRF upload, KML export | Interop extras. | Missing | Low | L | later | PRF needs FamilySearch API (permitted, NN#6). |
---
### 2.12 Mobile & offline
Responsive web is **partial**; PWA and offline-first are absent. Native apps are an explicit deferral.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| PWA (manifest + icons + viewport + service worker) | No manifest, no SW, no `next-pwa`; responsive coverage exists but unaudited on heavy views (tree canvas fixed 74vh). | Partial | High | M | 2 | If SW caches API responses, never retain non-owner PII; cache only what the session is authorized to see (NN#3). |
| Responsive parity audit | 17 breakpoint usages; small-screen parity on tree/person views unverified. | Partial | High | M | 12 | Feature parity is an ARCH requirement. |
| Offline-first editing + reconnect sync | No SW, no local store, no mutation queue. Valuable for archive/courthouse field research. | Missing | High | XL | later | Replayed edits go through service layer + audit (NN#1); cached data respects living-person rule (NN#3). |
| Native mobile apps | Explicitly deferred (responsive web only). | Missing | Med | XL | later | If built, reads through one backend privacy engine (NN#2/#3/#4). |
| Companion app w/ cross-device sync | Largely redundant with server-backed web. | Missing | Low | XL | later | Sync boundary enforces privacy (NN#2); full CRUD parity (NN#8). |
| Relatives Around Me | Nearby-relatives discovery. | Missing | Low | L | later | Explicit opt-in; anonymous until mutual consent (NN#4). |
---
### 2.13 API & extensibility
Internal REST + OpenAPI + generated TS client are **have**. The externalized developer story and the connector/plugin spine are not built.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Public read-only API + scoped tokens (OAuth) | Bearer token is opaque session only; `TokenPurpose` lacks scopes; designed `public.py` never built. | Partial | High | L | 56 | Any scoped-token path routes through `person_visibility` + living-person redaction (NN#2/#3). |
| SourceConnector framework | Only AuthProvider/ObjectStore/Mailer base classes exist; no connector base/loader/registry. Gates AI, hints, property connectors. | Planned | Med | L | 4 | Read-only, rate-limited; findings via ChangeProposal (NN#1); legal sources only (NN#6). |
| Webhooks / change feeds | `AuditEntry` is the natural substrate (shares the notification dispatch layer, §2.9); no feed/webhook layer. | Missing | Med | L | 6 | Emit privacy-filtered, tenant-scoped projections — never raw before/after JSON (NN#2/#3). |
| CLI / scripting surface | No `[project.scripts]`, no Typer/Click; worker is a purge loop only. Self-hosters want bulk admin. | Missing | Med | M | 9 | Funnel reads through privacy.py, writes through audit; admin-scoped, no assistant-write path. |
| Plugin/addon architecture | Connector framework only; no general UI/report/theme plugin system planned. | Planned | Med | L | later | Sandbox via service layer; no privacy/audit bypass, no writes outside ChangeProposal. |
| In-app query tooling (SuperTool) | Power-user expression engine. | Missing | Low | L | later | Execute through privacy engine — no row enumeration bypass (NN#2). |
| Certified partner program | Organizational, not software. | Missing | Low | XL | — | Out of scope until a hosted offering exists. |
---
### 2.14 Performance & scale
Postgres + S3, multi-tenant isolation are **have**. Queue, observability, backups, pagination, and scale validation are the gaps that gate Phases 4/7 — several are current functional limitations, not late-phase validation tasks.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Real job queue (Postgres/Redis-backed) | Worker is a fixed-interval purge loop; GEDCOM import and account export run **inline in the request**. | Partial | High | L | 4 (pre-req) | Blocks NN#1 (assistant in worker) and NN#4 (hint matching in worker). Queue backend is an open question (PRD §11). |
| **Pagination on list endpoints + server-side tree loading** | List endpoints (`persons.py:37`, events, relationships) take no `limit/offset/skip`; the tree view loads the whole graph client-side. A *current* limitation against the 50k-person target. | Planned | High | M | 12 | **Split out from scale validation** — this is a correctness/functional gap now, not a Phase 9 task. |
| Scale validation (50k+ trees, P95<2s, load test) | No benchmark or load test exists. | Planned | High | L | 9 | Inline heavy ops risk partial writes — moving to the queue is what makes "failures never corrupt state" true. |
| **Operator backup: one-command `pg_dump` + MinIO sync** | Only a documented procedure + per-account ZIP exist; no scripted DB+object dump. For a self-host product this is day-one data-loss exposure. | Partial | Critical | M | 12 | **Pulled forward** — Critical importance contradicted the old Phase-9 slot. Restore must re-apply privacy state faithfully (NN#3); safety net for NN#8. |
| Scheduled / cloud automated backup + restore tooling | Cron-driven, off-host, verified-restore workflow. | Partial | High | L | 9 | Builds on the one-command slice above. |
| ARM64 build matrix | CI builds `linux/amd64` only; many self-hosters run ARM SBCs. | Partial | High | S | 1 | **Quick win.** Add arm64 + QEMU to buildx (NN#7 container-native). |
| Structured JSON logs + Prometheus metrics | Plain-text stdlib logging; no `/metrics`. | Partial | Med | M | 9 | Logs/metrics reference UUIDs, never names/PII (NN#3/#4). |
| pgvector enablement | Image has pgvector; app never creates the extension or adds embedding columns (docs claim otherwise). | Partial | Med | M | 7 | See §2.3 — embedding provider open question; candidates via privacy engine. |
| Database check-and-repair | No orphan/dangling-edge/cycle scanner (recent "harden tree render" commit shows bad graphs occur). | Missing | Med | M | 9 | Tenant-scoped + audited; auto-fix via ChangeProposal (NN#1). |
| Pluggable DB backend, billions-scale shared tree, weekly record releases | Different product models. | Missing | Low | XL | — | **Out of scope** — Postgres-only is consistent with the invariants; global shared tree conflicts with NN#2/#3/#4. |
---
### 2.15 Property / land chain-of-title — *headline differentiator*
The entire "land" half is **planned/missing** but fully specified. This is where Provenance has no real competitor.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Property/parcel first-class entity | No model/endpoint/service/migration. Foundation for the whole category. | Planned | High | L | 3 | Full CRUD in API+UI (NN#8); reads added carefully to the **single** privacy engine (NN#2). |
| Typed OwnershipEvents | grant/patent, purchase, sale, inheritance, gift, tax sale, foreclosure, eminent domain — with grantor/grantee Persons + Citation. | Planned | High | L | 3 | Each event carries a Citation (NN#5); grantor/grantee living-person links redacted (NN#3). |
| Chain-of-title timeline + gap flagging | Ordered OwnershipEvents first-grant→present, breaks flagged. | Planned | High | M | 3 | The genuinely differentiating analytical piece (PRD US-032). |
| Bidirectional owner↔person, parcel↔place | "Every property a person held" / "every parcel at a place." | Planned | High | M | 3 | Reverse traversals filtered through privacy engine (NN#2). |
| Citations on OwnershipEvents | Add `ownership_event_id` to Citation (5th target). | Partial | Critical | S | 3 | **Quick win once Property lands** — single FK + CHECK edit (NN#5). |
| Legal description verbatim storage | metes-and-bounds / PLSS township-range-section as-written. | Planned | Med | L | 3 | Part of the Property model; preserves the record faithfully. |
| Parcel/plat boundary geometry | Optional geometry; plain coords first. | Planned | Med | L | 3+ | PostGIS is an open question (ARCH §14) — surface dependency. |
| PLSS / metes-and-bounds parsing → geometry | Automated survey parsing. | Planned | Med | XL | later | Hard; gated on PostGIS. |
| BLM/GLO federal land-patent connector | Marquee US land source. | Planned | High | L | 8 | Permitted source (NN#6); patents surface as ChangeProposals (NN#1); read-only + rate-limited. |
| USGS map + public county-deed connectors | Per-jurisdiction grantor/grantee indexes. | Planned | Med | L | 8 | Each connector verifies a legally open source (NN#6). |
| Co-ownership roles / tenure types | joint tenants, TIC, life estate, heirs. | Planned | Low | M | later | Multiple parties likely free with OwnershipEvent; role typing is a refinement. |
| Tax/assessment rolls, UK Tithe, Lloyd George Domesday | Valuation + non-US collections. | Missing | Low | ML | — | US-focused v1; international formats out of scope (model is country-agnostic). |
---
### 2.16 AI assistant — *defining differentiator*
Entirely **planned** — and note the docs-vs-code gap: ARCHITECTURE §5 lists `ChangeProposal` as part of the "landed" core model, but no model/migration/schema exists. The audit substrate (`actor_type=assistant`, before/after JSONB) is the right foundation; the ChangeProposal model and ModelProvider abstraction are the two critical-path pieces.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| **ChangeProposal (propose-then-confirm)** | The defining invariant. No `proposal.py`, no migration, no review UI yet — despite docs implying it landed. | Planned | **Critical** | L | 4 | **IS NN#1.** Enforce structurally: assistant tools return proposals; only user action applies one; application flows through the normal service layer (privacy + audit). ChangeProposal itself needs full CRUD (NN#8). Correct the docs to match reality. |
| Pluggable LLM + embedding provider | `ModelProvider` over Anthropic/OpenAI/xAI/Ollama; env placeholders exist, no interface code. | Planned | Critical | M | 4 | **Twelve-factor, no hard-coded keys/endpoints** (NN#7); the Ollama/self-hosted path is what makes the privacy-first promise real. |
| AI research-assistant chatbot (RAG over tree) | Marquee feature; needs ModelProvider + connector + retrieval through privacy engine. | Planned | High | XL | 4 | NN#1 propose-only, NN#2 privacy retrieval, NN#3 redaction. |
| Conversational / connector record search | Search legal sources via the assistant. | Planned | High | L | 4 | Legal sources (NN#6); findings = Source + Citation (NN#5). |
| Fact extraction from documents | Extracted facts map cleanly to ChangeProposal review. | Missing | Med | M | 4 | Canonical NN#1 use case; each fact carries a Citation (NN#5). |
| OCR/HTR transcription + document translation | Worker job via ModelProvider. | Missing | Med | L | 4+ | Output → Source/Citation (NN#5); via ModelProvider (NN#7); auto-extraction emits ChangeProposal (NN#1). |
| Next-step research guidance | Gap analysis → suggested next record. | Planned | Med | M | 4 | Reads via privacy engine; advisory unless it queues fetches. |
| AI biography / audio narration | Read-only generation grounded in tree data. | Missing | Low | ML | later | Must not leak living-person PII (NN#3); via ModelProvider (NN#7); stored biographies = full CRUD (NN#8). |
---
### 2.17 Localization & accessibility
A documented **day-one commitment** ("UI strings externalized from day one") that is currently **unmet** — every label is a hardcoded literal. Correct the PRD claim or close the gap.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| **UI string externalization** | No i18n lib, no message catalogs; all copy hardcoded in TSX. Gating prerequisite; cheapest to do now while the surface is small. | Missing | High | L | 12 | PRD §6 promises this "from day one" — **docs-vs-code gap; edit the doc now.** |
| Multi-language UI (4060+ langs) | Translation pipeline after externalization (frontend + backend-generated messages). | Missing | High | XL | later | Table stakes across all competitors. |
| Accessibility / WCAG 2.2 AA | Some ARIA/focus styling; no CI a11y audit, no skip-links, SVG tree viz not keyboard/screen-reader navigable. | Partial | High | L | 2/9 | Stated PRD §6 target; add axe/pa11y in CI; accessible alternate to the chart. |
| Unicode-correct non-Latin names | Stores fine (UTF-8); no NFC normalization on write, no locale-aware collation, no romanized search. | Partial | High | M | 2 | Apply `unicodedata.normalize('NFC')` on input; add COLLATE; supports faithful-record goal. |
| Structured/compound surname components | Surname is a single field; no support for Spanish/Portuguese paternal+maternal, Arabic nasab, particles/prefixes. | Missing | Med | M | 2 | New Name sub-fields ship with full CRUD (NN#8); preserves the name as recorded. |
| Non-Gregorian calendar dates | `calendar` column is a placeholder; GEDCOM calendar escapes never parsed/populated. | Partial | Med | L | 2 | Preserve original calendar as recorded (sources-first). |
| Language tags / romanized variants per name | No language_tag/script/romanized fields; GEDCOM ROMN/LANG unhandled. | Missing | Med | M | 2 | New Name sub-fields ship with full CRUD (NN#8). |
| RTL support | `lang="en"` hardcoded, no `dir`, physical CSS properties throughout. | Missing | Med | M | later | Convert to logical CSS properties; cheaper once i18n exists. |
| Selectable themes | Light/dark/system works; brand palette intentionally single. | Partial | Med | M | later | Confirm whether additional themes are a deliberate non-goal (brand guide constrains palette). |
| Multi-language report/diagram output | Depends on i18n + reports, neither shipped. | Missing | Low | L | later | — |
---
## 3. Quick wins (high importance / low effort)
Ordered by leverage. All are S-effort or a thin slice of a larger item, and most close a stated invariant gap.
1. **Fix `site_members` visibility tier** (Privacy, Critical/S) — defined and selectable in the UI but never handled in `can_view_tree`; fails closed unintuitively.
2. **Email-verification enforcement gate** (Privacy/Auth, High/S) — add the read-side `email_verified_at` check so a freshly registered, unverified user doesn't get a live authenticated session. Security-priority; the registration-mode env switch (open/approve/closed) is the larger follow-on, not part of this quick win.
3. **Citation confidence selector in the cite form** (Sources, High/S) — confidence is modeled and API-writable but unreachable in the UI; every UI citation is currently NULL. Honors NN#8 and the evidence-quality thesis.
4. **Source edit UI + expose all 8 fields** (Sources, High/S) — update API exists but there is no edit form and create exposes ~3 fields; a create-but-not-edit entity violates NN#8.
5. **Make `LIVING_RECENCY_YEARS` env-configurable** (Privacy, High/S) — hardcoded 100 at `privacy.py:23`; twelve-factor (NN#7).
6. **Add `ownership_event_id` to Citation** (Property/Sources, Critical/S) — single FK + CHECK-constraint edit the moment Property lands; the spine is already built (NN#5).
7. **GEDCOM encoding detection** (Standards, High/S) — detect/honor the CHAR tag; reject or transcode ANSEL/UTF-16 rather than silently mangling with `errors='replace'`.
8. **GEDCOM HEAD completeness** (Standards, Med/S) — emit the required `2 FORM LINEAGE-LINKED` (under GEDC) and `1 SUBM` at `gedcom.py:740`. Pure conformance.
9. **ARM64 CI build matrix** (Perf/Scale, High/S) — add `linux/arm64` + QEMU to buildx for both images; many self-hosters run ARM SBCs.
10. **`GET /{tree}/citations/{id}` endpoint** (Sources, Med/S) — API symmetry (NN#8).
11. **Transcription/abstract fields on Source** (Sources, Med/S) — add `transcription_text` + `abstract_text`, distinct from `citation_text`; core to evidence analysis.
12. **Sort the merged person timeline** (Research workflow, Med/S) — `shownEvents.sort()` on `date_start`; currently appended unsorted.
13. **Doc corrections (docs-vs-code)** (Meta, trivial/S) — edit CLAUDE.md / ARCHITECTURE so the pgvector "used" claim, the i18n "from day one" claim, and the ChangeProposal "landed" claim match reality. The repo convention requires docs to travel with code.
> **Ships-with, not standalone:** *Revocable / adjustable access (membership PATCH/DELETE + role change)* is security-critical and S-effort, but it is the minimal slice of the membership work (§2.9) and ships **with** those endpoints — it is not independently shippable on its own.
>
> **Higher priority than any quick win, but M-effort (not quick):** the **media privacy leak** (§2.4), the **child-resource redaction gap** (§2.10), and pulling the **one-command operator backup** (§2.14) forward. Treat these as **security-/data-loss-priority Phase 12 fixes** regardless of the quick-win list.
---
## 4. Strategic differentiators
Where to invest to make Provenance distinct rather than a webtrees clone. Each leans on a non-negotiable as a *feature*, not a constraint.
**1. Property chain-of-title (the "land" half).** No surveyed competitor models ownership as a typed, cited event chain tying parties across time, with gap-flagging and bidirectional owner↔person / parcel↔place traversal, fed by **legal** public sources (BLM/GLO patents, USGS, public county deeds). This is the single clearest "no one else does this" capability. Sequence: Property + OwnershipEvent + Citation-target (Phase 3) → chain-of-title view → BLM/GLO connector (Phase 8). The Citation extension is a quick win; the entity is the prerequisite for everything else in the category.
**2. The ChangeProposal AI model.** "The assistant never writes autonomously" is a *trust* differentiator in a market where users fear AI corrupting their research. Build it structurally — assistant tools return proposals; only an explicit human action applies one; application flows through the normal service layer so it always hits the privacy engine and audit log. The same approval queue moderates untrusted human-contributor edits (Collaboration §2.9), so design them together. The audit substrate is already in place; ChangeProposal + ModelProvider are the critical path — and the docs should stop asserting ChangeProposal has landed until it has.
**3. Anonymous, mutual-consent cross-tree hints.** The privacy model already redacts living people for anonymous viewers, so a hint system that reveals *nothing identifying* until both sides opt in is achievable by construction — and is a categorically more trustworthy version of MyHeritage Smart Matches / Ancestry hints. Requires the matching engine (pgvector enablement + candidate generation, Phase 7), the notification/event-dispatch substrate (§2.9), and the messaging channel that opens only post-consent.
**4. True self-hosting + data ownership.** Full account export/import, soft-delete recovery, GEDCOM round-trip, env-driven everything, and (to-build) operator-grade scheduled backup + ARM support make Provenance the genealogy app you actually own. Two correctness items gate the promise: GEDCOM export must stop dropping citations (a Provenance→Provenance round-trip currently destroys the sources graph), and operator backup must move from "documented procedure" to a one-command dump. The Ollama/self-hosted ModelProvider path means even the AI assistant runs without tree data leaving the deployment — a promise no commercial competitor can make.
**5. Sources-first as a felt experience.** The two-tier model is built; the differentiator is making it *visible and low-friction*: a guided Evidence-Explained citation builder, transcription/abstract fields, source-driven data entry (transcribe a document into the tree), per-fact confidence surfaced in the UI, and — critically — citations that **survive GEDCOM export**. These turn "every fact links to where it came from" from an architecture note into the product's personality.