Files
provenance/docs/BACKLOG.md
T
justin a6179037c2 Close citation/source living-person leak; add on-demand tree purge
Two changes.

1. Privacy fix (NN#2/NN#3) — the citation and source list endpoints gated only
   on can_view_tree, so a non-member on a public/unlisted/site_members tree could
   enumerate citations and sources tied to a redacted living person, leaking that
   the person exists and has sourced facts (and possibly their name via a source
   title). #46 closed this for events/media/names/relationships but not
   citations/sources. Now citation_service.list_citations and
   source_service.{list_sources,get_source} delegate non-member reads to
   public_view_service, mirroring the #46 pattern:
   - citations: shown only when the cited fact resolves to FULL-visibility
     person(s) — covers the person_id, name_id, event_id (person or both-partner),
     and relationship_id (both-partner) target paths.
   - sources: shown only when they back at least one visible citation; a withheld
     source 404s (don't reveal it exists).
   Tests cover all four citation target types + source withholding + member-sees-all.

2. On-demand tree purge — owners can permanently delete a soft-deleted tree now
   instead of waiting out the 30-day auto-purge window. POST /trees/{id}/purge
   (owner-only): the tree must already be in the trash, and the caller retypes its
   name to confirm. Media objects are deleted from storage, then a single
   DELETE on trees cascades all tree-owned rows via the tree_id ON DELETE CASCADE;
   the audit entry survives (tree_id SET NULL). Frontend adds a "Delete forever"
   button to the Recently-deleted list. No migration.

Suite: 102 passing.
Signed-off-by: Justin Paul <justin@jpaul.me>
2026-06-10 22:38:59 -04:00

431 lines
58 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!-- Generated by the genealogy-feature-gap-backlog workflow on 2026-06-09. -->
<!-- Gap analysis vs commercial (Ancestry/MyHeritage/FamilySearch) and OSS
(GRAMPS/Gramps Web/webtrees) genealogy software, verified against the
codebase. Statuses reflect the repo at workflow launch (before the
tree-visibility phases 1-3 landed; some items are now closed). -->
# Provenance — Product Backlog
> Status legend: **Have** (shipped) · **Partial** (substrate exists, surface incomplete) · **Planned** (on roadmap, no code) · **Missing** (no code, off roadmap).
> Importance: Critical / High / Medium / Low. Effort: S / M / L / XL.
> Phase references map onto the existing 09 roadmap. "NN#" = non-negotiable invariant.
---
## 1. Executive summary
**Where Provenance is strong today.** The foundation is genuinely solid and, in several places, ahead of the OSS field:
- **Sources-first spine is real.** A reusable `Source` + per-fact `Citation` two-tier model with a `exactly_one_target` CHECK constraint, confidence enum, and full backend CRUD. This is the architectural thing webtrees/Gramps get right and most commercial tools bury.
- **Privacy architecture is the right shape — and coverage is now broad.** A single `privacy.py` engine, `TenantScoped` mixin on every row, living-person heuristic (`is_possibly_living`, unknown-birth-treated-as-living), and media served **through the backend rather than via raw S3 URLs**. Non-member reads of persons, events, media, names, and relationships all route through `person_visibility` (#46). The remaining gap is the `citation`/`source` list endpoints, which still gate only on `can_view_tree` — see §2.10.
- **Non-destructive by design.** Soft-delete with timed purge worker, immutable `AuditEntry` (before/after JSONB, `actor_type` ready for the assistant), GEDCOM merge that copies rather than overwrites, full account export/import.
- **Modeling maturity.** Typed parent/child qualifiers (biological/adoptive/step/foster/donor/guardian), typed alternate names with one-primary invariant, dual verbatim+normalized dates, duplicate-relationship guards, UUID surrogate keys.
- **Standards core.** GEDCOM 5.5.1 import/export is **functional** (with preview/merge-vs-create resolution UI), pg_trgm fuzzy name search, multi-tenant tree hosting with visibility tiers. Round-trip *fidelity* has three tracked gaps (custom tags, PLAC coords/hierarchy, non-UTF-8 encoding) — see §2.11.
**Documentation-vs-code gaps to correct now (per "docs travel with code").** Two repo claims are not yet true and should be edited in the same spirit they were written:
- **pgvector is claimed as used; it is not.** Only `pg_trgm` is created. ARCHITECTURE references pgvector for match ranking.
- **i18n "from day one" is documented but unmet.** PRD §6 promises externalized strings; every label is a hardcoded literal.
These two doc edits are themselves trivial quick wins (see §3).
**The biggest gaps vs commercial (Ancestry / MyHeritage / FamilySearch).** Provenance is not trying to be a record provider, and correctly so — but it is missing several things mainstream users treat as table stakes:
- **No record hints, no "save to tree," no connector framework.** The entire SourceConnector layer (FamilySearch/Find A Grave/WikiTree) is unbuilt — this gates AI search, hints, and auto-citation.
- **No person merge outside GEDCOM import.** Merging duplicate people is fundamental hygiene and is currently impossible in-tree — the single highest-value near-term matching gap.
- **No maps at all.** No place autocomplete, no geocoding, no interactive/migration/birthplace maps — a glaring hole for an app whose thesis is *family **and** land*.
- **No report/print/PDF output.** Charts render on-screen only; there is no Ahnentafel, family group sheet, narrative report, or any PDF/SVG/HTML export. The whole "Charts, reports & printing" category is on-screen-viewing only.
- **DNA absent** (deliberately parked — treat as open question, not a gap).
**The biggest gaps vs OSS (GRAMPS / Gramps Web / webtrees).** These are where a privacy-first self-host product is expected to compete and currently trails:
- **Collaboration management is now reachable, but minimal.** `TreeMembership` roles are enforced on every read/write, and a list/add/change-role/remove API + UI now ship (§2.9), satisfying the full-CRUD invariant (NN#8). The remaining gap is the richer **email invite/grant flow** (pending-invite state, resend/expire), still scheduled for Phase 6.
- **Living-person redaction is now near-uniform.** Non-member reads of persons, events, media, names, and relationships all redact possibly-living people (#46); the `citation`/`source` list endpoints are the remaining hold-outs (they gate only on `can_view_tree`) — a narrowed PII gap on public/unlisted trees (NN#3, NN#2).
- **No place as a usable first-class entity** (model exists, created by GEDCOM, but no read/edit/delete — a create-only entity, which is a bug per NN#8).
- **No research log, to-do/task planner, kinship calculator, data-quality checker, or i18n/string externalization** (the last is a documented day-one commitment that is currently unmet).
**Security-priority correctness fixes (do these first, regardless of phase).** The redaction defects all shipped — child resources (#46) and now citations/sources too — leaving one config switch:
1. **Self-registration approval-mode switch (§2.10)** — the read-side enforcement now exists: `REQUIRE_EMAIL_VERIFICATION` gates login/session on `email_verified_at` (#53). The remaining gap is the env switch to choose open vs admin-approval vs closed self-registration. *(The citation/source living-person leak is now closed — citation/source list endpoints apply `person_visibility` for non-members via `public_view_service`.)*
**Strategic posture.** The differentiators worth pressing — property chain-of-title, the ChangeProposal AI model, the anonymous mutual-consent hint system, and true self-host data ownership — are mostly still ahead on the roadmap. The near-term job is (a) close the **privacy/auth correctness** and **collaboration** gaps that the architecture already implies, (b) ship the **maps + reports + merge** table stakes, and (c) finish the back-half spine — the **connector framework** plus wiring the now-landed **ChangeProposal/ModelProvider** into the assistant — that unlocks the entire back half of the roadmap.
---
## 2. Backlog by category
### 2.1 Tree & data model
Core CRUD, typed relationships, dates, soft-delete, and naming are **have**. Remaining work is about reusable sub-entities, shared/event-centric modeling, and research-grade conveniences.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Repository as first-class entity | Promote `Source.repository` string to a reusable `Repository` (name/address/call-numbers) with dedup. | Partial | Med | M | 12 | If promoted, full CRUD in API+UI (NN#8) — don't half-build. |
| Note as first-class entity (SNOTE) | Promote inline `notes` text fields to reusable shared `Note`/SNOTE records. | Missing | Low | M | 2 | Full CRUD; GEDCOM 7 round-trip parity. |
| Shared/event-centric model + witnesses | Remove the `subject_person_xor_relationship` XOR; add participant/role join so one event has many people (FAN/cluster research). | Missing | Med | M | later | Unlocks FAN club + richer sourcing; participants must redact via privacy engine. |
| Non-family associations (FAN) | Add associate/neighbor relationship types; best delivered with shared-event participants. | Missing | Low | M | later | — |
| Relationship-status enum | Add married/divorced/annulled status on partnership rather than inferring from events. | Partial | High | M | 12 | — |
| Family/couple unit (GEDCOM FAM) | Persist a true FAM entity (own ID/sources, childless couples) instead of rebuilding on export. | Partial | High | L | 2 | Improves GEDCOM fidelity. |
| Kinship / relationship calculator | "How is A related to B" path + cousinship. Graph edges already exist. | Missing | High | M | 12 | Self-contained; reads via privacy engine. |
| **Read-only audit-log viewer / activity feed** | Surface `AuditEntry` as a per-tree/per-person change feed. Smaller and higher-leverage than value-level undo; partially satisfies NN#8's "read" for AuditEntry and is the substrate for watch/follow + webhooks. | Missing | High | M | 2 | Privacy-filtered projections only — never raw before/after JSON to non-members (NN#2/#3). |
| Per-field revision history + restore-prior-value | Value-level history view + undo, built atop the audit feed above. | Partial | High | L | 6 | Audit-log *UI* is the feed item; this is the larger value-level-undo work (NN#8 correction ethos). |
| Color-coded tags & custom labels | Tag people for lineages/research-status/grouping. | Missing | Med | M | 2 | Full CRUD; tenant-scoped. |
| Person timeline / LifeStory | Sort the merged event list; add place/age enrichment + narrative presentation. | Partial | Med | M | 2 | Sort is trivial (`localeCompare` on `date_start`); narrative is the larger piece. |
| Multi-calendar normalization | Store + parse Julian/Hebrew/French Republican (only `calendar` tag stored today, only Gregorian normalized). | Partial | Low | M | 2 | See also Localization §2.17. |
| Evidence/persona vs conclusion model | GEDCOM-X persona layer separate from conclusion person. | Missing | Med | XL | later | Large modeling change; strengthens sourcing + hint matching. |
| Negative assertions | Boolean "event did not happen" on Event. | Missing | Low | S | 2 | Cheap interop nicety. |
| Custom groups / networks | Named manual or rules-based groupings. | Missing | Low | M | later | Lower priority than tags. |
| Raw GEDCOM record editor / configurable fact tabs | webtrees-style raw editor + fact-type registry. | Partial | Med | L | later | Open vocabularies give de-facto custom facts today. |
| Health/medical, historical-facts index, LDS ordinances | Niche entities. | Missing | Low | ML | later | LDS BAPL/ENDL/SLGS should map to distinct types if ever pursued; medical is special-category PII. |
---
### 2.2 Sources & citations
The two-tier model is **have** and production-grade on the backend. The gaps are almost all UI/CRUD-completeness and the connector-dependent "save to tree" flows.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Citation confidence selector in UI | Confidence enum is modeled + API-writable but the `citeControl` form never sets it — every UI citation is NULL confidence. | Partial | High | S | 1 | **Quick win.** Full CRUD in UI (NN#8); reinforces evidence-quality thesis. |
| Source edit UI + all 8 fields | Source UI is add/list/delete only and create exposes ~3 of 8 fields (no author/source_type/publication_info/quality_note/citation_text). | Partial | High | S | 1 | Update API exists but no edit form — violates NN#8. |
| `GET /{tree}/citations/{id}` | Citation API has list but no single-read endpoint. | Partial | Med | S | 1 | API symmetry (NN#8). |
| Transcription / abstract / extract fields | Add `transcription_text` + `abstract_text` to Source; don't conflate with `citation_text` (GEDCOM SOUR.TEXT currently dumped into citation_text). | Missing | Med | S | 12 | **Quick win.** Central to evidence analysis; full CRUD (NN#8). |
| Evidence-Explained guided citation builder | Structured fields → formatted citation (Chicago/MLA/APA) instead of hand-typed `citation_text`. | Missing | High | L | 2 | Signature provenance feature; citation_text should be generated, not typed. |
| Citations on OwnershipEvents | Add `ownership_event_id` to Citation + extend CHECK to 5 targets when property lands. | Partial | Critical | S | 3 | **Quick win once Property exists** — single FK + constraint edit (NN#5). |
| Record-to-source attachment ("save to tree") | Search a connector record and attach its facts. | Missing | High | XL | 4 | Gated on connector framework; assistant attach must emit ChangeProposal (NN#1); legal sources only (NN#6). |
| Source Linker (one record → many persons) | Bulk-attach a record's facts across people. | Missing | Med | L | 4 | Downstream of connectors; reads/writes via service layer. |
| Auto-citation on save/match | Generate citation when a hint/record is confirmed. | Missing | Med | L | 4/7 | Blocked on connectors + hints; ChangeProposal if assistant-driven. |
| Memories-as-sources (cite a photo directly) | Allow media to be a citation target, not only attachable to a Source. | Partial | Low | M | 2 | Reads stay on privacy-checked media endpoint (NN#2). |
| GPS / Proof-Standard reasoning artifact | Container linking sources/citations into a proof narrative reconciling conflicts. | Missing | Med | L | later | Serious-researcher differentiator; full CRUD (NN#8). |
| Proprietary record collections | 1921 census, UK sets, etc. | Missing | Low | XL | — | **Out of scope** — conflicts with NN#6 / self-host. Do not pursue. |
---
### 2.3 Search & matching
Fuzzy trigram name search is **have**; everything that depends on connectors, embeddings, or multiple populated trees is planned/missing. The standout near-term gap is **in-tree person merge**.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Standalone duplicate detection | Lift the GEDCOM `_best_match` logic into a "find duplicates in my tree" scan. | Partial | High | M | 2 | Logic already written; results via privacy engine (NN#2). |
| Interactive two-person merge (side-by-side, field-select, undo) | General merge of duplicate persons with citation re-pointing — impossible outside import today. | Partial | High | L | 2 | **Highest-value matching gap.** Preserve + re-point Citations (NN#5); write-once is a bug (NN#8). |
| Advanced search (wildcards, boolean, date/place facets, sort) | Search exposes only `?q`. | Partial | High | M | 2 | Keep per-person privacy filter in the search loop (NN#2). |
| Phonetic matching (Soundex/Metaphone/DM) | Enable `fuzzystrmatch`; trigram is char-similarity, not phonetic. | Partial | High | M | 2 | Pure utility. |
| Semantic / vector search (pgvector) | **Docs claim pgvector is used; it is not** — only pg_trgm extension is created. Add `CREATE EXTENSION vector` + embedding columns (and correct the docs). | Missing | Med | L | 7 | Embedding provider is an open question (PRD §11) — don't pick silently; candidates via privacy engine. |
| Tree-to-tree matching (Smart Matches) | Cross-tree candidate generation + ranking. | Planned | High | XL | 7 | Anonymous until mutual consent (NN#4); living-person protection (NN#3). |
| Mutual-consent match notification | Anonymous notification, reveal only after both opt in. | Planned | High | L | 7 | **Mandated invariant**, not a toggle (NN#4, NN#3); rides the notification substrate (§2.9). |
| Match confirm/reject + "not a match" memory | Persistent rejected-match store (today scoring lives only inside import). | Partial | High | M | 7 | Prevents re-notifying once hints land. |
| External search deep-links | Pre-fill FamilySearch/Find A Grave/BLM-GLO search URLs from a person's name/dates/place. | Missing | Med | M | 24 | **High value, low risk** before full connectors; legal targets only (NN#6). |
| **Automated record hints** | Proactive per-person record suggestions from connectors — a marquee mainstream feature. | Missing | High | XL | 7 | Connector-gated (NN#6); surfaced anonymously where cross-tree (NN#4); attach via ChangeProposal (NN#1). |
| Jurisdiction-aware record-search hints | Map place/jurisdiction → relevant collections. Place hierarchy is a ready foundation. | Missing | Med | L | 8 | Suggested collections must be legal (NN#6). |
| Cross-language / transliteration matching | Cyrillic/Hebrew/CJK ↔ Latin. | Missing | Med | XL | later | See Localization. |
| Record Detective, newspaper matches, collection catalog, GQL query builder, OCR full-text search | Connector/record-layer dependent. | Missing/Planned | LowMed | LXL | 4/7/8 | All gated on the connector framework; any query path runs through privacy engine (NN#2). |
---
### 2.4 Media & documents
Universal media attachment is **have**; the earlier privacy leak is now **closed** (#46), and the remaining gaps are the asset-processing pipeline (EXIF strip, thumbnails).
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| **Media privacy gating on serve paths** | `list_media`/`get_media`/`media_content` now apply `person_visibility` for non-members (#46): media is exposed only when linked to a FULL-visibility person (`list_public_media`/`can_view_media`), so living-person photos no longer leak on public/unlisted trees. | Have | **Critical** | M | 1 | **Resolved (NN#3/NN#2).** Serve paths check attached `person_id` visibility and 404 otherwise. |
| EXIF / GPS stripping on upload | Raw bytes stored verbatim; family photos leak GPS/home addresses/timestamps. | Planned | High | M | 1 | **Security-priority**, not cosmetic. Parse EXIF on ingest, strip/quarantine by default, allow override. |
| Thumbnail / preview generation | No image pipeline (no Pillow). Async, idempotent worker job. | Planned | High | L | 1 | Derived thumbnail must inherit parent privacy — no bypass path. |
| Image reference regions | Mark the rectangle of a census image that supports a Citation. | Missing | Med | M | later | Tenant-scoped, full CRUD; region→Citation preferred over region→Person. |
| Photo/face tagging (manual) | Multi-person tagging via single FK today. | Missing | Med | XL(ML)/M(manual) | later | Owner-only, in-deployment; face tags inherit redaction (NN#3); full CRUD. |
| Mobile photo scanning + auto-split | Shoebox digitization. | Missing | Med | L | later | Reuse privacy-gated upload + EXIF strip. |
| AI photo dating / colorize / restore / animate / narrate | Model-driven media features. | Missing | Low | LXL | 4+ | Must route through ModelProvider (NN#7), require approval (NN#1), preserve original; animating living faces raises consent issues. |
| British Library / paywalled archives, pay-per-view credits | Licensed content + metering. | Missing | Low | XL | — | **Out of scope** — conflicts with NN#6 and the self-host model. |
---
### 2.5 DNA & genetic genealogy
DNA is an **explicit PRD non-goal / open question** — treat as parked, not a backlog to grind through. Across every DNA row the rule is uniform: **a user uploading their own export is permissible; vendor connectors/scrapers (23andMe / Ancestry / MyHeritage / GEDmatch) are barred (NN#6).** Kits and matches are living-tester PII and route through the privacy engine.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| DNA-confirmed relationship flag | Model DNA confirmation as a Source/Citation backing a Relationship (not free text). | Missing | Med | M | parked | Best sources-first fit (NN#5); full CRUD (NN#8). |
| Raw DNA upload (own file) | User uploads own export; no vendor scraping. | Missing | Med | L | parked | User's own file is fine; vendor connectors barred (NN#6); special-category PII via privacy engine. |
| Kit/Match entities linked to persons | Kit (tester) + Match tied to Person, tenant-scoped/audited. | Missing | Med | M | parked | Kits = living-tester PII (NN#2/#3); full CRUD (NN#8). |
| Autosomal match list, segments, chromosome browser, triangulation, ThruLines/AutoTree, ethnicity/admixture, haplogroups, GEDmatch, NPE detection | Full genetic-genealogy suite. | Missing | LowMed | LXL | parked | DNA scope is an unresolved open question — **surface the dependency, don't build speculatively.** Own-data only (NN#6); cross-user surfacing obeys NN#4. |
---
### 2.6 Maps, places & gazetteers
This category is almost entirely **missing** despite being half the product thesis. The Place model has the right bones (parent_id, lat/long, PlaceName with date ranges) but no API/UI and no maps.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| **Place as usable first-class entity** | Place rows are created by GEDCOM but have **no read/edit/delete** API or UI — a create-only entity. | Partial | High | M | 23 | **Violates NN#8** (create-but-not-edit = bug). Make Place citable too (NN#5). |
| Place autocomplete + picker in event editor | No `/places` router; the event form has no place input, so users can't attach a place at all. | Missing | High | M | 2 | Table stakes; lookup is low-risk. |
| Geocoding (manual coords + forward) | lat/long columns exist; no UI, no geocoder. | Partial | High | M | 3 | Provider via env (NN#7), ToS-compliant (NN#6). |
| Pluggable geocoding provider | Nominatim/GeoNames/Bing/Google swappable. | Missing | Med | L | 3 | Provider+keys via env (NN#7); legal providers only (NN#6). |
| Bulk/batch geocoding (worker) | Geocode hundreds of GEDCOM-imported places. | Missing | Med | M | 3 | Idempotent, rate-limited worker job; provider via env. |
| Place merge/split (dedup) | GEDCOM imports produce near-duplicate place strings. | Missing | High | M | 23 | Needs Place update/delete (NN#8); audited merges. |
| Place-name cleanup tools | Extend the existing preview→apply cleanup UX to places. | Missing | Med | M | 2 | Preview-first + audited like existing cleanup. |
| Standardized-name vs original text | Mirror the verbatim+normalized date pattern for places. | Missing | Med | M | 23 | GEDCOM fidelity. |
| Alternate/historical place names with date ranges | `PlaceName` model exists with valid_from/to but no CRUD and never populated. | Partial | Med | M | 23 | Stored entity with no CRUD surface (NN#8). |
| Interactive map of events & places | No map library in frontend. Core to family+land positioning. | Missing | High | L | 3 | Plot via `person_visibility` so non-owners never see living locations (NN#2/#3). |
| Migration trail / pedigree-birthplace maps | Per-person life path; ancestor birthplace map. | Missing | Med | L | 3 | Redact living subjects for non-owners (NN#3). |
| Bundled world gazetteer | Offline GeoNames-style authority. | Missing | Med | XL | later | GeoNames (CC-BY) verify AGPL-compat; env-configurable. |
| Historical boundary overlays, time slider, heatmaps, radius/nearby, tile-provider switch | Advanced geo. | Missing | LowMed | SXL | later | PostGIS is an open question (ARCH §14) — **surface dependency**, don't adopt silently; tiles legal (NN#6). |
---
### 2.7 Charts, reports & printing
On-screen pedigree/descendant/fan/hourglass charts are **have**. The entire **output/print/report** half is missing — this is the linchpin gap of the category.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| **Multi-format export (PDF / SVG / image / HTML)** | No export/print path, no `@media print`, no `window.print()`. Charts and reports can't leave the screen. | Missing | High | L | 2/6 | **Linchpin.** Generate from privacy-filtered data so living people redacted in shared output (NN#3). |
| Ahnentafel report | Numbered-ancestor report; all data exists. | Missing | High | M | 6 | — |
| Family group sheet / individual summary | Printable summary; data available, needs print layout. | Missing | High | M | 6 | — |
| Narrative descendant/ancestor reports | Multi-standard prose with inline sources. | Missing | High | L | 6 | Cite Sources inline (NN#5); redact living (NN#3). |
| Sentence-template narrative engine | Deterministic fact→prose underpinning reports. | Missing | Med | L | 6 | Keep template-based; report text never mutates tree (NN#1). |
| Photo boxes in charts | Pass privacy-checked media URLs to `setCardDisplay`; CSS already present. | Missing | High | M | 2 | Stream via privacy-checked /media (NN#2/#3). |
| Drag-to-edit / interactive chart canvas | Tree canvas renders but interactive node editing (drag to re-parent, inline edit on the chart) is only partly present. | Partial | Med | M | 2 | Edits go through service layer + audit (NN#1); honor redaction. |
| Statistics dashboard | Surname/place/date distributions + tree-health. | Missing | Med | M | 6 | Reads via privacy engine (NN#2). |
| Kinship/relationship diagram report | Needs path-finding (see §2.3 calculator) + renderer. | Missing | Med | M | 6 | — |
| List reports (sources/places/repos/media) | Printable indexes (current screens are management, not reports). | Missing | Med | M | 6 | — |
| Color-by-lineage, fan overlays, lifespan/timeline charts | Sex-coloring exists; lineage/overlay/timeline don't. | Partial/Missing | Med | M | later | Overlays respect privacy engine. |
| Book/multi-report compiler, wall-chart tiling, page-setup, customizable charts | Print-shop-grade output. | Missing | LowMed | LXL | later | Saved "book" entity = full CRUD (NN#8); honor living-person privacy. |
| Bowtie/couple-rooted/circular-sun/3D/network/calendar | Niche chart variants. | Missing/Partial | LowMed | ML | later | — |
| Print-shop products, XML template engine, blank forms | Commercial/template extras. | Missing | Low | SXL | later | Weak fit for self-host. |
---
### 2.8 Research workflow & automation
The preview→approve **bulk cleanup** tool is a genuine **have** and a differentiator. The missing pieces are the serious-researcher workflow entities.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Data-quality / consistency checker | Extend cleanup beyond name issues: child-before-parent, death-before-birth, implausible ages, orphans, dups; severity tiers. | Partial | High | L | 2 | New auto-fixes keep preview→apply (NN#1). |
| Research log | Searches, repositories visited, negative results, findings — distinct from the system audit log. | Missing | High | M | 6 | Reference reusable Sources (NN#5); tenant-scoped full CRUD (NN#8). |
| To-do / research task planner | Tasks on Person/Tree with status/priority/due/assignment. | Missing | High | M | 6 | Full CRUD in API+UI (NN#8). |
| Source-driven data entry | Start from a Source document and transcribe facts into the tree. | Missing | High | M | 2 | Natural sources-first differentiator (NN#5). |
| Task↔log linkage | FK + joined view once both entities exist. | Missing | Med | S | 6 | Cheap once predecessors land. |
| Family chronology / timeline | Sort merged events; family-wide chronology (parents' marriage, siblings' births). | Partial | Med | M | 2 | Sort is trivial; presentation over privacy-filtered data. |
| Navigation: active person / history / bookmarks | Large trees rely on browser back only. | Missing | Med | M | 2 | Per-user, tenant-scoped, full CRUD; don't expose redacted persons (NN#2/#3). |
| Saved-record shoebox / review queue | Stage candidate records before committing. | Missing | Med | M | 4/7 | Auto-attach via ChangeProposal (NN#1); legal sources (NN#6). |
| Guided research suggestions | Proactive "research next" engine (today only flags problems). | Partial | High | L | 4 | Advisory; writes via ChangeProposal (NN#1); cross-tree via privacy engine (NN#2). |
| Persona-adaptive onboarding | Family Keeper / Serious Researcher / Property Researcher selector (PRD US-002, documented but unbuilt). | Missing | LowMed | L | 2 | Pure presentation. |
| Dashboard widgets, scratchpad, research-link sidebar, blog/narrative authoring, research wiki, crowd indexing | Conveniences. | Missing | Low | SXL | later | Widgets/published narratives read via privacy engine (NN#2/#3). |
---
### 2.9 Collaboration & sharing
Authorization is enforced everywhere, and a **minimal management surface now ships** — list/add/change-role/remove via `api/v1/members.py` plus a members page (#233). The remaining gap is the richer email invite/grant flow. The minimal slice landed at Phase 2 as planned; the invite/email UX stays at Phase 6.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| **Membership PATCH/DELETE + role change (minimal slice)** | Add/adjust/revoke a collaborator and change `role` — GET/PATCH/DELETE on `/trees/{id}/members` (`api/v1/members.py`) plus a frontend members page now ship (#233). Resolves the create-only NN#8 break without the full invite flow. | Have | **Critical** | SM | 2 | Resolves the create-only NN#8 break. Revocation routes through the single privacy point. |
| Full invite/grant flow (email + UI) | Email-based invitations, pending-invite state, role-grant UI, resend/expire. Builds on the minimal slice. | Partial | High | L | 6 | Invitation email via configured SMTP (NN#7); membership changes through the one enforcement point. |
| **Read-only public tree share** | Anonymous read surface shipped: optional-auth `CurrentUserOrNone` dep, `api/v1/public.py` + `public_view_service.py`, and server-rendered pages at `/p/[treeId]` (+ `/persons/[personId]`) and `/explore`. Living-safe by construction via `person_visibility`. | Have | High | M | 2 | Highest-leverage near-term sharing feature; living-safe by construction via `person_visibility` (NN#2/#3). |
| SEO public profile pages (server-rendered) | Server-rendered public pages (`/p/[treeId]`, `/explore`) and `robots.ts` now ship. Deferred follow-ups: a public-only `sitemap.ts` and per-tree `noindex,nofollow` meta for `unlisted`/`site_members` pages. | Partial | Med | L | 2 | NN#2 explicitly names server-rendered public pages — must go through privacy engine, no direct row queries. |
| **Notification / event-dispatch substrate** | Shared enabler seeded from `AuditEntry`: subscription + dispatch layer emitting privacy-filtered projections. Underpins watch/follow, mutual-consent match notices, comments, moderation, and in-app messaging. | Missing | High | L | 6 | **Privacy-filtered projections only — never raw before/after JSON** (NN#2/#3). |
| Comments / discussion threads | Per-profile discussion (target = person/event/source), threaded. | Missing | High | M | 6 | Comments on living persons redacted for non-members (NN#2/#3); rides the dispatch substrate. |
| In-app messaging (contact details hidden) | SMTP exists; no Message/Thread model. | Planned | High | L | 6 | Hide contact details; opens after mutual consent (NN#4); redact living-person content; rides dispatch substrate. |
| Watch/follow + change notifications | `AuditEntry` is the natural event source; needs subscription entity + dispatch (substrate above). | Planned | Med | M | 6 | Notification builder reads via privacy engine, not raw rows. |
| **Optimistic concurrency / lost-update protection** | No version/etag/`updated_at` precondition checks; concurrent multi-user edits can silently clobber. | Missing | High | M | 6 | Full-CRUD + multi-user without this risks lost updates; concurrent paths still route through privacy engine. |
| Pending-changes moderation (human edits) | Queue contributor edits for owner approval — shares infra with the AI ChangeProposal queue. | Missing | Med | L | 6 | **Design together with ChangeProposal** (NN#1). |
| Field-by-field profile merge & approval | WikiTree-style merge center + unmerge with per-field provenance. | Missing | Med | XL | later | Conflicting facts each retain Source/Citation (NN#5). |
| Ownership transfer | `owner_id` is effectively write-once; needed for self-host longevity. A minimal reassignment endpoint is the NN#8 fix. | Missing | Med | M | 6 | **Violates write-once invariant** (NN#8) — importance/phase tension noted; ship the minimal slice when membership lands. |
| Narrative website / HTML export | Static narrated site (reuse public-page renderer). | Missing | Med | L | later | Redact living persons at build time (static bypasses runtime engine) (NN#3). |
| Two-way desktop↔online sync | Bidirectional sync with change journals. Audit log could seed a change feed. | Missing | Med | XL | later | No Ancestry TreeShare / paywalled sync (NN#6). |
| Curator roles, trusted-list ACLs, field locking, projects/workspaces, forum, honor code, free-space wiki, portal homepage | Community-platform features. | Missing | Low | SXL | later | New roles/ACLs/locks integrate with the **single** enforcement point, not parallel checks. |
| Real-time co-editing | Out of scope; only optimistic concurrency planned. | Planned | Med | XL | later | Concurrent paths must route through privacy engine. |
---
### 2.10 Privacy & access control
The architecture is correct (single engine, tenant mixin, audit, soft-delete + purge are **have**), but enforcement coverage and configurability have real holes — two of which are security-priority.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| **Uniform living-person redaction across child resources** | `person_visibility` now runs for non-members on the event, media, name, relationship endpoints (#46) and the citation/source list endpoints, all delegating to `public_view_service`: citations resolve to FULL-visibility person(s); sources show only when they back a visible citation. | Have | High | S | 12 | **Resolved (NN#3/NN#2).** No child-resource path leaks a redacted living person's facts. |
| **Email-verification enforcement gate** | Read-side check now ships (#53): `REQUIRE_EMAIL_VERIFICATION` gates login/session on `email_verified_at` (`auth_service.py`). Opt-in (default off) so SMTP-less self-hosts still work. | Have | **High** | S | 12 | Read-side trust path now enforced (NN#7); the registration-mode switch below is the separate larger piece. |
| Self-registration mode gating (approve / open / closed) | No env switch to choose open vs admin-approval vs closed registration. | Partial | High | M | 2/5 | Twelve-factor registration control (NN#7); pairs with the verification gate above. |
| Instance owner / operator role | `OWNER_EMAIL`-declared operator (#240): `is_instance_owner` on `/users/me`, owner-only `GET /api/v1/admin/instance`, `/admin` UI. | Have | Med | S | 2/5 | Owner-only operational surface, twelve-factor via env (NN#7); reads stay through the service layer. |
| **Fix `site_members` visibility tier** | `can_view_tree` now handles `site_members` (`privacy.py:56`): any authenticated account gets a read view, anonymous is refused. | Have | Critical | S | 1 | Honors the tier the UI offers; reads still route through `person_visibility`. |
| Make `LIVING_RECENCY_YEARS` configurable | Hardcoded 100 at `privacy.py:23`. | Partial | High | S | 2 | **Quick win.** Twelve-factor (NN#7). |
| Privacy-stripped export (redact living) | GEDCOM + account export emit full tree; no "strip living" mode. | Missing | High | M | 2 | Reuse `person_visibility`/`_redact` (NN#3). Owner self-export is safe today; shareable variant is the gap. |
| Per-fact / per-field privacy + record flags | tentative/rejected/preferred/private flags on facts. | Missing | Med | L | later | If added, route through the single engine (NN#2). |
| Granular rules by record type & viewer relationship | webtrees-style "hide marriages from non-descendants". | Missing | Med | L | later | Single enforcement point. |
| OIDC / external IdP login | `AuthProvider` interface ready; only Local implemented. Authentik is the intended real auth. | Planned | High | L | 5 | Additive by design. |
| Two-factor auth (TOTP) | Bearer/cookie session auth is solid; no MFA. | Partial | High | L | 5 | — |
| DB-level audit immutability | Audit is insert-only by convention; no trigger/constraint. Verified as "adequate for self-host," so importance downgraded to match. | Have(soft) | Med | S | 9 | Adequate for self-host; upgrade to trigger only if true immutability is required. |
| Block/hide users, family-group private space, DNA opt-in controls | Depend on messaging/DNA. | Missing | LowMed | MXL | 6/parked | DNA parked (NN#6). |
---
### 2.11 Import/export & standards
GEDCOM 5.5.1 import/export and full data-portability export are **have**; the remaining fidelity gaps (custom tags, PLAC coords/hierarchy, non-UTF-8 encoding) still undercut the provenance thesis.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| **Citation links on GEDCOM export** | Export now selects Citations and emits `SOUR`/`PAGE` per fact (#232), so fact→source links survive a Provenance→Provenance round-trip. (Citation detail/confidence beyond page still to round-trip.) | Have | **Critical** | M | 2 | Closes the silent data-loss / destructive round-trip on the product's signature data (NN#5); satisfies PRD US-013. |
| GEDCOM 7.0 import/export | Version hardcoded `5.5.1`; no v7 semantics, SCHMA, SUBM, or UID handling. | Partial | High | L | 2 | Stated differentiator (FamilySearch interop). |
| Custom/underscore tag preservation | `_MARNM` becomes `TYPE married`, other custom tags dropped — violates ≥99% round-trip goal. | Missing | High | L | 2 | Tension with provenance thesis (faithful record). |
| PLAC FORM hierarchy + MAP coordinate round-trip | Import reads only PLAC text; export emits flat PLAC. lat/long + hierarchy lost on round-trip. | Missing | High | M | 23 | Round-trip fidelity for the land/maps pillar. |
| Encoding detection (ANSEL/UTF-16) | UTF-8 round-trips; non-UTF-8 files silently mangled via `errors='replace'`; CHAR tag ignored. | Partial | High | S | 2 | **Near quick win.** Detect/honor CHAR; reject or transcode rather than corrupt. |
| HEAD completeness | HEAD at `gedcom.py:740` emits only `SOUR/GEDC/VERS/CHAR` — missing required `2 FORM LINEAGE-LINKED` (under GEDC) and `1 SUBM`. | Partial | Med | S | 2 | **Quick win.** Pure conformance. |
| GEDCOM media (OBJE) round-trip | OBJE in skip-tags; media ignored on import, never emitted on export. | Partial | Med | M | 2 | Any media bundle keeps privacy gating. |
| GEDZIP (.gdz) bundle | Bundled-media packaging. | Missing | Med | M | 2 | Natural once v7 + OBJE land. |
| Selective / filtered export | Clippings-cart / branch subset. | Missing | Med | M | later | Maintain single-enforcement-point on export (NN#2). |
| Import conformance validation | Preview is a mapping report, not structural/cardinality validation; bad lines silently skipped. | Partial | Med | M | 2 | — |
| GEDCOM-X, Gramps XML, multi-format import, FHISO/ELF, PRF upload, KML export | Interop extras. | Missing | Low | L | later | PRF needs FamilySearch API (permitted, NN#6). |
---
### 2.12 Mobile & offline
Responsive web is **partial**; PWA and offline-first are absent. Native apps are an explicit deferral.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| PWA (manifest + icons + viewport + service worker) | No manifest, no SW, no `next-pwa`; responsive coverage exists but unaudited on heavy views (tree canvas fixed 74vh). | Partial | High | M | 2 | If SW caches API responses, never retain non-owner PII; cache only what the session is authorized to see (NN#3). |
| Responsive parity audit | 17 breakpoint usages; small-screen parity on tree/person views unverified. | Partial | High | M | 12 | Feature parity is an ARCH requirement. |
| Offline-first editing + reconnect sync | No SW, no local store, no mutation queue. Valuable for archive/courthouse field research. | Missing | High | XL | later | Replayed edits go through service layer + audit (NN#1); cached data respects living-person rule (NN#3). |
| Native mobile apps | Explicitly deferred (responsive web only). | Missing | Med | XL | later | If built, reads through one backend privacy engine (NN#2/#3/#4). |
| Companion app w/ cross-device sync | Largely redundant with server-backed web. | Missing | Low | XL | later | Sync boundary enforces privacy (NN#2); full CRUD parity (NN#8). |
| Relatives Around Me | Nearby-relatives discovery. | Missing | Low | L | later | Explicit opt-in; anonymous until mutual consent (NN#4). |
---
### 2.13 API & extensibility
Internal REST + OpenAPI + generated TS client are **have**. The externalized developer story and the connector/plugin spine are not built.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Public read-only API + scoped tokens (OAuth) | The unauthenticated public read surface (`public.py`) now ships (#41#51), but for a *developer* API the bearer token is still opaque session only and `TokenPurpose` lacks scopes — no scoped/OAuth token path. | Partial | High | L | 56 | Any scoped-token path routes through `person_visibility` + living-person redaction (NN#2/#3). |
| SourceConnector framework | Only AuthProvider/ObjectStore/Mailer base classes exist; no connector base/loader/registry. Gates AI, hints, property connectors. | Planned | Med | L | 4 | Read-only, rate-limited; findings via ChangeProposal (NN#1); legal sources only (NN#6). |
| Webhooks / change feeds | `AuditEntry` is the natural substrate (shares the notification dispatch layer, §2.9); no feed/webhook layer. | Missing | Med | L | 6 | Emit privacy-filtered, tenant-scoped projections — never raw before/after JSON (NN#2/#3). |
| CLI / scripting surface | No `[project.scripts]`, no Typer/Click; worker is a purge loop only. Self-hosters want bulk admin. | Missing | Med | M | 9 | Funnel reads through privacy.py, writes through audit; admin-scoped, no assistant-write path. |
| Plugin/addon architecture | Connector framework only; no general UI/report/theme plugin system planned. | Planned | Med | L | later | Sandbox via service layer; no privacy/audit bypass, no writes outside ChangeProposal. |
| In-app query tooling (SuperTool) | Power-user expression engine. | Missing | Low | L | later | Execute through privacy engine — no row enumeration bypass (NN#2). |
| Certified partner program | Organizational, not software. | Missing | Low | XL | — | Out of scope until a hosted offering exists. |
---
### 2.14 Performance & scale
Postgres + S3, multi-tenant isolation are **have**. Queue, observability, backups, pagination, and scale validation are the gaps that gate Phases 4/7 — several are current functional limitations, not late-phase validation tasks.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Real job queue (Postgres/Redis-backed) | Worker is a fixed-interval purge loop; GEDCOM import and account export run **inline in the request**. | Partial | High | L | 4 (pre-req) | Blocks NN#1 (assistant in worker) and NN#4 (hint matching in worker). Queue backend is an open question (PRD §11). |
| **Pagination on list endpoints + server-side tree loading** | List endpoints (`persons.py:37`, events, relationships) take no `limit/offset/skip`; the tree view loads the whole graph client-side. A *current* limitation against the 50k-person target. | Planned | High | M | 12 | **Split out from scale validation** — this is a correctness/functional gap now, not a Phase 9 task. |
| Scale validation (50k+ trees, P95<2s, load test) | No benchmark or load test exists. | Planned | High | L | 9 | Inline heavy ops risk partial writes — moving to the queue is what makes "failures never corrupt state" true. |
| **Operator backup: one-command `pg_dump` + MinIO sync** | `deploy/backup.sh` + `deploy/BACKUP.md` now provide a scripted DB+object dump (#234). Remaining: scheduled/off-host/verified-restore tooling (row below). | Have | Critical | M | 12 | Restore must re-apply privacy state faithfully (NN#3); safety net for NN#8. |
| Scheduled / cloud automated backup + restore tooling | Cron-driven, off-host, verified-restore workflow. | Partial | High | L | 9 | Builds on the one-command slice above. |
| ARM64 build matrix | CI builds `linux/amd64` only; many self-hosters run ARM SBCs. | Partial | High | S | 1 | **Quick win.** Add arm64 + QEMU to buildx (NN#7 container-native). |
| Structured JSON logs + Prometheus metrics | Plain-text stdlib logging; no `/metrics`. | Partial | Med | M | 9 | Logs/metrics reference UUIDs, never names/PII (NN#3/#4). |
| pgvector enablement | Image has pgvector; app never creates the extension or adds embedding columns (docs claim otherwise). | Partial | Med | M | 7 | See §2.3 — embedding provider open question; candidates via privacy engine. |
| Database check-and-repair | No orphan/dangling-edge/cycle scanner (recent "harden tree render" commit shows bad graphs occur). | Missing | Med | M | 9 | Tenant-scoped + audited; auto-fix via ChangeProposal (NN#1). |
| Pluggable DB backend, billions-scale shared tree, weekly record releases | Different product models. | Missing | Low | XL | — | **Out of scope** — Postgres-only is consistent with the invariants; global shared tree conflicts with NN#2/#3/#4. |
---
### 2.15 Property / land chain-of-title — *headline differentiator*
The entire "land" half is **planned/missing** but fully specified. This is where Provenance has no real competitor.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Property/parcel first-class entity | No model/endpoint/service/migration. Foundation for the whole category. | Planned | High | L | 3 | Full CRUD in API+UI (NN#8); reads added carefully to the **single** privacy engine (NN#2). |
| Typed OwnershipEvents | grant/patent, purchase, sale, inheritance, gift, tax sale, foreclosure, eminent domain — with grantor/grantee Persons + Citation. | Planned | High | L | 3 | Each event carries a Citation (NN#5); grantor/grantee living-person links redacted (NN#3). |
| Chain-of-title timeline + gap flagging | Ordered OwnershipEvents first-grant→present, breaks flagged. | Planned | High | M | 3 | The genuinely differentiating analytical piece (PRD US-032). |
| Bidirectional owner↔person, parcel↔place | "Every property a person held" / "every parcel at a place." | Planned | High | M | 3 | Reverse traversals filtered through privacy engine (NN#2). |
| Citations on OwnershipEvents | Add `ownership_event_id` to Citation (5th target). | Partial | Critical | S | 3 | **Quick win once Property lands** — single FK + CHECK edit (NN#5). |
| Legal description verbatim storage | metes-and-bounds / PLSS township-range-section as-written. | Planned | Med | L | 3 | Part of the Property model; preserves the record faithfully. |
| Parcel/plat boundary geometry | Optional geometry; plain coords first. | Planned | Med | L | 3+ | PostGIS is an open question (ARCH §14) — surface dependency. |
| PLSS / metes-and-bounds parsing → geometry | Automated survey parsing. | Planned | Med | XL | later | Hard; gated on PostGIS. |
| BLM/GLO federal land-patent connector | Marquee US land source. | Planned | High | L | 8 | Permitted source (NN#6); patents surface as ChangeProposals (NN#1); read-only + rate-limited. |
| USGS map + public county-deed connectors | Per-jurisdiction grantor/grantee indexes. | Planned | Med | L | 8 | Each connector verifies a legally open source (NN#6). |
| Co-ownership roles / tenure types | joint tenants, TIC, life estate, heirs. | Planned | Low | M | later | Multiple parties likely free with OwnershipEvent; role typing is a refinement. |
| Tax/assessment rolls, UK Tithe, Lloyd George Domesday | Valuation + non-US collections. | Missing | Low | ML | — | US-focused v1; international formats out of scope (model is country-agnostic). |
---
### 2.16 AI assistant — *defining differentiator*
The spine has now **landed**: the `ChangeProposal` model/schema/service, its migration, the GET/POST API, and a review UI all ship, and the `LLMProvider`/`EmbeddingProvider` abstraction with null/Anthropic/OpenAI-compat (OpenAI/xAI/Ollama) providers + registry is in place. The audit substrate (`actor_type=assistant`, before/after JSONB) is the right foundation; the remaining work is wiring the assistant's tools to emit proposals and building the chatbot/RAG surface on top.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| **ChangeProposal (propose-then-confirm)** | The defining invariant. Model/schema/service (`models/change_proposal.py`, `services/change_proposal_service.py`), migration `a1b2c3d4e5f6`, GET/POST `api/v1/proposals.py`, and a `/trees/[id]/proposals` review UI all ship. Remaining: wire assistant tools to emit proposals. | Have | **Critical** | L | 4 | **IS NN#1.** Enforce structurally: assistant tools return proposals; only user action applies one; application flows through the normal service layer (privacy + audit). ChangeProposal itself needs full CRUD (NN#8). |
| Pluggable LLM + embedding provider | `LLMProvider`/`EmbeddingProvider` ABCs (`integrations/models/base.py`) with null, Anthropic, and OpenAI-compat (OpenAI/xAI/Ollama) implementations + registry. | Have | Critical | M | 4 | **Twelve-factor, no hard-coded keys/endpoints** (NN#7); the Ollama/self-hosted path is what makes the privacy-first promise real. |
| Per-tree AI model policy | Owner-only per-tree model selection (`Tree.ai_member_provider`/`ai_recommender_provider`, GET/PATCH `/trees/{id}/ai`, `/trees/[id]/ai` UI) (#238). | Have | Med | S | 4 | Owner-only; selects which configured provider a tree uses — keys stay in env, twelve-factor (NN#7). |
| AI research-assistant chatbot (RAG over tree) | Marquee feature; needs ModelProvider + connector + retrieval through privacy engine. | Planned | High | XL | 4 | NN#1 propose-only, NN#2 privacy retrieval, NN#3 redaction. |
| Conversational / connector record search | Search legal sources via the assistant. | Planned | High | L | 4 | Legal sources (NN#6); findings = Source + Citation (NN#5). |
| Fact extraction from documents | Extracted facts map cleanly to ChangeProposal review. | Missing | Med | M | 4 | Canonical NN#1 use case; each fact carries a Citation (NN#5). |
| OCR/HTR transcription + document translation | Worker job via ModelProvider. | Missing | Med | L | 4+ | Output → Source/Citation (NN#5); via ModelProvider (NN#7); auto-extraction emits ChangeProposal (NN#1). |
| Next-step research guidance | Gap analysis → suggested next record. | Planned | Med | M | 4 | Reads via privacy engine; advisory unless it queues fetches. |
| AI biography / audio narration | Read-only generation grounded in tree data. | Missing | Low | ML | later | Must not leak living-person PII (NN#3); via ModelProvider (NN#7); stored biographies = full CRUD (NN#8). |
---
### 2.17 Localization & accessibility
A documented **day-one commitment** ("UI strings externalized from day one") that is currently **unmet** — every label is a hardcoded literal. Correct the PRD claim or close the gap.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| **UI string externalization** | No i18n lib, no message catalogs; all copy hardcoded in TSX. Gating prerequisite; cheapest to do now while the surface is small. | Missing | High | L | 12 | PRD §6 promises this "from day one" — **docs-vs-code gap; edit the doc now.** |
| Multi-language UI (4060+ langs) | Translation pipeline after externalization (frontend + backend-generated messages). | Missing | High | XL | later | Table stakes across all competitors. |
| Accessibility / WCAG 2.2 AA | Some ARIA/focus styling; no CI a11y audit, no skip-links, SVG tree viz not keyboard/screen-reader navigable. | Partial | High | L | 2/9 | Stated PRD §6 target; add axe/pa11y in CI; accessible alternate to the chart. |
| Unicode-correct non-Latin names | Stores fine (UTF-8); no NFC normalization on write, no locale-aware collation, no romanized search. | Partial | High | M | 2 | Apply `unicodedata.normalize('NFC')` on input; add COLLATE; supports faithful-record goal. |
| Structured/compound surname components | Surname is a single field; no support for Spanish/Portuguese paternal+maternal, Arabic nasab, particles/prefixes. | Missing | Med | M | 2 | New Name sub-fields ship with full CRUD (NN#8); preserves the name as recorded. |
| Non-Gregorian calendar dates | `calendar` column is a placeholder; GEDCOM calendar escapes never parsed/populated. | Partial | Med | L | 2 | Preserve original calendar as recorded (sources-first). |
| Language tags / romanized variants per name | No language_tag/script/romanized fields; GEDCOM ROMN/LANG unhandled. | Missing | Med | M | 2 | New Name sub-fields ship with full CRUD (NN#8). |
| RTL support | `lang="en"` hardcoded, no `dir`, physical CSS properties throughout. | Missing | Med | M | later | Convert to logical CSS properties; cheaper once i18n exists. |
| Selectable themes | Light/dark/system works; brand palette intentionally single. | Partial | Med | M | later | Confirm whether additional themes are a deliberate non-goal (brand guide constrains palette). |
| Multi-language report/diagram output | Depends on i18n + reports, neither shipped. | Missing | Low | L | later | — |
---
## 3. Quick wins (high importance / low effort)
Ordered by leverage. All are S-effort or a thin slice of a larger item, and most close a stated invariant gap.
1. **Fix `site_members` visibility tier** (Privacy, Critical/S) — **done:** `can_view_tree` now handles `site_members` (`privacy.py:56`), giving any authenticated account a read view while refusing anonymous.
2. **Email-verification enforcement gate** (Privacy/Auth, High/S) — **done (#53):** the read-side `email_verified_at` check now ships behind `REQUIRE_EMAIL_VERIFICATION`, so a freshly registered, unverified user doesn't get a live authenticated session. The registration-mode env switch (open/approve/closed) is the larger follow-on (§2.10, M-effort — not a quick win).
3. **Citation confidence selector in the cite form** (Sources, High/S) — confidence is modeled and API-writable but unreachable in the UI; every UI citation is currently NULL. Honors NN#8 and the evidence-quality thesis.
4. **Source edit UI + expose all 8 fields** (Sources, High/S) — update API exists but there is no edit form and create exposes ~3 fields; a create-but-not-edit entity violates NN#8.
5. **Make `LIVING_RECENCY_YEARS` env-configurable** (Privacy, High/S) — hardcoded 100 at `privacy.py:23`; twelve-factor (NN#7).
6. **Add `ownership_event_id` to Citation** (Property/Sources, Critical/S) — single FK + CHECK-constraint edit the moment Property lands; the spine is already built (NN#5).
7. **GEDCOM encoding detection** (Standards, High/S) — detect/honor the CHAR tag; reject or transcode ANSEL/UTF-16 rather than silently mangling with `errors='replace'`.
8. **GEDCOM HEAD completeness** (Standards, Med/S) — emit the required `2 FORM LINEAGE-LINKED` (under GEDC) and `1 SUBM` at `gedcom.py:740`. Pure conformance.
9. **ARM64 CI build matrix** (Perf/Scale, High/S) — add `linux/arm64` + QEMU to buildx for both images; many self-hosters run ARM SBCs.
10. **`GET /{tree}/citations/{id}` endpoint** (Sources, Med/S) — API symmetry (NN#8).
11. **Transcription/abstract fields on Source** (Sources, Med/S) — add `transcription_text` + `abstract_text`, distinct from `citation_text`; core to evidence analysis.
12. **Sort the merged person timeline** (Research workflow, Med/S) — `shownEvents.sort()` on `date_start`; currently appended unsorted.
13. **Doc corrections (docs-vs-code)** (Meta, trivial/S) — edit CLAUDE.md / ARCHITECTURE so the pgvector "used" claim and the i18n "from day one" claim match reality. The repo convention requires docs to travel with code.
> **Shipped this cycle:** the **media privacy leak** (§2.4) and the **child-resource redaction gap** (§2.10) are fully closed — person/event/media/name/relationship (#46) and citation/source endpoints all apply `person_visibility` for non-members. No residual living-person leak on the read surface.
---
## 4. Strategic differentiators
Where to invest to make Provenance distinct rather than a webtrees clone. Each leans on a non-negotiable as a *feature*, not a constraint.
**1. Property chain-of-title (the "land" half).** No surveyed competitor models ownership as a typed, cited event chain tying parties across time, with gap-flagging and bidirectional owner↔person / parcel↔place traversal, fed by **legal** public sources (BLM/GLO patents, USGS, public county deeds). This is the single clearest "no one else does this" capability. Sequence: Property + OwnershipEvent + Citation-target (Phase 3) → chain-of-title view → BLM/GLO connector (Phase 8). The Citation extension is a quick win; the entity is the prerequisite for everything else in the category.
**2. The ChangeProposal AI model.** "The assistant never writes autonomously" is a *trust* differentiator in a market where users fear AI corrupting their research. The structural spine has **landed** — the `ChangeProposal` model/API/review UI and the pluggable `LLMProvider`/`EmbeddingProvider` abstraction both ship — so the remaining work is wiring the assistant's tools to emit proposals (never mutating directly). Assistant tools return proposals; only an explicit human action applies one; application flows through the normal service layer so it always hits the privacy engine and audit log. The same approval queue moderates untrusted human-contributor edits (Collaboration §2.9), so design them together.
**3. Anonymous, mutual-consent cross-tree hints.** The privacy model already redacts living people for anonymous viewers, so a hint system that reveals *nothing identifying* until both sides opt in is achievable by construction — and is a categorically more trustworthy version of MyHeritage Smart Matches / Ancestry hints. Requires the matching engine (pgvector enablement + candidate generation, Phase 7), the notification/event-dispatch substrate (§2.9), and the messaging channel that opens only post-consent.
**4. True self-hosting + data ownership.** Full account export/import, soft-delete recovery (with owner-confirmed on-demand purge to delete a trashed tree immediately rather than waiting out the 30-day window), GEDCOM round-trip, env-driven everything, a one-command operator backup, and (to-build) scheduled off-host backup + ARM support make Provenance the genealogy app you actually own. The two correctness items that gated the promise have **landed**: GEDCOM export now preserves citations (the Provenance→Provenance round-trip keeps the sources graph), and operator backup moved from "documented procedure" to a one-command dump (`deploy/backup.sh`). What remains is scheduled/verified-restore tooling and ARM builds. The Ollama/self-hosted ModelProvider path means even the AI assistant runs without tree data leaving the deployment — a promise no commercial competitor can make.
**5. Sources-first as a felt experience.** The two-tier model is built, and citations now **survive GEDCOM export** (#232); the remaining differentiator is making sourcing *visible and low-friction*: a guided Evidence-Explained citation builder, transcription/abstract fields, source-driven data entry (transcribe a document into the tree), and per-fact confidence surfaced in the UI. These turn "every fact links to where it came from" from an architecture note into the product's personality.