# Provenance — Product Backlog > Status legend: **Have** (shipped) · **Partial** (substrate exists, surface incomplete) · **Planned** (on roadmap, no code) · **Missing** (no code, off roadmap). > Importance: Critical / High / Medium / Low. Effort: S / M / L / XL. > Phase references map onto the existing 0–9 roadmap. "NN#" = non-negotiable invariant. --- ## 1. Executive summary **Where Provenance is strong today.** The foundation is genuinely solid and, in several places, ahead of the OSS field: - **Sources-first spine is real.** A reusable `Source` + per-fact `Citation` two-tier model with a `exactly_one_target` CHECK constraint, confidence enum, and full backend CRUD. This is the architectural thing webtrees/Gramps get right and most commercial tools bury. - **Privacy architecture is the right shape — and coverage is now broad.** A single `privacy.py` engine, `TenantScoped` mixin on every row, living-person heuristic (`is_possibly_living`, unknown-birth-treated-as-living), and media served **through the backend rather than via raw S3 URLs**. Non-member reads of persons, events, media, names, and relationships all route through `person_visibility` (#46). The remaining gap is the `citation`/`source` list endpoints, which still gate only on `can_view_tree` — see §2.10. - **Non-destructive by design.** Soft-delete with timed purge worker, immutable `AuditEntry` (before/after JSONB, `actor_type` ready for the assistant), GEDCOM merge that copies rather than overwrites, full account export/import. - **Modeling maturity.** Typed parent/child qualifiers (biological/adoptive/step/foster/donor/guardian), typed alternate names with one-primary invariant, dual verbatim+normalized dates, duplicate-relationship guards, UUID surrogate keys. - **Standards core.** GEDCOM 5.5.1 import/export is **functional** (with preview/merge-vs-create resolution UI), pg_trgm fuzzy name search, multi-tenant tree hosting with visibility tiers. Round-trip *fidelity* has three tracked gaps (custom tags, PLAC coords/hierarchy, non-UTF-8 encoding) — see §2.11. **Documentation-vs-code gaps to correct now (per "docs travel with code").** Two repo claims are not yet true and should be edited in the same spirit they were written: - **pgvector is claimed as used; it is not.** Only `pg_trgm` is created. ARCHITECTURE references pgvector for match ranking. - **i18n "from day one" is documented but unmet.** PRD §6 promises externalized strings; every label is a hardcoded literal. These two doc edits are themselves trivial quick wins (see §3). **The biggest gaps vs commercial (Ancestry / MyHeritage / FamilySearch).** Provenance is not trying to be a record provider, and correctly so — but it is missing several things mainstream users treat as table stakes: - **No record hints, no "save to tree," no connector framework.** The entire SourceConnector layer (FamilySearch/Find A Grave/WikiTree) is unbuilt — this gates AI search, hints, and auto-citation. - **No person merge outside GEDCOM import.** Merging duplicate people is fundamental hygiene and is currently impossible in-tree — the single highest-value near-term matching gap. - **No maps at all.** No place autocomplete, no geocoding, no interactive/migration/birthplace maps — a glaring hole for an app whose thesis is *family **and** land*. - **No report/print/PDF output.** Charts render on-screen only; there is no Ahnentafel, family group sheet, narrative report, or any PDF/SVG/HTML export. The whole "Charts, reports & printing" category is on-screen-viewing only. - **DNA absent** (deliberately parked — treat as open question, not a gap). **The biggest gaps vs OSS (GRAMPS / Gramps Web / webtrees).** These are where a privacy-first self-host product is expected to compete and currently trails: - **Collaboration management is now reachable, but minimal.** `TreeMembership` roles are enforced on every read/write, and a list/add/change-role/remove API + UI now ship (§2.9), satisfying the full-CRUD invariant (NN#8). The remaining gap is the richer **email invite/grant flow** (pending-invite state, resend/expire), still scheduled for Phase 6. - **Living-person redaction is now near-uniform.** Non-member reads of persons, events, media, names, and relationships all redact possibly-living people (#46); the `citation`/`source` list endpoints are the remaining hold-outs (they gate only on `can_view_tree`) — a narrowed PII gap on public/unlisted trees (NN#3, NN#2). - **No place as a usable first-class entity** (model exists, created by GEDCOM, but no read/edit/delete — a create-only entity, which is a bug per NN#8). - **No research log, to-do/task planner, kinship calculator, data-quality checker, or i18n/string externalization** (the last is a documented day-one commitment that is currently unmet). **Security-priority correctness fixes (do these first, regardless of phase).** Most of the original redaction defects shipped this cycle (#46); two items remain — one a narrowed PII gap, one a config switch: 1. **Citation/source redaction gap (§2.10)** — `list_media`/`get_media`/`media_content`, plus the event/name/relationship endpoints, now apply `person_visibility` for non-members (#46), closing the media leak. The `citation`/`source` list endpoints still gate only on `can_view_tree`, so a non-member on a public/unlisted tree can still enumerate citations/sources tied to redacted living people — the remaining living-person leak. 2. **Self-registration approval-mode switch (§2.10)** — the read-side enforcement now exists: `REQUIRE_EMAIL_VERIFICATION` gates login/session on `email_verified_at` (#53). The remaining gap is the env switch to choose open vs admin-approval vs closed self-registration. **Strategic posture.** The differentiators worth pressing — property chain-of-title, the ChangeProposal AI model, the anonymous mutual-consent hint system, and true self-host data ownership — are mostly still ahead on the roadmap. The near-term job is (a) close the **privacy/auth correctness** and **collaboration** gaps that the architecture already implies, (b) ship the **maps + reports + merge** table stakes, and (c) finish the back-half spine — the **connector framework** plus wiring the now-landed **ChangeProposal/ModelProvider** into the assistant — that unlocks the entire back half of the roadmap. --- ## 2. Backlog by category ### 2.1 Tree & data model Core CRUD, typed relationships, dates, soft-delete, and naming are **have**. Remaining work is about reusable sub-entities, shared/event-centric modeling, and research-grade conveniences. | Item | Description | Status | Imp | Eff | Phase | Non-negotiable | |---|---|---|---|---|---|---| | Repository as first-class entity | Promote `Source.repository` string to a reusable `Repository` (name/address/call-numbers) with dedup. | Partial | Med | M | 1–2 | If promoted, full CRUD in API+UI (NN#8) — don't half-build. | | Note as first-class entity (SNOTE) | Promote inline `notes` text fields to reusable shared `Note`/SNOTE records. | Missing | Low | M | 2 | Full CRUD; GEDCOM 7 round-trip parity. | | Shared/event-centric model + witnesses | Remove the `subject_person_xor_relationship` XOR; add participant/role join so one event has many people (FAN/cluster research). | Missing | Med | M | later | Unlocks FAN club + richer sourcing; participants must redact via privacy engine. | | Non-family associations (FAN) | Add associate/neighbor relationship types; best delivered with shared-event participants. | Missing | Low | M | later | — | | Relationship-status enum | Add married/divorced/annulled status on partnership rather than inferring from events. | Partial | High | M | 1–2 | — | | Family/couple unit (GEDCOM FAM) | Persist a true FAM entity (own ID/sources, childless couples) instead of rebuilding on export. | Partial | High | L | 2 | Improves GEDCOM fidelity. | | Kinship / relationship calculator | "How is A related to B" path + cousinship. Graph edges already exist. | Missing | High | M | 1–2 | Self-contained; reads via privacy engine. | | **Read-only audit-log viewer / activity feed** | Surface `AuditEntry` as a per-tree/per-person change feed. Smaller and higher-leverage than value-level undo; partially satisfies NN#8's "read" for AuditEntry and is the substrate for watch/follow + webhooks. | Missing | High | M | 2 | Privacy-filtered projections only — never raw before/after JSON to non-members (NN#2/#3). | | Per-field revision history + restore-prior-value | Value-level history view + undo, built atop the audit feed above. | Partial | High | L | 6 | Audit-log *UI* is the feed item; this is the larger value-level-undo work (NN#8 correction ethos). | | Color-coded tags & custom labels | Tag people for lineages/research-status/grouping. | Missing | Med | M | 2 | Full CRUD; tenant-scoped. | | Person timeline / LifeStory | Sort the merged event list; add place/age enrichment + narrative presentation. | Partial | Med | M | 2 | Sort is trivial (`localeCompare` on `date_start`); narrative is the larger piece. | | Multi-calendar normalization | Store + parse Julian/Hebrew/French Republican (only `calendar` tag stored today, only Gregorian normalized). | Partial | Low | M | 2 | See also Localization §2.17. | | Evidence/persona vs conclusion model | GEDCOM-X persona layer separate from conclusion person. | Missing | Med | XL | later | Large modeling change; strengthens sourcing + hint matching. | | Negative assertions | Boolean "event did not happen" on Event. | Missing | Low | S | 2 | Cheap interop nicety. | | Custom groups / networks | Named manual or rules-based groupings. | Missing | Low | M | later | Lower priority than tags. | | Raw GEDCOM record editor / configurable fact tabs | webtrees-style raw editor + fact-type registry. | Partial | Med | L | later | Open vocabularies give de-facto custom facts today. | | Health/medical, historical-facts index, LDS ordinances | Niche entities. | Missing | Low | M–L | later | LDS BAPL/ENDL/SLGS should map to distinct types if ever pursued; medical is special-category PII. | --- ### 2.2 Sources & citations The two-tier model is **have** and production-grade on the backend. The gaps are almost all UI/CRUD-completeness and the connector-dependent "save to tree" flows. | Item | Description | Status | Imp | Eff | Phase | Non-negotiable | |---|---|---|---|---|---|---| | Citation confidence selector in UI | Confidence enum is modeled + API-writable but the `citeControl` form never sets it — every UI citation is NULL confidence. | Partial | High | S | 1 | **Quick win.** Full CRUD in UI (NN#8); reinforces evidence-quality thesis. | | Source edit UI + all 8 fields | Source UI is add/list/delete only and create exposes ~3 of 8 fields (no author/source_type/publication_info/quality_note/citation_text). | Partial | High | S | 1 | Update API exists but no edit form — violates NN#8. | | `GET /{tree}/citations/{id}` | Citation API has list but no single-read endpoint. | Partial | Med | S | 1 | API symmetry (NN#8). | | Transcription / abstract / extract fields | Add `transcription_text` + `abstract_text` to Source; don't conflate with `citation_text` (GEDCOM SOUR.TEXT currently dumped into citation_text). | Missing | Med | S | 1–2 | **Quick win.** Central to evidence analysis; full CRUD (NN#8). | | Evidence-Explained guided citation builder | Structured fields → formatted citation (Chicago/MLA/APA) instead of hand-typed `citation_text`. | Missing | High | L | 2 | Signature provenance feature; citation_text should be generated, not typed. | | Citations on OwnershipEvents | Add `ownership_event_id` to Citation + extend CHECK to 5 targets when property lands. | Partial | Critical | S | 3 | **Quick win once Property exists** — single FK + constraint edit (NN#5). | | Record-to-source attachment ("save to tree") | Search a connector record and attach its facts. | Missing | High | XL | 4 | Gated on connector framework; assistant attach must emit ChangeProposal (NN#1); legal sources only (NN#6). | | Source Linker (one record → many persons) | Bulk-attach a record's facts across people. | Missing | Med | L | 4 | Downstream of connectors; reads/writes via service layer. | | Auto-citation on save/match | Generate citation when a hint/record is confirmed. | Missing | Med | L | 4/7 | Blocked on connectors + hints; ChangeProposal if assistant-driven. | | Memories-as-sources (cite a photo directly) | Allow media to be a citation target, not only attachable to a Source. | Partial | Low | M | 2 | Reads stay on privacy-checked media endpoint (NN#2). | | GPS / Proof-Standard reasoning artifact | Container linking sources/citations into a proof narrative reconciling conflicts. | Missing | Med | L | later | Serious-researcher differentiator; full CRUD (NN#8). | | Proprietary record collections | 1921 census, UK sets, etc. | Missing | Low | XL | — | **Out of scope** — conflicts with NN#6 / self-host. Do not pursue. | --- ### 2.3 Search & matching Fuzzy trigram name search is **have**; everything that depends on connectors, embeddings, or multiple populated trees is planned/missing. The standout near-term gap is **in-tree person merge**. | Item | Description | Status | Imp | Eff | Phase | Non-negotiable | |---|---|---|---|---|---|---| | Standalone duplicate detection | Lift the GEDCOM `_best_match` logic into a "find duplicates in my tree" scan. | Partial | High | M | 2 | Logic already written; results via privacy engine (NN#2). | | Interactive two-person merge (side-by-side, field-select, undo) | General merge of duplicate persons with citation re-pointing — impossible outside import today. | Partial | High | L | 2 | **Highest-value matching gap.** Preserve + re-point Citations (NN#5); write-once is a bug (NN#8). | | Advanced search (wildcards, boolean, date/place facets, sort) | Search exposes only `?q`. | Partial | High | M | 2 | Keep per-person privacy filter in the search loop (NN#2). | | Phonetic matching (Soundex/Metaphone/DM) | Enable `fuzzystrmatch`; trigram is char-similarity, not phonetic. | Partial | High | M | 2 | Pure utility. | | Semantic / vector search (pgvector) | **Docs claim pgvector is used; it is not** — only pg_trgm extension is created. Add `CREATE EXTENSION vector` + embedding columns (and correct the docs). | Missing | Med | L | 7 | Embedding provider is an open question (PRD §11) — don't pick silently; candidates via privacy engine. | | Tree-to-tree matching (Smart Matches) | Cross-tree candidate generation + ranking. | Planned | High | XL | 7 | Anonymous until mutual consent (NN#4); living-person protection (NN#3). | | Mutual-consent match notification | Anonymous notification, reveal only after both opt in. | Planned | High | L | 7 | **Mandated invariant**, not a toggle (NN#4, NN#3); rides the notification substrate (§2.9). | | Match confirm/reject + "not a match" memory | Persistent rejected-match store (today scoring lives only inside import). | Partial | High | M | 7 | Prevents re-notifying once hints land. | | External search deep-links | Pre-fill FamilySearch/Find A Grave/BLM-GLO search URLs from a person's name/dates/place. | Missing | Med | M | 2–4 | **High value, low risk** before full connectors; legal targets only (NN#6). | | **Automated record hints** | Proactive per-person record suggestions from connectors — a marquee mainstream feature. | Missing | High | XL | 7 | Connector-gated (NN#6); surfaced anonymously where cross-tree (NN#4); attach via ChangeProposal (NN#1). | | Jurisdiction-aware record-search hints | Map place/jurisdiction → relevant collections. Place hierarchy is a ready foundation. | Missing | Med | L | 8 | Suggested collections must be legal (NN#6). | | Cross-language / transliteration matching | Cyrillic/Hebrew/CJK ↔ Latin. | Missing | Med | XL | later | See Localization. | | Record Detective, newspaper matches, collection catalog, GQL query builder, OCR full-text search | Connector/record-layer dependent. | Missing/Planned | Low–Med | L–XL | 4/7/8 | All gated on the connector framework; any query path runs through privacy engine (NN#2). | --- ### 2.4 Media & documents Universal media attachment is **have**; the earlier privacy leak is now **closed** (#46), and the remaining gaps are the asset-processing pipeline (EXIF strip, thumbnails). | Item | Description | Status | Imp | Eff | Phase | Non-negotiable | |---|---|---|---|---|---|---| | **Media privacy gating on serve paths** | `list_media`/`get_media`/`media_content` now apply `person_visibility` for non-members (#46): media is exposed only when linked to a FULL-visibility person (`list_public_media`/`can_view_media`), so living-person photos no longer leak on public/unlisted trees. | Have | **Critical** | M | 1 | **Resolved (NN#3/NN#2).** Serve paths check attached `person_id` visibility and 404 otherwise. | | EXIF / GPS stripping on upload | Raw bytes stored verbatim; family photos leak GPS/home addresses/timestamps. | Planned | High | M | 1 | **Security-priority**, not cosmetic. Parse EXIF on ingest, strip/quarantine by default, allow override. | | Thumbnail / preview generation | No image pipeline (no Pillow). Async, idempotent worker job. | Planned | High | L | 1 | Derived thumbnail must inherit parent privacy — no bypass path. | | Image reference regions | Mark the rectangle of a census image that supports a Citation. | Missing | Med | M | later | Tenant-scoped, full CRUD; region→Citation preferred over region→Person. | | Photo/face tagging (manual) | Multi-person tagging via single FK today. | Missing | Med | XL(ML)/M(manual) | later | Owner-only, in-deployment; face tags inherit redaction (NN#3); full CRUD. | | Mobile photo scanning + auto-split | Shoebox digitization. | Missing | Med | L | later | Reuse privacy-gated upload + EXIF strip. | | AI photo dating / colorize / restore / animate / narrate | Model-driven media features. | Missing | Low | L–XL | 4+ | Must route through ModelProvider (NN#7), require approval (NN#1), preserve original; animating living faces raises consent issues. | | British Library / paywalled archives, pay-per-view credits | Licensed content + metering. | Missing | Low | XL | — | **Out of scope** — conflicts with NN#6 and the self-host model. | --- ### 2.5 DNA & genetic genealogy DNA is an **explicit PRD non-goal / open question** — treat as parked, not a backlog to grind through. Across every DNA row the rule is uniform: **a user uploading their own export is permissible; vendor connectors/scrapers (23andMe / Ancestry / MyHeritage / GEDmatch) are barred (NN#6).** Kits and matches are living-tester PII and route through the privacy engine. | Item | Description | Status | Imp | Eff | Phase | Non-negotiable | |---|---|---|---|---|---|---| | DNA-confirmed relationship flag | Model DNA confirmation as a Source/Citation backing a Relationship (not free text). | Missing | Med | M | parked | Best sources-first fit (NN#5); full CRUD (NN#8). | | Raw DNA upload (own file) | User uploads own export; no vendor scraping. | Missing | Med | L | parked | User's own file is fine; vendor connectors barred (NN#6); special-category PII via privacy engine. | | Kit/Match entities linked to persons | Kit (tester) + Match tied to Person, tenant-scoped/audited. | Missing | Med | M | parked | Kits = living-tester PII (NN#2/#3); full CRUD (NN#8). | | Autosomal match list, segments, chromosome browser, triangulation, ThruLines/AutoTree, ethnicity/admixture, haplogroups, GEDmatch, NPE detection | Full genetic-genealogy suite. | Missing | Low–Med | L–XL | parked | DNA scope is an unresolved open question — **surface the dependency, don't build speculatively.** Own-data only (NN#6); cross-user surfacing obeys NN#4. | --- ### 2.6 Maps, places & gazetteers This category is almost entirely **missing** despite being half the product thesis. The Place model has the right bones (parent_id, lat/long, PlaceName with date ranges) but no API/UI and no maps. | Item | Description | Status | Imp | Eff | Phase | Non-negotiable | |---|---|---|---|---|---|---| | **Place as usable first-class entity** | Place rows are created by GEDCOM but have **no read/edit/delete** API or UI — a create-only entity. | Partial | High | M | 2–3 | **Violates NN#8** (create-but-not-edit = bug). Make Place citable too (NN#5). | | Place autocomplete + picker in event editor | No `/places` router; the event form has no place input, so users can't attach a place at all. | Missing | High | M | 2 | Table stakes; lookup is low-risk. | | Geocoding (manual coords + forward) | lat/long columns exist; no UI, no geocoder. | Partial | High | M | 3 | Provider via env (NN#7), ToS-compliant (NN#6). | | Pluggable geocoding provider | Nominatim/GeoNames/Bing/Google swappable. | Missing | Med | L | 3 | Provider+keys via env (NN#7); legal providers only (NN#6). | | Bulk/batch geocoding (worker) | Geocode hundreds of GEDCOM-imported places. | Missing | Med | M | 3 | Idempotent, rate-limited worker job; provider via env. | | Place merge/split (dedup) | GEDCOM imports produce near-duplicate place strings. | Missing | High | M | 2–3 | Needs Place update/delete (NN#8); audited merges. | | Place-name cleanup tools | Extend the existing preview→apply cleanup UX to places. | Missing | Med | M | 2 | Preview-first + audited like existing cleanup. | | Standardized-name vs original text | Mirror the verbatim+normalized date pattern for places. | Missing | Med | M | 2–3 | GEDCOM fidelity. | | Alternate/historical place names with date ranges | `PlaceName` model exists with valid_from/to but no CRUD and never populated. | Partial | Med | M | 2–3 | Stored entity with no CRUD surface (NN#8). | | Interactive map of events & places | No map library in frontend. Core to family+land positioning. | Missing | High | L | 3 | Plot via `person_visibility` so non-owners never see living locations (NN#2/#3). | | Migration trail / pedigree-birthplace maps | Per-person life path; ancestor birthplace map. | Missing | Med | L | 3 | Redact living subjects for non-owners (NN#3). | | Bundled world gazetteer | Offline GeoNames-style authority. | Missing | Med | XL | later | GeoNames (CC-BY) verify AGPL-compat; env-configurable. | | Historical boundary overlays, time slider, heatmaps, radius/nearby, tile-provider switch | Advanced geo. | Missing | Low–Med | S–XL | later | PostGIS is an open question (ARCH §14) — **surface dependency**, don't adopt silently; tiles legal (NN#6). | --- ### 2.7 Charts, reports & printing On-screen pedigree/descendant/fan/hourglass charts are **have**. The entire **output/print/report** half is missing — this is the linchpin gap of the category. | Item | Description | Status | Imp | Eff | Phase | Non-negotiable | |---|---|---|---|---|---|---| | **Multi-format export (PDF / SVG / image / HTML)** | No export/print path, no `@media print`, no `window.print()`. Charts and reports can't leave the screen. | Missing | High | L | 2/6 | **Linchpin.** Generate from privacy-filtered data so living people redacted in shared output (NN#3). | | Ahnentafel report | Numbered-ancestor report; all data exists. | Missing | High | M | 6 | — | | Family group sheet / individual summary | Printable summary; data available, needs print layout. | Missing | High | M | 6 | — | | Narrative descendant/ancestor reports | Multi-standard prose with inline sources. | Missing | High | L | 6 | Cite Sources inline (NN#5); redact living (NN#3). | | Sentence-template narrative engine | Deterministic fact→prose underpinning reports. | Missing | Med | L | 6 | Keep template-based; report text never mutates tree (NN#1). | | Photo boxes in charts | Pass privacy-checked media URLs to `setCardDisplay`; CSS already present. | Missing | High | M | 2 | Stream via privacy-checked /media (NN#2/#3). | | Drag-to-edit / interactive chart canvas | Tree canvas renders but interactive node editing (drag to re-parent, inline edit on the chart) is only partly present. | Partial | Med | M | 2 | Edits go through service layer + audit (NN#1); honor redaction. | | Statistics dashboard | Surname/place/date distributions + tree-health. | Missing | Med | M | 6 | Reads via privacy engine (NN#2). | | Kinship/relationship diagram report | Needs path-finding (see §2.3 calculator) + renderer. | Missing | Med | M | 6 | — | | List reports (sources/places/repos/media) | Printable indexes (current screens are management, not reports). | Missing | Med | M | 6 | — | | Color-by-lineage, fan overlays, lifespan/timeline charts | Sex-coloring exists; lineage/overlay/timeline don't. | Partial/Missing | Med | M | later | Overlays respect privacy engine. | | Book/multi-report compiler, wall-chart tiling, page-setup, customizable charts | Print-shop-grade output. | Missing | Low–Med | L–XL | later | Saved "book" entity = full CRUD (NN#8); honor living-person privacy. | | Bowtie/couple-rooted/circular-sun/3D/network/calendar | Niche chart variants. | Missing/Partial | Low–Med | M–L | later | — | | Print-shop products, XML template engine, blank forms | Commercial/template extras. | Missing | Low | S–XL | later | Weak fit for self-host. | --- ### 2.8 Research workflow & automation The preview→approve **bulk cleanup** tool is a genuine **have** and a differentiator. The missing pieces are the serious-researcher workflow entities. | Item | Description | Status | Imp | Eff | Phase | Non-negotiable | |---|---|---|---|---|---|---| | Data-quality / consistency checker | Extend cleanup beyond name issues: child-before-parent, death-before-birth, implausible ages, orphans, dups; severity tiers. | Partial | High | L | 2 | New auto-fixes keep preview→apply (NN#1). | | Research log | Searches, repositories visited, negative results, findings — distinct from the system audit log. | Missing | High | M | 6 | Reference reusable Sources (NN#5); tenant-scoped full CRUD (NN#8). | | To-do / research task planner | Tasks on Person/Tree with status/priority/due/assignment. | Missing | High | M | 6 | Full CRUD in API+UI (NN#8). | | Source-driven data entry | Start from a Source document and transcribe facts into the tree. | Missing | High | M | 2 | Natural sources-first differentiator (NN#5). | | Task↔log linkage | FK + joined view once both entities exist. | Missing | Med | S | 6 | Cheap once predecessors land. | | Family chronology / timeline | Sort merged events; family-wide chronology (parents' marriage, siblings' births). | Partial | Med | M | 2 | Sort is trivial; presentation over privacy-filtered data. | | Navigation: active person / history / bookmarks | Large trees rely on browser back only. | Missing | Med | M | 2 | Per-user, tenant-scoped, full CRUD; don't expose redacted persons (NN#2/#3). | | Saved-record shoebox / review queue | Stage candidate records before committing. | Missing | Med | M | 4/7 | Auto-attach via ChangeProposal (NN#1); legal sources (NN#6). | | Guided research suggestions | Proactive "research next" engine (today only flags problems). | Partial | High | L | 4 | Advisory; writes via ChangeProposal (NN#1); cross-tree via privacy engine (NN#2). | | Persona-adaptive onboarding | Family Keeper / Serious Researcher / Property Researcher selector (PRD US-002, documented but unbuilt). | Missing | Low–Med | L | 2 | Pure presentation. | | Dashboard widgets, scratchpad, research-link sidebar, blog/narrative authoring, research wiki, crowd indexing | Conveniences. | Missing | Low | S–XL | later | Widgets/published narratives read via privacy engine (NN#2/#3). | --- ### 2.9 Collaboration & sharing Authorization is enforced everywhere, and a **minimal management surface now ships** — list/add/change-role/remove via `api/v1/members.py` plus a members page (#233). The remaining gap is the richer email invite/grant flow. The minimal slice landed at Phase 2 as planned; the invite/email UX stays at Phase 6. | Item | Description | Status | Imp | Eff | Phase | Non-negotiable | |---|---|---|---|---|---|---| | **Membership PATCH/DELETE + role change (minimal slice)** | Add/adjust/revoke a collaborator and change `role` — GET/PATCH/DELETE on `/trees/{id}/members` (`api/v1/members.py`) plus a frontend members page now ship (#233). Resolves the create-only NN#8 break without the full invite flow. | Have | **Critical** | S–M | 2 | Resolves the create-only NN#8 break. Revocation routes through the single privacy point. | | Full invite/grant flow (email + UI) | Email-based invitations, pending-invite state, role-grant UI, resend/expire. Builds on the minimal slice. | Partial | High | L | 6 | Invitation email via configured SMTP (NN#7); membership changes through the one enforcement point. | | **Read-only public tree share** | Anonymous read surface shipped: optional-auth `CurrentUserOrNone` dep, `api/v1/public.py` + `public_view_service.py`, and server-rendered pages at `/p/[treeId]` (+ `/persons/[personId]`) and `/explore`. Living-safe by construction via `person_visibility`. | Have | High | M | 2 | Highest-leverage near-term sharing feature; living-safe by construction via `person_visibility` (NN#2/#3). | | SEO public profile pages (server-rendered) | Server-rendered public pages (`/p/[treeId]`, `/explore`) and `robots.ts` now ship. Deferred follow-ups: a public-only `sitemap.ts` and per-tree `noindex,nofollow` meta for `unlisted`/`site_members` pages. | Partial | Med | L | 2 | NN#2 explicitly names server-rendered public pages — must go through privacy engine, no direct row queries. | | **Notification / event-dispatch substrate** | Shared enabler seeded from `AuditEntry`: subscription + dispatch layer emitting privacy-filtered projections. Underpins watch/follow, mutual-consent match notices, comments, moderation, and in-app messaging. | Missing | High | L | 6 | **Privacy-filtered projections only — never raw before/after JSON** (NN#2/#3). | | Comments / discussion threads | Per-profile discussion (target = person/event/source), threaded. | Missing | High | M | 6 | Comments on living persons redacted for non-members (NN#2/#3); rides the dispatch substrate. | | In-app messaging (contact details hidden) | SMTP exists; no Message/Thread model. | Planned | High | L | 6 | Hide contact details; opens after mutual consent (NN#4); redact living-person content; rides dispatch substrate. | | Watch/follow + change notifications | `AuditEntry` is the natural event source; needs subscription entity + dispatch (substrate above). | Planned | Med | M | 6 | Notification builder reads via privacy engine, not raw rows. | | **Optimistic concurrency / lost-update protection** | No version/etag/`updated_at` precondition checks; concurrent multi-user edits can silently clobber. | Missing | High | M | 6 | Full-CRUD + multi-user without this risks lost updates; concurrent paths still route through privacy engine. | | Pending-changes moderation (human edits) | Queue contributor edits for owner approval — shares infra with the AI ChangeProposal queue. | Missing | Med | L | 6 | **Design together with ChangeProposal** (NN#1). | | Field-by-field profile merge & approval | WikiTree-style merge center + unmerge with per-field provenance. | Missing | Med | XL | later | Conflicting facts each retain Source/Citation (NN#5). | | Ownership transfer | `owner_id` is effectively write-once; needed for self-host longevity. A minimal reassignment endpoint is the NN#8 fix. | Missing | Med | M | 6 | **Violates write-once invariant** (NN#8) — importance/phase tension noted; ship the minimal slice when membership lands. | | Narrative website / HTML export | Static narrated site (reuse public-page renderer). | Missing | Med | L | later | Redact living persons at build time (static bypasses runtime engine) (NN#3). | | Two-way desktop↔online sync | Bidirectional sync with change journals. Audit log could seed a change feed. | Missing | Med | XL | later | No Ancestry TreeShare / paywalled sync (NN#6). | | Curator roles, trusted-list ACLs, field locking, projects/workspaces, forum, honor code, free-space wiki, portal homepage | Community-platform features. | Missing | Low | S–XL | later | New roles/ACLs/locks integrate with the **single** enforcement point, not parallel checks. | | Real-time co-editing | Out of scope; only optimistic concurrency planned. | Planned | Med | XL | later | Concurrent paths must route through privacy engine. | --- ### 2.10 Privacy & access control The architecture is correct (single engine, tenant mixin, audit, soft-delete + purge are **have**), but enforcement coverage and configurability have real holes — two of which are security-priority. | Item | Description | Status | Imp | Eff | Phase | Non-negotiable | |---|---|---|---|---|---|---| | **Uniform living-person redaction across child resources** | `person_visibility` now runs for non-members on the event, media, name, and relationship endpoints (#46), which delegate to `public_view_service`. Remaining: the `citation`/`source` list endpoints still gate only on `can_view_tree`, so citations tied to a redacted living person are still enumerable. | Partial | High | S | 1–2 | **Mostly resolved (NN#3/NN#2).** Apply `person_visibility` to the citation/source list paths to close the residual leak. | | **Email-verification enforcement gate** | Read-side check now ships (#53): `REQUIRE_EMAIL_VERIFICATION` gates login/session on `email_verified_at` (`auth_service.py`). Opt-in (default off) so SMTP-less self-hosts still work. | Have | **High** | S | 1–2 | Read-side trust path now enforced (NN#7); the registration-mode switch below is the separate larger piece. | | Self-registration mode gating (approve / open / closed) | No env switch to choose open vs admin-approval vs closed registration. | Partial | High | M | 2/5 | Twelve-factor registration control (NN#7); pairs with the verification gate above. | | Instance owner / operator role | `OWNER_EMAIL`-declared operator (#240): `is_instance_owner` on `/users/me`, owner-only `GET /api/v1/admin/instance`, `/admin` UI. | Have | Med | S | 2/5 | Owner-only operational surface, twelve-factor via env (NN#7); reads stay through the service layer. | | **Fix `site_members` visibility tier** | `can_view_tree` now handles `site_members` (`privacy.py:56`): any authenticated account gets a read view, anonymous is refused. | Have | Critical | S | 1 | Honors the tier the UI offers; reads still route through `person_visibility`. | | Make `LIVING_RECENCY_YEARS` configurable | Hardcoded 100 at `privacy.py:23`. | Partial | High | S | 2 | **Quick win.** Twelve-factor (NN#7). | | Privacy-stripped export (redact living) | GEDCOM + account export emit full tree; no "strip living" mode. | Missing | High | M | 2 | Reuse `person_visibility`/`_redact` (NN#3). Owner self-export is safe today; shareable variant is the gap. | | Per-fact / per-field privacy + record flags | tentative/rejected/preferred/private flags on facts. | Missing | Med | L | later | If added, route through the single engine (NN#2). | | Granular rules by record type & viewer relationship | webtrees-style "hide marriages from non-descendants". | Missing | Med | L | later | Single enforcement point. | | OIDC / external IdP login | `AuthProvider` interface ready; only Local implemented. Authentik is the intended real auth. | Planned | High | L | 5 | Additive by design. | | Two-factor auth (TOTP) | Bearer/cookie session auth is solid; no MFA. | Partial | High | L | 5 | — | | DB-level audit immutability | Audit is insert-only by convention; no trigger/constraint. Verified as "adequate for self-host," so importance downgraded to match. | Have(soft) | Med | S | 9 | Adequate for self-host; upgrade to trigger only if true immutability is required. | | Block/hide users, family-group private space, DNA opt-in controls | Depend on messaging/DNA. | Missing | Low–Med | M–XL | 6/parked | DNA parked (NN#6). | --- ### 2.11 Import/export & standards GEDCOM 5.5.1 import/export and full data-portability export are **have**; the remaining fidelity gaps (custom tags, PLAC coords/hierarchy, non-UTF-8 encoding) still undercut the provenance thesis. | Item | Description | Status | Imp | Eff | Phase | Non-negotiable | |---|---|---|---|---|---|---| | **Citation links on GEDCOM export** | Export now selects Citations and emits `SOUR`/`PAGE` per fact (#232), so fact→source links survive a Provenance→Provenance round-trip. (Citation detail/confidence beyond page still to round-trip.) | Have | **Critical** | M | 2 | Closes the silent data-loss / destructive round-trip on the product's signature data (NN#5); satisfies PRD US-013. | | GEDCOM 7.0 import/export | Version hardcoded `5.5.1`; no v7 semantics, SCHMA, SUBM, or UID handling. | Partial | High | L | 2 | Stated differentiator (FamilySearch interop). | | Custom/underscore tag preservation | `_MARNM` becomes `TYPE married`, other custom tags dropped — violates ≥99% round-trip goal. | Missing | High | L | 2 | Tension with provenance thesis (faithful record). | | PLAC FORM hierarchy + MAP coordinate round-trip | Import reads only PLAC text; export emits flat PLAC. lat/long + hierarchy lost on round-trip. | Missing | High | M | 2–3 | Round-trip fidelity for the land/maps pillar. | | Encoding detection (ANSEL/UTF-16) | UTF-8 round-trips; non-UTF-8 files silently mangled via `errors='replace'`; CHAR tag ignored. | Partial | High | S | 2 | **Near quick win.** Detect/honor CHAR; reject or transcode rather than corrupt. | | HEAD completeness | HEAD at `gedcom.py:740` emits only `SOUR/GEDC/VERS/CHAR` — missing required `2 FORM LINEAGE-LINKED` (under GEDC) and `1 SUBM`. | Partial | Med | S | 2 | **Quick win.** Pure conformance. | | GEDCOM media (OBJE) round-trip | OBJE in skip-tags; media ignored on import, never emitted on export. | Partial | Med | M | 2 | Any media bundle keeps privacy gating. | | GEDZIP (.gdz) bundle | Bundled-media packaging. | Missing | Med | M | 2 | Natural once v7 + OBJE land. | | Selective / filtered export | Clippings-cart / branch subset. | Missing | Med | M | later | Maintain single-enforcement-point on export (NN#2). | | Import conformance validation | Preview is a mapping report, not structural/cardinality validation; bad lines silently skipped. | Partial | Med | M | 2 | — | | GEDCOM-X, Gramps XML, multi-format import, FHISO/ELF, PRF upload, KML export | Interop extras. | Missing | Low | L | later | PRF needs FamilySearch API (permitted, NN#6). | --- ### 2.12 Mobile & offline Responsive web is **partial**; PWA and offline-first are absent. Native apps are an explicit deferral. | Item | Description | Status | Imp | Eff | Phase | Non-negotiable | |---|---|---|---|---|---|---| | PWA (manifest + icons + viewport + service worker) | No manifest, no SW, no `next-pwa`; responsive coverage exists but unaudited on heavy views (tree canvas fixed 74vh). | Partial | High | M | 2 | If SW caches API responses, never retain non-owner PII; cache only what the session is authorized to see (NN#3). | | Responsive parity audit | 17 breakpoint usages; small-screen parity on tree/person views unverified. | Partial | High | M | 1–2 | Feature parity is an ARCH requirement. | | Offline-first editing + reconnect sync | No SW, no local store, no mutation queue. Valuable for archive/courthouse field research. | Missing | High | XL | later | Replayed edits go through service layer + audit (NN#1); cached data respects living-person rule (NN#3). | | Native mobile apps | Explicitly deferred (responsive web only). | Missing | Med | XL | later | If built, reads through one backend privacy engine (NN#2/#3/#4). | | Companion app w/ cross-device sync | Largely redundant with server-backed web. | Missing | Low | XL | later | Sync boundary enforces privacy (NN#2); full CRUD parity (NN#8). | | Relatives Around Me | Nearby-relatives discovery. | Missing | Low | L | later | Explicit opt-in; anonymous until mutual consent (NN#4). | --- ### 2.13 API & extensibility Internal REST + OpenAPI + generated TS client are **have**. The externalized developer story and the connector/plugin spine are not built. | Item | Description | Status | Imp | Eff | Phase | Non-negotiable | |---|---|---|---|---|---|---| | Public read-only API + scoped tokens (OAuth) | The unauthenticated public read surface (`public.py`) now ships (#41–#51), but for a *developer* API the bearer token is still opaque session only and `TokenPurpose` lacks scopes — no scoped/OAuth token path. | Partial | High | L | 5–6 | Any scoped-token path routes through `person_visibility` + living-person redaction (NN#2/#3). | | SourceConnector framework | Only AuthProvider/ObjectStore/Mailer base classes exist; no connector base/loader/registry. Gates AI, hints, property connectors. | Planned | Med | L | 4 | Read-only, rate-limited; findings via ChangeProposal (NN#1); legal sources only (NN#6). | | Webhooks / change feeds | `AuditEntry` is the natural substrate (shares the notification dispatch layer, §2.9); no feed/webhook layer. | Missing | Med | L | 6 | Emit privacy-filtered, tenant-scoped projections — never raw before/after JSON (NN#2/#3). | | CLI / scripting surface | No `[project.scripts]`, no Typer/Click; worker is a purge loop only. Self-hosters want bulk admin. | Missing | Med | M | 9 | Funnel reads through privacy.py, writes through audit; admin-scoped, no assistant-write path. | | Plugin/addon architecture | Connector framework only; no general UI/report/theme plugin system planned. | Planned | Med | L | later | Sandbox via service layer; no privacy/audit bypass, no writes outside ChangeProposal. | | In-app query tooling (SuperTool) | Power-user expression engine. | Missing | Low | L | later | Execute through privacy engine — no row enumeration bypass (NN#2). | | Certified partner program | Organizational, not software. | Missing | Low | XL | — | Out of scope until a hosted offering exists. | --- ### 2.14 Performance & scale Postgres + S3, multi-tenant isolation are **have**. Queue, observability, backups, pagination, and scale validation are the gaps that gate Phases 4/7 — several are current functional limitations, not late-phase validation tasks. | Item | Description | Status | Imp | Eff | Phase | Non-negotiable | |---|---|---|---|---|---|---| | Real job queue (Postgres/Redis-backed) | Worker is a fixed-interval purge loop; GEDCOM import and account export run **inline in the request**. | Partial | High | L | 4 (pre-req) | Blocks NN#1 (assistant in worker) and NN#4 (hint matching in worker). Queue backend is an open question (PRD §11). | | **Pagination on list endpoints + server-side tree loading** | List endpoints (`persons.py:37`, events, relationships) take no `limit/offset/skip`; the tree view loads the whole graph client-side. A *current* limitation against the 50k-person target. | Planned | High | M | 1–2 | **Split out from scale validation** — this is a correctness/functional gap now, not a Phase 9 task. | | Scale validation (50k+ trees, P95<2s, load test) | No benchmark or load test exists. | Planned | High | L | 9 | Inline heavy ops risk partial writes — moving to the queue is what makes "failures never corrupt state" true. | | **Operator backup: one-command `pg_dump` + MinIO sync** | `deploy/backup.sh` + `deploy/BACKUP.md` now provide a scripted DB+object dump (#234). Remaining: scheduled/off-host/verified-restore tooling (row below). | Have | Critical | M | 1–2 | Restore must re-apply privacy state faithfully (NN#3); safety net for NN#8. | | Scheduled / cloud automated backup + restore tooling | Cron-driven, off-host, verified-restore workflow. | Partial | High | L | 9 | Builds on the one-command slice above. | | ARM64 build matrix | CI builds `linux/amd64` only; many self-hosters run ARM SBCs. | Partial | High | S | 1 | **Quick win.** Add arm64 + QEMU to buildx (NN#7 container-native). | | Structured JSON logs + Prometheus metrics | Plain-text stdlib logging; no `/metrics`. | Partial | Med | M | 9 | Logs/metrics reference UUIDs, never names/PII (NN#3/#4). | | pgvector enablement | Image has pgvector; app never creates the extension or adds embedding columns (docs claim otherwise). | Partial | Med | M | 7 | See §2.3 — embedding provider open question; candidates via privacy engine. | | Database check-and-repair | No orphan/dangling-edge/cycle scanner (recent "harden tree render" commit shows bad graphs occur). | Missing | Med | M | 9 | Tenant-scoped + audited; auto-fix via ChangeProposal (NN#1). | | Pluggable DB backend, billions-scale shared tree, weekly record releases | Different product models. | Missing | Low | XL | — | **Out of scope** — Postgres-only is consistent with the invariants; global shared tree conflicts with NN#2/#3/#4. | --- ### 2.15 Property / land chain-of-title — *headline differentiator* The entire "land" half is **planned/missing** but fully specified. This is where Provenance has no real competitor. | Item | Description | Status | Imp | Eff | Phase | Non-negotiable | |---|---|---|---|---|---|---| | Property/parcel first-class entity | No model/endpoint/service/migration. Foundation for the whole category. | Planned | High | L | 3 | Full CRUD in API+UI (NN#8); reads added carefully to the **single** privacy engine (NN#2). | | Typed OwnershipEvents | grant/patent, purchase, sale, inheritance, gift, tax sale, foreclosure, eminent domain — with grantor/grantee Persons + Citation. | Planned | High | L | 3 | Each event carries a Citation (NN#5); grantor/grantee living-person links redacted (NN#3). | | Chain-of-title timeline + gap flagging | Ordered OwnershipEvents first-grant→present, breaks flagged. | Planned | High | M | 3 | The genuinely differentiating analytical piece (PRD US-032). | | Bidirectional owner↔person, parcel↔place | "Every property a person held" / "every parcel at a place." | Planned | High | M | 3 | Reverse traversals filtered through privacy engine (NN#2). | | Citations on OwnershipEvents | Add `ownership_event_id` to Citation (5th target). | Partial | Critical | S | 3 | **Quick win once Property lands** — single FK + CHECK edit (NN#5). | | Legal description verbatim storage | metes-and-bounds / PLSS township-range-section as-written. | Planned | Med | L | 3 | Part of the Property model; preserves the record faithfully. | | Parcel/plat boundary geometry | Optional geometry; plain coords first. | Planned | Med | L | 3+ | PostGIS is an open question (ARCH §14) — surface dependency. | | PLSS / metes-and-bounds parsing → geometry | Automated survey parsing. | Planned | Med | XL | later | Hard; gated on PostGIS. | | BLM/GLO federal land-patent connector | Marquee US land source. | Planned | High | L | 8 | Permitted source (NN#6); patents surface as ChangeProposals (NN#1); read-only + rate-limited. | | USGS map + public county-deed connectors | Per-jurisdiction grantor/grantee indexes. | Planned | Med | L | 8 | Each connector verifies a legally open source (NN#6). | | Co-ownership roles / tenure types | joint tenants, TIC, life estate, heirs. | Planned | Low | M | later | Multiple parties likely free with OwnershipEvent; role typing is a refinement. | | Tax/assessment rolls, UK Tithe, Lloyd George Domesday | Valuation + non-US collections. | Missing | Low | M–L | — | US-focused v1; international formats out of scope (model is country-agnostic). | --- ### 2.16 AI assistant — *defining differentiator* The spine has now **landed**: the `ChangeProposal` model/schema/service, its migration, the GET/POST API, and a review UI all ship, and the `LLMProvider`/`EmbeddingProvider` abstraction with null/Anthropic/OpenAI-compat (OpenAI/xAI/Ollama) providers + registry is in place. The audit substrate (`actor_type=assistant`, before/after JSONB) is the right foundation; the remaining work is wiring the assistant's tools to emit proposals and building the chatbot/RAG surface on top. | Item | Description | Status | Imp | Eff | Phase | Non-negotiable | |---|---|---|---|---|---|---| | **ChangeProposal (propose-then-confirm)** | The defining invariant. Model/schema/service (`models/change_proposal.py`, `services/change_proposal_service.py`), migration `a1b2c3d4e5f6`, GET/POST `api/v1/proposals.py`, and a `/trees/[id]/proposals` review UI all ship. Remaining: wire assistant tools to emit proposals. | Have | **Critical** | L | 4 | **IS NN#1.** Enforce structurally: assistant tools return proposals; only user action applies one; application flows through the normal service layer (privacy + audit). ChangeProposal itself needs full CRUD (NN#8). | | Pluggable LLM + embedding provider | `LLMProvider`/`EmbeddingProvider` ABCs (`integrations/models/base.py`) with null, Anthropic, and OpenAI-compat (OpenAI/xAI/Ollama) implementations + registry. | Have | Critical | M | 4 | **Twelve-factor, no hard-coded keys/endpoints** (NN#7); the Ollama/self-hosted path is what makes the privacy-first promise real. | | Per-tree AI model policy | Owner-only per-tree model selection (`Tree.ai_member_provider`/`ai_recommender_provider`, GET/PATCH `/trees/{id}/ai`, `/trees/[id]/ai` UI) (#238). | Have | Med | S | 4 | Owner-only; selects which configured provider a tree uses — keys stay in env, twelve-factor (NN#7). | | AI research-assistant chatbot (RAG over tree) | Marquee feature; needs ModelProvider + connector + retrieval through privacy engine. | Planned | High | XL | 4 | NN#1 propose-only, NN#2 privacy retrieval, NN#3 redaction. | | Conversational / connector record search | Search legal sources via the assistant. | Planned | High | L | 4 | Legal sources (NN#6); findings = Source + Citation (NN#5). | | Fact extraction from documents | Extracted facts map cleanly to ChangeProposal review. | Missing | Med | M | 4 | Canonical NN#1 use case; each fact carries a Citation (NN#5). | | OCR/HTR transcription + document translation | Worker job via ModelProvider. | Missing | Med | L | 4+ | Output → Source/Citation (NN#5); via ModelProvider (NN#7); auto-extraction emits ChangeProposal (NN#1). | | Next-step research guidance | Gap analysis → suggested next record. | Planned | Med | M | 4 | Reads via privacy engine; advisory unless it queues fetches. | | AI biography / audio narration | Read-only generation grounded in tree data. | Missing | Low | M–L | later | Must not leak living-person PII (NN#3); via ModelProvider (NN#7); stored biographies = full CRUD (NN#8). | --- ### 2.17 Localization & accessibility A documented **day-one commitment** ("UI strings externalized from day one") that is currently **unmet** — every label is a hardcoded literal. Correct the PRD claim or close the gap. | Item | Description | Status | Imp | Eff | Phase | Non-negotiable | |---|---|---|---|---|---|---| | **UI string externalization** | No i18n lib, no message catalogs; all copy hardcoded in TSX. Gating prerequisite; cheapest to do now while the surface is small. | Missing | High | L | 1–2 | PRD §6 promises this "from day one" — **docs-vs-code gap; edit the doc now.** | | Multi-language UI (40–60+ langs) | Translation pipeline after externalization (frontend + backend-generated messages). | Missing | High | XL | later | Table stakes across all competitors. | | Accessibility / WCAG 2.2 AA | Some ARIA/focus styling; no CI a11y audit, no skip-links, SVG tree viz not keyboard/screen-reader navigable. | Partial | High | L | 2/9 | Stated PRD §6 target; add axe/pa11y in CI; accessible alternate to the chart. | | Unicode-correct non-Latin names | Stores fine (UTF-8); no NFC normalization on write, no locale-aware collation, no romanized search. | Partial | High | M | 2 | Apply `unicodedata.normalize('NFC')` on input; add COLLATE; supports faithful-record goal. | | Structured/compound surname components | Surname is a single field; no support for Spanish/Portuguese paternal+maternal, Arabic nasab, particles/prefixes. | Missing | Med | M | 2 | New Name sub-fields ship with full CRUD (NN#8); preserves the name as recorded. | | Non-Gregorian calendar dates | `calendar` column is a placeholder; GEDCOM calendar escapes never parsed/populated. | Partial | Med | L | 2 | Preserve original calendar as recorded (sources-first). | | Language tags / romanized variants per name | No language_tag/script/romanized fields; GEDCOM ROMN/LANG unhandled. | Missing | Med | M | 2 | New Name sub-fields ship with full CRUD (NN#8). | | RTL support | `lang="en"` hardcoded, no `dir`, physical CSS properties throughout. | Missing | Med | M | later | Convert to logical CSS properties; cheaper once i18n exists. | | Selectable themes | Light/dark/system works; brand palette intentionally single. | Partial | Med | M | later | Confirm whether additional themes are a deliberate non-goal (brand guide constrains palette). | | Multi-language report/diagram output | Depends on i18n + reports, neither shipped. | Missing | Low | L | later | — | --- ## 3. Quick wins (high importance / low effort) Ordered by leverage. All are S-effort or a thin slice of a larger item, and most close a stated invariant gap. 1. **Fix `site_members` visibility tier** (Privacy, Critical/S) — **done:** `can_view_tree` now handles `site_members` (`privacy.py:56`), giving any authenticated account a read view while refusing anonymous. 2. **Email-verification enforcement gate** (Privacy/Auth, High/S) — **done (#53):** the read-side `email_verified_at` check now ships behind `REQUIRE_EMAIL_VERIFICATION`, so a freshly registered, unverified user doesn't get a live authenticated session. The registration-mode env switch (open/approve/closed) is the larger follow-on (§2.10, M-effort — not a quick win). 3. **Citation confidence selector in the cite form** (Sources, High/S) — confidence is modeled and API-writable but unreachable in the UI; every UI citation is currently NULL. Honors NN#8 and the evidence-quality thesis. 4. **Source edit UI + expose all 8 fields** (Sources, High/S) — update API exists but there is no edit form and create exposes ~3 fields; a create-but-not-edit entity violates NN#8. 5. **Make `LIVING_RECENCY_YEARS` env-configurable** (Privacy, High/S) — hardcoded 100 at `privacy.py:23`; twelve-factor (NN#7). 6. **Add `ownership_event_id` to Citation** (Property/Sources, Critical/S) — single FK + CHECK-constraint edit the moment Property lands; the spine is already built (NN#5). 7. **GEDCOM encoding detection** (Standards, High/S) — detect/honor the CHAR tag; reject or transcode ANSEL/UTF-16 rather than silently mangling with `errors='replace'`. 8. **GEDCOM HEAD completeness** (Standards, Med/S) — emit the required `2 FORM LINEAGE-LINKED` (under GEDC) and `1 SUBM` at `gedcom.py:740`. Pure conformance. 9. **ARM64 CI build matrix** (Perf/Scale, High/S) — add `linux/arm64` + QEMU to buildx for both images; many self-hosters run ARM SBCs. 10. **`GET /{tree}/citations/{id}` endpoint** (Sources, Med/S) — API symmetry (NN#8). 11. **Transcription/abstract fields on Source** (Sources, Med/S) — add `transcription_text` + `abstract_text`, distinct from `citation_text`; core to evidence analysis. 12. **Sort the merged person timeline** (Research workflow, Med/S) — `shownEvents.sort()` on `date_start`; currently appended unsorted. 13. **Doc corrections (docs-vs-code)** (Meta, trivial/S) — edit CLAUDE.md / ARCHITECTURE so the pgvector "used" claim and the i18n "from day one" claim match reality. The repo convention requires docs to travel with code. > **Mostly shipped this cycle (#46):** the **media privacy leak** (§2.4) and the broad **child-resource redaction gap** (§2.10) are now closed for the person/event/media/name/relationship endpoints. The narrowed remainder — applying `person_visibility` to the **citation/source list endpoints** — is an S-effort follow-up; treat it as a security-priority Phase 1–2 fix regardless of the quick-win list. --- ## 4. Strategic differentiators Where to invest to make Provenance distinct rather than a webtrees clone. Each leans on a non-negotiable as a *feature*, not a constraint. **1. Property chain-of-title (the "land" half).** No surveyed competitor models ownership as a typed, cited event chain tying parties across time, with gap-flagging and bidirectional owner↔person / parcel↔place traversal, fed by **legal** public sources (BLM/GLO patents, USGS, public county deeds). This is the single clearest "no one else does this" capability. Sequence: Property + OwnershipEvent + Citation-target (Phase 3) → chain-of-title view → BLM/GLO connector (Phase 8). The Citation extension is a quick win; the entity is the prerequisite for everything else in the category. **2. The ChangeProposal AI model.** "The assistant never writes autonomously" is a *trust* differentiator in a market where users fear AI corrupting their research. The structural spine has **landed** — the `ChangeProposal` model/API/review UI and the pluggable `LLMProvider`/`EmbeddingProvider` abstraction both ship — so the remaining work is wiring the assistant's tools to emit proposals (never mutating directly). Assistant tools return proposals; only an explicit human action applies one; application flows through the normal service layer so it always hits the privacy engine and audit log. The same approval queue moderates untrusted human-contributor edits (Collaboration §2.9), so design them together. **3. Anonymous, mutual-consent cross-tree hints.** The privacy model already redacts living people for anonymous viewers, so a hint system that reveals *nothing identifying* until both sides opt in is achievable by construction — and is a categorically more trustworthy version of MyHeritage Smart Matches / Ancestry hints. Requires the matching engine (pgvector enablement + candidate generation, Phase 7), the notification/event-dispatch substrate (§2.9), and the messaging channel that opens only post-consent. **4. True self-hosting + data ownership.** Full account export/import, soft-delete recovery, GEDCOM round-trip, env-driven everything, a one-command operator backup, and (to-build) scheduled off-host backup + ARM support make Provenance the genealogy app you actually own. The two correctness items that gated the promise have **landed**: GEDCOM export now preserves citations (the Provenance→Provenance round-trip keeps the sources graph), and operator backup moved from "documented procedure" to a one-command dump (`deploy/backup.sh`). What remains is scheduled/verified-restore tooling and ARM builds. The Ollama/self-hosted ModelProvider path means even the AI assistant runs without tree data leaving the deployment — a promise no commercial competitor can make. **5. Sources-first as a felt experience.** The two-tier model is built, and citations now **survive GEDCOM export** (#232); the remaining differentiator is making sourcing *visible and low-friction*: a guided Evidence-Explained citation builder, transcription/abstract fields, source-driven data entry (transcribe a document into the tree), and per-fact confidence surfaced in the UI. These turn "every fact links to where it came from" from an architecture note into the product's personality.