Files
provenance/docs/BACKLOG.md
T
justin a6179037c2 Close citation/source living-person leak; add on-demand tree purge
Two changes.

1. Privacy fix (NN#2/NN#3) — the citation and source list endpoints gated only
   on can_view_tree, so a non-member on a public/unlisted/site_members tree could
   enumerate citations and sources tied to a redacted living person, leaking that
   the person exists and has sourced facts (and possibly their name via a source
   title). #46 closed this for events/media/names/relationships but not
   citations/sources. Now citation_service.list_citations and
   source_service.{list_sources,get_source} delegate non-member reads to
   public_view_service, mirroring the #46 pattern:
   - citations: shown only when the cited fact resolves to FULL-visibility
     person(s) — covers the person_id, name_id, event_id (person or both-partner),
     and relationship_id (both-partner) target paths.
   - sources: shown only when they back at least one visible citation; a withheld
     source 404s (don't reveal it exists).
   Tests cover all four citation target types + source withholding + member-sees-all.

2. On-demand tree purge — owners can permanently delete a soft-deleted tree now
   instead of waiting out the 30-day auto-purge window. POST /trees/{id}/purge
   (owner-only): the tree must already be in the trash, and the caller retypes its
   name to confirm. Media objects are deleted from storage, then a single
   DELETE on trees cascades all tree-owned rows via the tree_id ON DELETE CASCADE;
   the audit entry survives (tree_id SET NULL). Frontend adds a "Delete forever"
   button to the Recently-deleted list. No migration.

Suite: 102 passing.
Signed-off-by: Justin Paul <justin@jpaul.me>
2026-06-10 22:38:59 -04:00

58 KiB
Raw Blame History

Provenance — Product Backlog

Status legend: Have (shipped) · Partial (substrate exists, surface incomplete) · Planned (on roadmap, no code) · Missing (no code, off roadmap). Importance: Critical / High / Medium / Low. Effort: S / M / L / XL. Phase references map onto the existing 09 roadmap. "NN#" = non-negotiable invariant.


1. Executive summary

Where Provenance is strong today. The foundation is genuinely solid and, in several places, ahead of the OSS field:

  • Sources-first spine is real. A reusable Source + per-fact Citation two-tier model with a exactly_one_target CHECK constraint, confidence enum, and full backend CRUD. This is the architectural thing webtrees/Gramps get right and most commercial tools bury.
  • Privacy architecture is the right shape — and coverage is now broad. A single privacy.py engine, TenantScoped mixin on every row, living-person heuristic (is_possibly_living, unknown-birth-treated-as-living), and media served through the backend rather than via raw S3 URLs. Non-member reads of persons, events, media, names, and relationships all route through person_visibility (#46). The remaining gap is the citation/source list endpoints, which still gate only on can_view_tree — see §2.10.
  • Non-destructive by design. Soft-delete with timed purge worker, immutable AuditEntry (before/after JSONB, actor_type ready for the assistant), GEDCOM merge that copies rather than overwrites, full account export/import.
  • Modeling maturity. Typed parent/child qualifiers (biological/adoptive/step/foster/donor/guardian), typed alternate names with one-primary invariant, dual verbatim+normalized dates, duplicate-relationship guards, UUID surrogate keys.
  • Standards core. GEDCOM 5.5.1 import/export is functional (with preview/merge-vs-create resolution UI), pg_trgm fuzzy name search, multi-tenant tree hosting with visibility tiers. Round-trip fidelity has three tracked gaps (custom tags, PLAC coords/hierarchy, non-UTF-8 encoding) — see §2.11.

Documentation-vs-code gaps to correct now (per "docs travel with code"). Two repo claims are not yet true and should be edited in the same spirit they were written:

  • pgvector is claimed as used; it is not. Only pg_trgm is created. ARCHITECTURE references pgvector for match ranking.
  • i18n "from day one" is documented but unmet. PRD §6 promises externalized strings; every label is a hardcoded literal.

These two doc edits are themselves trivial quick wins (see §3).

The biggest gaps vs commercial (Ancestry / MyHeritage / FamilySearch). Provenance is not trying to be a record provider, and correctly so — but it is missing several things mainstream users treat as table stakes:

  • No record hints, no "save to tree," no connector framework. The entire SourceConnector layer (FamilySearch/Find A Grave/WikiTree) is unbuilt — this gates AI search, hints, and auto-citation.
  • No person merge outside GEDCOM import. Merging duplicate people is fundamental hygiene and is currently impossible in-tree — the single highest-value near-term matching gap.
  • No maps at all. No place autocomplete, no geocoding, no interactive/migration/birthplace maps — a glaring hole for an app whose thesis is family and land.
  • No report/print/PDF output. Charts render on-screen only; there is no Ahnentafel, family group sheet, narrative report, or any PDF/SVG/HTML export. The whole "Charts, reports & printing" category is on-screen-viewing only.
  • DNA absent (deliberately parked — treat as open question, not a gap).

The biggest gaps vs OSS (GRAMPS / Gramps Web / webtrees). These are where a privacy-first self-host product is expected to compete and currently trails:

  • Collaboration management is now reachable, but minimal. TreeMembership roles are enforced on every read/write, and a list/add/change-role/remove API + UI now ship (§2.9), satisfying the full-CRUD invariant (NN#8). The remaining gap is the richer email invite/grant flow (pending-invite state, resend/expire), still scheduled for Phase 6.
  • Living-person redaction is now near-uniform. Non-member reads of persons, events, media, names, and relationships all redact possibly-living people (#46); the citation/source list endpoints are the remaining hold-outs (they gate only on can_view_tree) — a narrowed PII gap on public/unlisted trees (NN#3, NN#2).
  • No place as a usable first-class entity (model exists, created by GEDCOM, but no read/edit/delete — a create-only entity, which is a bug per NN#8).
  • No research log, to-do/task planner, kinship calculator, data-quality checker, or i18n/string externalization (the last is a documented day-one commitment that is currently unmet).

Security-priority correctness fixes (do these first, regardless of phase). The redaction defects all shipped — child resources (#46) and now citations/sources too — leaving one config switch:

  1. Self-registration approval-mode switch (§2.10) — the read-side enforcement now exists: REQUIRE_EMAIL_VERIFICATION gates login/session on email_verified_at (#53). The remaining gap is the env switch to choose open vs admin-approval vs closed self-registration. (The citation/source living-person leak is now closed — citation/source list endpoints apply person_visibility for non-members via public_view_service.)

Strategic posture. The differentiators worth pressing — property chain-of-title, the ChangeProposal AI model, the anonymous mutual-consent hint system, and true self-host data ownership — are mostly still ahead on the roadmap. The near-term job is (a) close the privacy/auth correctness and collaboration gaps that the architecture already implies, (b) ship the maps + reports + merge table stakes, and (c) finish the back-half spine — the connector framework plus wiring the now-landed ChangeProposal/ModelProvider into the assistant — that unlocks the entire back half of the roadmap.


2. Backlog by category

2.1 Tree & data model

Core CRUD, typed relationships, dates, soft-delete, and naming are have. Remaining work is about reusable sub-entities, shared/event-centric modeling, and research-grade conveniences.

Item Description Status Imp Eff Phase Non-negotiable
Repository as first-class entity Promote Source.repository string to a reusable Repository (name/address/call-numbers) with dedup. Partial Med M 12 If promoted, full CRUD in API+UI (NN#8) — don't half-build.
Note as first-class entity (SNOTE) Promote inline notes text fields to reusable shared Note/SNOTE records. Missing Low M 2 Full CRUD; GEDCOM 7 round-trip parity.
Shared/event-centric model + witnesses Remove the subject_person_xor_relationship XOR; add participant/role join so one event has many people (FAN/cluster research). Missing Med M later Unlocks FAN club + richer sourcing; participants must redact via privacy engine.
Non-family associations (FAN) Add associate/neighbor relationship types; best delivered with shared-event participants. Missing Low M later
Relationship-status enum Add married/divorced/annulled status on partnership rather than inferring from events. Partial High M 12
Family/couple unit (GEDCOM FAM) Persist a true FAM entity (own ID/sources, childless couples) instead of rebuilding on export. Partial High L 2 Improves GEDCOM fidelity.
Kinship / relationship calculator "How is A related to B" path + cousinship. Graph edges already exist. Missing High M 12 Self-contained; reads via privacy engine.
Read-only audit-log viewer / activity feed Surface AuditEntry as a per-tree/per-person change feed. Smaller and higher-leverage than value-level undo; partially satisfies NN#8's "read" for AuditEntry and is the substrate for watch/follow + webhooks. Missing High M 2 Privacy-filtered projections only — never raw before/after JSON to non-members (NN#2/#3).
Per-field revision history + restore-prior-value Value-level history view + undo, built atop the audit feed above. Partial High L 6 Audit-log UI is the feed item; this is the larger value-level-undo work (NN#8 correction ethos).
Color-coded tags & custom labels Tag people for lineages/research-status/grouping. Missing Med M 2 Full CRUD; tenant-scoped.
Person timeline / LifeStory Sort the merged event list; add place/age enrichment + narrative presentation. Partial Med M 2 Sort is trivial (localeCompare on date_start); narrative is the larger piece.
Multi-calendar normalization Store + parse Julian/Hebrew/French Republican (only calendar tag stored today, only Gregorian normalized). Partial Low M 2 See also Localization §2.17.
Evidence/persona vs conclusion model GEDCOM-X persona layer separate from conclusion person. Missing Med XL later Large modeling change; strengthens sourcing + hint matching.
Negative assertions Boolean "event did not happen" on Event. Missing Low S 2 Cheap interop nicety.
Custom groups / networks Named manual or rules-based groupings. Missing Low M later Lower priority than tags.
Raw GEDCOM record editor / configurable fact tabs webtrees-style raw editor + fact-type registry. Partial Med L later Open vocabularies give de-facto custom facts today.
Health/medical, historical-facts index, LDS ordinances Niche entities. Missing Low ML later LDS BAPL/ENDL/SLGS should map to distinct types if ever pursued; medical is special-category PII.

2.2 Sources & citations

The two-tier model is have and production-grade on the backend. The gaps are almost all UI/CRUD-completeness and the connector-dependent "save to tree" flows.

Item Description Status Imp Eff Phase Non-negotiable
Citation confidence selector in UI Confidence enum is modeled + API-writable but the citeControl form never sets it — every UI citation is NULL confidence. Partial High S 1 Quick win. Full CRUD in UI (NN#8); reinforces evidence-quality thesis.
Source edit UI + all 8 fields Source UI is add/list/delete only and create exposes ~3 of 8 fields (no author/source_type/publication_info/quality_note/citation_text). Partial High S 1 Update API exists but no edit form — violates NN#8.
GET /{tree}/citations/{id} Citation API has list but no single-read endpoint. Partial Med S 1 API symmetry (NN#8).
Transcription / abstract / extract fields Add transcription_text + abstract_text to Source; don't conflate with citation_text (GEDCOM SOUR.TEXT currently dumped into citation_text). Missing Med S 12 Quick win. Central to evidence analysis; full CRUD (NN#8).
Evidence-Explained guided citation builder Structured fields → formatted citation (Chicago/MLA/APA) instead of hand-typed citation_text. Missing High L 2 Signature provenance feature; citation_text should be generated, not typed.
Citations on OwnershipEvents Add ownership_event_id to Citation + extend CHECK to 5 targets when property lands. Partial Critical S 3 Quick win once Property exists — single FK + constraint edit (NN#5).
Record-to-source attachment ("save to tree") Search a connector record and attach its facts. Missing High XL 4 Gated on connector framework; assistant attach must emit ChangeProposal (NN#1); legal sources only (NN#6).
Source Linker (one record → many persons) Bulk-attach a record's facts across people. Missing Med L 4 Downstream of connectors; reads/writes via service layer.
Auto-citation on save/match Generate citation when a hint/record is confirmed. Missing Med L 4/7 Blocked on connectors + hints; ChangeProposal if assistant-driven.
Memories-as-sources (cite a photo directly) Allow media to be a citation target, not only attachable to a Source. Partial Low M 2 Reads stay on privacy-checked media endpoint (NN#2).
GPS / Proof-Standard reasoning artifact Container linking sources/citations into a proof narrative reconciling conflicts. Missing Med L later Serious-researcher differentiator; full CRUD (NN#8).
Proprietary record collections 1921 census, UK sets, etc. Missing Low XL Out of scope — conflicts with NN#6 / self-host. Do not pursue.

2.3 Search & matching

Fuzzy trigram name search is have; everything that depends on connectors, embeddings, or multiple populated trees is planned/missing. The standout near-term gap is in-tree person merge.

Item Description Status Imp Eff Phase Non-negotiable
Standalone duplicate detection Lift the GEDCOM _best_match logic into a "find duplicates in my tree" scan. Partial High M 2 Logic already written; results via privacy engine (NN#2).
Interactive two-person merge (side-by-side, field-select, undo) General merge of duplicate persons with citation re-pointing — impossible outside import today. Partial High L 2 Highest-value matching gap. Preserve + re-point Citations (NN#5); write-once is a bug (NN#8).
Advanced search (wildcards, boolean, date/place facets, sort) Search exposes only ?q. Partial High M 2 Keep per-person privacy filter in the search loop (NN#2).
Phonetic matching (Soundex/Metaphone/DM) Enable fuzzystrmatch; trigram is char-similarity, not phonetic. Partial High M 2 Pure utility.
Semantic / vector search (pgvector) Docs claim pgvector is used; it is not — only pg_trgm extension is created. Add CREATE EXTENSION vector + embedding columns (and correct the docs). Missing Med L 7 Embedding provider is an open question (PRD §11) — don't pick silently; candidates via privacy engine.
Tree-to-tree matching (Smart Matches) Cross-tree candidate generation + ranking. Planned High XL 7 Anonymous until mutual consent (NN#4); living-person protection (NN#3).
Mutual-consent match notification Anonymous notification, reveal only after both opt in. Planned High L 7 Mandated invariant, not a toggle (NN#4, NN#3); rides the notification substrate (§2.9).
Match confirm/reject + "not a match" memory Persistent rejected-match store (today scoring lives only inside import). Partial High M 7 Prevents re-notifying once hints land.
External search deep-links Pre-fill FamilySearch/Find A Grave/BLM-GLO search URLs from a person's name/dates/place. Missing Med M 24 High value, low risk before full connectors; legal targets only (NN#6).
Automated record hints Proactive per-person record suggestions from connectors — a marquee mainstream feature. Missing High XL 7 Connector-gated (NN#6); surfaced anonymously where cross-tree (NN#4); attach via ChangeProposal (NN#1).
Jurisdiction-aware record-search hints Map place/jurisdiction → relevant collections. Place hierarchy is a ready foundation. Missing Med L 8 Suggested collections must be legal (NN#6).
Cross-language / transliteration matching Cyrillic/Hebrew/CJK ↔ Latin. Missing Med XL later See Localization.
Record Detective, newspaper matches, collection catalog, GQL query builder, OCR full-text search Connector/record-layer dependent. Missing/Planned LowMed LXL 4/7/8 All gated on the connector framework; any query path runs through privacy engine (NN#2).

2.4 Media & documents

Universal media attachment is have; the earlier privacy leak is now closed (#46), and the remaining gaps are the asset-processing pipeline (EXIF strip, thumbnails).

Item Description Status Imp Eff Phase Non-negotiable
Media privacy gating on serve paths list_media/get_media/media_content now apply person_visibility for non-members (#46): media is exposed only when linked to a FULL-visibility person (list_public_media/can_view_media), so living-person photos no longer leak on public/unlisted trees. Have Critical M 1 Resolved (NN#3/NN#2). Serve paths check attached person_id visibility and 404 otherwise.
EXIF / GPS stripping on upload Raw bytes stored verbatim; family photos leak GPS/home addresses/timestamps. Planned High M 1 Security-priority, not cosmetic. Parse EXIF on ingest, strip/quarantine by default, allow override.
Thumbnail / preview generation No image pipeline (no Pillow). Async, idempotent worker job. Planned High L 1 Derived thumbnail must inherit parent privacy — no bypass path.
Image reference regions Mark the rectangle of a census image that supports a Citation. Missing Med M later Tenant-scoped, full CRUD; region→Citation preferred over region→Person.
Photo/face tagging (manual) Multi-person tagging via single FK today. Missing Med XL(ML)/M(manual) later Owner-only, in-deployment; face tags inherit redaction (NN#3); full CRUD.
Mobile photo scanning + auto-split Shoebox digitization. Missing Med L later Reuse privacy-gated upload + EXIF strip.
AI photo dating / colorize / restore / animate / narrate Model-driven media features. Missing Low LXL 4+ Must route through ModelProvider (NN#7), require approval (NN#1), preserve original; animating living faces raises consent issues.
British Library / paywalled archives, pay-per-view credits Licensed content + metering. Missing Low XL Out of scope — conflicts with NN#6 and the self-host model.

2.5 DNA & genetic genealogy

DNA is an explicit PRD non-goal / open question — treat as parked, not a backlog to grind through. Across every DNA row the rule is uniform: a user uploading their own export is permissible; vendor connectors/scrapers (23andMe / Ancestry / MyHeritage / GEDmatch) are barred (NN#6). Kits and matches are living-tester PII and route through the privacy engine.

Item Description Status Imp Eff Phase Non-negotiable
DNA-confirmed relationship flag Model DNA confirmation as a Source/Citation backing a Relationship (not free text). Missing Med M parked Best sources-first fit (NN#5); full CRUD (NN#8).
Raw DNA upload (own file) User uploads own export; no vendor scraping. Missing Med L parked User's own file is fine; vendor connectors barred (NN#6); special-category PII via privacy engine.
Kit/Match entities linked to persons Kit (tester) + Match tied to Person, tenant-scoped/audited. Missing Med M parked Kits = living-tester PII (NN#2/#3); full CRUD (NN#8).
Autosomal match list, segments, chromosome browser, triangulation, ThruLines/AutoTree, ethnicity/admixture, haplogroups, GEDmatch, NPE detection Full genetic-genealogy suite. Missing LowMed LXL parked DNA scope is an unresolved open question — surface the dependency, don't build speculatively. Own-data only (NN#6); cross-user surfacing obeys NN#4.

2.6 Maps, places & gazetteers

This category is almost entirely missing despite being half the product thesis. The Place model has the right bones (parent_id, lat/long, PlaceName with date ranges) but no API/UI and no maps.

Item Description Status Imp Eff Phase Non-negotiable
Place as usable first-class entity Place rows are created by GEDCOM but have no read/edit/delete API or UI — a create-only entity. Partial High M 23 Violates NN#8 (create-but-not-edit = bug). Make Place citable too (NN#5).
Place autocomplete + picker in event editor No /places router; the event form has no place input, so users can't attach a place at all. Missing High M 2 Table stakes; lookup is low-risk.
Geocoding (manual coords + forward) lat/long columns exist; no UI, no geocoder. Partial High M 3 Provider via env (NN#7), ToS-compliant (NN#6).
Pluggable geocoding provider Nominatim/GeoNames/Bing/Google swappable. Missing Med L 3 Provider+keys via env (NN#7); legal providers only (NN#6).
Bulk/batch geocoding (worker) Geocode hundreds of GEDCOM-imported places. Missing Med M 3 Idempotent, rate-limited worker job; provider via env.
Place merge/split (dedup) GEDCOM imports produce near-duplicate place strings. Missing High M 23 Needs Place update/delete (NN#8); audited merges.
Place-name cleanup tools Extend the existing preview→apply cleanup UX to places. Missing Med M 2 Preview-first + audited like existing cleanup.
Standardized-name vs original text Mirror the verbatim+normalized date pattern for places. Missing Med M 23 GEDCOM fidelity.
Alternate/historical place names with date ranges PlaceName model exists with valid_from/to but no CRUD and never populated. Partial Med M 23 Stored entity with no CRUD surface (NN#8).
Interactive map of events & places No map library in frontend. Core to family+land positioning. Missing High L 3 Plot via person_visibility so non-owners never see living locations (NN#2/#3).
Migration trail / pedigree-birthplace maps Per-person life path; ancestor birthplace map. Missing Med L 3 Redact living subjects for non-owners (NN#3).
Bundled world gazetteer Offline GeoNames-style authority. Missing Med XL later GeoNames (CC-BY) verify AGPL-compat; env-configurable.
Historical boundary overlays, time slider, heatmaps, radius/nearby, tile-provider switch Advanced geo. Missing LowMed SXL later PostGIS is an open question (ARCH §14) — surface dependency, don't adopt silently; tiles legal (NN#6).

2.7 Charts, reports & printing

On-screen pedigree/descendant/fan/hourglass charts are have. The entire output/print/report half is missing — this is the linchpin gap of the category.

Item Description Status Imp Eff Phase Non-negotiable
Multi-format export (PDF / SVG / image / HTML) No export/print path, no @media print, no window.print(). Charts and reports can't leave the screen. Missing High L 2/6 Linchpin. Generate from privacy-filtered data so living people redacted in shared output (NN#3).
Ahnentafel report Numbered-ancestor report; all data exists. Missing High M 6
Family group sheet / individual summary Printable summary; data available, needs print layout. Missing High M 6
Narrative descendant/ancestor reports Multi-standard prose with inline sources. Missing High L 6 Cite Sources inline (NN#5); redact living (NN#3).
Sentence-template narrative engine Deterministic fact→prose underpinning reports. Missing Med L 6 Keep template-based; report text never mutates tree (NN#1).
Photo boxes in charts Pass privacy-checked media URLs to setCardDisplay; CSS already present. Missing High M 2 Stream via privacy-checked /media (NN#2/#3).
Drag-to-edit / interactive chart canvas Tree canvas renders but interactive node editing (drag to re-parent, inline edit on the chart) is only partly present. Partial Med M 2 Edits go through service layer + audit (NN#1); honor redaction.
Statistics dashboard Surname/place/date distributions + tree-health. Missing Med M 6 Reads via privacy engine (NN#2).
Kinship/relationship diagram report Needs path-finding (see §2.3 calculator) + renderer. Missing Med M 6
List reports (sources/places/repos/media) Printable indexes (current screens are management, not reports). Missing Med M 6
Color-by-lineage, fan overlays, lifespan/timeline charts Sex-coloring exists; lineage/overlay/timeline don't. Partial/Missing Med M later Overlays respect privacy engine.
Book/multi-report compiler, wall-chart tiling, page-setup, customizable charts Print-shop-grade output. Missing LowMed LXL later Saved "book" entity = full CRUD (NN#8); honor living-person privacy.
Bowtie/couple-rooted/circular-sun/3D/network/calendar Niche chart variants. Missing/Partial LowMed ML later
Print-shop products, XML template engine, blank forms Commercial/template extras. Missing Low SXL later Weak fit for self-host.

2.8 Research workflow & automation

The preview→approve bulk cleanup tool is a genuine have and a differentiator. The missing pieces are the serious-researcher workflow entities.

Item Description Status Imp Eff Phase Non-negotiable
Data-quality / consistency checker Extend cleanup beyond name issues: child-before-parent, death-before-birth, implausible ages, orphans, dups; severity tiers. Partial High L 2 New auto-fixes keep preview→apply (NN#1).
Research log Searches, repositories visited, negative results, findings — distinct from the system audit log. Missing High M 6 Reference reusable Sources (NN#5); tenant-scoped full CRUD (NN#8).
To-do / research task planner Tasks on Person/Tree with status/priority/due/assignment. Missing High M 6 Full CRUD in API+UI (NN#8).
Source-driven data entry Start from a Source document and transcribe facts into the tree. Missing High M 2 Natural sources-first differentiator (NN#5).
Task↔log linkage FK + joined view once both entities exist. Missing Med S 6 Cheap once predecessors land.
Family chronology / timeline Sort merged events; family-wide chronology (parents' marriage, siblings' births). Partial Med M 2 Sort is trivial; presentation over privacy-filtered data.
Navigation: active person / history / bookmarks Large trees rely on browser back only. Missing Med M 2 Per-user, tenant-scoped, full CRUD; don't expose redacted persons (NN#2/#3).
Saved-record shoebox / review queue Stage candidate records before committing. Missing Med M 4/7 Auto-attach via ChangeProposal (NN#1); legal sources (NN#6).
Guided research suggestions Proactive "research next" engine (today only flags problems). Partial High L 4 Advisory; writes via ChangeProposal (NN#1); cross-tree via privacy engine (NN#2).
Persona-adaptive onboarding Family Keeper / Serious Researcher / Property Researcher selector (PRD US-002, documented but unbuilt). Missing LowMed L 2 Pure presentation.
Dashboard widgets, scratchpad, research-link sidebar, blog/narrative authoring, research wiki, crowd indexing Conveniences. Missing Low SXL later Widgets/published narratives read via privacy engine (NN#2/#3).

2.9 Collaboration & sharing

Authorization is enforced everywhere, and a minimal management surface now ships — list/add/change-role/remove via api/v1/members.py plus a members page (#233). The remaining gap is the richer email invite/grant flow. The minimal slice landed at Phase 2 as planned; the invite/email UX stays at Phase 6.

Item Description Status Imp Eff Phase Non-negotiable
Membership PATCH/DELETE + role change (minimal slice) Add/adjust/revoke a collaborator and change role — GET/PATCH/DELETE on /trees/{id}/members (api/v1/members.py) plus a frontend members page now ship (#233). Resolves the create-only NN#8 break without the full invite flow. Have Critical SM 2 Resolves the create-only NN#8 break. Revocation routes through the single privacy point.
Full invite/grant flow (email + UI) Email-based invitations, pending-invite state, role-grant UI, resend/expire. Builds on the minimal slice. Partial High L 6 Invitation email via configured SMTP (NN#7); membership changes through the one enforcement point.
Read-only public tree share Anonymous read surface shipped: optional-auth CurrentUserOrNone dep, api/v1/public.py + public_view_service.py, and server-rendered pages at /p/[treeId] (+ /persons/[personId]) and /explore. Living-safe by construction via person_visibility. Have High M 2 Highest-leverage near-term sharing feature; living-safe by construction via person_visibility (NN#2/#3).
SEO public profile pages (server-rendered) Server-rendered public pages (/p/[treeId], /explore) and robots.ts now ship. Deferred follow-ups: a public-only sitemap.ts and per-tree noindex,nofollow meta for unlisted/site_members pages. Partial Med L 2 NN#2 explicitly names server-rendered public pages — must go through privacy engine, no direct row queries.
Notification / event-dispatch substrate Shared enabler seeded from AuditEntry: subscription + dispatch layer emitting privacy-filtered projections. Underpins watch/follow, mutual-consent match notices, comments, moderation, and in-app messaging. Missing High L 6 Privacy-filtered projections only — never raw before/after JSON (NN#2/#3).
Comments / discussion threads Per-profile discussion (target = person/event/source), threaded. Missing High M 6 Comments on living persons redacted for non-members (NN#2/#3); rides the dispatch substrate.
In-app messaging (contact details hidden) SMTP exists; no Message/Thread model. Planned High L 6 Hide contact details; opens after mutual consent (NN#4); redact living-person content; rides dispatch substrate.
Watch/follow + change notifications AuditEntry is the natural event source; needs subscription entity + dispatch (substrate above). Planned Med M 6 Notification builder reads via privacy engine, not raw rows.
Optimistic concurrency / lost-update protection No version/etag/updated_at precondition checks; concurrent multi-user edits can silently clobber. Missing High M 6 Full-CRUD + multi-user without this risks lost updates; concurrent paths still route through privacy engine.
Pending-changes moderation (human edits) Queue contributor edits for owner approval — shares infra with the AI ChangeProposal queue. Missing Med L 6 Design together with ChangeProposal (NN#1).
Field-by-field profile merge & approval WikiTree-style merge center + unmerge with per-field provenance. Missing Med XL later Conflicting facts each retain Source/Citation (NN#5).
Ownership transfer owner_id is effectively write-once; needed for self-host longevity. A minimal reassignment endpoint is the NN#8 fix. Missing Med M 6 Violates write-once invariant (NN#8) — importance/phase tension noted; ship the minimal slice when membership lands.
Narrative website / HTML export Static narrated site (reuse public-page renderer). Missing Med L later Redact living persons at build time (static bypasses runtime engine) (NN#3).
Two-way desktop↔online sync Bidirectional sync with change journals. Audit log could seed a change feed. Missing Med XL later No Ancestry TreeShare / paywalled sync (NN#6).
Curator roles, trusted-list ACLs, field locking, projects/workspaces, forum, honor code, free-space wiki, portal homepage Community-platform features. Missing Low SXL later New roles/ACLs/locks integrate with the single enforcement point, not parallel checks.
Real-time co-editing Out of scope; only optimistic concurrency planned. Planned Med XL later Concurrent paths must route through privacy engine.

2.10 Privacy & access control

The architecture is correct (single engine, tenant mixin, audit, soft-delete + purge are have), but enforcement coverage and configurability have real holes — two of which are security-priority.

Item Description Status Imp Eff Phase Non-negotiable
Uniform living-person redaction across child resources person_visibility now runs for non-members on the event, media, name, relationship endpoints (#46) and the citation/source list endpoints, all delegating to public_view_service: citations resolve to FULL-visibility person(s); sources show only when they back a visible citation. Have High S 12 Resolved (NN#3/NN#2). No child-resource path leaks a redacted living person's facts.
Email-verification enforcement gate Read-side check now ships (#53): REQUIRE_EMAIL_VERIFICATION gates login/session on email_verified_at (auth_service.py). Opt-in (default off) so SMTP-less self-hosts still work. Have High S 12 Read-side trust path now enforced (NN#7); the registration-mode switch below is the separate larger piece.
Self-registration mode gating (approve / open / closed) No env switch to choose open vs admin-approval vs closed registration. Partial High M 2/5 Twelve-factor registration control (NN#7); pairs with the verification gate above.
Instance owner / operator role OWNER_EMAIL-declared operator (#240): is_instance_owner on /users/me, owner-only GET /api/v1/admin/instance, /admin UI. Have Med S 2/5 Owner-only operational surface, twelve-factor via env (NN#7); reads stay through the service layer.
Fix site_members visibility tier can_view_tree now handles site_members (privacy.py:56): any authenticated account gets a read view, anonymous is refused. Have Critical S 1 Honors the tier the UI offers; reads still route through person_visibility.
Make LIVING_RECENCY_YEARS configurable Hardcoded 100 at privacy.py:23. Partial High S 2 Quick win. Twelve-factor (NN#7).
Privacy-stripped export (redact living) GEDCOM + account export emit full tree; no "strip living" mode. Missing High M 2 Reuse person_visibility/_redact (NN#3). Owner self-export is safe today; shareable variant is the gap.
Per-fact / per-field privacy + record flags tentative/rejected/preferred/private flags on facts. Missing Med L later If added, route through the single engine (NN#2).
Granular rules by record type & viewer relationship webtrees-style "hide marriages from non-descendants". Missing Med L later Single enforcement point.
OIDC / external IdP login AuthProvider interface ready; only Local implemented. Authentik is the intended real auth. Planned High L 5 Additive by design.
Two-factor auth (TOTP) Bearer/cookie session auth is solid; no MFA. Partial High L 5
DB-level audit immutability Audit is insert-only by convention; no trigger/constraint. Verified as "adequate for self-host," so importance downgraded to match. Have(soft) Med S 9 Adequate for self-host; upgrade to trigger only if true immutability is required.
Block/hide users, family-group private space, DNA opt-in controls Depend on messaging/DNA. Missing LowMed MXL 6/parked DNA parked (NN#6).

2.11 Import/export & standards

GEDCOM 5.5.1 import/export and full data-portability export are have; the remaining fidelity gaps (custom tags, PLAC coords/hierarchy, non-UTF-8 encoding) still undercut the provenance thesis.

Item Description Status Imp Eff Phase Non-negotiable
Citation links on GEDCOM export Export now selects Citations and emits SOUR/PAGE per fact (#232), so fact→source links survive a Provenance→Provenance round-trip. (Citation detail/confidence beyond page still to round-trip.) Have Critical M 2 Closes the silent data-loss / destructive round-trip on the product's signature data (NN#5); satisfies PRD US-013.
GEDCOM 7.0 import/export Version hardcoded 5.5.1; no v7 semantics, SCHMA, SUBM, or UID handling. Partial High L 2 Stated differentiator (FamilySearch interop).
Custom/underscore tag preservation _MARNM becomes TYPE married, other custom tags dropped — violates ≥99% round-trip goal. Missing High L 2 Tension with provenance thesis (faithful record).
PLAC FORM hierarchy + MAP coordinate round-trip Import reads only PLAC text; export emits flat PLAC. lat/long + hierarchy lost on round-trip. Missing High M 23 Round-trip fidelity for the land/maps pillar.
Encoding detection (ANSEL/UTF-16) UTF-8 round-trips; non-UTF-8 files silently mangled via errors='replace'; CHAR tag ignored. Partial High S 2 Near quick win. Detect/honor CHAR; reject or transcode rather than corrupt.
HEAD completeness HEAD at gedcom.py:740 emits only SOUR/GEDC/VERS/CHAR — missing required 2 FORM LINEAGE-LINKED (under GEDC) and 1 SUBM. Partial Med S 2 Quick win. Pure conformance.
GEDCOM media (OBJE) round-trip OBJE in skip-tags; media ignored on import, never emitted on export. Partial Med M 2 Any media bundle keeps privacy gating.
GEDZIP (.gdz) bundle Bundled-media packaging. Missing Med M 2 Natural once v7 + OBJE land.
Selective / filtered export Clippings-cart / branch subset. Missing Med M later Maintain single-enforcement-point on export (NN#2).
Import conformance validation Preview is a mapping report, not structural/cardinality validation; bad lines silently skipped. Partial Med M 2
GEDCOM-X, Gramps XML, multi-format import, FHISO/ELF, PRF upload, KML export Interop extras. Missing Low L later PRF needs FamilySearch API (permitted, NN#6).

2.12 Mobile & offline

Responsive web is partial; PWA and offline-first are absent. Native apps are an explicit deferral.

Item Description Status Imp Eff Phase Non-negotiable
PWA (manifest + icons + viewport + service worker) No manifest, no SW, no next-pwa; responsive coverage exists but unaudited on heavy views (tree canvas fixed 74vh). Partial High M 2 If SW caches API responses, never retain non-owner PII; cache only what the session is authorized to see (NN#3).
Responsive parity audit 17 breakpoint usages; small-screen parity on tree/person views unverified. Partial High M 12 Feature parity is an ARCH requirement.
Offline-first editing + reconnect sync No SW, no local store, no mutation queue. Valuable for archive/courthouse field research. Missing High XL later Replayed edits go through service layer + audit (NN#1); cached data respects living-person rule (NN#3).
Native mobile apps Explicitly deferred (responsive web only). Missing Med XL later If built, reads through one backend privacy engine (NN#2/#3/#4).
Companion app w/ cross-device sync Largely redundant with server-backed web. Missing Low XL later Sync boundary enforces privacy (NN#2); full CRUD parity (NN#8).
Relatives Around Me Nearby-relatives discovery. Missing Low L later Explicit opt-in; anonymous until mutual consent (NN#4).

2.13 API & extensibility

Internal REST + OpenAPI + generated TS client are have. The externalized developer story and the connector/plugin spine are not built.

Item Description Status Imp Eff Phase Non-negotiable
Public read-only API + scoped tokens (OAuth) The unauthenticated public read surface (public.py) now ships (#41#51), but for a developer API the bearer token is still opaque session only and TokenPurpose lacks scopes — no scoped/OAuth token path. Partial High L 56 Any scoped-token path routes through person_visibility + living-person redaction (NN#2/#3).
SourceConnector framework Only AuthProvider/ObjectStore/Mailer base classes exist; no connector base/loader/registry. Gates AI, hints, property connectors. Planned Med L 4 Read-only, rate-limited; findings via ChangeProposal (NN#1); legal sources only (NN#6).
Webhooks / change feeds AuditEntry is the natural substrate (shares the notification dispatch layer, §2.9); no feed/webhook layer. Missing Med L 6 Emit privacy-filtered, tenant-scoped projections — never raw before/after JSON (NN#2/#3).
CLI / scripting surface No [project.scripts], no Typer/Click; worker is a purge loop only. Self-hosters want bulk admin. Missing Med M 9 Funnel reads through privacy.py, writes through audit; admin-scoped, no assistant-write path.
Plugin/addon architecture Connector framework only; no general UI/report/theme plugin system planned. Planned Med L later Sandbox via service layer; no privacy/audit bypass, no writes outside ChangeProposal.
In-app query tooling (SuperTool) Power-user expression engine. Missing Low L later Execute through privacy engine — no row enumeration bypass (NN#2).
Certified partner program Organizational, not software. Missing Low XL Out of scope until a hosted offering exists.

2.14 Performance & scale

Postgres + S3, multi-tenant isolation are have. Queue, observability, backups, pagination, and scale validation are the gaps that gate Phases 4/7 — several are current functional limitations, not late-phase validation tasks.

Item Description Status Imp Eff Phase Non-negotiable
Real job queue (Postgres/Redis-backed) Worker is a fixed-interval purge loop; GEDCOM import and account export run inline in the request. Partial High L 4 (pre-req) Blocks NN#1 (assistant in worker) and NN#4 (hint matching in worker). Queue backend is an open question (PRD §11).
Pagination on list endpoints + server-side tree loading List endpoints (persons.py:37, events, relationships) take no limit/offset/skip; the tree view loads the whole graph client-side. A current limitation against the 50k-person target. Planned High M 12 Split out from scale validation — this is a correctness/functional gap now, not a Phase 9 task.
Scale validation (50k+ trees, P95<2s, load test) No benchmark or load test exists. Planned High L 9 Inline heavy ops risk partial writes — moving to the queue is what makes "failures never corrupt state" true.
Operator backup: one-command pg_dump + MinIO sync deploy/backup.sh + deploy/BACKUP.md now provide a scripted DB+object dump (#234). Remaining: scheduled/off-host/verified-restore tooling (row below). Have Critical M 12 Restore must re-apply privacy state faithfully (NN#3); safety net for NN#8.
Scheduled / cloud automated backup + restore tooling Cron-driven, off-host, verified-restore workflow. Partial High L 9 Builds on the one-command slice above.
ARM64 build matrix CI builds linux/amd64 only; many self-hosters run ARM SBCs. Partial High S 1 Quick win. Add arm64 + QEMU to buildx (NN#7 container-native).
Structured JSON logs + Prometheus metrics Plain-text stdlib logging; no /metrics. Partial Med M 9 Logs/metrics reference UUIDs, never names/PII (NN#3/#4).
pgvector enablement Image has pgvector; app never creates the extension or adds embedding columns (docs claim otherwise). Partial Med M 7 See §2.3 — embedding provider open question; candidates via privacy engine.
Database check-and-repair No orphan/dangling-edge/cycle scanner (recent "harden tree render" commit shows bad graphs occur). Missing Med M 9 Tenant-scoped + audited; auto-fix via ChangeProposal (NN#1).
Pluggable DB backend, billions-scale shared tree, weekly record releases Different product models. Missing Low XL Out of scope — Postgres-only is consistent with the invariants; global shared tree conflicts with NN#2/#3/#4.

2.15 Property / land chain-of-title — headline differentiator

The entire "land" half is planned/missing but fully specified. This is where Provenance has no real competitor.

Item Description Status Imp Eff Phase Non-negotiable
Property/parcel first-class entity No model/endpoint/service/migration. Foundation for the whole category. Planned High L 3 Full CRUD in API+UI (NN#8); reads added carefully to the single privacy engine (NN#2).
Typed OwnershipEvents grant/patent, purchase, sale, inheritance, gift, tax sale, foreclosure, eminent domain — with grantor/grantee Persons + Citation. Planned High L 3 Each event carries a Citation (NN#5); grantor/grantee living-person links redacted (NN#3).
Chain-of-title timeline + gap flagging Ordered OwnershipEvents first-grant→present, breaks flagged. Planned High M 3 The genuinely differentiating analytical piece (PRD US-032).
Bidirectional owner↔person, parcel↔place "Every property a person held" / "every parcel at a place." Planned High M 3 Reverse traversals filtered through privacy engine (NN#2).
Citations on OwnershipEvents Add ownership_event_id to Citation (5th target). Partial Critical S 3 Quick win once Property lands — single FK + CHECK edit (NN#5).
Legal description verbatim storage metes-and-bounds / PLSS township-range-section as-written. Planned Med L 3 Part of the Property model; preserves the record faithfully.
Parcel/plat boundary geometry Optional geometry; plain coords first. Planned Med L 3+ PostGIS is an open question (ARCH §14) — surface dependency.
PLSS / metes-and-bounds parsing → geometry Automated survey parsing. Planned Med XL later Hard; gated on PostGIS.
BLM/GLO federal land-patent connector Marquee US land source. Planned High L 8 Permitted source (NN#6); patents surface as ChangeProposals (NN#1); read-only + rate-limited.
USGS map + public county-deed connectors Per-jurisdiction grantor/grantee indexes. Planned Med L 8 Each connector verifies a legally open source (NN#6).
Co-ownership roles / tenure types joint tenants, TIC, life estate, heirs. Planned Low M later Multiple parties likely free with OwnershipEvent; role typing is a refinement.
Tax/assessment rolls, UK Tithe, Lloyd George Domesday Valuation + non-US collections. Missing Low ML US-focused v1; international formats out of scope (model is country-agnostic).

2.16 AI assistant — defining differentiator

The spine has now landed: the ChangeProposal model/schema/service, its migration, the GET/POST API, and a review UI all ship, and the LLMProvider/EmbeddingProvider abstraction with null/Anthropic/OpenAI-compat (OpenAI/xAI/Ollama) providers + registry is in place. The audit substrate (actor_type=assistant, before/after JSONB) is the right foundation; the remaining work is wiring the assistant's tools to emit proposals and building the chatbot/RAG surface on top.

Item Description Status Imp Eff Phase Non-negotiable
ChangeProposal (propose-then-confirm) The defining invariant. Model/schema/service (models/change_proposal.py, services/change_proposal_service.py), migration a1b2c3d4e5f6, GET/POST api/v1/proposals.py, and a /trees/[id]/proposals review UI all ship. Remaining: wire assistant tools to emit proposals. Have Critical L 4 IS NN#1. Enforce structurally: assistant tools return proposals; only user action applies one; application flows through the normal service layer (privacy + audit). ChangeProposal itself needs full CRUD (NN#8).
Pluggable LLM + embedding provider LLMProvider/EmbeddingProvider ABCs (integrations/models/base.py) with null, Anthropic, and OpenAI-compat (OpenAI/xAI/Ollama) implementations + registry. Have Critical M 4 Twelve-factor, no hard-coded keys/endpoints (NN#7); the Ollama/self-hosted path is what makes the privacy-first promise real.
Per-tree AI model policy Owner-only per-tree model selection (Tree.ai_member_provider/ai_recommender_provider, GET/PATCH /trees/{id}/ai, /trees/[id]/ai UI) (#238). Have Med S 4 Owner-only; selects which configured provider a tree uses — keys stay in env, twelve-factor (NN#7).
AI research-assistant chatbot (RAG over tree) Marquee feature; needs ModelProvider + connector + retrieval through privacy engine. Planned High XL 4 NN#1 propose-only, NN#2 privacy retrieval, NN#3 redaction.
Conversational / connector record search Search legal sources via the assistant. Planned High L 4 Legal sources (NN#6); findings = Source + Citation (NN#5).
Fact extraction from documents Extracted facts map cleanly to ChangeProposal review. Missing Med M 4 Canonical NN#1 use case; each fact carries a Citation (NN#5).
OCR/HTR transcription + document translation Worker job via ModelProvider. Missing Med L 4+ Output → Source/Citation (NN#5); via ModelProvider (NN#7); auto-extraction emits ChangeProposal (NN#1).
Next-step research guidance Gap analysis → suggested next record. Planned Med M 4 Reads via privacy engine; advisory unless it queues fetches.
AI biography / audio narration Read-only generation grounded in tree data. Missing Low ML later Must not leak living-person PII (NN#3); via ModelProvider (NN#7); stored biographies = full CRUD (NN#8).

2.17 Localization & accessibility

A documented day-one commitment ("UI strings externalized from day one") that is currently unmet — every label is a hardcoded literal. Correct the PRD claim or close the gap.

Item Description Status Imp Eff Phase Non-negotiable
UI string externalization No i18n lib, no message catalogs; all copy hardcoded in TSX. Gating prerequisite; cheapest to do now while the surface is small. Missing High L 12 PRD §6 promises this "from day one" — docs-vs-code gap; edit the doc now.
Multi-language UI (4060+ langs) Translation pipeline after externalization (frontend + backend-generated messages). Missing High XL later Table stakes across all competitors.
Accessibility / WCAG 2.2 AA Some ARIA/focus styling; no CI a11y audit, no skip-links, SVG tree viz not keyboard/screen-reader navigable. Partial High L 2/9 Stated PRD §6 target; add axe/pa11y in CI; accessible alternate to the chart.
Unicode-correct non-Latin names Stores fine (UTF-8); no NFC normalization on write, no locale-aware collation, no romanized search. Partial High M 2 Apply unicodedata.normalize('NFC') on input; add COLLATE; supports faithful-record goal.
Structured/compound surname components Surname is a single field; no support for Spanish/Portuguese paternal+maternal, Arabic nasab, particles/prefixes. Missing Med M 2 New Name sub-fields ship with full CRUD (NN#8); preserves the name as recorded.
Non-Gregorian calendar dates calendar column is a placeholder; GEDCOM calendar escapes never parsed/populated. Partial Med L 2 Preserve original calendar as recorded (sources-first).
Language tags / romanized variants per name No language_tag/script/romanized fields; GEDCOM ROMN/LANG unhandled. Missing Med M 2 New Name sub-fields ship with full CRUD (NN#8).
RTL support lang="en" hardcoded, no dir, physical CSS properties throughout. Missing Med M later Convert to logical CSS properties; cheaper once i18n exists.
Selectable themes Light/dark/system works; brand palette intentionally single. Partial Med M later Confirm whether additional themes are a deliberate non-goal (brand guide constrains palette).
Multi-language report/diagram output Depends on i18n + reports, neither shipped. Missing Low L later

3. Quick wins (high importance / low effort)

Ordered by leverage. All are S-effort or a thin slice of a larger item, and most close a stated invariant gap.

  1. Fix site_members visibility tier (Privacy, Critical/S) — done: can_view_tree now handles site_members (privacy.py:56), giving any authenticated account a read view while refusing anonymous.
  2. Email-verification enforcement gate (Privacy/Auth, High/S) — done (#53): the read-side email_verified_at check now ships behind REQUIRE_EMAIL_VERIFICATION, so a freshly registered, unverified user doesn't get a live authenticated session. The registration-mode env switch (open/approve/closed) is the larger follow-on (§2.10, M-effort — not a quick win).
  3. Citation confidence selector in the cite form (Sources, High/S) — confidence is modeled and API-writable but unreachable in the UI; every UI citation is currently NULL. Honors NN#8 and the evidence-quality thesis.
  4. Source edit UI + expose all 8 fields (Sources, High/S) — update API exists but there is no edit form and create exposes ~3 fields; a create-but-not-edit entity violates NN#8.
  5. Make LIVING_RECENCY_YEARS env-configurable (Privacy, High/S) — hardcoded 100 at privacy.py:23; twelve-factor (NN#7).
  6. Add ownership_event_id to Citation (Property/Sources, Critical/S) — single FK + CHECK-constraint edit the moment Property lands; the spine is already built (NN#5).
  7. GEDCOM encoding detection (Standards, High/S) — detect/honor the CHAR tag; reject or transcode ANSEL/UTF-16 rather than silently mangling with errors='replace'.
  8. GEDCOM HEAD completeness (Standards, Med/S) — emit the required 2 FORM LINEAGE-LINKED (under GEDC) and 1 SUBM at gedcom.py:740. Pure conformance.
  9. ARM64 CI build matrix (Perf/Scale, High/S) — add linux/arm64 + QEMU to buildx for both images; many self-hosters run ARM SBCs.
  10. GET /{tree}/citations/{id} endpoint (Sources, Med/S) — API symmetry (NN#8).
  11. Transcription/abstract fields on Source (Sources, Med/S) — add transcription_text + abstract_text, distinct from citation_text; core to evidence analysis.
  12. Sort the merged person timeline (Research workflow, Med/S) — shownEvents.sort() on date_start; currently appended unsorted.
  13. Doc corrections (docs-vs-code) (Meta, trivial/S) — edit CLAUDE.md / ARCHITECTURE so the pgvector "used" claim and the i18n "from day one" claim match reality. The repo convention requires docs to travel with code.

Shipped this cycle: the media privacy leak (§2.4) and the child-resource redaction gap (§2.10) are fully closed — person/event/media/name/relationship (#46) and citation/source endpoints all apply person_visibility for non-members. No residual living-person leak on the read surface.


4. Strategic differentiators

Where to invest to make Provenance distinct rather than a webtrees clone. Each leans on a non-negotiable as a feature, not a constraint.

1. Property chain-of-title (the "land" half). No surveyed competitor models ownership as a typed, cited event chain tying parties across time, with gap-flagging and bidirectional owner↔person / parcel↔place traversal, fed by legal public sources (BLM/GLO patents, USGS, public county deeds). This is the single clearest "no one else does this" capability. Sequence: Property + OwnershipEvent + Citation-target (Phase 3) → chain-of-title view → BLM/GLO connector (Phase 8). The Citation extension is a quick win; the entity is the prerequisite for everything else in the category.

2. The ChangeProposal AI model. "The assistant never writes autonomously" is a trust differentiator in a market where users fear AI corrupting their research. The structural spine has landed — the ChangeProposal model/API/review UI and the pluggable LLMProvider/EmbeddingProvider abstraction both ship — so the remaining work is wiring the assistant's tools to emit proposals (never mutating directly). Assistant tools return proposals; only an explicit human action applies one; application flows through the normal service layer so it always hits the privacy engine and audit log. The same approval queue moderates untrusted human-contributor edits (Collaboration §2.9), so design them together.

3. Anonymous, mutual-consent cross-tree hints. The privacy model already redacts living people for anonymous viewers, so a hint system that reveals nothing identifying until both sides opt in is achievable by construction — and is a categorically more trustworthy version of MyHeritage Smart Matches / Ancestry hints. Requires the matching engine (pgvector enablement + candidate generation, Phase 7), the notification/event-dispatch substrate (§2.9), and the messaging channel that opens only post-consent.

4. True self-hosting + data ownership. Full account export/import, soft-delete recovery (with owner-confirmed on-demand purge to delete a trashed tree immediately rather than waiting out the 30-day window), GEDCOM round-trip, env-driven everything, a one-command operator backup, and (to-build) scheduled off-host backup + ARM support make Provenance the genealogy app you actually own. The two correctness items that gated the promise have landed: GEDCOM export now preserves citations (the Provenance→Provenance round-trip keeps the sources graph), and operator backup moved from "documented procedure" to a one-command dump (deploy/backup.sh). What remains is scheduled/verified-restore tooling and ARM builds. The Ollama/self-hosted ModelProvider path means even the AI assistant runs without tree data leaving the deployment — a promise no commercial competitor can make.

5. Sources-first as a felt experience. The two-tier model is built, and citations now survive GEDCOM export (#232); the remaining differentiator is making sourcing visible and low-friction: a guided Evidence-Explained citation builder, transcription/abstract fields, source-driven data entry (transcribe a document into the tree), and per-fact confidence surfaced in the UI. These turn "every fact links to where it came from" from an architecture note into the product's personality.