Two changes. 1. Privacy fix (NN#2/NN#3) — the citation and source list endpoints gated only on can_view_tree, so a non-member on a public/unlisted/site_members tree could enumerate citations and sources tied to a redacted living person, leaking that the person exists and has sourced facts (and possibly their name via a source title). #46 closed this for events/media/names/relationships but not citations/sources. Now citation_service.list_citations and source_service.{list_sources,get_source} delegate non-member reads to public_view_service, mirroring the #46 pattern: - citations: shown only when the cited fact resolves to FULL-visibility person(s) — covers the person_id, name_id, event_id (person or both-partner), and relationship_id (both-partner) target paths. - sources: shown only when they back at least one visible citation; a withheld source 404s (don't reveal it exists). Tests cover all four citation target types + source withholding + member-sees-all. 2. On-demand tree purge — owners can permanently delete a soft-deleted tree now instead of waiting out the 30-day auto-purge window. POST /trees/{id}/purge (owner-only): the tree must already be in the trash, and the caller retypes its name to confirm. Media objects are deleted from storage, then a single DELETE on trees cascades all tree-owned rows via the tree_id ON DELETE CASCADE; the audit entry survives (tree_id SET NULL). Frontend adds a "Delete forever" button to the Recently-deleted list. No migration. Suite: 102 passing. Signed-off-by: Justin Paul <justin@jpaul.me>
58 KiB
Provenance — Product Backlog
Status legend: Have (shipped) · Partial (substrate exists, surface incomplete) · Planned (on roadmap, no code) · Missing (no code, off roadmap). Importance: Critical / High / Medium / Low. Effort: S / M / L / XL. Phase references map onto the existing 0–9 roadmap. "NN#" = non-negotiable invariant.
1. Executive summary
Where Provenance is strong today. The foundation is genuinely solid and, in several places, ahead of the OSS field:
- Sources-first spine is real. A reusable
Source+ per-factCitationtwo-tier model with aexactly_one_targetCHECK constraint, confidence enum, and full backend CRUD. This is the architectural thing webtrees/Gramps get right and most commercial tools bury. - Privacy architecture is the right shape — and coverage is now broad. A single
privacy.pyengine,TenantScopedmixin on every row, living-person heuristic (is_possibly_living, unknown-birth-treated-as-living), and media served through the backend rather than via raw S3 URLs. Non-member reads of persons, events, media, names, and relationships all route throughperson_visibility(#46). The remaining gap is thecitation/sourcelist endpoints, which still gate only oncan_view_tree— see §2.10. - Non-destructive by design. Soft-delete with timed purge worker, immutable
AuditEntry(before/after JSONB,actor_typeready for the assistant), GEDCOM merge that copies rather than overwrites, full account export/import. - Modeling maturity. Typed parent/child qualifiers (biological/adoptive/step/foster/donor/guardian), typed alternate names with one-primary invariant, dual verbatim+normalized dates, duplicate-relationship guards, UUID surrogate keys.
- Standards core. GEDCOM 5.5.1 import/export is functional (with preview/merge-vs-create resolution UI), pg_trgm fuzzy name search, multi-tenant tree hosting with visibility tiers. Round-trip fidelity has three tracked gaps (custom tags, PLAC coords/hierarchy, non-UTF-8 encoding) — see §2.11.
Documentation-vs-code gaps to correct now (per "docs travel with code"). Two repo claims are not yet true and should be edited in the same spirit they were written:
- pgvector is claimed as used; it is not. Only
pg_trgmis created. ARCHITECTURE references pgvector for match ranking. - i18n "from day one" is documented but unmet. PRD §6 promises externalized strings; every label is a hardcoded literal.
These two doc edits are themselves trivial quick wins (see §3).
The biggest gaps vs commercial (Ancestry / MyHeritage / FamilySearch). Provenance is not trying to be a record provider, and correctly so — but it is missing several things mainstream users treat as table stakes:
- No record hints, no "save to tree," no connector framework. The entire SourceConnector layer (FamilySearch/Find A Grave/WikiTree) is unbuilt — this gates AI search, hints, and auto-citation.
- No person merge outside GEDCOM import. Merging duplicate people is fundamental hygiene and is currently impossible in-tree — the single highest-value near-term matching gap.
- No maps at all. No place autocomplete, no geocoding, no interactive/migration/birthplace maps — a glaring hole for an app whose thesis is family and land.
- No report/print/PDF output. Charts render on-screen only; there is no Ahnentafel, family group sheet, narrative report, or any PDF/SVG/HTML export. The whole "Charts, reports & printing" category is on-screen-viewing only.
- DNA absent (deliberately parked — treat as open question, not a gap).
The biggest gaps vs OSS (GRAMPS / Gramps Web / webtrees). These are where a privacy-first self-host product is expected to compete and currently trails:
- Collaboration management is now reachable, but minimal.
TreeMembershiproles are enforced on every read/write, and a list/add/change-role/remove API + UI now ship (§2.9), satisfying the full-CRUD invariant (NN#8). The remaining gap is the richer email invite/grant flow (pending-invite state, resend/expire), still scheduled for Phase 6. - Living-person redaction is now near-uniform. Non-member reads of persons, events, media, names, and relationships all redact possibly-living people (#46); the
citation/sourcelist endpoints are the remaining hold-outs (they gate only oncan_view_tree) — a narrowed PII gap on public/unlisted trees (NN#3, NN#2). - No place as a usable first-class entity (model exists, created by GEDCOM, but no read/edit/delete — a create-only entity, which is a bug per NN#8).
- No research log, to-do/task planner, kinship calculator, data-quality checker, or i18n/string externalization (the last is a documented day-one commitment that is currently unmet).
Security-priority correctness fixes (do these first, regardless of phase). The redaction defects all shipped — child resources (#46) and now citations/sources too — leaving one config switch:
- Self-registration approval-mode switch (§2.10) — the read-side enforcement now exists:
REQUIRE_EMAIL_VERIFICATIONgates login/session onemail_verified_at(#53). The remaining gap is the env switch to choose open vs admin-approval vs closed self-registration. (The citation/source living-person leak is now closed — citation/source list endpoints applyperson_visibilityfor non-members viapublic_view_service.)
Strategic posture. The differentiators worth pressing — property chain-of-title, the ChangeProposal AI model, the anonymous mutual-consent hint system, and true self-host data ownership — are mostly still ahead on the roadmap. The near-term job is (a) close the privacy/auth correctness and collaboration gaps that the architecture already implies, (b) ship the maps + reports + merge table stakes, and (c) finish the back-half spine — the connector framework plus wiring the now-landed ChangeProposal/ModelProvider into the assistant — that unlocks the entire back half of the roadmap.
2. Backlog by category
2.1 Tree & data model
Core CRUD, typed relationships, dates, soft-delete, and naming are have. Remaining work is about reusable sub-entities, shared/event-centric modeling, and research-grade conveniences.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Repository as first-class entity | Promote Source.repository string to a reusable Repository (name/address/call-numbers) with dedup. |
Partial | Med | M | 1–2 | If promoted, full CRUD in API+UI (NN#8) — don't half-build. |
| Note as first-class entity (SNOTE) | Promote inline notes text fields to reusable shared Note/SNOTE records. |
Missing | Low | M | 2 | Full CRUD; GEDCOM 7 round-trip parity. |
| Shared/event-centric model + witnesses | Remove the subject_person_xor_relationship XOR; add participant/role join so one event has many people (FAN/cluster research). |
Missing | Med | M | later | Unlocks FAN club + richer sourcing; participants must redact via privacy engine. |
| Non-family associations (FAN) | Add associate/neighbor relationship types; best delivered with shared-event participants. | Missing | Low | M | later | — |
| Relationship-status enum | Add married/divorced/annulled status on partnership rather than inferring from events. | Partial | High | M | 1–2 | — |
| Family/couple unit (GEDCOM FAM) | Persist a true FAM entity (own ID/sources, childless couples) instead of rebuilding on export. | Partial | High | L | 2 | Improves GEDCOM fidelity. |
| Kinship / relationship calculator | "How is A related to B" path + cousinship. Graph edges already exist. | Missing | High | M | 1–2 | Self-contained; reads via privacy engine. |
| Read-only audit-log viewer / activity feed | Surface AuditEntry as a per-tree/per-person change feed. Smaller and higher-leverage than value-level undo; partially satisfies NN#8's "read" for AuditEntry and is the substrate for watch/follow + webhooks. |
Missing | High | M | 2 | Privacy-filtered projections only — never raw before/after JSON to non-members (NN#2/#3). |
| Per-field revision history + restore-prior-value | Value-level history view + undo, built atop the audit feed above. | Partial | High | L | 6 | Audit-log UI is the feed item; this is the larger value-level-undo work (NN#8 correction ethos). |
| Color-coded tags & custom labels | Tag people for lineages/research-status/grouping. | Missing | Med | M | 2 | Full CRUD; tenant-scoped. |
| Person timeline / LifeStory | Sort the merged event list; add place/age enrichment + narrative presentation. | Partial | Med | M | 2 | Sort is trivial (localeCompare on date_start); narrative is the larger piece. |
| Multi-calendar normalization | Store + parse Julian/Hebrew/French Republican (only calendar tag stored today, only Gregorian normalized). |
Partial | Low | M | 2 | See also Localization §2.17. |
| Evidence/persona vs conclusion model | GEDCOM-X persona layer separate from conclusion person. | Missing | Med | XL | later | Large modeling change; strengthens sourcing + hint matching. |
| Negative assertions | Boolean "event did not happen" on Event. | Missing | Low | S | 2 | Cheap interop nicety. |
| Custom groups / networks | Named manual or rules-based groupings. | Missing | Low | M | later | Lower priority than tags. |
| Raw GEDCOM record editor / configurable fact tabs | webtrees-style raw editor + fact-type registry. | Partial | Med | L | later | Open vocabularies give de-facto custom facts today. |
| Health/medical, historical-facts index, LDS ordinances | Niche entities. | Missing | Low | M–L | later | LDS BAPL/ENDL/SLGS should map to distinct types if ever pursued; medical is special-category PII. |
2.2 Sources & citations
The two-tier model is have and production-grade on the backend. The gaps are almost all UI/CRUD-completeness and the connector-dependent "save to tree" flows.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Citation confidence selector in UI | Confidence enum is modeled + API-writable but the citeControl form never sets it — every UI citation is NULL confidence. |
Partial | High | S | 1 | Quick win. Full CRUD in UI (NN#8); reinforces evidence-quality thesis. |
| Source edit UI + all 8 fields | Source UI is add/list/delete only and create exposes ~3 of 8 fields (no author/source_type/publication_info/quality_note/citation_text). | Partial | High | S | 1 | Update API exists but no edit form — violates NN#8. |
GET /{tree}/citations/{id} |
Citation API has list but no single-read endpoint. | Partial | Med | S | 1 | API symmetry (NN#8). |
| Transcription / abstract / extract fields | Add transcription_text + abstract_text to Source; don't conflate with citation_text (GEDCOM SOUR.TEXT currently dumped into citation_text). |
Missing | Med | S | 1–2 | Quick win. Central to evidence analysis; full CRUD (NN#8). |
| Evidence-Explained guided citation builder | Structured fields → formatted citation (Chicago/MLA/APA) instead of hand-typed citation_text. |
Missing | High | L | 2 | Signature provenance feature; citation_text should be generated, not typed. |
| Citations on OwnershipEvents | Add ownership_event_id to Citation + extend CHECK to 5 targets when property lands. |
Partial | Critical | S | 3 | Quick win once Property exists — single FK + constraint edit (NN#5). |
| Record-to-source attachment ("save to tree") | Search a connector record and attach its facts. | Missing | High | XL | 4 | Gated on connector framework; assistant attach must emit ChangeProposal (NN#1); legal sources only (NN#6). |
| Source Linker (one record → many persons) | Bulk-attach a record's facts across people. | Missing | Med | L | 4 | Downstream of connectors; reads/writes via service layer. |
| Auto-citation on save/match | Generate citation when a hint/record is confirmed. | Missing | Med | L | 4/7 | Blocked on connectors + hints; ChangeProposal if assistant-driven. |
| Memories-as-sources (cite a photo directly) | Allow media to be a citation target, not only attachable to a Source. | Partial | Low | M | 2 | Reads stay on privacy-checked media endpoint (NN#2). |
| GPS / Proof-Standard reasoning artifact | Container linking sources/citations into a proof narrative reconciling conflicts. | Missing | Med | L | later | Serious-researcher differentiator; full CRUD (NN#8). |
| Proprietary record collections | 1921 census, UK sets, etc. | Missing | Low | XL | — | Out of scope — conflicts with NN#6 / self-host. Do not pursue. |
2.3 Search & matching
Fuzzy trigram name search is have; everything that depends on connectors, embeddings, or multiple populated trees is planned/missing. The standout near-term gap is in-tree person merge.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Standalone duplicate detection | Lift the GEDCOM _best_match logic into a "find duplicates in my tree" scan. |
Partial | High | M | 2 | Logic already written; results via privacy engine (NN#2). |
| Interactive two-person merge (side-by-side, field-select, undo) | General merge of duplicate persons with citation re-pointing — impossible outside import today. | Partial | High | L | 2 | Highest-value matching gap. Preserve + re-point Citations (NN#5); write-once is a bug (NN#8). |
| Advanced search (wildcards, boolean, date/place facets, sort) | Search exposes only ?q. |
Partial | High | M | 2 | Keep per-person privacy filter in the search loop (NN#2). |
| Phonetic matching (Soundex/Metaphone/DM) | Enable fuzzystrmatch; trigram is char-similarity, not phonetic. |
Partial | High | M | 2 | Pure utility. |
| Semantic / vector search (pgvector) | Docs claim pgvector is used; it is not — only pg_trgm extension is created. Add CREATE EXTENSION vector + embedding columns (and correct the docs). |
Missing | Med | L | 7 | Embedding provider is an open question (PRD §11) — don't pick silently; candidates via privacy engine. |
| Tree-to-tree matching (Smart Matches) | Cross-tree candidate generation + ranking. | Planned | High | XL | 7 | Anonymous until mutual consent (NN#4); living-person protection (NN#3). |
| Mutual-consent match notification | Anonymous notification, reveal only after both opt in. | Planned | High | L | 7 | Mandated invariant, not a toggle (NN#4, NN#3); rides the notification substrate (§2.9). |
| Match confirm/reject + "not a match" memory | Persistent rejected-match store (today scoring lives only inside import). | Partial | High | M | 7 | Prevents re-notifying once hints land. |
| External search deep-links | Pre-fill FamilySearch/Find A Grave/BLM-GLO search URLs from a person's name/dates/place. | Missing | Med | M | 2–4 | High value, low risk before full connectors; legal targets only (NN#6). |
| Automated record hints | Proactive per-person record suggestions from connectors — a marquee mainstream feature. | Missing | High | XL | 7 | Connector-gated (NN#6); surfaced anonymously where cross-tree (NN#4); attach via ChangeProposal (NN#1). |
| Jurisdiction-aware record-search hints | Map place/jurisdiction → relevant collections. Place hierarchy is a ready foundation. | Missing | Med | L | 8 | Suggested collections must be legal (NN#6). |
| Cross-language / transliteration matching | Cyrillic/Hebrew/CJK ↔ Latin. | Missing | Med | XL | later | See Localization. |
| Record Detective, newspaper matches, collection catalog, GQL query builder, OCR full-text search | Connector/record-layer dependent. | Missing/Planned | Low–Med | L–XL | 4/7/8 | All gated on the connector framework; any query path runs through privacy engine (NN#2). |
2.4 Media & documents
Universal media attachment is have; the earlier privacy leak is now closed (#46), and the remaining gaps are the asset-processing pipeline (EXIF strip, thumbnails).
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Media privacy gating on serve paths | list_media/get_media/media_content now apply person_visibility for non-members (#46): media is exposed only when linked to a FULL-visibility person (list_public_media/can_view_media), so living-person photos no longer leak on public/unlisted trees. |
Have | Critical | M | 1 | Resolved (NN#3/NN#2). Serve paths check attached person_id visibility and 404 otherwise. |
| EXIF / GPS stripping on upload | Raw bytes stored verbatim; family photos leak GPS/home addresses/timestamps. | Planned | High | M | 1 | Security-priority, not cosmetic. Parse EXIF on ingest, strip/quarantine by default, allow override. |
| Thumbnail / preview generation | No image pipeline (no Pillow). Async, idempotent worker job. | Planned | High | L | 1 | Derived thumbnail must inherit parent privacy — no bypass path. |
| Image reference regions | Mark the rectangle of a census image that supports a Citation. | Missing | Med | M | later | Tenant-scoped, full CRUD; region→Citation preferred over region→Person. |
| Photo/face tagging (manual) | Multi-person tagging via single FK today. | Missing | Med | XL(ML)/M(manual) | later | Owner-only, in-deployment; face tags inherit redaction (NN#3); full CRUD. |
| Mobile photo scanning + auto-split | Shoebox digitization. | Missing | Med | L | later | Reuse privacy-gated upload + EXIF strip. |
| AI photo dating / colorize / restore / animate / narrate | Model-driven media features. | Missing | Low | L–XL | 4+ | Must route through ModelProvider (NN#7), require approval (NN#1), preserve original; animating living faces raises consent issues. |
| British Library / paywalled archives, pay-per-view credits | Licensed content + metering. | Missing | Low | XL | — | Out of scope — conflicts with NN#6 and the self-host model. |
2.5 DNA & genetic genealogy
DNA is an explicit PRD non-goal / open question — treat as parked, not a backlog to grind through. Across every DNA row the rule is uniform: a user uploading their own export is permissible; vendor connectors/scrapers (23andMe / Ancestry / MyHeritage / GEDmatch) are barred (NN#6). Kits and matches are living-tester PII and route through the privacy engine.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| DNA-confirmed relationship flag | Model DNA confirmation as a Source/Citation backing a Relationship (not free text). | Missing | Med | M | parked | Best sources-first fit (NN#5); full CRUD (NN#8). |
| Raw DNA upload (own file) | User uploads own export; no vendor scraping. | Missing | Med | L | parked | User's own file is fine; vendor connectors barred (NN#6); special-category PII via privacy engine. |
| Kit/Match entities linked to persons | Kit (tester) + Match tied to Person, tenant-scoped/audited. | Missing | Med | M | parked | Kits = living-tester PII (NN#2/#3); full CRUD (NN#8). |
| Autosomal match list, segments, chromosome browser, triangulation, ThruLines/AutoTree, ethnicity/admixture, haplogroups, GEDmatch, NPE detection | Full genetic-genealogy suite. | Missing | Low–Med | L–XL | parked | DNA scope is an unresolved open question — surface the dependency, don't build speculatively. Own-data only (NN#6); cross-user surfacing obeys NN#4. |
2.6 Maps, places & gazetteers
This category is almost entirely missing despite being half the product thesis. The Place model has the right bones (parent_id, lat/long, PlaceName with date ranges) but no API/UI and no maps.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Place as usable first-class entity | Place rows are created by GEDCOM but have no read/edit/delete API or UI — a create-only entity. | Partial | High | M | 2–3 | Violates NN#8 (create-but-not-edit = bug). Make Place citable too (NN#5). |
| Place autocomplete + picker in event editor | No /places router; the event form has no place input, so users can't attach a place at all. |
Missing | High | M | 2 | Table stakes; lookup is low-risk. |
| Geocoding (manual coords + forward) | lat/long columns exist; no UI, no geocoder. | Partial | High | M | 3 | Provider via env (NN#7), ToS-compliant (NN#6). |
| Pluggable geocoding provider | Nominatim/GeoNames/Bing/Google swappable. | Missing | Med | L | 3 | Provider+keys via env (NN#7); legal providers only (NN#6). |
| Bulk/batch geocoding (worker) | Geocode hundreds of GEDCOM-imported places. | Missing | Med | M | 3 | Idempotent, rate-limited worker job; provider via env. |
| Place merge/split (dedup) | GEDCOM imports produce near-duplicate place strings. | Missing | High | M | 2–3 | Needs Place update/delete (NN#8); audited merges. |
| Place-name cleanup tools | Extend the existing preview→apply cleanup UX to places. | Missing | Med | M | 2 | Preview-first + audited like existing cleanup. |
| Standardized-name vs original text | Mirror the verbatim+normalized date pattern for places. | Missing | Med | M | 2–3 | GEDCOM fidelity. |
| Alternate/historical place names with date ranges | PlaceName model exists with valid_from/to but no CRUD and never populated. |
Partial | Med | M | 2–3 | Stored entity with no CRUD surface (NN#8). |
| Interactive map of events & places | No map library in frontend. Core to family+land positioning. | Missing | High | L | 3 | Plot via person_visibility so non-owners never see living locations (NN#2/#3). |
| Migration trail / pedigree-birthplace maps | Per-person life path; ancestor birthplace map. | Missing | Med | L | 3 | Redact living subjects for non-owners (NN#3). |
| Bundled world gazetteer | Offline GeoNames-style authority. | Missing | Med | XL | later | GeoNames (CC-BY) verify AGPL-compat; env-configurable. |
| Historical boundary overlays, time slider, heatmaps, radius/nearby, tile-provider switch | Advanced geo. | Missing | Low–Med | S–XL | later | PostGIS is an open question (ARCH §14) — surface dependency, don't adopt silently; tiles legal (NN#6). |
2.7 Charts, reports & printing
On-screen pedigree/descendant/fan/hourglass charts are have. The entire output/print/report half is missing — this is the linchpin gap of the category.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Multi-format export (PDF / SVG / image / HTML) | No export/print path, no @media print, no window.print(). Charts and reports can't leave the screen. |
Missing | High | L | 2/6 | Linchpin. Generate from privacy-filtered data so living people redacted in shared output (NN#3). |
| Ahnentafel report | Numbered-ancestor report; all data exists. | Missing | High | M | 6 | — |
| Family group sheet / individual summary | Printable summary; data available, needs print layout. | Missing | High | M | 6 | — |
| Narrative descendant/ancestor reports | Multi-standard prose with inline sources. | Missing | High | L | 6 | Cite Sources inline (NN#5); redact living (NN#3). |
| Sentence-template narrative engine | Deterministic fact→prose underpinning reports. | Missing | Med | L | 6 | Keep template-based; report text never mutates tree (NN#1). |
| Photo boxes in charts | Pass privacy-checked media URLs to setCardDisplay; CSS already present. |
Missing | High | M | 2 | Stream via privacy-checked /media (NN#2/#3). |
| Drag-to-edit / interactive chart canvas | Tree canvas renders but interactive node editing (drag to re-parent, inline edit on the chart) is only partly present. | Partial | Med | M | 2 | Edits go through service layer + audit (NN#1); honor redaction. |
| Statistics dashboard | Surname/place/date distributions + tree-health. | Missing | Med | M | 6 | Reads via privacy engine (NN#2). |
| Kinship/relationship diagram report | Needs path-finding (see §2.3 calculator) + renderer. | Missing | Med | M | 6 | — |
| List reports (sources/places/repos/media) | Printable indexes (current screens are management, not reports). | Missing | Med | M | 6 | — |
| Color-by-lineage, fan overlays, lifespan/timeline charts | Sex-coloring exists; lineage/overlay/timeline don't. | Partial/Missing | Med | M | later | Overlays respect privacy engine. |
| Book/multi-report compiler, wall-chart tiling, page-setup, customizable charts | Print-shop-grade output. | Missing | Low–Med | L–XL | later | Saved "book" entity = full CRUD (NN#8); honor living-person privacy. |
| Bowtie/couple-rooted/circular-sun/3D/network/calendar | Niche chart variants. | Missing/Partial | Low–Med | M–L | later | — |
| Print-shop products, XML template engine, blank forms | Commercial/template extras. | Missing | Low | S–XL | later | Weak fit for self-host. |
2.8 Research workflow & automation
The preview→approve bulk cleanup tool is a genuine have and a differentiator. The missing pieces are the serious-researcher workflow entities.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Data-quality / consistency checker | Extend cleanup beyond name issues: child-before-parent, death-before-birth, implausible ages, orphans, dups; severity tiers. | Partial | High | L | 2 | New auto-fixes keep preview→apply (NN#1). |
| Research log | Searches, repositories visited, negative results, findings — distinct from the system audit log. | Missing | High | M | 6 | Reference reusable Sources (NN#5); tenant-scoped full CRUD (NN#8). |
| To-do / research task planner | Tasks on Person/Tree with status/priority/due/assignment. | Missing | High | M | 6 | Full CRUD in API+UI (NN#8). |
| Source-driven data entry | Start from a Source document and transcribe facts into the tree. | Missing | High | M | 2 | Natural sources-first differentiator (NN#5). |
| Task↔log linkage | FK + joined view once both entities exist. | Missing | Med | S | 6 | Cheap once predecessors land. |
| Family chronology / timeline | Sort merged events; family-wide chronology (parents' marriage, siblings' births). | Partial | Med | M | 2 | Sort is trivial; presentation over privacy-filtered data. |
| Navigation: active person / history / bookmarks | Large trees rely on browser back only. | Missing | Med | M | 2 | Per-user, tenant-scoped, full CRUD; don't expose redacted persons (NN#2/#3). |
| Saved-record shoebox / review queue | Stage candidate records before committing. | Missing | Med | M | 4/7 | Auto-attach via ChangeProposal (NN#1); legal sources (NN#6). |
| Guided research suggestions | Proactive "research next" engine (today only flags problems). | Partial | High | L | 4 | Advisory; writes via ChangeProposal (NN#1); cross-tree via privacy engine (NN#2). |
| Persona-adaptive onboarding | Family Keeper / Serious Researcher / Property Researcher selector (PRD US-002, documented but unbuilt). | Missing | Low–Med | L | 2 | Pure presentation. |
| Dashboard widgets, scratchpad, research-link sidebar, blog/narrative authoring, research wiki, crowd indexing | Conveniences. | Missing | Low | S–XL | later | Widgets/published narratives read via privacy engine (NN#2/#3). |
2.9 Collaboration & sharing
Authorization is enforced everywhere, and a minimal management surface now ships — list/add/change-role/remove via api/v1/members.py plus a members page (#233). The remaining gap is the richer email invite/grant flow. The minimal slice landed at Phase 2 as planned; the invite/email UX stays at Phase 6.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Membership PATCH/DELETE + role change (minimal slice) | Add/adjust/revoke a collaborator and change role — GET/PATCH/DELETE on /trees/{id}/members (api/v1/members.py) plus a frontend members page now ship (#233). Resolves the create-only NN#8 break without the full invite flow. |
Have | Critical | S–M | 2 | Resolves the create-only NN#8 break. Revocation routes through the single privacy point. |
| Full invite/grant flow (email + UI) | Email-based invitations, pending-invite state, role-grant UI, resend/expire. Builds on the minimal slice. | Partial | High | L | 6 | Invitation email via configured SMTP (NN#7); membership changes through the one enforcement point. |
| Read-only public tree share | Anonymous read surface shipped: optional-auth CurrentUserOrNone dep, api/v1/public.py + public_view_service.py, and server-rendered pages at /p/[treeId] (+ /persons/[personId]) and /explore. Living-safe by construction via person_visibility. |
Have | High | M | 2 | Highest-leverage near-term sharing feature; living-safe by construction via person_visibility (NN#2/#3). |
| SEO public profile pages (server-rendered) | Server-rendered public pages (/p/[treeId], /explore) and robots.ts now ship. Deferred follow-ups: a public-only sitemap.ts and per-tree noindex,nofollow meta for unlisted/site_members pages. |
Partial | Med | L | 2 | NN#2 explicitly names server-rendered public pages — must go through privacy engine, no direct row queries. |
| Notification / event-dispatch substrate | Shared enabler seeded from AuditEntry: subscription + dispatch layer emitting privacy-filtered projections. Underpins watch/follow, mutual-consent match notices, comments, moderation, and in-app messaging. |
Missing | High | L | 6 | Privacy-filtered projections only — never raw before/after JSON (NN#2/#3). |
| Comments / discussion threads | Per-profile discussion (target = person/event/source), threaded. | Missing | High | M | 6 | Comments on living persons redacted for non-members (NN#2/#3); rides the dispatch substrate. |
| In-app messaging (contact details hidden) | SMTP exists; no Message/Thread model. | Planned | High | L | 6 | Hide contact details; opens after mutual consent (NN#4); redact living-person content; rides dispatch substrate. |
| Watch/follow + change notifications | AuditEntry is the natural event source; needs subscription entity + dispatch (substrate above). |
Planned | Med | M | 6 | Notification builder reads via privacy engine, not raw rows. |
| Optimistic concurrency / lost-update protection | No version/etag/updated_at precondition checks; concurrent multi-user edits can silently clobber. |
Missing | High | M | 6 | Full-CRUD + multi-user without this risks lost updates; concurrent paths still route through privacy engine. |
| Pending-changes moderation (human edits) | Queue contributor edits for owner approval — shares infra with the AI ChangeProposal queue. | Missing | Med | L | 6 | Design together with ChangeProposal (NN#1). |
| Field-by-field profile merge & approval | WikiTree-style merge center + unmerge with per-field provenance. | Missing | Med | XL | later | Conflicting facts each retain Source/Citation (NN#5). |
| Ownership transfer | owner_id is effectively write-once; needed for self-host longevity. A minimal reassignment endpoint is the NN#8 fix. |
Missing | Med | M | 6 | Violates write-once invariant (NN#8) — importance/phase tension noted; ship the minimal slice when membership lands. |
| Narrative website / HTML export | Static narrated site (reuse public-page renderer). | Missing | Med | L | later | Redact living persons at build time (static bypasses runtime engine) (NN#3). |
| Two-way desktop↔online sync | Bidirectional sync with change journals. Audit log could seed a change feed. | Missing | Med | XL | later | No Ancestry TreeShare / paywalled sync (NN#6). |
| Curator roles, trusted-list ACLs, field locking, projects/workspaces, forum, honor code, free-space wiki, portal homepage | Community-platform features. | Missing | Low | S–XL | later | New roles/ACLs/locks integrate with the single enforcement point, not parallel checks. |
| Real-time co-editing | Out of scope; only optimistic concurrency planned. | Planned | Med | XL | later | Concurrent paths must route through privacy engine. |
2.10 Privacy & access control
The architecture is correct (single engine, tenant mixin, audit, soft-delete + purge are have), but enforcement coverage and configurability have real holes — two of which are security-priority.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Uniform living-person redaction across child resources | person_visibility now runs for non-members on the event, media, name, relationship endpoints (#46) and the citation/source list endpoints, all delegating to public_view_service: citations resolve to FULL-visibility person(s); sources show only when they back a visible citation. |
Have | High | S | 1–2 | Resolved (NN#3/NN#2). No child-resource path leaks a redacted living person's facts. |
| Email-verification enforcement gate | Read-side check now ships (#53): REQUIRE_EMAIL_VERIFICATION gates login/session on email_verified_at (auth_service.py). Opt-in (default off) so SMTP-less self-hosts still work. |
Have | High | S | 1–2 | Read-side trust path now enforced (NN#7); the registration-mode switch below is the separate larger piece. |
| Self-registration mode gating (approve / open / closed) | No env switch to choose open vs admin-approval vs closed registration. | Partial | High | M | 2/5 | Twelve-factor registration control (NN#7); pairs with the verification gate above. |
| Instance owner / operator role | OWNER_EMAIL-declared operator (#240): is_instance_owner on /users/me, owner-only GET /api/v1/admin/instance, /admin UI. |
Have | Med | S | 2/5 | Owner-only operational surface, twelve-factor via env (NN#7); reads stay through the service layer. |
Fix site_members visibility tier |
can_view_tree now handles site_members (privacy.py:56): any authenticated account gets a read view, anonymous is refused. |
Have | Critical | S | 1 | Honors the tier the UI offers; reads still route through person_visibility. |
Make LIVING_RECENCY_YEARS configurable |
Hardcoded 100 at privacy.py:23. |
Partial | High | S | 2 | Quick win. Twelve-factor (NN#7). |
| Privacy-stripped export (redact living) | GEDCOM + account export emit full tree; no "strip living" mode. | Missing | High | M | 2 | Reuse person_visibility/_redact (NN#3). Owner self-export is safe today; shareable variant is the gap. |
| Per-fact / per-field privacy + record flags | tentative/rejected/preferred/private flags on facts. | Missing | Med | L | later | If added, route through the single engine (NN#2). |
| Granular rules by record type & viewer relationship | webtrees-style "hide marriages from non-descendants". | Missing | Med | L | later | Single enforcement point. |
| OIDC / external IdP login | AuthProvider interface ready; only Local implemented. Authentik is the intended real auth. |
Planned | High | L | 5 | Additive by design. |
| Two-factor auth (TOTP) | Bearer/cookie session auth is solid; no MFA. | Partial | High | L | 5 | — |
| DB-level audit immutability | Audit is insert-only by convention; no trigger/constraint. Verified as "adequate for self-host," so importance downgraded to match. | Have(soft) | Med | S | 9 | Adequate for self-host; upgrade to trigger only if true immutability is required. |
| Block/hide users, family-group private space, DNA opt-in controls | Depend on messaging/DNA. | Missing | Low–Med | M–XL | 6/parked | DNA parked (NN#6). |
2.11 Import/export & standards
GEDCOM 5.5.1 import/export and full data-portability export are have; the remaining fidelity gaps (custom tags, PLAC coords/hierarchy, non-UTF-8 encoding) still undercut the provenance thesis.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Citation links on GEDCOM export | Export now selects Citations and emits SOUR/PAGE per fact (#232), so fact→source links survive a Provenance→Provenance round-trip. (Citation detail/confidence beyond page still to round-trip.) |
Have | Critical | M | 2 | Closes the silent data-loss / destructive round-trip on the product's signature data (NN#5); satisfies PRD US-013. |
| GEDCOM 7.0 import/export | Version hardcoded 5.5.1; no v7 semantics, SCHMA, SUBM, or UID handling. |
Partial | High | L | 2 | Stated differentiator (FamilySearch interop). |
| Custom/underscore tag preservation | _MARNM becomes TYPE married, other custom tags dropped — violates ≥99% round-trip goal. |
Missing | High | L | 2 | Tension with provenance thesis (faithful record). |
| PLAC FORM hierarchy + MAP coordinate round-trip | Import reads only PLAC text; export emits flat PLAC. lat/long + hierarchy lost on round-trip. | Missing | High | M | 2–3 | Round-trip fidelity for the land/maps pillar. |
| Encoding detection (ANSEL/UTF-16) | UTF-8 round-trips; non-UTF-8 files silently mangled via errors='replace'; CHAR tag ignored. |
Partial | High | S | 2 | Near quick win. Detect/honor CHAR; reject or transcode rather than corrupt. |
| HEAD completeness | HEAD at gedcom.py:740 emits only SOUR/GEDC/VERS/CHAR — missing required 2 FORM LINEAGE-LINKED (under GEDC) and 1 SUBM. |
Partial | Med | S | 2 | Quick win. Pure conformance. |
| GEDCOM media (OBJE) round-trip | OBJE in skip-tags; media ignored on import, never emitted on export. | Partial | Med | M | 2 | Any media bundle keeps privacy gating. |
| GEDZIP (.gdz) bundle | Bundled-media packaging. | Missing | Med | M | 2 | Natural once v7 + OBJE land. |
| Selective / filtered export | Clippings-cart / branch subset. | Missing | Med | M | later | Maintain single-enforcement-point on export (NN#2). |
| Import conformance validation | Preview is a mapping report, not structural/cardinality validation; bad lines silently skipped. | Partial | Med | M | 2 | — |
| GEDCOM-X, Gramps XML, multi-format import, FHISO/ELF, PRF upload, KML export | Interop extras. | Missing | Low | L | later | PRF needs FamilySearch API (permitted, NN#6). |
2.12 Mobile & offline
Responsive web is partial; PWA and offline-first are absent. Native apps are an explicit deferral.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| PWA (manifest + icons + viewport + service worker) | No manifest, no SW, no next-pwa; responsive coverage exists but unaudited on heavy views (tree canvas fixed 74vh). |
Partial | High | M | 2 | If SW caches API responses, never retain non-owner PII; cache only what the session is authorized to see (NN#3). |
| Responsive parity audit | 17 breakpoint usages; small-screen parity on tree/person views unverified. | Partial | High | M | 1–2 | Feature parity is an ARCH requirement. |
| Offline-first editing + reconnect sync | No SW, no local store, no mutation queue. Valuable for archive/courthouse field research. | Missing | High | XL | later | Replayed edits go through service layer + audit (NN#1); cached data respects living-person rule (NN#3). |
| Native mobile apps | Explicitly deferred (responsive web only). | Missing | Med | XL | later | If built, reads through one backend privacy engine (NN#2/#3/#4). |
| Companion app w/ cross-device sync | Largely redundant with server-backed web. | Missing | Low | XL | later | Sync boundary enforces privacy (NN#2); full CRUD parity (NN#8). |
| Relatives Around Me | Nearby-relatives discovery. | Missing | Low | L | later | Explicit opt-in; anonymous until mutual consent (NN#4). |
2.13 API & extensibility
Internal REST + OpenAPI + generated TS client are have. The externalized developer story and the connector/plugin spine are not built.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Public read-only API + scoped tokens (OAuth) | The unauthenticated public read surface (public.py) now ships (#41–#51), but for a developer API the bearer token is still opaque session only and TokenPurpose lacks scopes — no scoped/OAuth token path. |
Partial | High | L | 5–6 | Any scoped-token path routes through person_visibility + living-person redaction (NN#2/#3). |
| SourceConnector framework | Only AuthProvider/ObjectStore/Mailer base classes exist; no connector base/loader/registry. Gates AI, hints, property connectors. | Planned | Med | L | 4 | Read-only, rate-limited; findings via ChangeProposal (NN#1); legal sources only (NN#6). |
| Webhooks / change feeds | AuditEntry is the natural substrate (shares the notification dispatch layer, §2.9); no feed/webhook layer. |
Missing | Med | L | 6 | Emit privacy-filtered, tenant-scoped projections — never raw before/after JSON (NN#2/#3). |
| CLI / scripting surface | No [project.scripts], no Typer/Click; worker is a purge loop only. Self-hosters want bulk admin. |
Missing | Med | M | 9 | Funnel reads through privacy.py, writes through audit; admin-scoped, no assistant-write path. |
| Plugin/addon architecture | Connector framework only; no general UI/report/theme plugin system planned. | Planned | Med | L | later | Sandbox via service layer; no privacy/audit bypass, no writes outside ChangeProposal. |
| In-app query tooling (SuperTool) | Power-user expression engine. | Missing | Low | L | later | Execute through privacy engine — no row enumeration bypass (NN#2). |
| Certified partner program | Organizational, not software. | Missing | Low | XL | — | Out of scope until a hosted offering exists. |
2.14 Performance & scale
Postgres + S3, multi-tenant isolation are have. Queue, observability, backups, pagination, and scale validation are the gaps that gate Phases 4/7 — several are current functional limitations, not late-phase validation tasks.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Real job queue (Postgres/Redis-backed) | Worker is a fixed-interval purge loop; GEDCOM import and account export run inline in the request. | Partial | High | L | 4 (pre-req) | Blocks NN#1 (assistant in worker) and NN#4 (hint matching in worker). Queue backend is an open question (PRD §11). |
| Pagination on list endpoints + server-side tree loading | List endpoints (persons.py:37, events, relationships) take no limit/offset/skip; the tree view loads the whole graph client-side. A current limitation against the 50k-person target. |
Planned | High | M | 1–2 | Split out from scale validation — this is a correctness/functional gap now, not a Phase 9 task. |
| Scale validation (50k+ trees, P95<2s, load test) | No benchmark or load test exists. | Planned | High | L | 9 | Inline heavy ops risk partial writes — moving to the queue is what makes "failures never corrupt state" true. |
Operator backup: one-command pg_dump + MinIO sync |
deploy/backup.sh + deploy/BACKUP.md now provide a scripted DB+object dump (#234). Remaining: scheduled/off-host/verified-restore tooling (row below). |
Have | Critical | M | 1–2 | Restore must re-apply privacy state faithfully (NN#3); safety net for NN#8. |
| Scheduled / cloud automated backup + restore tooling | Cron-driven, off-host, verified-restore workflow. | Partial | High | L | 9 | Builds on the one-command slice above. |
| ARM64 build matrix | CI builds linux/amd64 only; many self-hosters run ARM SBCs. |
Partial | High | S | 1 | Quick win. Add arm64 + QEMU to buildx (NN#7 container-native). |
| Structured JSON logs + Prometheus metrics | Plain-text stdlib logging; no /metrics. |
Partial | Med | M | 9 | Logs/metrics reference UUIDs, never names/PII (NN#3/#4). |
| pgvector enablement | Image has pgvector; app never creates the extension or adds embedding columns (docs claim otherwise). | Partial | Med | M | 7 | See §2.3 — embedding provider open question; candidates via privacy engine. |
| Database check-and-repair | No orphan/dangling-edge/cycle scanner (recent "harden tree render" commit shows bad graphs occur). | Missing | Med | M | 9 | Tenant-scoped + audited; auto-fix via ChangeProposal (NN#1). |
| Pluggable DB backend, billions-scale shared tree, weekly record releases | Different product models. | Missing | Low | XL | — | Out of scope — Postgres-only is consistent with the invariants; global shared tree conflicts with NN#2/#3/#4. |
2.15 Property / land chain-of-title — headline differentiator
The entire "land" half is planned/missing but fully specified. This is where Provenance has no real competitor.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| Property/parcel first-class entity | No model/endpoint/service/migration. Foundation for the whole category. | Planned | High | L | 3 | Full CRUD in API+UI (NN#8); reads added carefully to the single privacy engine (NN#2). |
| Typed OwnershipEvents | grant/patent, purchase, sale, inheritance, gift, tax sale, foreclosure, eminent domain — with grantor/grantee Persons + Citation. | Planned | High | L | 3 | Each event carries a Citation (NN#5); grantor/grantee living-person links redacted (NN#3). |
| Chain-of-title timeline + gap flagging | Ordered OwnershipEvents first-grant→present, breaks flagged. | Planned | High | M | 3 | The genuinely differentiating analytical piece (PRD US-032). |
| Bidirectional owner↔person, parcel↔place | "Every property a person held" / "every parcel at a place." | Planned | High | M | 3 | Reverse traversals filtered through privacy engine (NN#2). |
| Citations on OwnershipEvents | Add ownership_event_id to Citation (5th target). |
Partial | Critical | S | 3 | Quick win once Property lands — single FK + CHECK edit (NN#5). |
| Legal description verbatim storage | metes-and-bounds / PLSS township-range-section as-written. | Planned | Med | L | 3 | Part of the Property model; preserves the record faithfully. |
| Parcel/plat boundary geometry | Optional geometry; plain coords first. | Planned | Med | L | 3+ | PostGIS is an open question (ARCH §14) — surface dependency. |
| PLSS / metes-and-bounds parsing → geometry | Automated survey parsing. | Planned | Med | XL | later | Hard; gated on PostGIS. |
| BLM/GLO federal land-patent connector | Marquee US land source. | Planned | High | L | 8 | Permitted source (NN#6); patents surface as ChangeProposals (NN#1); read-only + rate-limited. |
| USGS map + public county-deed connectors | Per-jurisdiction grantor/grantee indexes. | Planned | Med | L | 8 | Each connector verifies a legally open source (NN#6). |
| Co-ownership roles / tenure types | joint tenants, TIC, life estate, heirs. | Planned | Low | M | later | Multiple parties likely free with OwnershipEvent; role typing is a refinement. |
| Tax/assessment rolls, UK Tithe, Lloyd George Domesday | Valuation + non-US collections. | Missing | Low | M–L | — | US-focused v1; international formats out of scope (model is country-agnostic). |
2.16 AI assistant — defining differentiator
The spine has now landed: the ChangeProposal model/schema/service, its migration, the GET/POST API, and a review UI all ship, and the LLMProvider/EmbeddingProvider abstraction with null/Anthropic/OpenAI-compat (OpenAI/xAI/Ollama) providers + registry is in place. The audit substrate (actor_type=assistant, before/after JSONB) is the right foundation; the remaining work is wiring the assistant's tools to emit proposals and building the chatbot/RAG surface on top.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| ChangeProposal (propose-then-confirm) | The defining invariant. Model/schema/service (models/change_proposal.py, services/change_proposal_service.py), migration a1b2c3d4e5f6, GET/POST api/v1/proposals.py, and a /trees/[id]/proposals review UI all ship. Remaining: wire assistant tools to emit proposals. |
Have | Critical | L | 4 | IS NN#1. Enforce structurally: assistant tools return proposals; only user action applies one; application flows through the normal service layer (privacy + audit). ChangeProposal itself needs full CRUD (NN#8). |
| Pluggable LLM + embedding provider | LLMProvider/EmbeddingProvider ABCs (integrations/models/base.py) with null, Anthropic, and OpenAI-compat (OpenAI/xAI/Ollama) implementations + registry. |
Have | Critical | M | 4 | Twelve-factor, no hard-coded keys/endpoints (NN#7); the Ollama/self-hosted path is what makes the privacy-first promise real. |
| Per-tree AI model policy | Owner-only per-tree model selection (Tree.ai_member_provider/ai_recommender_provider, GET/PATCH /trees/{id}/ai, /trees/[id]/ai UI) (#238). |
Have | Med | S | 4 | Owner-only; selects which configured provider a tree uses — keys stay in env, twelve-factor (NN#7). |
| AI research-assistant chatbot (RAG over tree) | Marquee feature; needs ModelProvider + connector + retrieval through privacy engine. | Planned | High | XL | 4 | NN#1 propose-only, NN#2 privacy retrieval, NN#3 redaction. |
| Conversational / connector record search | Search legal sources via the assistant. | Planned | High | L | 4 | Legal sources (NN#6); findings = Source + Citation (NN#5). |
| Fact extraction from documents | Extracted facts map cleanly to ChangeProposal review. | Missing | Med | M | 4 | Canonical NN#1 use case; each fact carries a Citation (NN#5). |
| OCR/HTR transcription + document translation | Worker job via ModelProvider. | Missing | Med | L | 4+ | Output → Source/Citation (NN#5); via ModelProvider (NN#7); auto-extraction emits ChangeProposal (NN#1). |
| Next-step research guidance | Gap analysis → suggested next record. | Planned | Med | M | 4 | Reads via privacy engine; advisory unless it queues fetches. |
| AI biography / audio narration | Read-only generation grounded in tree data. | Missing | Low | M–L | later | Must not leak living-person PII (NN#3); via ModelProvider (NN#7); stored biographies = full CRUD (NN#8). |
2.17 Localization & accessibility
A documented day-one commitment ("UI strings externalized from day one") that is currently unmet — every label is a hardcoded literal. Correct the PRD claim or close the gap.
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|---|---|---|---|---|---|---|
| UI string externalization | No i18n lib, no message catalogs; all copy hardcoded in TSX. Gating prerequisite; cheapest to do now while the surface is small. | Missing | High | L | 1–2 | PRD §6 promises this "from day one" — docs-vs-code gap; edit the doc now. |
| Multi-language UI (40–60+ langs) | Translation pipeline after externalization (frontend + backend-generated messages). | Missing | High | XL | later | Table stakes across all competitors. |
| Accessibility / WCAG 2.2 AA | Some ARIA/focus styling; no CI a11y audit, no skip-links, SVG tree viz not keyboard/screen-reader navigable. | Partial | High | L | 2/9 | Stated PRD §6 target; add axe/pa11y in CI; accessible alternate to the chart. |
| Unicode-correct non-Latin names | Stores fine (UTF-8); no NFC normalization on write, no locale-aware collation, no romanized search. | Partial | High | M | 2 | Apply unicodedata.normalize('NFC') on input; add COLLATE; supports faithful-record goal. |
| Structured/compound surname components | Surname is a single field; no support for Spanish/Portuguese paternal+maternal, Arabic nasab, particles/prefixes. | Missing | Med | M | 2 | New Name sub-fields ship with full CRUD (NN#8); preserves the name as recorded. |
| Non-Gregorian calendar dates | calendar column is a placeholder; GEDCOM calendar escapes never parsed/populated. |
Partial | Med | L | 2 | Preserve original calendar as recorded (sources-first). |
| Language tags / romanized variants per name | No language_tag/script/romanized fields; GEDCOM ROMN/LANG unhandled. | Missing | Med | M | 2 | New Name sub-fields ship with full CRUD (NN#8). |
| RTL support | lang="en" hardcoded, no dir, physical CSS properties throughout. |
Missing | Med | M | later | Convert to logical CSS properties; cheaper once i18n exists. |
| Selectable themes | Light/dark/system works; brand palette intentionally single. | Partial | Med | M | later | Confirm whether additional themes are a deliberate non-goal (brand guide constrains palette). |
| Multi-language report/diagram output | Depends on i18n + reports, neither shipped. | Missing | Low | L | later | — |
3. Quick wins (high importance / low effort)
Ordered by leverage. All are S-effort or a thin slice of a larger item, and most close a stated invariant gap.
- Fix
site_membersvisibility tier (Privacy, Critical/S) — done:can_view_treenow handlessite_members(privacy.py:56), giving any authenticated account a read view while refusing anonymous. - Email-verification enforcement gate (Privacy/Auth, High/S) — done (#53): the read-side
email_verified_atcheck now ships behindREQUIRE_EMAIL_VERIFICATION, so a freshly registered, unverified user doesn't get a live authenticated session. The registration-mode env switch (open/approve/closed) is the larger follow-on (§2.10, M-effort — not a quick win). - Citation confidence selector in the cite form (Sources, High/S) — confidence is modeled and API-writable but unreachable in the UI; every UI citation is currently NULL. Honors NN#8 and the evidence-quality thesis.
- Source edit UI + expose all 8 fields (Sources, High/S) — update API exists but there is no edit form and create exposes ~3 fields; a create-but-not-edit entity violates NN#8.
- Make
LIVING_RECENCY_YEARSenv-configurable (Privacy, High/S) — hardcoded 100 atprivacy.py:23; twelve-factor (NN#7). - Add
ownership_event_idto Citation (Property/Sources, Critical/S) — single FK + CHECK-constraint edit the moment Property lands; the spine is already built (NN#5). - GEDCOM encoding detection (Standards, High/S) — detect/honor the CHAR tag; reject or transcode ANSEL/UTF-16 rather than silently mangling with
errors='replace'. - GEDCOM HEAD completeness (Standards, Med/S) — emit the required
2 FORM LINEAGE-LINKED(under GEDC) and1 SUBMatgedcom.py:740. Pure conformance. - ARM64 CI build matrix (Perf/Scale, High/S) — add
linux/arm64+ QEMU to buildx for both images; many self-hosters run ARM SBCs. GET /{tree}/citations/{id}endpoint (Sources, Med/S) — API symmetry (NN#8).- Transcription/abstract fields on Source (Sources, Med/S) — add
transcription_text+abstract_text, distinct fromcitation_text; core to evidence analysis. - Sort the merged person timeline (Research workflow, Med/S) —
shownEvents.sort()ondate_start; currently appended unsorted. - Doc corrections (docs-vs-code) (Meta, trivial/S) — edit CLAUDE.md / ARCHITECTURE so the pgvector "used" claim and the i18n "from day one" claim match reality. The repo convention requires docs to travel with code.
Shipped this cycle: the media privacy leak (§2.4) and the child-resource redaction gap (§2.10) are fully closed — person/event/media/name/relationship (#46) and citation/source endpoints all apply
person_visibilityfor non-members. No residual living-person leak on the read surface.
4. Strategic differentiators
Where to invest to make Provenance distinct rather than a webtrees clone. Each leans on a non-negotiable as a feature, not a constraint.
1. Property chain-of-title (the "land" half). No surveyed competitor models ownership as a typed, cited event chain tying parties across time, with gap-flagging and bidirectional owner↔person / parcel↔place traversal, fed by legal public sources (BLM/GLO patents, USGS, public county deeds). This is the single clearest "no one else does this" capability. Sequence: Property + OwnershipEvent + Citation-target (Phase 3) → chain-of-title view → BLM/GLO connector (Phase 8). The Citation extension is a quick win; the entity is the prerequisite for everything else in the category.
2. The ChangeProposal AI model. "The assistant never writes autonomously" is a trust differentiator in a market where users fear AI corrupting their research. The structural spine has landed — the ChangeProposal model/API/review UI and the pluggable LLMProvider/EmbeddingProvider abstraction both ship — so the remaining work is wiring the assistant's tools to emit proposals (never mutating directly). Assistant tools return proposals; only an explicit human action applies one; application flows through the normal service layer so it always hits the privacy engine and audit log. The same approval queue moderates untrusted human-contributor edits (Collaboration §2.9), so design them together.
3. Anonymous, mutual-consent cross-tree hints. The privacy model already redacts living people for anonymous viewers, so a hint system that reveals nothing identifying until both sides opt in is achievable by construction — and is a categorically more trustworthy version of MyHeritage Smart Matches / Ancestry hints. Requires the matching engine (pgvector enablement + candidate generation, Phase 7), the notification/event-dispatch substrate (§2.9), and the messaging channel that opens only post-consent.
4. True self-hosting + data ownership. Full account export/import, soft-delete recovery (with owner-confirmed on-demand purge to delete a trashed tree immediately rather than waiting out the 30-day window), GEDCOM round-trip, env-driven everything, a one-command operator backup, and (to-build) scheduled off-host backup + ARM support make Provenance the genealogy app you actually own. The two correctness items that gated the promise have landed: GEDCOM export now preserves citations (the Provenance→Provenance round-trip keeps the sources graph), and operator backup moved from "documented procedure" to a one-command dump (deploy/backup.sh). What remains is scheduled/verified-restore tooling and ARM builds. The Ollama/self-hosted ModelProvider path means even the AI assistant runs without tree data leaving the deployment — a promise no commercial competitor can make.
5. Sources-first as a felt experience. The two-tier model is built, and citations now survive GEDCOM export (#232); the remaining differentiator is making sourcing visible and low-friction: a guided Evidence-Explained citation builder, transcription/abstract fields, source-driven data entry (transcribe a document into the tree), and per-fact confidence surfaced in the UI. These turn "every fact links to where it came from" from an architecture note into the product's personality.