docs: bring all documentation current with shipped work

A multi-agent audit of every doc against the code surfaced ~50 stale/missing
items (the roadmap/status docs and the backlog had fallen behind the code).
This catches them up:

- CLAUDE.md: phase status was ~3 phases stale ("Phase 1 is next" while Phase 1 +
  chunks of 2 & 4 shipped). Rewrote the status list; added a model-provider
  tech-stack entry; updated repo-layout (integrations objectstore/models,
  deploy backup.sh/dev compose).
- ARCHITECTURE.md: §6 privacy engine described 3 visibility levels — corrected to
  the shipped 4 (adds site_members); documented per-tree AI policy on Tree,
  LLMProvider/EmbeddingProvider split + registry, ChangeProposal origin/status/
  operations, verified-email session gate, instance-owner role, schema-drift
  guard, and the env_file config model.
- PRD.md: 4-level visibility in US-040/§5.5, instance-owner role (§5.1/§5.11),
  per-tree AI policy (§5.8), §8 sequencing annotated with shipped status, header
  date/status bumped.
- README.md: 4-level privacy; softened "Full GEDCOM 7" to the 5.5.1/7 common
  subset; noted backups + instance-owner admin; moved property/land to an
  explicit "where it's headed" (no property models exist yet).
- BACKLOG.md: flipped ~15 shipped-but-open rows to Have (ChangeProposal, provider
  abstraction, GEDCOM citation export, membership management, operator backup,
  email-verification gate, per-tree AI policy, instance owner, the whole
  visibility/public-viewing/child-resource-redaction cluster #41-#51/#46), and
  reconciled the executive summary, "current defects" list, quick wins, and
  differentiators. Left genuinely-open items (citation/source redaction, sitemap,
  per-tree noindex, scoped-token API) accurately open.
- .env.example: dropped "SMTP wired in a later phase"; documented the worker
  purge knobs, S3_PRESIGN_TTL, COOKIE_NAME; removed a stray duplicate line.
- design/: tree-visibility.md and change-proposal.md marked Shipped; corrected
  the redaction approach (reuses member schemas, not a separate PublicPersonRead)
  and the apply() rollback claim (v1 is not cross-op transactional), and marked
  rate-limiting/sitemap/noindex as deferred.

No code changes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Justin Paul <justin@jpaul.me>
This commit is contained in:
2026-06-10 21:05:29 -04:00
parent 0388b9b99f
commit 447daf7fa8
8 changed files with 135 additions and 96 deletions
+15 -9
View File
@@ -1,8 +1,8 @@
# Provenance — Product Requirements Document
**Status:** Draft v0.1
**Status:** Draft v0.1 — now describes a partially-implemented system: Phase 0 complete, Phase 1 done, with early slices of later phases shipped.
**Owner:** Justin Paul
**Last updated:** 2026-06-06
**Last updated:** 2026-06-10
---
@@ -94,7 +94,7 @@ Acceptance criteria (AC) are written to be testable.
- **US-033** I view every property a person held, and every parcel ever recorded at a place. *AC:* both reverse lookups return correct sets.
### Privacy & sharing
- **US-040** I set a tree to public, unlisted, or private. *AC:* visibility enforced for anonymous and non-owner users.
- **US-040** I set a tree to one of four visibility levels — private, unlisted, site_members, or public. *AC:* visibility enforced for anonymous and non-owner users; at the **site_members** level the tree is visible to any authenticated instance user (signed in but not a member of the tree) and hidden from anonymous visitors.
- **US-041** I mark any individual private even within a public tree. *AC:* that person's details hidden from non-owners regardless of tree setting.
- **US-042** Living people are hidden from non-owners by default. *AC:* a person with no death fact and a plausibly-living birth date shows only minimal/no PII to non-owners; owner can override per person.
- **US-043** I add a co-owner to a tree. *AC:* co-owner can edit per role; action attributed to them in the audit log.
@@ -132,6 +132,7 @@ Acceptance criteria (AC) are written to be testable.
### 5.1 Identity & access
- Pluggable authentication: local password (with email verification and reset), social sign-in (Google, Apple, Facebook), and generic **OIDC** (validated against Authentik; should work with Keycloak, Authentik, Auth0, etc.). Operators enable any subset.
- Roles per tree: **owner**, **co-owner/editor**, **viewer**. Public/unlisted trees also have an implicit anonymous viewer.
- **Instance owner/operator:** an env-declared operator role (via `OWNER_EMAIL`, requiring a verified email), distinct from the per-tree roles. It is an operations/config role only and is **not** a privacy bypass — it grants no access to others' tree data or PII.
- The AI assistant acts as a distinct, scoped principal bound to the user it is helping — it can never exceed that user's rights, and its actions are separately attributable.
### 5.2 Data model (core entities)
@@ -155,6 +156,7 @@ Acceptance criteria (AC) are written to be testable.
### 5.5 Privacy engine
- Effective visibility = function(tree visibility, person override, living status, viewer role).
- Tree visibility has four levels: **private** (members only; default), **unlisted** (anyone with the link, not listed/indexed), **site_members** (any authenticated instance user), and **public** (anonymous + listed/indexable).
- Living-person rule: absent a death fact and within a configurable recency window (default ~100 years from birth, or unknown birth treated as possibly-living), non-owners see minimal or no PII.
- Public/link views must render through the same privacy engine — no bypass path.
@@ -168,6 +170,7 @@ Acceptance criteria (AC) are written to be testable.
### 5.8 AI research assistant
- Provider-agnostic abstraction over hosted models (Anthropic, OpenAI, xAI) and self-hosted/local models (e.g., an OpenAI-compatible endpoint or Ollama).
- Operators register one or more model providers (env / registry); a tree owner then selects the active provider(s) for that tree via an owner-only AI settings surface.
- Tool-mediated access to the same CRUD operations a user has, scoped to that user, via a server with explicitly scoped capabilities (an MCP-style tool boundary).
- **Propose-then-confirm is mandatory.** The assistant drafts changes as diffs; nothing persists without explicit user approval.
- Source connectors are a **plugin framework**; the project ships only legal sources (e.g., FamilySearch API, Find A Grave, WikiTree, BLM/GLO land patents, USGS maps, public-domain newspapers, public county records). Operator-supplied scrapers can be added later.
@@ -181,6 +184,7 @@ Acceptance criteria (AC) are written to be testable.
### 5.11 Administration & operations
- All integration points (auth, SMTP, object storage, database, model providers, scrapers) are environment/config-driven.
- Health endpoints; structured logs; a documented backup/restore procedure; safe upgrade via image pull + migration.
- Owner-only operator surface: instance status and configuration (`GET /api/v1/admin/instance` and the `/admin` UI), scoped to the instance owner and exposing no tree contents or PII.
## 6. Non-functional requirements
@@ -206,17 +210,19 @@ Acceptance criteria (AC) are written to be testable.
Provenance ships continuously and is stood up in a live lab as it goes; there is no hard MVP/v2 line, but features land in dependency order so each tranche is usable.
- **Phase 0 — Foundation:** backend + DB schema; local auth + email verify; frontend scaffold; container images; CI/CD (Gitea Actions → Gitea registry → server pull); one-command compose deploy.
- **Phase 1 — Core tree:** people, relationships, events; sources & citations; media uploads; soft delete + recovery; tree-level privacy.
- **Phase 2 — Standards & polish:** GEDCOM 7 import/export; search with fuzzy names; living-person protection; person-level privacy override; onboarding + persona selector.
- **Phase 0 — Foundation:** *(shipped)* backend + DB schema; local auth + email verify; frontend scaffold; container images; CI/CD (Gitea Actions → Gitea registry → server pull); one-command compose deploy.
- **Phase 1 — Core tree:** *(shipped)* people, relationships, events; sources & citations; media uploads; soft delete + recovery; tree-level privacy (now four levels: private/unlisted/site_members/public).
- **Phase 2 — Standards & polish:** *(partly shipped — GEDCOM 7 import/export #232; fuzzy/trigram search)* GEDCOM 7 import/export; search with fuzzy names; living-person protection; person-level privacy override; onboarding + persona selector.
- **Phase 3 — Property:** property entity; ownership events; chain-of-title view; property-aware sources.
- **Phase 4 — AI assistant:** provider abstraction (hosted + local); scraper plugin framework; first connectors (FamilySearch, Find A Grave); propose-diff approval flow; assistant actions in audit log.
- **Phase 5 — Federated auth:** OIDC (Authentik), then Google/Apple/Facebook sign-in.
- **Phase 6 — Collaboration:** tree co-owners; audit-log UI; direct messaging; notifications.
- **Phase 4 — AI assistant:** *(partly shipped early — provider abstraction + multi-provider registry #235/#237; ChangeProposal propose-then-confirm #236)* provider abstraction (hosted + local); scraper plugin framework; first connectors (FamilySearch, Find A Grave); propose-diff approval flow; assistant actions in audit log.
- **Phase 5 — Federated auth:** *(not shipped — only the `AuthProvider` ABC exists)* OIDC (Authentik), then Google/Apple/Facebook sign-in.
- **Phase 6 — Collaboration:** *(tree membership #233 landed early)* tree co-owners; audit-log UI; direct messaging; notifications.
- **Phase 7 — Cross-tree hints:** async matching engine (embeddings-assisted); anonymous match notifications; mutual-consent reveal.
- **Phase 8 — Land sources:** BLM/GLO patents; USGS map integration; additional county-deed connectors (merge existing scrapers).
- **Phase 9 — Hardening & dogfooding** toward a possible hosted offering.
**Shipped ahead of sequence (operations & platform):** instance-owner/operator role (#240); operator backup tooling (#234); a schema-drift guard (#239). These landed early because the live lab deployment needed them. Note that despite their later issue numbers, **Phase 5 federated auth/OIDC is not yet shipped** — only the `AuthProvider` ABC is in place.
Rationale: enabling work (schema, auth, deploy, sources) precedes everything; GEDCOM lands before the assistant so AI writes target a stable model; property follows a well-tested people graph; hints come late because they require multiple populated trees.
## 9. Technical direction (summary)