docs: bring all documentation current with shipped work #244
@@ -30,6 +30,7 @@ These are product invariants, not preferences. Do not violate them, and flag any
|
||||
- **Object storage:** S3-compatible (MinIO for self-host).
|
||||
- **Edge:** Caddy reverse proxy; optional Cloudflare Tunnel (preferred ingress, never required).
|
||||
- **Email:** operator-configured SMTP.
|
||||
- **Model providers:** pluggable `LLMProvider` + `EmbeddingProvider` abstraction (ABCs) with Null / Anthropic / OpenAI-compatible (OpenAI, xAI, Ollama) implementations; an operator configures one or more via env and they're selectable by name through a registry (per-tree AI policy + `default_llm_provider`/`default_embedding_provider`).
|
||||
- **CI/CD:** Gitea Actions build per-component images. **Push** to the LAN registry `192.168.0.2:1234` (plain HTTP, bypasses Cloudflare's body limit); **pull** via the public `git.jpaul.io` FQDN. Servers pull to deploy — no host build. Mirrors the drawbar setup; see [[gitea-lan-push-fqdn-pull]].
|
||||
|
||||
Pick libraries consistent with this stack. If you introduce a significant dependency or a new service, note it in ARCHITECTURE.md in the same change.
|
||||
@@ -39,17 +40,24 @@ Pick libraries consistent with this stack. If you introduce a significant depend
|
||||
```
|
||||
/ # docs and project meta (this file, README, LICENSE, COC, CONTRIBUTING)
|
||||
/docs # PRD.md, ARCHITECTURE.md
|
||||
/backend # FastAPI service (uv-managed). app/{api/v1, services (+ privacy engine), repositories, models, schemas, integrations (auth/mailer), core}; migrations/ = Alembic
|
||||
/deploy # docker-compose.yml, Caddyfile, .env.example — the self-host stack
|
||||
/backend # FastAPI service (uv-managed). app/{api/v1, services (+ privacy engine), repositories, models, schemas, integrations (auth, mailer, objectstore, models = pluggable LLM/embedding providers), core}; migrations/ = Alembic
|
||||
/deploy # docker-compose.yml (+ docker-compose.dev.yml), Caddyfile, .env.example, backup.sh + BACKUP.md (one-command pg_dump + MinIO backup) — the self-host stack
|
||||
/.gitea/workflows # Gitea Actions CI (build images → Gitea registry)
|
||||
/frontend # Next.js (App Router, TS, Tailwind, shadcn-style UI). app/ pages, lib/api generated OpenAPI client, components/ui
|
||||
```
|
||||
|
||||
Phase 0 is landing **deploy-first**: the compose stack (Postgres + MinIO + Caddy + a minimal FastAPI backend exposing `/health` and `/health/ready`) and CI come before the real data model and the frontend. Backend dependencies are managed with **uv**; migrations use **Alembic**. The core data model (ARCHITECTURE §5), **local auth** (Argon2 passwords, backend-issued sessions, email verify/reset behind the `AuthProvider` interface; API auth via Bearer header or HttpOnly cookie), and the **Next.js frontend scaffold** (Tailwind + shadcn-style UI, generated OpenAPI client, auth + tree/person views) have all landed — **Phase 0 is complete and running on the live deployment.** Phase 1 (core tree features — media, soft-delete recovery, richer CRUD) is next; OIDC/social auth is Phase 5. Keep this section current as the tree grows.
|
||||
Phase 0 landed **deploy-first**: the compose stack (Postgres + MinIO + Caddy + FastAPI backend) and CI before the data model and frontend. Backend deps use **uv**; migrations use **Alembic**. Status (keep current as the tree grows):
|
||||
|
||||
- **Phase 0 — Foundation: complete** and running live (core data model, local auth behind `AuthProvider`, Next.js frontend).
|
||||
- **Phase 1 — Core tree: complete.** Media (upload/serve), soft-delete + recovery UI, full CRUD across entities, and the 4-level tree visibility/privacy model (#41–#51).
|
||||
- **Phase 2 — substantially landed.** GEDCOM import (preview→apply, duplicate-aware) and export (citation-preserving, #232); fuzzy name search (pg_trgm) + the public `/explore` directory. Living-person protection is still hardening.
|
||||
- **Phase 4 — AI assistant foundations landed.** Pluggable `LLMProvider`/`EmbeddingProvider` abstraction + multi-provider registry (Anthropic/OpenAI/xAI/Ollama, #235/#237), the **ChangeProposal** propose-then-confirm flow (#236), and per-tree AI model policy (#238). The assistant's *tool surface that emits proposals* is the remaining piece.
|
||||
- Also shipped: tree membership management (#233), an **instance owner/operator** role (`OWNER_EMAIL`, #240), a schema-drift readiness guard (#239), and a one-command operator backup (#234).
|
||||
- **Not built yet:** Phase 3 (Property — parcels/deeds/chain-of-title; no property models exist), Phase 5 (OIDC/social auth — only the `AuthProvider` ABC exists), and cross-tree hints (last; needs multiple populated trees + the embedding provider).
|
||||
|
||||
## Where to start
|
||||
|
||||
The roadmap is phased in PRD §8. Build in dependency order. **Phase 0 — Foundation is complete** and running on the live deployment; **Phase 1 (core tree features) is the current target.** For reference, Phase 0 covered:
|
||||
The roadmap is phased in PRD §8. Build in dependency order. **Phases 0 and 1 are complete**, Phase 2 is substantially done, and Phase 4's AI foundations have shipped (see the status list above). The biggest unbuilt areas are **Phase 3 (Property)** and **Phase 5 (OIDC/social auth)** — likely current targets. For reference, Phase 0 covered:
|
||||
|
||||
1. Backend skeleton (FastAPI, async, layered) + Postgres + migrations
|
||||
2. Core data model from ARCHITECTURE §5 — start with User, Tree, TreeMembership, Person, Name, Relationship, Event, Place, Source, Citation, AuditEntry, soft-delete support
|
||||
@@ -58,7 +66,7 @@ The roadmap is phased in PRD §8. Build in dependency order. **Phase 0 — Found
|
||||
5. The deploy stack: `compose` for app + postgres + objectstore, Caddy config, env-driven settings
|
||||
6. CI/CD: Gitea Actions building images to the registry
|
||||
|
||||
Don't get ahead of the phases. GEDCOM lands before the assistant (so AI writes target a stable model); property follows a tested people graph; hints come last because they need multiple populated trees. If you think the order is wrong, raise it rather than reordering silently.
|
||||
Don't get ahead of the phases. GEDCOM and the assistant's propose-diff foundation (provider abstraction + ChangeProposal approval flow) have shipped; the remaining dependency-ordered work is **Property** (Phase 3, on top of the tested people graph), then richer collaboration/audit UI, with **cross-tree hints last** (they need multiple populated trees and the embedding provider). If you think the order is wrong, raise it rather than reordering silently.
|
||||
|
||||
## Conventions
|
||||
|
||||
|
||||
@@ -19,13 +19,14 @@ Every fact links to its source. Every claim can be traced. Nothing is just asser
|
||||
## What it does
|
||||
|
||||
- **Build a tree that holds up.** People, relationships, events, and places — with every fact linked to the document, photo, or record it came from.
|
||||
- **Trace the land, not just the family.** Properties are first-class. Record ownership events (grants, deeds, inheritances, sales), reconstruct chain-of-title, and tie parcels to the people who held them.
|
||||
- **Bring your own archive.** Scans, PDFs, photos, audio recordings — first-class citizens, not afterthoughts.
|
||||
- **A research assistant that proposes, never overwrites.** The built-in AI assistant searches legal sources, lays out what it found, and waits for your approval before anything touches your data. You can point it at the major model providers or a self-hosted model — your keys, your choice.
|
||||
- **Standards over silos.** Full GEDCOM 7 import and export. Migrate in, migrate out.
|
||||
- **Privacy you control.** Public, unlisted, or private per tree; any individual can be hidden; living people are protected by default.
|
||||
- **Standards over silos.** GEDCOM import and export (5.5.1 / 7 common subset) — duplicate-aware import, citation-preserving export. Migrate in, migrate out.
|
||||
- **Privacy you control.** Public, members-only (any signed-in user on your instance), unlisted, or private per tree; any individual can be hidden; living people are protected by default.
|
||||
- **Find your people.** When another user's tree overlaps with yours, Provenance can surface an anonymous "possible match" — and only connects you if you both say yes.
|
||||
- **Run it your way.** Container-native. Self-host behind Caddy and, if you like, a Cloudflare Tunnel. Multi-tenant, so your whole extended family — or a whole community of strangers — can coexist on one deployment.
|
||||
- **Run it your way.** Container-native. Self-host behind Caddy and, if you like, a Cloudflare Tunnel. Multi-tenant, so your whole extended family — or a whole community of strangers — can coexist on one deployment. One-command backups (Postgres + object storage) and an instance-owner admin role keep operations in your hands.
|
||||
|
||||
**Where it's headed — trace the land, not just the family.** The same source-backed treatment for *property*: parcels, deeds, and ownership events, reconstructing chain-of-title and tying land to the people who held it. The people side ships today; the land half is on the roadmap, not yet built — but it's why Provenance exists, not an afterthought.
|
||||
|
||||
## Who it's for
|
||||
|
||||
|
||||
+12
-2
@@ -35,6 +35,8 @@ S3_BUCKET=provenance
|
||||
S3_ACCESS_KEY=provenance
|
||||
S3_SECRET_KEY=change-me-too
|
||||
S3_REGION=us-east-1
|
||||
# Presigned media URL lifetime in seconds.
|
||||
S3_PRESIGN_TTL=3600
|
||||
|
||||
# --- Edge (Caddy) ---
|
||||
# Local: ':80' (http://localhost). Production: 'provenance.example.com' for auto-HTTPS.
|
||||
@@ -52,6 +54,8 @@ COMPOSE_PROFILES=
|
||||
# --- Auth / sessions ---
|
||||
SESSION_TTL_DAYS=30
|
||||
TOKEN_TTL_HOURS=24
|
||||
# Name of the session cookie.
|
||||
COOKIE_NAME=provenance_session
|
||||
# Set false for local http; true (default) behind TLS.
|
||||
COOKIE_SECURE=false
|
||||
# Base URL used to build links in outbound email.
|
||||
@@ -62,13 +66,20 @@ MAILER=console
|
||||
# until SMTP works and existing accounts are verified, or you will lock users out.
|
||||
REQUIRE_EMAIL_VERIFICATION=false
|
||||
|
||||
# --- Email (SMTP) — wired in a later phase ---
|
||||
# --- Email (SMTP) ---
|
||||
# Active when MAILER=smtp (above) and SMTP_HOST is set.
|
||||
SMTP_HOST=
|
||||
SMTP_PORT=587
|
||||
SMTP_USERNAME=
|
||||
SMTP_PASSWORD=
|
||||
SMTP_FROM=
|
||||
|
||||
# --- Worker (soft-delete purge) ---
|
||||
# How often the purge job runs, and how old a soft-deleted row must be before it
|
||||
# is permanently removed (and its media objects cleaned up).
|
||||
PURGE_INTERVAL_SECONDS=3600
|
||||
PURGE_AFTER_DAYS=30
|
||||
|
||||
# --- Model providers (AI assistant + embeddings) -----------------------------
|
||||
# Configure as many as you like — each turns on when its key is set. The
|
||||
# default_* vars pick which one is used by default; the app can also select any
|
||||
@@ -99,4 +110,3 @@ OLLAMA_ENABLED=false
|
||||
OLLAMA_BASE_URL=http://localhost:11434/v1
|
||||
OLLAMA_MODEL=llama3.1
|
||||
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
|
||||
# XAI_API_KEY=
|
||||
|
||||
+15
-13
@@ -69,7 +69,7 @@ Layered, dependency pointing inward:
|
||||
- **Service layer** — all domain logic and the only place writes happen. Enforces invariants (e.g., "a write must carry an actor for the audit log"). The privacy engine is invoked here on every read.
|
||||
- **Repository layer** — data access over SQLAlchemy; no business rules.
|
||||
- **Domain models** — the entities in §5.
|
||||
- **Integrations** — adapters behind interfaces: `AuthProvider`, `ObjectStore`, `Mailer`, `ModelProvider`, `SourceConnector`, `Queue`. Swapping an implementation is a config change, not a code change.
|
||||
- **Integrations** — adapters behind interfaces: `AuthProvider`, `ObjectStore`, `Mailer`, `LLMProvider` / `EmbeddingProvider` (two separate model abstractions), `SourceConnector`, `Queue`. Swapping an implementation is a config change, not a code change.
|
||||
|
||||
Async throughout (FastAPI + async SQLAlchemy). Anything that can be slow or can fail externally (model calls, scraping, large imports) goes to the worker, never inline in a request.
|
||||
|
||||
@@ -87,7 +87,7 @@ Core entities and the important relationships. (Illustrative, not final DDL.)
|
||||
|
||||
### Tenancy & identity
|
||||
- **User** — a person with login. Auth method(s) are attached but identity is internal, so one user can link multiple providers.
|
||||
- **Tree** — the top-level tenant boundary for genealogical data. Owned by a User; may have additional members.
|
||||
- **Tree** — the top-level tenant boundary for genealogical data. Owned by a User; may have additional members. Carries a per-tree **AI model policy** (owner-configured): `ai_member_provider` and `ai_recommender_provider` name configured providers from the model-provider registry (null = no model for that role); the owner may use any configured provider, while these cap what members and the recommender may use. Set via the owner-only `GET`/`PATCH /trees/{id}/ai`.
|
||||
- **TreeMembership** — (User, Tree, role) where role ∈ {owner, editor, viewer}. The basis for authorization *within a tree*.
|
||||
- **Instance owner / operator** — orthogonal to tree roles. The account(s) whose email is named in the `OWNER_EMAIL` env var **and whose email is verified** are the instance's operator(s), with access to the owner-only `/api/v1/admin` surface (operational status, instance-wide config). Derived from the env at request time — no DB column, no migration, can't drift, survives DB resets. The verified-email requirement is deliberate: registration is open, so without it whoever registers the owner address first would seize the role — verification ties ownership to proven control of the inbox. Crucially this is **not** a privacy bypass: an instance owner gets operational/config rights, **not** read access to other users' private trees or living-person PII — those still resolve only through the privacy engine. (`is_instance_owner` in `api/deps.py`.)
|
||||
|
||||
@@ -109,7 +109,7 @@ Core entities and the important relationships. (Illustrative, not final DDL.)
|
||||
### Cross-cutting
|
||||
- **AuditEntry** — append-only: actor (User *or* the assistant principal acting for a User), action, entity, before/after snapshot, timestamp. Immutable.
|
||||
- **SoftDelete** — entities carry `deleted_at`; a scheduled worker purges rows older than 30 days. Recovery = clearing `deleted_at` within the window.
|
||||
- **ChangeProposal** — a pending set of writes generated by the assistant (or potentially a collaborator suggestion later): a structured diff the user approves, edits, or rejects. Approved proposals are applied through the normal service layer (so they hit the privacy engine and the audit log like any other write).
|
||||
- **ChangeProposal** — a pending set of writes: records an `origin` (`assistant` | `contributor` — collaborator suggestions are encoded today, not just a future idea), a `status` (pending/applied/rejected), a structured `operations` diff (JSONB list of `{op, entity_type, entity_id?, payload}`), a summary/rationale, and review/apply-error metadata. The user approves, edits, or rejects; approved proposals are applied through the normal service layer (so they hit the privacy engine and audit log like any other write). *Note: v1 apply is not cross-op transactional — see `docs/design/change-proposal.md`.*
|
||||
|
||||
## 6. Privacy engine
|
||||
|
||||
@@ -119,11 +119,12 @@ A single function conceptually:
|
||||
visible(viewer, entity) -> { full | redacted | hidden }
|
||||
```
|
||||
|
||||
Inputs: viewer's role on the entity's Tree (including "anonymous"), the Tree's visibility (public/unlisted/private), per-Person privacy override, and living-person status.
|
||||
Inputs: viewer's role on the entity's Tree (including "anonymous"), the Tree's visibility (public / site_members / unlisted / private), per-Person privacy override, and living-person status.
|
||||
|
||||
Rules:
|
||||
- **Tree private** → only members see anything.
|
||||
- **Tree public/unlisted** → non-members get a read view, *but* every Person is run through the living-person check and per-person override first.
|
||||
- **Tree site_members** → any authenticated account on this instance gets a read view (anonymous viewers get nothing), still per-person living/override filtered.
|
||||
- **Tree unlisted / public** → non-members *including anonymous viewers* get a read view, *but* every Person is run through the living-person check and per-person override first. Unlisted is gated only by knowing the link (never listed or search-indexed); public is listed in `/explore` and indexable.
|
||||
- **Living-person rule** — a Person with no death fact, whose birth is within a configurable recency window (default ~100 years; unknown birth treated as possibly-living), is redacted (name minimized, vitals/events/media hidden) for non-owners. Owners may override per Person.
|
||||
- The engine is invoked in the **service layer**, so it covers API, server-rendered public pages, search results, and any data the assistant can read. There is intentionally no path that returns rows without passing through it.
|
||||
|
||||
@@ -131,7 +132,7 @@ Rules:
|
||||
|
||||
Three parts, deliberately separated:
|
||||
|
||||
1. **Model provider abstraction** (`ModelProvider`) — one interface over hosted models (Anthropic, OpenAI, xAI) and self-hosted/local models via an OpenAI-compatible endpoint or Ollama. Configurable per deployment; keys supplied by the operator (this deployment) or by the user (BYO-key deployments).
|
||||
1. **Model provider abstraction** — two separate interfaces, `LLMProvider` and `EmbeddingProvider` (configured independently — e.g. Anthropic has no embeddings endpoint), over hosted models (Anthropic, OpenAI, xAI) and self-hosted/local models via an OpenAI-compatible endpoint or Ollama. An operator can configure **several providers at once** through a registry (`build_llm_providers()`/`configured_llm_providers()`), each selectable by name — the basis for the per-tree AI policy and the `default_llm_provider`/`default_embedding_provider` settings. Keys supplied by the operator (this deployment) or by the user (BYO-key deployments).
|
||||
2. **Scoped tool surface** — the assistant can only act through a constrained set of tools that map to service-layer operations, **scoped to the user it is helping.** It is its own principal: it cannot exceed that user's rights, and every action is attributed to "assistant (on behalf of User X)" in the audit log. This is the MCP-style boundary referenced in the PRD — the assistant gets capabilities, not raw database access.
|
||||
3. **Source connectors** (`SourceConnector`) — a plugin framework for *reading* external data: FamilySearch API, Find A Grave, WikiTree, BLM/GLO land patents, USGS maps, public-domain newspapers, public county records. Only legally permissible sources ship with the project; operators can add their own. Connectors are read-only and rate-limited, and run in the worker.
|
||||
|
||||
@@ -149,7 +150,8 @@ Three parts, deliberately separated:
|
||||
- `AuthProvider` interface with implementations for **local** (password + email verification/reset), **OIDC** (validated against Authentik; expected to work with Keycloak, Auth0, etc.), and **social** (Google, Apple, Facebook).
|
||||
- Operators enable any subset via config. This deployment will use Authentik (`auth.jpaul.io`) plus selected social providers; a bare self-hoster can run local-only.
|
||||
- Sessions are backend-issued; the assistant principal is minted per-session and scoped to the acting user.
|
||||
- *Status:* **local auth has landed** — Argon2id password hashing, opaque backend-issued sessions (only the token hash is stored; presented as a Bearer token or HttpOnly cookie), and email verification + password reset via the `Mailer` interface (console in dev, SMTP for operators). OIDC and social providers are Phase 5. Every write records an attributable actor in the audit log.
|
||||
- *Status:* **local auth has landed** — Argon2id password hashing, opaque backend-issued sessions (only the token hash is stored; presented as a Bearer token or HttpOnly cookie), and email verification + password reset via the `Mailer` interface (console in dev, SMTP for operators). An opt-in gate (`REQUIRE_EMAIL_VERIFICATION`, default off so SMTP-less self-hosts and pre-existing accounts aren't locked out) refuses sessions for accounts without a verified email — login is denied and existing sessions stop resolving until the address is verified. OIDC and social providers are Phase 5. Every write records an attributable actor in the audit log.
|
||||
- **Instance owner / operator** (orthogonal to the per-tree roles): the account(s) whose email is in `OWNER_EMAIL` *and* is verified are the instance operator(s), with the owner-only `/api/v1/admin` surface (operational status, instance-wide config). Derived from the env at request time — no DB column. It is an operator/config role, **not** a privacy bypass: it grants no read access to other users' private trees or living-person PII. (`is_instance_owner` in `api/deps.py`.)
|
||||
|
||||
## 10. Search
|
||||
|
||||
@@ -176,20 +178,20 @@ Jobs are idempotent and retryable; an external failure degrades gracefully rathe
|
||||
- Tag scheme: `test-main` (current main), `test-sha-<long>` (rollback pins), the component version, and `latest` on `v*` tags.
|
||||
- Servers **pull** new images to deploy — no build on the host. The deploy compose references `git.jpaul.io/justin/provenance-{backend,frontend}:${IMAGE_TAG:-test-main}`; `docker-compose.dev.yml` is a local-build override.
|
||||
- **Caddy** terminates TLS and reverse-proxies frontend + backend. **Cloudflare Tunnel** is the preferred ingress (no open inbound ports) but is never required; a plain Caddy-on-a-public-host deployment is equally supported.
|
||||
- **Configuration** is entirely environment-driven (twelve-factor). One `.env` plus the compose file is enough to stand up a deployment.
|
||||
- **Migrations** run on backend start (or via an explicit job) so an image pull + restart is a complete upgrade.
|
||||
- **Backups:** documented procedure for Postgres dump + object-store sync; restore is the inverse.
|
||||
- **Configuration** is entirely environment-driven (twelve-factor). One `.env` plus the compose file is enough to stand up a deployment; the backend/worker/migrate services read it via `env_file`, so every setting in `app/core/config.py` is configurable without a compose edit.
|
||||
- **Migrations** run on backend start (`RUN_MIGRATIONS=1`) and via a one-shot `migrate` compose service, so an image pull + restart is a complete upgrade. A **schema-drift guard** (defense in depth) makes a half-applied deploy loud rather than a silent storm of 500s: `/health/ready` returns 503 and startup logs a CRITICAL `SCHEMA DRIFT` line when the DB's `alembic_version` is behind the heads baked into the image (`app/core/schema_version.py`).
|
||||
- **Backups:** a one-command operator script (`deploy/backup.sh` — `pg_dump` + MinIO object sync, see `deploy/BACKUP.md`) plus a per-account ZIP export; restore is the inverse.
|
||||
|
||||
**Repository layout (as scaffolded):**
|
||||
|
||||
```
|
||||
/backend # FastAPI, uv-managed. app/{api/v1, services (+privacy), repositories, models, schemas, integrations (auth/mailer), core}; migrations/ = Alembic
|
||||
/deploy # docker-compose.yml, Caddyfile, .env.example
|
||||
/backend # FastAPI, uv-managed. app/{api/v1, services (+privacy), repositories, models, schemas, integrations (auth, mailer, objectstore, models = LLM/embedding providers), core}; migrations/ = Alembic
|
||||
/deploy # docker-compose.yml (+ docker-compose.dev.yml), Caddyfile, .env.example, backup.sh + BACKUP.md
|
||||
/.gitea/workflows # Gitea Actions: build images → Gitea registry
|
||||
/frontend # Next.js (App Router, TS, Tailwind). app/ pages, lib/api (openapi-typescript client), components/ui, Dockerfile (standalone)
|
||||
```
|
||||
|
||||
The compose stack runs `postgres` (pgvector image — includes `pgvector`; `pg_trgm` ships in contrib), `minio`, `backend`, and `caddy`. The **worker** container (same image as backend, worker mode) joins once queue-driven jobs exist. Phase 0 ships a minimal backend with `/health` (liveness) and `/health/ready` (Postgres reachability) to validate the deploy wiring before the data model lands.
|
||||
The compose stack runs `postgres` (pgvector image — includes `pgvector`; `pg_trgm` ships in contrib), `minio`, a one-shot `migrate` job, `backend`, the **worker** (same image as backend, worker mode — runs the scheduled soft-delete purge), `caddy`, and an optional `cloudflared` tunnel. The backend exposes `/health` (liveness) and `/health/ready` (Postgres reachability + schema-drift check).
|
||||
|
||||
## 13. Observability
|
||||
|
||||
|
||||
+36
-39
@@ -16,19 +16,18 @@
|
||||
|
||||
**Where Provenance is strong today.** The foundation is genuinely solid and, in several places, ahead of the OSS field:
|
||||
|
||||
- **Sources-first spine is real.** A reusable `Source` + per-fact `Citation` two-tier model with a `exactly_one_target` CHECK constraint, confidence enum, and full backend CRUD. This is the architectural thing webtrees/Gramps get right and most commercial tools bury. (Caveat: citations are silently dropped on GEDCOM *export* — see below.)
|
||||
- **Privacy architecture is the right shape.** A single `privacy.py` engine, `TenantScoped` mixin on every row, living-person heuristic (`is_possibly_living`, unknown-birth-treated-as-living), and media served **through the backend rather than via raw S3 URLs**. The *shape* is correct; coverage is not yet complete (the media endpoint and several child resources don't yet apply `person_visibility` — see §2.4, §2.10).
|
||||
- **Sources-first spine is real.** A reusable `Source` + per-fact `Citation` two-tier model with a `exactly_one_target` CHECK constraint, confidence enum, and full backend CRUD. This is the architectural thing webtrees/Gramps get right and most commercial tools bury.
|
||||
- **Privacy architecture is the right shape — and coverage is now broad.** A single `privacy.py` engine, `TenantScoped` mixin on every row, living-person heuristic (`is_possibly_living`, unknown-birth-treated-as-living), and media served **through the backend rather than via raw S3 URLs**. Non-member reads of persons, events, media, names, and relationships all route through `person_visibility` (#46). The remaining gap is the `citation`/`source` list endpoints, which still gate only on `can_view_tree` — see §2.10.
|
||||
- **Non-destructive by design.** Soft-delete with timed purge worker, immutable `AuditEntry` (before/after JSONB, `actor_type` ready for the assistant), GEDCOM merge that copies rather than overwrites, full account export/import.
|
||||
- **Modeling maturity.** Typed parent/child qualifiers (biological/adoptive/step/foster/donor/guardian), typed alternate names with one-primary invariant, dual verbatim+normalized dates, duplicate-relationship guards, UUID surrogate keys.
|
||||
- **Standards core.** GEDCOM 5.5.1 import/export is **functional** (with preview/merge-vs-create resolution UI), pg_trgm fuzzy name search, multi-tenant tree hosting with visibility tiers. Round-trip *fidelity* has four tracked gaps (citation links, custom tags, PLAC coords/hierarchy, non-UTF-8 encoding) — see §2.11.
|
||||
- **Standards core.** GEDCOM 5.5.1 import/export is **functional** (with preview/merge-vs-create resolution UI), pg_trgm fuzzy name search, multi-tenant tree hosting with visibility tiers. Round-trip *fidelity* has three tracked gaps (custom tags, PLAC coords/hierarchy, non-UTF-8 encoding) — see §2.11.
|
||||
|
||||
**Documentation-vs-code gaps to correct now (per "docs travel with code").** Three repo claims are not yet true and should be edited in the same spirit they were written:
|
||||
**Documentation-vs-code gaps to correct now (per "docs travel with code").** Two repo claims are not yet true and should be edited in the same spirit they were written:
|
||||
|
||||
- **ChangeProposal is documented as landed but does not exist.** CLAUDE.md states the core data model (ARCHITECTURE §5) landed / "Phase 0 complete," but `ChangeProposal` — part of §5 and the load-bearing AI invariant — has no model, migration, or schema. Either scope it out of the "landed" claim or build it; don't leave the docs asserting it.
|
||||
- **pgvector is claimed as used; it is not.** Only `pg_trgm` is created. ARCHITECTURE references pgvector for match ranking.
|
||||
- **i18n "from day one" is documented but unmet.** PRD §6 promises externalized strings; every label is a hardcoded literal.
|
||||
|
||||
These three doc edits are themselves trivial quick wins (see §3).
|
||||
These two doc edits are themselves trivial quick wins (see §3).
|
||||
|
||||
**The biggest gaps vs commercial (Ancestry / MyHeritage / FamilySearch).** Provenance is not trying to be a record provider, and correctly so — but it is missing several things mainstream users treat as table stakes:
|
||||
|
||||
@@ -40,19 +39,17 @@ These three doc edits are themselves trivial quick wins (see §3).
|
||||
|
||||
**The biggest gaps vs OSS (GRAMPS / Gramps Web / webtrees).** These are where a privacy-first self-host product is expected to compete and currently trails:
|
||||
|
||||
- **Collaboration is plumbed but unreachable.** `TreeMembership` roles are enforced on every read/write, but there is **no API or UI to invite, grant, change, or revoke** a member — the tree is effectively single-user despite multi-user infrastructure. This also breaks the full-CRUD invariant (NN#8) and, because importance and the old Phase-6 schedule disagree, a minimal management slice is pulled forward (§2.9).
|
||||
- **Living-person redaction is non-uniform.** Redaction is applied on person reads but **not** on the event/media/name/relationship/citation/source child-resource endpoints — a real PII leak on public/unlisted trees (NN#3, NN#2).
|
||||
- **`site_members` visibility tier is silently broken** (defined, selectable in UI, never handled in `can_view_tree`).
|
||||
- **Collaboration management is now reachable, but minimal.** `TreeMembership` roles are enforced on every read/write, and a list/add/change-role/remove API + UI now ship (§2.9), satisfying the full-CRUD invariant (NN#8). The remaining gap is the richer **email invite/grant flow** (pending-invite state, resend/expire), still scheduled for Phase 6.
|
||||
- **Living-person redaction is now near-uniform.** Non-member reads of persons, events, media, names, and relationships all redact possibly-living people (#46); the `citation`/`source` list endpoints are the remaining hold-outs (they gate only on `can_view_tree`) — a narrowed PII gap on public/unlisted trees (NN#3, NN#2).
|
||||
- **No place as a usable first-class entity** (model exists, created by GEDCOM, but no read/edit/delete — a create-only entity, which is a bug per NN#8).
|
||||
- **No research log, to-do/task planner, kinship calculator, data-quality checker, or i18n/string externalization** (the last is a documented day-one commitment that is currently unmet).
|
||||
|
||||
**Security-priority correctness fixes (do these first, regardless of phase).** Three current defects are user-harm or trust issues, not roadmap items:
|
||||
**Security-priority correctness fixes (do these first, regardless of phase).** Most of the original redaction defects shipped this cycle (#46); two items remain — one a narrowed PII gap, one a config switch:
|
||||
|
||||
1. **Media privacy leak (§2.4)** — `list_media`/`get_media`/`media_content` gate on `can_view_tree` but never `person_visibility`; non-owners can download photos of redacted living people on public/unlisted trees.
|
||||
2. **Child-resource redaction gap (§2.10)** — event/media/name/relationship/citation/source endpoints don't apply living-person redaction.
|
||||
3. **Registration issues a live session before verification (§2.10)** — `register` returns an authenticated session cookie + token (201) and `email_verified_at` is written but never read on any path; there is no env switch to gate self-registration. The *enforcement check* (read-side `email_verified_at`) is small; the approval-mode env switch is the larger piece.
|
||||
1. **Citation/source redaction gap (§2.10)** — `list_media`/`get_media`/`media_content`, plus the event/name/relationship endpoints, now apply `person_visibility` for non-members (#46), closing the media leak. The `citation`/`source` list endpoints still gate only on `can_view_tree`, so a non-member on a public/unlisted tree can still enumerate citations/sources tied to redacted living people — the remaining living-person leak.
|
||||
2. **Self-registration approval-mode switch (§2.10)** — the read-side enforcement now exists: `REQUIRE_EMAIL_VERIFICATION` gates login/session on `email_verified_at` (#53). The remaining gap is the env switch to choose open vs admin-approval vs closed self-registration.
|
||||
|
||||
**Strategic posture.** The differentiators worth pressing — property chain-of-title, the ChangeProposal AI model, the anonymous mutual-consent hint system, and true self-host data ownership — are mostly still ahead on the roadmap. The near-term job is (a) close the **privacy/auth correctness** and **collaboration** gaps that the architecture already implies, (b) ship the **maps + reports + merge** table stakes, and (c) build the **connector/ModelProvider/ChangeProposal** spine that unlocks the entire back half of the roadmap.
|
||||
**Strategic posture.** The differentiators worth pressing — property chain-of-title, the ChangeProposal AI model, the anonymous mutual-consent hint system, and true self-host data ownership — are mostly still ahead on the roadmap. The near-term job is (a) close the **privacy/auth correctness** and **collaboration** gaps that the architecture already implies, (b) ship the **maps + reports + merge** table stakes, and (c) finish the back-half spine — the **connector framework** plus wiring the now-landed **ChangeProposal/ModelProvider** into the assistant — that unlocks the entire back half of the roadmap.
|
||||
|
||||
---
|
||||
|
||||
@@ -129,11 +126,11 @@ Fuzzy trigram name search is **have**; everything that depends on connectors, em
|
||||
|
||||
### 2.4 Media & documents
|
||||
|
||||
Universal media attachment is **have**, but with a **confirmed privacy leak** and no asset-processing pipeline.
|
||||
Universal media attachment is **have**; the earlier privacy leak is now **closed** (#46), and the remaining gaps are the asset-processing pipeline (EXIF strip, thumbnails).
|
||||
|
||||
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|
||||
|---|---|---|---|---|---|---|
|
||||
| **Media privacy gating on serve paths** | `list_media`/`get_media`/`media_content` gate only on `can_view_tree`, never `person_visibility` — a non-owner can download photos of redacted living people on public/unlisted trees. | Have(leaky) | **Critical** | M | 1 | **Security-priority — fix first. Direct NN#3/NN#2 violation.** Check attached `person_id` visibility and redact/hide. |
|
||||
| **Media privacy gating on serve paths** | `list_media`/`get_media`/`media_content` now apply `person_visibility` for non-members (#46): media is exposed only when linked to a FULL-visibility person (`list_public_media`/`can_view_media`), so living-person photos no longer leak on public/unlisted trees. | Have | **Critical** | M | 1 | **Resolved (NN#3/NN#2).** Serve paths check attached `person_id` visibility and 404 otherwise. |
|
||||
| EXIF / GPS stripping on upload | Raw bytes stored verbatim; family photos leak GPS/home addresses/timestamps. | Planned | High | M | 1 | **Security-priority**, not cosmetic. Parse EXIF on ingest, strip/quarantine by default, allow override. |
|
||||
| Thumbnail / preview generation | No image pipeline (no Pillow). Async, idempotent worker job. | Planned | High | L | 1 | Derived thumbnail must inherit parent privacy — no bypass path. |
|
||||
| Image reference regions | Mark the rectangle of a census image that supports a Citation. | Missing | Med | M | later | Tenant-scoped, full CRUD; region→Citation preferred over region→Person. |
|
||||
@@ -224,14 +221,14 @@ The preview→approve **bulk cleanup** tool is a genuine **have** and a differen
|
||||
|
||||
### 2.9 Collaboration & sharing
|
||||
|
||||
Authorization is enforced everywhere, but the **management surface is entirely absent** — the most consequential gap relative to the multi-user product promise. Because the Critical items below previously sat at Phase 6 while their labels said "breaks NN#8," a minimal management slice is pulled forward to Phase 2; the richer invite/email UX stays at Phase 6.
|
||||
Authorization is enforced everywhere, and a **minimal management surface now ships** — list/add/change-role/remove via `api/v1/members.py` plus a members page (#233). The remaining gap is the richer email invite/grant flow. The minimal slice landed at Phase 2 as planned; the invite/email UX stays at Phase 6.
|
||||
|
||||
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|
||||
|---|---|---|---|---|---|---|
|
||||
| **Membership PATCH/DELETE + role change (minimal slice)** | Add/adjust/revoke a collaborator and change `role` — the substrate (mutable `role`) exists; only the endpoints are missing. Resolves the create-only NN#8 break without the full invite flow. | Partial | **Critical** | S–M | 2 | **Pulled forward** — a create-only entity shouldn't wait for Phase 6 (NN#8). Revocation routes through the single privacy point. |
|
||||
| **Membership PATCH/DELETE + role change (minimal slice)** | Add/adjust/revoke a collaborator and change `role` — GET/PATCH/DELETE on `/trees/{id}/members` (`api/v1/members.py`) plus a frontend members page now ship (#233). Resolves the create-only NN#8 break without the full invite flow. | Have | **Critical** | S–M | 2 | Resolves the create-only NN#8 break. Revocation routes through the single privacy point. |
|
||||
| Full invite/grant flow (email + UI) | Email-based invitations, pending-invite state, role-grant UI, resend/expire. Builds on the minimal slice. | Partial | High | L | 6 | Invitation email via configured SMTP (NN#7); membership changes through the one enforcement point. |
|
||||
| **Read-only public tree share** | Visibility model already redacts living persons for anonymous viewers, but every endpoint requires `CurrentUser` — no optional-auth dep, no public route, no public page. | Partial | High | M | 2 | Highest-leverage near-term sharing feature; living-safe by construction via `person_visibility` (NN#2/#3). |
|
||||
| SEO public profile pages (server-rendered) | Intent declared (`public` = search-indexable) but zero implementation; no sitemap/robots/meta. | Partial | Med | L | 2 | NN#2 explicitly names server-rendered public pages — must go through privacy engine, no direct row queries. |
|
||||
| **Read-only public tree share** | Anonymous read surface shipped: optional-auth `CurrentUserOrNone` dep, `api/v1/public.py` + `public_view_service.py`, and server-rendered pages at `/p/[treeId]` (+ `/persons/[personId]`) and `/explore`. Living-safe by construction via `person_visibility`. | Have | High | M | 2 | Highest-leverage near-term sharing feature; living-safe by construction via `person_visibility` (NN#2/#3). |
|
||||
| SEO public profile pages (server-rendered) | Server-rendered public pages (`/p/[treeId]`, `/explore`) and `robots.ts` now ship. Deferred follow-ups: a public-only `sitemap.ts` and per-tree `noindex,nofollow` meta for `unlisted`/`site_members` pages. | Partial | Med | L | 2 | NN#2 explicitly names server-rendered public pages — must go through privacy engine, no direct row queries. |
|
||||
| **Notification / event-dispatch substrate** | Shared enabler seeded from `AuditEntry`: subscription + dispatch layer emitting privacy-filtered projections. Underpins watch/follow, mutual-consent match notices, comments, moderation, and in-app messaging. | Missing | High | L | 6 | **Privacy-filtered projections only — never raw before/after JSON** (NN#2/#3). |
|
||||
| Comments / discussion threads | Per-profile discussion (target = person/event/source), threaded. | Missing | High | M | 6 | Comments on living persons redacted for non-members (NN#2/#3); rides the dispatch substrate. |
|
||||
| In-app messaging (contact details hidden) | SMTP exists; no Message/Thread model. | Planned | High | L | 6 | Hide contact details; opens after mutual consent (NN#4); redact living-person content; rides dispatch substrate. |
|
||||
@@ -253,10 +250,11 @@ The architecture is correct (single engine, tenant mixin, audit, soft-delete + p
|
||||
|
||||
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|
||||
|---|---|---|---|---|---|---|
|
||||
| **Uniform living-person redaction across child resources** | `_redact` runs on person reads but **not** on event/media/name/relationship/citation/source endpoints — non-members fetch a possibly-living person's events/photos/names directly. | Partial | **Critical** | M | 1–2 | **Security-priority. Core NN#3/NN#2 defect.** Apply `person_visibility` on every person-derived fact. |
|
||||
| **Email-verification enforcement gate** | `email_verified_at` is written at `auth_service.py:154` but read on no path; `register` returns an authenticated session cookie + token (201) pre-verification. | Partial | **High** | S | 1–2 | **Security-priority near-quick-win** — add the read-side check (NN#7 trust path). The check is small; the registration-mode switch below is the larger piece. |
|
||||
| **Uniform living-person redaction across child resources** | `person_visibility` now runs for non-members on the event, media, name, and relationship endpoints (#46), which delegate to `public_view_service`. Remaining: the `citation`/`source` list endpoints still gate only on `can_view_tree`, so citations tied to a redacted living person are still enumerable. | Partial | High | S | 1–2 | **Mostly resolved (NN#3/NN#2).** Apply `person_visibility` to the citation/source list paths to close the residual leak. |
|
||||
| **Email-verification enforcement gate** | Read-side check now ships (#53): `REQUIRE_EMAIL_VERIFICATION` gates login/session on `email_verified_at` (`auth_service.py`). Opt-in (default off) so SMTP-less self-hosts still work. | Have | **High** | S | 1–2 | Read-side trust path now enforced (NN#7); the registration-mode switch below is the separate larger piece. |
|
||||
| Self-registration mode gating (approve / open / closed) | No env switch to choose open vs admin-approval vs closed registration. | Partial | High | M | 2/5 | Twelve-factor registration control (NN#7); pairs with the verification gate above. |
|
||||
| **Fix `site_members` visibility tier** | Defined + selectable in UI but `can_view_tree` only handles public/unlisted — fails closed unintuitively. | Partial | Critical | S | 1 | **Quick win.** Least-surprise; honor the tier the UI offers. |
|
||||
| Instance owner / operator role | `OWNER_EMAIL`-declared operator (#240): `is_instance_owner` on `/users/me`, owner-only `GET /api/v1/admin/instance`, `/admin` UI. | Have | Med | S | 2/5 | Owner-only operational surface, twelve-factor via env (NN#7); reads stay through the service layer. |
|
||||
| **Fix `site_members` visibility tier** | `can_view_tree` now handles `site_members` (`privacy.py:56`): any authenticated account gets a read view, anonymous is refused. | Have | Critical | S | 1 | Honors the tier the UI offers; reads still route through `person_visibility`. |
|
||||
| Make `LIVING_RECENCY_YEARS` configurable | Hardcoded 100 at `privacy.py:23`. | Partial | High | S | 2 | **Quick win.** Twelve-factor (NN#7). |
|
||||
| Privacy-stripped export (redact living) | GEDCOM + account export emit full tree; no "strip living" mode. | Missing | High | M | 2 | Reuse `person_visibility`/`_redact` (NN#3). Owner self-export is safe today; shareable variant is the gap. |
|
||||
| Per-fact / per-field privacy + record flags | tentative/rejected/preferred/private flags on facts. | Missing | Med | L | later | If added, route through the single engine (NN#2). |
|
||||
@@ -270,11 +268,11 @@ The architecture is correct (single engine, tenant mixin, audit, soft-delete + p
|
||||
|
||||
### 2.11 Import/export & standards
|
||||
|
||||
GEDCOM 5.5.1 import/export and full data-portability export are **have**, but fidelity gaps directly undercut the provenance thesis — and one is outright data loss.
|
||||
GEDCOM 5.5.1 import/export and full data-portability export are **have**; the remaining fidelity gaps (custom tags, PLAC coords/hierarchy, non-UTF-8 encoding) still undercut the provenance thesis.
|
||||
|
||||
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|
||||
|---|---|---|---|---|---|---|
|
||||
| **Citation links dropped on GEDCOM export** | Export never selects the Citation table — fact→source links, page, detail, confidence all dropped on export (they import fine). Re-importing your own export **destroys** the sources-first graph. | Partial | **Critical** | M | 2 | **Silent data loss on the product's signature data + destructive round-trip** (NN#5); breaks PRD US-013. |
|
||||
| **Citation links on GEDCOM export** | Export now selects Citations and emits `SOUR`/`PAGE` per fact (#232), so fact→source links survive a Provenance→Provenance round-trip. (Citation detail/confidence beyond page still to round-trip.) | Have | **Critical** | M | 2 | Closes the silent data-loss / destructive round-trip on the product's signature data (NN#5); satisfies PRD US-013. |
|
||||
| GEDCOM 7.0 import/export | Version hardcoded `5.5.1`; no v7 semantics, SCHMA, SUBM, or UID handling. | Partial | High | L | 2 | Stated differentiator (FamilySearch interop). |
|
||||
| Custom/underscore tag preservation | `_MARNM` becomes `TYPE married`, other custom tags dropped — violates ≥99% round-trip goal. | Missing | High | L | 2 | Tension with provenance thesis (faithful record). |
|
||||
| PLAC FORM hierarchy + MAP coordinate round-trip | Import reads only PLAC text; export emits flat PLAC. lat/long + hierarchy lost on round-trip. | Missing | High | M | 2–3 | Round-trip fidelity for the land/maps pillar. |
|
||||
@@ -309,7 +307,7 @@ Internal REST + OpenAPI + generated TS client are **have**. The externalized dev
|
||||
|
||||
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|
||||
|---|---|---|---|---|---|---|
|
||||
| Public read-only API + scoped tokens (OAuth) | Bearer token is opaque session only; `TokenPurpose` lacks scopes; designed `public.py` never built. | Partial | High | L | 5–6 | Any scoped-token path routes through `person_visibility` + living-person redaction (NN#2/#3). |
|
||||
| Public read-only API + scoped tokens (OAuth) | The unauthenticated public read surface (`public.py`) now ships (#41–#51), but for a *developer* API the bearer token is still opaque session only and `TokenPurpose` lacks scopes — no scoped/OAuth token path. | Partial | High | L | 5–6 | Any scoped-token path routes through `person_visibility` + living-person redaction (NN#2/#3). |
|
||||
| SourceConnector framework | Only AuthProvider/ObjectStore/Mailer base classes exist; no connector base/loader/registry. Gates AI, hints, property connectors. | Planned | Med | L | 4 | Read-only, rate-limited; findings via ChangeProposal (NN#1); legal sources only (NN#6). |
|
||||
| Webhooks / change feeds | `AuditEntry` is the natural substrate (shares the notification dispatch layer, §2.9); no feed/webhook layer. | Missing | Med | L | 6 | Emit privacy-filtered, tenant-scoped projections — never raw before/after JSON (NN#2/#3). |
|
||||
| CLI / scripting surface | No `[project.scripts]`, no Typer/Click; worker is a purge loop only. Self-hosters want bulk admin. | Missing | Med | M | 9 | Funnel reads through privacy.py, writes through audit; admin-scoped, no assistant-write path. |
|
||||
@@ -328,7 +326,7 @@ Postgres + S3, multi-tenant isolation are **have**. Queue, observability, backup
|
||||
| Real job queue (Postgres/Redis-backed) | Worker is a fixed-interval purge loop; GEDCOM import and account export run **inline in the request**. | Partial | High | L | 4 (pre-req) | Blocks NN#1 (assistant in worker) and NN#4 (hint matching in worker). Queue backend is an open question (PRD §11). |
|
||||
| **Pagination on list endpoints + server-side tree loading** | List endpoints (`persons.py:37`, events, relationships) take no `limit/offset/skip`; the tree view loads the whole graph client-side. A *current* limitation against the 50k-person target. | Planned | High | M | 1–2 | **Split out from scale validation** — this is a correctness/functional gap now, not a Phase 9 task. |
|
||||
| Scale validation (50k+ trees, P95<2s, load test) | No benchmark or load test exists. | Planned | High | L | 9 | Inline heavy ops risk partial writes — moving to the queue is what makes "failures never corrupt state" true. |
|
||||
| **Operator backup: one-command `pg_dump` + MinIO sync** | Only a documented procedure + per-account ZIP exist; no scripted DB+object dump. For a self-host product this is day-one data-loss exposure. | Partial | Critical | M | 1–2 | **Pulled forward** — Critical importance contradicted the old Phase-9 slot. Restore must re-apply privacy state faithfully (NN#3); safety net for NN#8. |
|
||||
| **Operator backup: one-command `pg_dump` + MinIO sync** | `deploy/backup.sh` + `deploy/BACKUP.md` now provide a scripted DB+object dump (#234). Remaining: scheduled/off-host/verified-restore tooling (row below). | Have | Critical | M | 1–2 | Restore must re-apply privacy state faithfully (NN#3); safety net for NN#8. |
|
||||
| Scheduled / cloud automated backup + restore tooling | Cron-driven, off-host, verified-restore workflow. | Partial | High | L | 9 | Builds on the one-command slice above. |
|
||||
| ARM64 build matrix | CI builds `linux/amd64` only; many self-hosters run ARM SBCs. | Partial | High | S | 1 | **Quick win.** Add arm64 + QEMU to buildx (NN#7 container-native). |
|
||||
| Structured JSON logs + Prometheus metrics | Plain-text stdlib logging; no `/metrics`. | Partial | Med | M | 9 | Logs/metrics reference UUIDs, never names/PII (NN#3/#4). |
|
||||
@@ -361,12 +359,13 @@ The entire "land" half is **planned/missing** but fully specified. This is where
|
||||
|
||||
### 2.16 AI assistant — *defining differentiator*
|
||||
|
||||
Entirely **planned** — and note the docs-vs-code gap: ARCHITECTURE §5 lists `ChangeProposal` as part of the "landed" core model, but no model/migration/schema exists. The audit substrate (`actor_type=assistant`, before/after JSONB) is the right foundation; the ChangeProposal model and ModelProvider abstraction are the two critical-path pieces.
|
||||
The spine has now **landed**: the `ChangeProposal` model/schema/service, its migration, the GET/POST API, and a review UI all ship, and the `LLMProvider`/`EmbeddingProvider` abstraction with null/Anthropic/OpenAI-compat (OpenAI/xAI/Ollama) providers + registry is in place. The audit substrate (`actor_type=assistant`, before/after JSONB) is the right foundation; the remaining work is wiring the assistant's tools to emit proposals and building the chatbot/RAG surface on top.
|
||||
|
||||
| Item | Description | Status | Imp | Eff | Phase | Non-negotiable |
|
||||
|---|---|---|---|---|---|---|
|
||||
| **ChangeProposal (propose-then-confirm)** | The defining invariant. No `proposal.py`, no migration, no review UI yet — despite docs implying it landed. | Planned | **Critical** | L | 4 | **IS NN#1.** Enforce structurally: assistant tools return proposals; only user action applies one; application flows through the normal service layer (privacy + audit). ChangeProposal itself needs full CRUD (NN#8). Correct the docs to match reality. |
|
||||
| Pluggable LLM + embedding provider | `ModelProvider` over Anthropic/OpenAI/xAI/Ollama; env placeholders exist, no interface code. | Planned | Critical | M | 4 | **Twelve-factor, no hard-coded keys/endpoints** (NN#7); the Ollama/self-hosted path is what makes the privacy-first promise real. |
|
||||
| **ChangeProposal (propose-then-confirm)** | The defining invariant. Model/schema/service (`models/change_proposal.py`, `services/change_proposal_service.py`), migration `a1b2c3d4e5f6`, GET/POST `api/v1/proposals.py`, and a `/trees/[id]/proposals` review UI all ship. Remaining: wire assistant tools to emit proposals. | Have | **Critical** | L | 4 | **IS NN#1.** Enforce structurally: assistant tools return proposals; only user action applies one; application flows through the normal service layer (privacy + audit). ChangeProposal itself needs full CRUD (NN#8). |
|
||||
| Pluggable LLM + embedding provider | `LLMProvider`/`EmbeddingProvider` ABCs (`integrations/models/base.py`) with null, Anthropic, and OpenAI-compat (OpenAI/xAI/Ollama) implementations + registry. | Have | Critical | M | 4 | **Twelve-factor, no hard-coded keys/endpoints** (NN#7); the Ollama/self-hosted path is what makes the privacy-first promise real. |
|
||||
| Per-tree AI model policy | Owner-only per-tree model selection (`Tree.ai_member_provider`/`ai_recommender_provider`, GET/PATCH `/trees/{id}/ai`, `/trees/[id]/ai` UI) (#238). | Have | Med | S | 4 | Owner-only; selects which configured provider a tree uses — keys stay in env, twelve-factor (NN#7). |
|
||||
| AI research-assistant chatbot (RAG over tree) | Marquee feature; needs ModelProvider + connector + retrieval through privacy engine. | Planned | High | XL | 4 | NN#1 propose-only, NN#2 privacy retrieval, NN#3 redaction. |
|
||||
| Conversational / connector record search | Search legal sources via the assistant. | Planned | High | L | 4 | Legal sources (NN#6); findings = Source + Citation (NN#5). |
|
||||
| Fact extraction from documents | Extracted facts map cleanly to ChangeProposal review. | Missing | Med | M | 4 | Canonical NN#1 use case; each fact carries a Citation (NN#5). |
|
||||
@@ -399,8 +398,8 @@ A documented **day-one commitment** ("UI strings externalized from day one") tha
|
||||
|
||||
Ordered by leverage. All are S-effort or a thin slice of a larger item, and most close a stated invariant gap.
|
||||
|
||||
1. **Fix `site_members` visibility tier** (Privacy, Critical/S) — defined and selectable in the UI but never handled in `can_view_tree`; fails closed unintuitively.
|
||||
2. **Email-verification enforcement gate** (Privacy/Auth, High/S) — add the read-side `email_verified_at` check so a freshly registered, unverified user doesn't get a live authenticated session. Security-priority; the registration-mode env switch (open/approve/closed) is the larger follow-on, not part of this quick win.
|
||||
1. **Fix `site_members` visibility tier** (Privacy, Critical/S) — **done:** `can_view_tree` now handles `site_members` (`privacy.py:56`), giving any authenticated account a read view while refusing anonymous.
|
||||
2. **Email-verification enforcement gate** (Privacy/Auth, High/S) — **done (#53):** the read-side `email_verified_at` check now ships behind `REQUIRE_EMAIL_VERIFICATION`, so a freshly registered, unverified user doesn't get a live authenticated session. The registration-mode env switch (open/approve/closed) is the larger follow-on (§2.10, M-effort — not a quick win).
|
||||
3. **Citation confidence selector in the cite form** (Sources, High/S) — confidence is modeled and API-writable but unreachable in the UI; every UI citation is currently NULL. Honors NN#8 and the evidence-quality thesis.
|
||||
4. **Source edit UI + expose all 8 fields** (Sources, High/S) — update API exists but there is no edit form and create exposes ~3 fields; a create-but-not-edit entity violates NN#8.
|
||||
5. **Make `LIVING_RECENCY_YEARS` env-configurable** (Privacy, High/S) — hardcoded 100 at `privacy.py:23`; twelve-factor (NN#7).
|
||||
@@ -411,11 +410,9 @@ Ordered by leverage. All are S-effort or a thin slice of a larger item, and most
|
||||
10. **`GET /{tree}/citations/{id}` endpoint** (Sources, Med/S) — API symmetry (NN#8).
|
||||
11. **Transcription/abstract fields on Source** (Sources, Med/S) — add `transcription_text` + `abstract_text`, distinct from `citation_text`; core to evidence analysis.
|
||||
12. **Sort the merged person timeline** (Research workflow, Med/S) — `shownEvents.sort()` on `date_start`; currently appended unsorted.
|
||||
13. **Doc corrections (docs-vs-code)** (Meta, trivial/S) — edit CLAUDE.md / ARCHITECTURE so the pgvector "used" claim, the i18n "from day one" claim, and the ChangeProposal "landed" claim match reality. The repo convention requires docs to travel with code.
|
||||
13. **Doc corrections (docs-vs-code)** (Meta, trivial/S) — edit CLAUDE.md / ARCHITECTURE so the pgvector "used" claim and the i18n "from day one" claim match reality. The repo convention requires docs to travel with code.
|
||||
|
||||
> **Ships-with, not standalone:** *Revocable / adjustable access (membership PATCH/DELETE + role change)* is security-critical and S-effort, but it is the minimal slice of the membership work (§2.9) and ships **with** those endpoints — it is not independently shippable on its own.
|
||||
>
|
||||
> **Higher priority than any quick win, but M-effort (not quick):** the **media privacy leak** (§2.4), the **child-resource redaction gap** (§2.10), and pulling the **one-command operator backup** (§2.14) forward. Treat these as **security-/data-loss-priority Phase 1–2 fixes** regardless of the quick-win list.
|
||||
> **Mostly shipped this cycle (#46):** the **media privacy leak** (§2.4) and the broad **child-resource redaction gap** (§2.10) are now closed for the person/event/media/name/relationship endpoints. The narrowed remainder — applying `person_visibility` to the **citation/source list endpoints** — is an S-effort follow-up; treat it as a security-priority Phase 1–2 fix regardless of the quick-win list.
|
||||
|
||||
---
|
||||
|
||||
@@ -425,10 +422,10 @@ Where to invest to make Provenance distinct rather than a webtrees clone. Each l
|
||||
|
||||
**1. Property chain-of-title (the "land" half).** No surveyed competitor models ownership as a typed, cited event chain tying parties across time, with gap-flagging and bidirectional owner↔person / parcel↔place traversal, fed by **legal** public sources (BLM/GLO patents, USGS, public county deeds). This is the single clearest "no one else does this" capability. Sequence: Property + OwnershipEvent + Citation-target (Phase 3) → chain-of-title view → BLM/GLO connector (Phase 8). The Citation extension is a quick win; the entity is the prerequisite for everything else in the category.
|
||||
|
||||
**2. The ChangeProposal AI model.** "The assistant never writes autonomously" is a *trust* differentiator in a market where users fear AI corrupting their research. Build it structurally — assistant tools return proposals; only an explicit human action applies one; application flows through the normal service layer so it always hits the privacy engine and audit log. The same approval queue moderates untrusted human-contributor edits (Collaboration §2.9), so design them together. The audit substrate is already in place; ChangeProposal + ModelProvider are the critical path — and the docs should stop asserting ChangeProposal has landed until it has.
|
||||
**2. The ChangeProposal AI model.** "The assistant never writes autonomously" is a *trust* differentiator in a market where users fear AI corrupting their research. The structural spine has **landed** — the `ChangeProposal` model/API/review UI and the pluggable `LLMProvider`/`EmbeddingProvider` abstraction both ship — so the remaining work is wiring the assistant's tools to emit proposals (never mutating directly). Assistant tools return proposals; only an explicit human action applies one; application flows through the normal service layer so it always hits the privacy engine and audit log. The same approval queue moderates untrusted human-contributor edits (Collaboration §2.9), so design them together.
|
||||
|
||||
**3. Anonymous, mutual-consent cross-tree hints.** The privacy model already redacts living people for anonymous viewers, so a hint system that reveals *nothing identifying* until both sides opt in is achievable by construction — and is a categorically more trustworthy version of MyHeritage Smart Matches / Ancestry hints. Requires the matching engine (pgvector enablement + candidate generation, Phase 7), the notification/event-dispatch substrate (§2.9), and the messaging channel that opens only post-consent.
|
||||
|
||||
**4. True self-hosting + data ownership.** Full account export/import, soft-delete recovery, GEDCOM round-trip, env-driven everything, and (to-build) operator-grade scheduled backup + ARM support make Provenance the genealogy app you actually own. Two correctness items gate the promise: GEDCOM export must stop dropping citations (a Provenance→Provenance round-trip currently destroys the sources graph), and operator backup must move from "documented procedure" to a one-command dump. The Ollama/self-hosted ModelProvider path means even the AI assistant runs without tree data leaving the deployment — a promise no commercial competitor can make.
|
||||
**4. True self-hosting + data ownership.** Full account export/import, soft-delete recovery, GEDCOM round-trip, env-driven everything, a one-command operator backup, and (to-build) scheduled off-host backup + ARM support make Provenance the genealogy app you actually own. The two correctness items that gated the promise have **landed**: GEDCOM export now preserves citations (the Provenance→Provenance round-trip keeps the sources graph), and operator backup moved from "documented procedure" to a one-command dump (`deploy/backup.sh`). What remains is scheduled/verified-restore tooling and ARM builds. The Ollama/self-hosted ModelProvider path means even the AI assistant runs without tree data leaving the deployment — a promise no commercial competitor can make.
|
||||
|
||||
**5. Sources-first as a felt experience.** The two-tier model is built; the differentiator is making it *visible and low-friction*: a guided Evidence-Explained citation builder, transcription/abstract fields, source-driven data entry (transcribe a document into the tree), per-fact confidence surfaced in the UI, and — critically — citations that **survive GEDCOM export**. These turn "every fact links to where it came from" from an architecture note into the product's personality.
|
||||
**5. Sources-first as a felt experience.** The two-tier model is built, and citations now **survive GEDCOM export** (#232); the remaining differentiator is making sourcing *visible and low-friction*: a guided Evidence-Explained citation builder, transcription/abstract fields, source-driven data entry (transcribe a document into the tree), and per-fact confidence surfaced in the UI. These turn "every fact links to where it came from" from an architecture note into the product's personality.
|
||||
|
||||
+15
-9
@@ -1,8 +1,8 @@
|
||||
# Provenance — Product Requirements Document
|
||||
|
||||
**Status:** Draft v0.1
|
||||
**Status:** Draft v0.1 — now describes a partially-implemented system: Phase 0 complete, Phase 1 done, with early slices of later phases shipped.
|
||||
**Owner:** Justin Paul
|
||||
**Last updated:** 2026-06-06
|
||||
**Last updated:** 2026-06-10
|
||||
|
||||
---
|
||||
|
||||
@@ -94,7 +94,7 @@ Acceptance criteria (AC) are written to be testable.
|
||||
- **US-033** I view every property a person held, and every parcel ever recorded at a place. *AC:* both reverse lookups return correct sets.
|
||||
|
||||
### Privacy & sharing
|
||||
- **US-040** I set a tree to public, unlisted, or private. *AC:* visibility enforced for anonymous and non-owner users.
|
||||
- **US-040** I set a tree to one of four visibility levels — private, unlisted, site_members, or public. *AC:* visibility enforced for anonymous and non-owner users; at the **site_members** level the tree is visible to any authenticated instance user (signed in but not a member of the tree) and hidden from anonymous visitors.
|
||||
- **US-041** I mark any individual private even within a public tree. *AC:* that person's details hidden from non-owners regardless of tree setting.
|
||||
- **US-042** Living people are hidden from non-owners by default. *AC:* a person with no death fact and a plausibly-living birth date shows only minimal/no PII to non-owners; owner can override per person.
|
||||
- **US-043** I add a co-owner to a tree. *AC:* co-owner can edit per role; action attributed to them in the audit log.
|
||||
@@ -132,6 +132,7 @@ Acceptance criteria (AC) are written to be testable.
|
||||
### 5.1 Identity & access
|
||||
- Pluggable authentication: local password (with email verification and reset), social sign-in (Google, Apple, Facebook), and generic **OIDC** (validated against Authentik; should work with Keycloak, Authentik, Auth0, etc.). Operators enable any subset.
|
||||
- Roles per tree: **owner**, **co-owner/editor**, **viewer**. Public/unlisted trees also have an implicit anonymous viewer.
|
||||
- **Instance owner/operator:** an env-declared operator role (via `OWNER_EMAIL`, requiring a verified email), distinct from the per-tree roles. It is an operations/config role only and is **not** a privacy bypass — it grants no access to others' tree data or PII.
|
||||
- The AI assistant acts as a distinct, scoped principal bound to the user it is helping — it can never exceed that user's rights, and its actions are separately attributable.
|
||||
|
||||
### 5.2 Data model (core entities)
|
||||
@@ -155,6 +156,7 @@ Acceptance criteria (AC) are written to be testable.
|
||||
|
||||
### 5.5 Privacy engine
|
||||
- Effective visibility = function(tree visibility, person override, living status, viewer role).
|
||||
- Tree visibility has four levels: **private** (members only; default), **unlisted** (anyone with the link, not listed/indexed), **site_members** (any authenticated instance user), and **public** (anonymous + listed/indexable).
|
||||
- Living-person rule: absent a death fact and within a configurable recency window (default ~100 years from birth, or unknown birth treated as possibly-living), non-owners see minimal or no PII.
|
||||
- Public/link views must render through the same privacy engine — no bypass path.
|
||||
|
||||
@@ -168,6 +170,7 @@ Acceptance criteria (AC) are written to be testable.
|
||||
|
||||
### 5.8 AI research assistant
|
||||
- Provider-agnostic abstraction over hosted models (Anthropic, OpenAI, xAI) and self-hosted/local models (e.g., an OpenAI-compatible endpoint or Ollama).
|
||||
- Operators register one or more model providers (env / registry); a tree owner then selects the active provider(s) for that tree via an owner-only AI settings surface.
|
||||
- Tool-mediated access to the same CRUD operations a user has, scoped to that user, via a server with explicitly scoped capabilities (an MCP-style tool boundary).
|
||||
- **Propose-then-confirm is mandatory.** The assistant drafts changes as diffs; nothing persists without explicit user approval.
|
||||
- Source connectors are a **plugin framework**; the project ships only legal sources (e.g., FamilySearch API, Find A Grave, WikiTree, BLM/GLO land patents, USGS maps, public-domain newspapers, public county records). Operator-supplied scrapers can be added later.
|
||||
@@ -181,6 +184,7 @@ Acceptance criteria (AC) are written to be testable.
|
||||
### 5.11 Administration & operations
|
||||
- All integration points (auth, SMTP, object storage, database, model providers, scrapers) are environment/config-driven.
|
||||
- Health endpoints; structured logs; a documented backup/restore procedure; safe upgrade via image pull + migration.
|
||||
- Owner-only operator surface: instance status and configuration (`GET /api/v1/admin/instance` and the `/admin` UI), scoped to the instance owner and exposing no tree contents or PII.
|
||||
|
||||
## 6. Non-functional requirements
|
||||
|
||||
@@ -206,17 +210,19 @@ Acceptance criteria (AC) are written to be testable.
|
||||
|
||||
Provenance ships continuously and is stood up in a live lab as it goes; there is no hard MVP/v2 line, but features land in dependency order so each tranche is usable.
|
||||
|
||||
- **Phase 0 — Foundation:** backend + DB schema; local auth + email verify; frontend scaffold; container images; CI/CD (Gitea Actions → Gitea registry → server pull); one-command compose deploy.
|
||||
- **Phase 1 — Core tree:** people, relationships, events; sources & citations; media uploads; soft delete + recovery; tree-level privacy.
|
||||
- **Phase 2 — Standards & polish:** GEDCOM 7 import/export; search with fuzzy names; living-person protection; person-level privacy override; onboarding + persona selector.
|
||||
- **Phase 0 — Foundation:** *(shipped)* backend + DB schema; local auth + email verify; frontend scaffold; container images; CI/CD (Gitea Actions → Gitea registry → server pull); one-command compose deploy.
|
||||
- **Phase 1 — Core tree:** *(shipped)* people, relationships, events; sources & citations; media uploads; soft delete + recovery; tree-level privacy (now four levels: private/unlisted/site_members/public).
|
||||
- **Phase 2 — Standards & polish:** *(partly shipped — GEDCOM 7 import/export #232; fuzzy/trigram search)* GEDCOM 7 import/export; search with fuzzy names; living-person protection; person-level privacy override; onboarding + persona selector.
|
||||
- **Phase 3 — Property:** property entity; ownership events; chain-of-title view; property-aware sources.
|
||||
- **Phase 4 — AI assistant:** provider abstraction (hosted + local); scraper plugin framework; first connectors (FamilySearch, Find A Grave); propose-diff approval flow; assistant actions in audit log.
|
||||
- **Phase 5 — Federated auth:** OIDC (Authentik), then Google/Apple/Facebook sign-in.
|
||||
- **Phase 6 — Collaboration:** tree co-owners; audit-log UI; direct messaging; notifications.
|
||||
- **Phase 4 — AI assistant:** *(partly shipped early — provider abstraction + multi-provider registry #235/#237; ChangeProposal propose-then-confirm #236)* provider abstraction (hosted + local); scraper plugin framework; first connectors (FamilySearch, Find A Grave); propose-diff approval flow; assistant actions in audit log.
|
||||
- **Phase 5 — Federated auth:** *(not shipped — only the `AuthProvider` ABC exists)* OIDC (Authentik), then Google/Apple/Facebook sign-in.
|
||||
- **Phase 6 — Collaboration:** *(tree membership #233 landed early)* tree co-owners; audit-log UI; direct messaging; notifications.
|
||||
- **Phase 7 — Cross-tree hints:** async matching engine (embeddings-assisted); anonymous match notifications; mutual-consent reveal.
|
||||
- **Phase 8 — Land sources:** BLM/GLO patents; USGS map integration; additional county-deed connectors (merge existing scrapers).
|
||||
- **Phase 9 — Hardening & dogfooding** toward a possible hosted offering.
|
||||
|
||||
**Shipped ahead of sequence (operations & platform):** instance-owner/operator role (#240); operator backup tooling (#234); a schema-drift guard (#239). These landed early because the live lab deployment needed them. Note that despite their later issue numbers, **Phase 5 federated auth/OIDC is not yet shipped** — only the `AuthProvider` ABC is in place.
|
||||
|
||||
Rationale: enabling work (schema, auth, deploy, sources) precedes everything; GEDCOM lands before the assistant so AI writes target a stable model; property follows a well-tested people graph; hints come late because they require multiple populated trees.
|
||||
|
||||
## 9. Technical direction (summary)
|
||||
|
||||
@@ -1,6 +1,8 @@
|
||||
# Design note: ChangeProposal (propose-then-confirm)
|
||||
|
||||
Status: **in progress**. Implements non-negotiable #1 (CLAUDE.md): *the AI
|
||||
Status: **Shipped (#214/#236)** — model, service, API, and review UI landed; the
|
||||
assistant producer and cross-op transactional apply remain as follow-ups (see
|
||||
Out of scope). Implements non-negotiable #1 (CLAUDE.md): *the AI
|
||||
assistant never writes autonomously.* Every assistant "write" emits a
|
||||
**ChangeProposal** — a structured diff a human approves, edits, or rejects.
|
||||
|
||||
@@ -63,7 +65,9 @@ is a follow-up (it needs the services to accept a no-commit mode).
|
||||
- `apply(session, *, actor, tree, proposal_id, edited_operations=None) -> ChangeProposal`
|
||||
— editor-only. Optional `edited_operations` lets the reviewer tweak the diff
|
||||
before applying ("edit" in approve/edit/reject). Dispatches each op through the
|
||||
editing services; on any failure, rolls back and records `apply_error`.
|
||||
editing services; on failure it records `apply_error` and leaves the proposal
|
||||
pending — it does **not** roll back ops already committed by earlier dispatches
|
||||
(v1 is not cross-op transactional; see Data model).
|
||||
- `reject(session, *, actor, tree, proposal_id, note=None)` — editor-only.
|
||||
|
||||
## API
|
||||
|
||||
@@ -1,11 +1,11 @@
|
||||
# Design note: tree visibility & the public viewing surface
|
||||
|
||||
Status: **proposed** (design only — no code yet). Owner: Justin. Created 2026-06-09.
|
||||
Status: **Shipped (#41-#51)**. Owner: Justin. Created 2026-06-09.
|
||||
|
||||
This is a privacy-critical change (it creates the first anonymous read surface in
|
||||
Provenance). Per CLAUDE.md, design before code. Implementation should land in
|
||||
small, individually-reviewable PRs, with tests on the privacy engine and the
|
||||
public read path before any anonymous endpoint is exposed.
|
||||
This is a privacy-critical change (it created the first anonymous read surface in
|
||||
Provenance). Per CLAUDE.md, it was designed before code and shipped in small,
|
||||
individually-reviewable PRs, with tests on the privacy engine and the public read
|
||||
path landing before any anonymous endpoint was exposed.
|
||||
|
||||
## 1. The model
|
||||
|
||||
@@ -74,13 +74,12 @@ logged-in non-member; `private` denies both.
|
||||
|
||||
## 4. The anonymous read path (the careful part)
|
||||
|
||||
**Recommendation: a dedicated read-only public API namespace**, not optional-auth
|
||||
on the existing endpoints. Rationale: it is far easier to audit a small,
|
||||
purpose-built surface that *always* funnels through `person_visibility` than to
|
||||
weaken the membership checks on the authenticated endpoints and hope every branch
|
||||
is covered.
|
||||
**Shipped: a dedicated read-only public API namespace**, not optional-auth on the
|
||||
existing endpoints. Rationale: it is far easier to audit a small, purpose-built
|
||||
surface that *always* funnels through `person_visibility` than to weaken the
|
||||
membership checks on the authenticated endpoints and hope every branch is covered.
|
||||
|
||||
- New router `app/api/v1/public.py`, mounted at `/api/v1/public`, with an
|
||||
- Router `app/api/v1/public.py`, mounted at `/api/v1/public`, with an
|
||||
**optional-auth** dependency `CurrentUserOrNone` (returns `User | None`; never
|
||||
401s). Contrast with `CurrentUser` (`deps.py:30-36`) which hard-401s.
|
||||
- Endpoints (read-only; no create/update/delete):
|
||||
@@ -88,14 +87,20 @@ is covered.
|
||||
lists `site_members` when the caller is authenticated. Paginated, search via
|
||||
existing `pg_trgm`. Never lists `unlisted`/`private`.
|
||||
- `GET /public/trees/{id}` — tree metadata if `can_view_tree(user_or_none)`.
|
||||
- `GET /public/trees/{id}/persons`, `/persons/{pid}`, `/relationships`,
|
||||
`/events`, `/media`, … — each filtered through `person_visibility`, returning
|
||||
redacted projections (a `PublicPersonRead` that omits PII for redacted people:
|
||||
no exact dates, no living-person names beyond "Living", etc.).
|
||||
- **A redacted response schema**, distinct from the member `PersonRead`, so the
|
||||
serializer physically cannot emit fields a non-member shouldn't see. Redaction
|
||||
happens in the service, not the route.
|
||||
- **Rate limiting** on the public namespace (per-IP) to blunt scraping/enumeration.
|
||||
- `GET /public/trees/{id}/persons`, `/persons/{pid}`, `/persons/{pid}/names`,
|
||||
`/relationships`, `/events` — each filtered through `person_visibility`.
|
||||
(Media is not exposed on the public surface yet — deferred.)
|
||||
- **Redaction happens in the service, before serialization** — this is the safety
|
||||
guarantee. It did **not** ship as a separate `PublicPersonRead` schema (that
|
||||
recommendation was not adopted): the public router **reuses the member read
|
||||
schemas** (`PersonRead`, `RelationshipRead`, `EventRead`, `NameRead`), and only
|
||||
the tree projection (`PublicTreeRead`) is distinct. Safety comes from
|
||||
`public_view_service` resolving `person_visibility` and then **dropping hidden
|
||||
rows and redacting possibly-living people** (`person_service._redact` rewrites
|
||||
the name to "Living person", etc.) *before* a row is ever validated into a
|
||||
schema. No route hands a raw row to the serializer.
|
||||
- **Rate limiting** on the public namespace (per-IP) is **deferred** — it is not
|
||||
implemented in the app and may be handled at the Caddy edge if needed.
|
||||
- **Audit**: count public reads; do not log PII.
|
||||
|
||||
## 5. Frontend public pages
|
||||
@@ -103,8 +108,12 @@ is covered.
|
||||
- New **server-rendered** routes outside the authed app shell, e.g.
|
||||
`/p/[treeId]` (tree), `/p/[treeId]/[personId]` (person), `/explore` (directory).
|
||||
Server components fetch the `/api/v1/public/*` endpoints; no login redirect.
|
||||
- `robots`: allow + sitemap for `public`; `noindex, nofollow` meta for `unlisted`
|
||||
and `site_members`. Sitemap lists only `public` trees/persons.
|
||||
- `robots`: ships a coarse `allow: ["/", "/p/"]` rule (`frontend/app/robots.ts`)
|
||||
that keeps the authed app out of the index. Per-tree `noindex, nofollow` meta
|
||||
for `unlisted`/`site_members` and a `public`-only **sitemap** did **not** ship —
|
||||
both are **deferred** follow-ups (per-tree noindex needs server rendering;
|
||||
meanwhile `unlisted`/`site_members` trees aren't linked or listed, so they
|
||||
aren't crawl-discoverable).
|
||||
- The directory `/explore` is anonymous for `public`; shows `site_members` trees
|
||||
only to logged-in users.
|
||||
- Reuse the tree/person view components where possible, fed by the redacted
|
||||
@@ -131,7 +140,9 @@ anyone on the web. Living people stay hidden.") is worthwhile given the stakes.
|
||||
output. No raw repository reads in the public router.
|
||||
- Living-person protection holds regardless of tree visibility.
|
||||
- Unlisted relies on UUID unguessability; never expose a sequential public id.
|
||||
- `noindex` everything except `public`; sitemap is `public`-only.
|
||||
- Per-tree `noindex` (everything except `public`) and a `public`-only sitemap are
|
||||
**deferred** (see §5); today `robots.ts` keeps the authed app out of the index
|
||||
and `unlisted`/`site_members` trees aren't linked or listed.
|
||||
- Tests gate the merge: privacy-engine matrix + an integration test that hits the
|
||||
public endpoints anonymously and asserts no living-person PII leaks.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user