docs: bring all documentation current with shipped work

A multi-agent audit of every doc against the code surfaced ~50 stale/missing items (the roadmap/status docs and the backlog had fallen behind the code). This catches them up: - CLAUDE.md: phase status was ~3 phases stale ("Phase 1 is next" while Phase 1 + chunks of 2 & 4 shipped). Rewrote the status list; added a model-provider tech-stack entry; updated repo-layout (integrations objectstore/models, deploy backup.sh/dev compose). - ARCHITECTURE.md: §6 privacy engine described 3 visibility levels — corrected to the shipped 4 (adds site_members); documented per-tree AI policy on Tree, LLMProvider/EmbeddingProvider split + registry, ChangeProposal origin/status/ operations, verified-email session gate, instance-owner role, schema-drift guard, and the env_file config model. - PRD.md: 4-level visibility in US-040/§5.5, instance-owner role (§5.1/§5.11), per-tree AI policy (§5.8), §8 sequencing annotated with shipped status, header date/status bumped. - README.md: 4-level privacy; softened "Full GEDCOM 7" to the 5.5.1/7 common subset; noted backups + instance-owner admin; moved property/land to an explicit "where it's headed" (no property models exist yet). - BACKLOG.md: flipped ~15 shipped-but-open rows to Have (ChangeProposal, provider abstraction, GEDCOM citation export, membership management, operator backup, email-verification gate, per-tree AI policy, instance owner, the whole visibility/public-viewing/child-resource-redaction cluster #41-#51/#46), and reconciled the executive summary, "current defects" list, quick wins, and differentiators. Left genuinely-open items (citation/source redaction, sitemap, per-tree noindex, scoped-token API) accurately open. - .env.example: dropped "SMTP wired in a later phase"; documented the worker purge knobs, S3_PRESIGN_TTL, COOKIE_NAME; removed a stray duplicate line. - design/: tree-visibility.md and change-proposal.md marked Shipped; corrected the redaction approach (reuses member schemas, not a separate PublicPersonRead) and the apply() rollback claim (v1 is not cross-op transactional), and marked rate-limiting/sitemap/noindex as deferred. No code changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>
2026-06-10 21:05:29 -04:00
parent 0388b9b99f
commit 447daf7fa8
8 changed files with 135 additions and 96 deletions
@@ -69,7 +69,7 @@ Layered, dependency pointing inward:
 - **Service layer** — all domain logic and the only place writes happen. Enforces invariants (e.g., "a write must carry an actor for the audit log"). The privacy engine is invoked here on every read.
 - **Repository layer** — data access over SQLAlchemy; no business rules.
 - **Domain models** — the entities in §5.
- **Integrations** — adapters behind interfaces: `AuthProvider`, `ObjectStore`, `Mailer`, `ModelProvider`, `SourceConnector`, `Queue`. Swapping an implementation is a config change, not a code change.
+- **Integrations** — adapters behind interfaces: `AuthProvider`, `ObjectStore`, `Mailer`, `LLMProvider` / `EmbeddingProvider` (two separate model abstractions), `SourceConnector`, `Queue`. Swapping an implementation is a config change, not a code change.

 Async throughout (FastAPI + async SQLAlchemy). Anything that can be slow or can fail externally (model calls, scraping, large imports) goes to the worker, never inline in a request.

@@ -87,7 +87,7 @@ Core entities and the important relationships. (Illustrative, not final DDL.)

 ### Tenancy & identity
 - **User** — a person with login. Auth method(s) are attached but identity is internal, so one user can link multiple providers.
- **Tree** — the top-level tenant boundary for genealogical data. Owned by a User; may have additional members.
+- **Tree** — the top-level tenant boundary for genealogical data. Owned by a User; may have additional members. Carries a per-tree **AI model policy** (owner-configured): `ai_member_provider` and `ai_recommender_provider` name configured providers from the model-provider registry (null = no model for that role); the owner may use any configured provider, while these cap what members and the recommender may use. Set via the owner-only `GET`/`PATCH /trees/{id}/ai`.
 - **TreeMembership** — (User, Tree, role) where role ∈ {owner, editor, viewer}. The basis for authorization *within a tree*.
 - **Instance owner / operator** — orthogonal to tree roles. The account(s) whose email is named in the `OWNER_EMAIL` env var **and whose email is verified** are the instance's operator(s), with access to the owner-only `/api/v1/admin` surface (operational status, instance-wide config). Derived from the env at request time — no DB column, no migration, can't drift, survives DB resets. The verified-email requirement is deliberate: registration is open, so without it whoever registers the owner address first would seize the role — verification ties ownership to proven control of the inbox. Crucially this is **not** a privacy bypass: an instance owner gets operational/config rights, **not** read access to other users' private trees or living-person PII — those still resolve only through the privacy engine. (`is_instance_owner` in `api/deps.py`.)

@@ -109,7 +109,7 @@ Core entities and the important relationships. (Illustrative, not final DDL.)
 ### Cross-cutting
 - **AuditEntry** — append-only: actor (User *or* the assistant principal acting for a User), action, entity, before/after snapshot, timestamp. Immutable.
 - **SoftDelete** — entities carry `deleted_at`; a scheduled worker purges rows older than 30 days. Recovery = clearing `deleted_at` within the window.
- **ChangeProposal** — a pending set of writes generated by the assistant (or potentially a collaborator suggestion later): a structured diff the user approves, edits, or rejects. Approved proposals are applied through the normal service layer (so they hit the privacy engine and the audit log like any other write).
+- **ChangeProposal** — a pending set of writes: records an `origin` (`assistant` | `contributor` — collaborator suggestions are encoded today, not just a future idea), a `status` (pending/applied/rejected), a structured `operations` diff (JSONB list of `{op, entity_type, entity_id?, payload}`), a summary/rationale, and review/apply-error metadata. The user approves, edits, or rejects; approved proposals are applied through the normal service layer (so they hit the privacy engine and audit log like any other write). *Note: v1 apply is not cross-op transactional — see `docs/design/change-proposal.md`.*

 ## 6. Privacy engine

@@ -119,11 +119,12 @@ A single function conceptually:
 visible(viewer, entity) -> { full | redacted | hidden }
 ```

-Inputs: viewer's role on the entity's Tree (including "anonymous"), the Tree's visibility (public/unlisted/private), per-Person privacy override, and living-person status.
+Inputs: viewer's role on the entity's Tree (including "anonymous"), the Tree's visibility (public / site_members / unlisted / private), per-Person privacy override, and living-person status.

 Rules:
 - **Tree private** → only members see anything.
- **Tree public/unlisted** → non-members get a read view, *but* every Person is run through the living-person check and per-person override first.
+- **Tree site_members** → any authenticated account on this instance gets a read view (anonymous viewers get nothing), still per-person living/override filtered.
+- **Tree unlisted / public** → non-members *including anonymous viewers* get a read view, *but* every Person is run through the living-person check and per-person override first. Unlisted is gated only by knowing the link (never listed or search-indexed); public is listed in `/explore` and indexable.
 - **Living-person rule** — a Person with no death fact, whose birth is within a configurable recency window (default ~100 years; unknown birth treated as possibly-living), is redacted (name minimized, vitals/events/media hidden) for non-owners. Owners may override per Person.
 - The engine is invoked in the **service layer**, so it covers API, server-rendered public pages, search results, and any data the assistant can read. There is intentionally no path that returns rows without passing through it.

@@ -131,7 +132,7 @@ Rules:

 Three parts, deliberately separated:

-1. **Model provider abstraction** (`ModelProvider`) — one interface over hosted models (Anthropic, OpenAI, xAI) and self-hosted/local models via an OpenAI-compatible endpoint or Ollama. Configurable per deployment; keys supplied by the operator (this deployment) or by the user (BYO-key deployments).
+1. **Model provider abstraction** — two separate interfaces, `LLMProvider` and `EmbeddingProvider` (configured independently — e.g. Anthropic has no embeddings endpoint), over hosted models (Anthropic, OpenAI, xAI) and self-hosted/local models via an OpenAI-compatible endpoint or Ollama. An operator can configure **several providers at once** through a registry (`build_llm_providers()`/`configured_llm_providers()`), each selectable by name — the basis for the per-tree AI policy and the `default_llm_provider`/`default_embedding_provider` settings. Keys supplied by the operator (this deployment) or by the user (BYO-key deployments).
 2. **Scoped tool surface** — the assistant can only act through a constrained set of tools that map to service-layer operations, **scoped to the user it is helping.** It is its own principal: it cannot exceed that user's rights, and every action is attributed to "assistant (on behalf of User X)" in the audit log. This is the MCP-style boundary referenced in the PRD — the assistant gets capabilities, not raw database access.
 3. **Source connectors** (`SourceConnector`) — a plugin framework for *reading* external data: FamilySearch API, Find A Grave, WikiTree, BLM/GLO land patents, USGS maps, public-domain newspapers, public county records. Only legally permissible sources ship with the project; operators can add their own. Connectors are read-only and rate-limited, and run in the worker.

@@ -149,7 +150,8 @@ Three parts, deliberately separated:
 - `AuthProvider` interface with implementations for **local** (password + email verification/reset), **OIDC** (validated against Authentik; expected to work with Keycloak, Auth0, etc.), and **social** (Google, Apple, Facebook).
 - Operators enable any subset via config. This deployment will use Authentik (`auth.jpaul.io`) plus selected social providers; a bare self-hoster can run local-only.
 - Sessions are backend-issued; the assistant principal is minted per-session and scoped to the acting user.
- *Status:* **local auth has landed** — Argon2id password hashing, opaque backend-issued sessions (only the token hash is stored; presented as a Bearer token or HttpOnly cookie), and email verification + password reset via the `Mailer` interface (console in dev, SMTP for operators). OIDC and social providers are Phase 5. Every write records an attributable actor in the audit log.
+- *Status:* **local auth has landed** — Argon2id password hashing, opaque backend-issued sessions (only the token hash is stored; presented as a Bearer token or HttpOnly cookie), and email verification + password reset via the `Mailer` interface (console in dev, SMTP for operators). An opt-in gate (`REQUIRE_EMAIL_VERIFICATION`, default off so SMTP-less self-hosts and pre-existing accounts aren't locked out) refuses sessions for accounts without a verified email — login is denied and existing sessions stop resolving until the address is verified. OIDC and social providers are Phase 5. Every write records an attributable actor in the audit log.
+- **Instance owner / operator** (orthogonal to the per-tree roles): the account(s) whose email is in `OWNER_EMAIL` *and* is verified are the instance operator(s), with the owner-only `/api/v1/admin` surface (operational status, instance-wide config). Derived from the env at request time — no DB column. It is an operator/config role, **not** a privacy bypass: it grants no read access to other users' private trees or living-person PII. (`is_instance_owner` in `api/deps.py`.)

 ## 10. Search

@@ -176,20 +178,20 @@ Jobs are idempotent and retryable; an external failure degrades gracefully rathe
 - Tag scheme: `test-main` (current main), `test-sha-<long>` (rollback pins), the component version, and `latest` on `v*` tags.
 - Servers **pull** new images to deploy — no build on the host. The deploy compose references `git.jpaul.io/justin/provenance-{backend,frontend}:${IMAGE_TAG:-test-main}`; `docker-compose.dev.yml` is a local-build override.
 - **Caddy** terminates TLS and reverse-proxies frontend + backend. **Cloudflare Tunnel** is the preferred ingress (no open inbound ports) but is never required; a plain Caddy-on-a-public-host deployment is equally supported.
- **Configuration** is entirely environment-driven (twelve-factor). One `.env` plus the compose file is enough to stand up a deployment.
- **Migrations** run on backend start (or via an explicit job) so an image pull + restart is a complete upgrade.
- **Backups:** documented procedure for Postgres dump + object-store sync; restore is the inverse.
+- **Configuration** is entirely environment-driven (twelve-factor). One `.env` plus the compose file is enough to stand up a deployment; the backend/worker/migrate services read it via `env_file`, so every setting in `app/core/config.py` is configurable without a compose edit.
+- **Migrations** run on backend start (`RUN_MIGRATIONS=1`) and via a one-shot `migrate` compose service, so an image pull + restart is a complete upgrade. A **schema-drift guard** (defense in depth) makes a half-applied deploy loud rather than a silent storm of 500s: `/health/ready` returns 503 and startup logs a CRITICAL `SCHEMA DRIFT` line when the DB's `alembic_version` is behind the heads baked into the image (`app/core/schema_version.py`).
+- **Backups:** a one-command operator script (`deploy/backup.sh` — `pg_dump` + MinIO object sync, see `deploy/BACKUP.md`) plus a per-account ZIP export; restore is the inverse.

 **Repository layout (as scaffolded):**

 ```
-/backend           # FastAPI, uv-managed. app/{api/v1, services (+privacy), repositories, models, schemas, integrations (auth/mailer), core}; migrations/ = Alembic
-/deploy            # docker-compose.yml, Caddyfile, .env.example
+/backend           # FastAPI, uv-managed. app/{api/v1, services (+privacy), repositories, models, schemas, integrations (auth, mailer, objectstore, models = LLM/embedding providers), core}; migrations/ = Alembic
+/deploy            # docker-compose.yml (+ docker-compose.dev.yml), Caddyfile, .env.example, backup.sh + BACKUP.md
 /.gitea/workflows  # Gitea Actions: build images → Gitea registry
 /frontend          # Next.js (App Router, TS, Tailwind). app/ pages, lib/api (openapi-typescript client), components/ui, Dockerfile (standalone)
 ```

-The compose stack runs `postgres` (pgvector image — includes `pgvector`; `pg_trgm` ships in contrib), `minio`, `backend`, and `caddy`. The **worker** container (same image as backend, worker mode) joins once queue-driven jobs exist. Phase 0 ships a minimal backend with `/health` (liveness) and `/health/ready` (Postgres reachability) to validate the deploy wiring before the data model lands.
+The compose stack runs `postgres` (pgvector image — includes `pgvector`; `pg_trgm` ships in contrib), `minio`, a one-shot `migrate` job, `backend`, the **worker** (same image as backend, worker mode — runs the scheduled soft-delete purge), `caddy`, and an optional `cloudflared` tunnel. The backend exposes `/health` (liveness) and `/health/ready` (Postgres reachability + schema-drift check).

 ## 13. Observability