docs: bring all documentation current with shipped work

A multi-agent audit of every doc against the code surfaced ~50 stale/missing
items (the roadmap/status docs and the backlog had fallen behind the code).
This catches them up:

- CLAUDE.md: phase status was ~3 phases stale ("Phase 1 is next" while Phase 1 +
  chunks of 2 & 4 shipped). Rewrote the status list; added a model-provider
  tech-stack entry; updated repo-layout (integrations objectstore/models,
  deploy backup.sh/dev compose).
- ARCHITECTURE.md: §6 privacy engine described 3 visibility levels — corrected to
  the shipped 4 (adds site_members); documented per-tree AI policy on Tree,
  LLMProvider/EmbeddingProvider split + registry, ChangeProposal origin/status/
  operations, verified-email session gate, instance-owner role, schema-drift
  guard, and the env_file config model.
- PRD.md: 4-level visibility in US-040/§5.5, instance-owner role (§5.1/§5.11),
  per-tree AI policy (§5.8), §8 sequencing annotated with shipped status, header
  date/status bumped.
- README.md: 4-level privacy; softened "Full GEDCOM 7" to the 5.5.1/7 common
  subset; noted backups + instance-owner admin; moved property/land to an
  explicit "where it's headed" (no property models exist yet).
- BACKLOG.md: flipped ~15 shipped-but-open rows to Have (ChangeProposal, provider
  abstraction, GEDCOM citation export, membership management, operator backup,
  email-verification gate, per-tree AI policy, instance owner, the whole
  visibility/public-viewing/child-resource-redaction cluster #41-#51/#46), and
  reconciled the executive summary, "current defects" list, quick wins, and
  differentiators. Left genuinely-open items (citation/source redaction, sitemap,
  per-tree noindex, scoped-token API) accurately open.
- .env.example: dropped "SMTP wired in a later phase"; documented the worker
  purge knobs, S3_PRESIGN_TTL, COOKIE_NAME; removed a stray duplicate line.
- design/: tree-visibility.md and change-proposal.md marked Shipped; corrected
  the redaction approach (reuses member schemas, not a separate PublicPersonRead)
  and the apply() rollback claim (v1 is not cross-op transactional), and marked
  rate-limiting/sitemap/noindex as deferred.

No code changes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Justin Paul <justin@jpaul.me>
This commit is contained in:
2026-06-10 21:05:29 -04:00
parent 0388b9b99f
commit 447daf7fa8
8 changed files with 135 additions and 96 deletions
+33 -22
View File
@@ -1,11 +1,11 @@
# Design note: tree visibility & the public viewing surface
Status: **proposed** (design only — no code yet). Owner: Justin. Created 2026-06-09.
Status: **Shipped (#41-#51)**. Owner: Justin. Created 2026-06-09.
This is a privacy-critical change (it creates the first anonymous read surface in
Provenance). Per CLAUDE.md, design before code. Implementation should land in
small, individually-reviewable PRs, with tests on the privacy engine and the
public read path before any anonymous endpoint is exposed.
This is a privacy-critical change (it created the first anonymous read surface in
Provenance). Per CLAUDE.md, it was designed before code and shipped in small,
individually-reviewable PRs, with tests on the privacy engine and the public read
path landing before any anonymous endpoint was exposed.
## 1. The model
@@ -74,13 +74,12 @@ logged-in non-member; `private` denies both.
## 4. The anonymous read path (the careful part)
**Recommendation: a dedicated read-only public API namespace**, not optional-auth
on the existing endpoints. Rationale: it is far easier to audit a small,
purpose-built surface that *always* funnels through `person_visibility` than to
weaken the membership checks on the authenticated endpoints and hope every branch
is covered.
**Shipped: a dedicated read-only public API namespace**, not optional-auth on the
existing endpoints. Rationale: it is far easier to audit a small, purpose-built
surface that *always* funnels through `person_visibility` than to weaken the
membership checks on the authenticated endpoints and hope every branch is covered.
- New router `app/api/v1/public.py`, mounted at `/api/v1/public`, with an
- Router `app/api/v1/public.py`, mounted at `/api/v1/public`, with an
**optional-auth** dependency `CurrentUserOrNone` (returns `User | None`; never
401s). Contrast with `CurrentUser` (`deps.py:30-36`) which hard-401s.
- Endpoints (read-only; no create/update/delete):
@@ -88,14 +87,20 @@ is covered.
lists `site_members` when the caller is authenticated. Paginated, search via
existing `pg_trgm`. Never lists `unlisted`/`private`.
- `GET /public/trees/{id}` — tree metadata if `can_view_tree(user_or_none)`.
- `GET /public/trees/{id}/persons`, `/persons/{pid}`, `/relationships`,
`/events`, `/media`, … — each filtered through `person_visibility`, returning
redacted projections (a `PublicPersonRead` that omits PII for redacted people:
no exact dates, no living-person names beyond "Living", etc.).
- **A redacted response schema**, distinct from the member `PersonRead`, so the
serializer physically cannot emit fields a non-member shouldn't see. Redaction
happens in the service, not the route.
- **Rate limiting** on the public namespace (per-IP) to blunt scraping/enumeration.
- `GET /public/trees/{id}/persons`, `/persons/{pid}`, `/persons/{pid}/names`,
`/relationships`, `/events` — each filtered through `person_visibility`.
(Media is not exposed on the public surface yet — deferred.)
- **Redaction happens in the service, before serialization** — this is the safety
guarantee. It did **not** ship as a separate `PublicPersonRead` schema (that
recommendation was not adopted): the public router **reuses the member read
schemas** (`PersonRead`, `RelationshipRead`, `EventRead`, `NameRead`), and only
the tree projection (`PublicTreeRead`) is distinct. Safety comes from
`public_view_service` resolving `person_visibility` and then **dropping hidden
rows and redacting possibly-living people** (`person_service._redact` rewrites
the name to "Living person", etc.) *before* a row is ever validated into a
schema. No route hands a raw row to the serializer.
- **Rate limiting** on the public namespace (per-IP) is **deferred** — it is not
implemented in the app and may be handled at the Caddy edge if needed.
- **Audit**: count public reads; do not log PII.
## 5. Frontend public pages
@@ -103,8 +108,12 @@ is covered.
- New **server-rendered** routes outside the authed app shell, e.g.
`/p/[treeId]` (tree), `/p/[treeId]/[personId]` (person), `/explore` (directory).
Server components fetch the `/api/v1/public/*` endpoints; no login redirect.
- `robots`: allow + sitemap for `public`; `noindex, nofollow` meta for `unlisted`
and `site_members`. Sitemap lists only `public` trees/persons.
- `robots`: ships a coarse `allow: ["/", "/p/"]` rule (`frontend/app/robots.ts`)
that keeps the authed app out of the index. Per-tree `noindex, nofollow` meta
for `unlisted`/`site_members` and a `public`-only **sitemap** did **not** ship —
both are **deferred** follow-ups (per-tree noindex needs server rendering;
meanwhile `unlisted`/`site_members` trees aren't linked or listed, so they
aren't crawl-discoverable).
- The directory `/explore` is anonymous for `public`; shows `site_members` trees
only to logged-in users.
- Reuse the tree/person view components where possible, fed by the redacted
@@ -131,7 +140,9 @@ anyone on the web. Living people stay hidden.") is worthwhile given the stakes.
output. No raw repository reads in the public router.
- Living-person protection holds regardless of tree visibility.
- Unlisted relies on UUID unguessability; never expose a sequential public id.
- `noindex` everything except `public`; sitemap is `public`-only.
- Per-tree `noindex` (everything except `public`) and a `public`-only sitemap are
**deferred** (see §5); today `robots.ts` keeps the authed app out of the index
and `unlisted`/`site_members` trees aren't linked or listed.
- Tests gate the merge: privacy-engine matrix + an integration test that hits the
public endpoints anonymously and asserts no living-person PII leaks.