447daf7fa8
A multi-agent audit of every doc against the code surfaced ~50 stale/missing
items (the roadmap/status docs and the backlog had fallen behind the code).
This catches them up:
- CLAUDE.md: phase status was ~3 phases stale ("Phase 1 is next" while Phase 1 +
chunks of 2 & 4 shipped). Rewrote the status list; added a model-provider
tech-stack entry; updated repo-layout (integrations objectstore/models,
deploy backup.sh/dev compose).
- ARCHITECTURE.md: §6 privacy engine described 3 visibility levels — corrected to
the shipped 4 (adds site_members); documented per-tree AI policy on Tree,
LLMProvider/EmbeddingProvider split + registry, ChangeProposal origin/status/
operations, verified-email session gate, instance-owner role, schema-drift
guard, and the env_file config model.
- PRD.md: 4-level visibility in US-040/§5.5, instance-owner role (§5.1/§5.11),
per-tree AI policy (§5.8), §8 sequencing annotated with shipped status, header
date/status bumped.
- README.md: 4-level privacy; softened "Full GEDCOM 7" to the 5.5.1/7 common
subset; noted backups + instance-owner admin; moved property/land to an
explicit "where it's headed" (no property models exist yet).
- BACKLOG.md: flipped ~15 shipped-but-open rows to Have (ChangeProposal, provider
abstraction, GEDCOM citation export, membership management, operator backup,
email-verification gate, per-tree AI policy, instance owner, the whole
visibility/public-viewing/child-resource-redaction cluster #41-#51/#46), and
reconciled the executive summary, "current defects" list, quick wins, and
differentiators. Left genuinely-open items (citation/source redaction, sitemap,
per-tree noindex, scoped-token API) accurately open.
- .env.example: dropped "SMTP wired in a later phase"; documented the worker
purge knobs, S3_PRESIGN_TTL, COOKIE_NAME; removed a stray duplicate line.
- design/: tree-visibility.md and change-proposal.md marked Shipped; corrected
the redaction approach (reuses member schemas, not a separate PublicPersonRead)
and the apply() rollback claim (v1 is not cross-op transactional), and marked
rate-limiting/sitemap/noindex as deferred.
No code changes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Justin Paul <justin@jpaul.me>
169 lines
8.4 KiB
Markdown
169 lines
8.4 KiB
Markdown
# Design note: tree visibility & the public viewing surface
|
||
|
||
Status: **Shipped (#41-#51)**. Owner: Justin. Created 2026-06-09.
|
||
|
||
This is a privacy-critical change (it created the first anonymous read surface in
|
||
Provenance). Per CLAUDE.md, it was designed before code and shipped in small,
|
||
individually-reviewable PRs, with tests on the privacy engine and the public read
|
||
path landing before any anonymous endpoint was exposed.
|
||
|
||
## 1. The model
|
||
|
||
Visibility flattens **two axes** — *who may read* and *how discoverable* — into one
|
||
ordered enum for the UI:
|
||
|
||
| Level | Anonymous (no login) | Any logged-in user | Tree members | In-app directory | Search-indexed |
|
||
|---|---|---|---|---|---|
|
||
| `public` — anyone on the web | ✅ view¹ | ✅ view¹ | ✅ full | ✅ listed to everyone | ✅ sitemap + indexable |
|
||
| `site_members` — Public, Site Members | ❌ | ✅ view¹ | ✅ full | ✅ listed to logged-in users | ❌ (`noindex`) |
|
||
| `unlisted` — anyone with the link | ✅ via direct link¹ | ✅ via link¹ | ✅ full | ❌ never listed | ❌ (`noindex`) |
|
||
| `private` | ❌ | ❌ | ✅ full | ❌ | ❌ |
|
||
|
||
¹ **Every non-member view passes through the privacy engine.** Living people are
|
||
redacted, and per-person `private` hides / `public` reveals, exactly as
|
||
`person_visibility()` already does (`backend/app/services/privacy.py:100-110`).
|
||
This is the single enforcement point — no public code path may issue a raw query.
|
||
|
||
Decisions captured (2026-06-09):
|
||
- **Unlisted** = anyone with the link, no account required. The link must be
|
||
**unguessable** (the tree UUID is already non-enumerable; do not add a public
|
||
integer id). Unlisted trees are excluded from the directory and sitemap and
|
||
served `noindex`.
|
||
- **Public** discovery for v1 includes **an in-app public browse/search**, not
|
||
just search-engine indexing.
|
||
- **Public – Site Members** = *any* registered account on this instance (not an
|
||
invite list — that is already tree membership / `private`).
|
||
|
||
## 2. Data model
|
||
|
||
`TreeVisibility` enum (`backend/app/models/enums.py`) gains a value:
|
||
|
||
```
|
||
public # anyone on the web
|
||
site_members # any authenticated user of this instance <-- NEW
|
||
unlisted # anyone with the link
|
||
private # members only (default)
|
||
```
|
||
|
||
- Alembic migration to `ALTER TYPE tree_visibility ADD VALUE 'site_members'`
|
||
(Postgres enum add-value cannot run inside a transaction with other DDL — use
|
||
`op.execute` with autocommit, separate migration).
|
||
- Default stays `private`. Existing rows unchanged.
|
||
- `TreeRead`/`TreeUpdate`/`TreeCreate` schemas already carry the enum; they pick
|
||
up the new value automatically. The OpenAPI client regen (`gen:api`) exposes it
|
||
to the frontend.
|
||
|
||
## 3. Privacy engine
|
||
|
||
`can_view_tree()` today treats `public` and `unlisted` identically and ignores
|
||
whether the viewer is anonymous vs authenticated (`privacy.py:44-49`). Replace the
|
||
final line with explicit branching on viewer auth state:
|
||
|
||
```
|
||
if membership: return True # members always
|
||
match tree.visibility:
|
||
public, unlisted: return True # anonymous OK (unlisted gated only by knowing the link)
|
||
site_members: return user_id is not None # any logged-in account
|
||
private: return False
|
||
```
|
||
|
||
`person_visibility()` is unchanged — it already redacts living/private people for
|
||
non-members. Add focused unit tests: anonymous + each visibility; living person
|
||
redacted on public/unlisted; `site_members` denies anonymous but allows a
|
||
logged-in non-member; `private` denies both.
|
||
|
||
## 4. The anonymous read path (the careful part)
|
||
|
||
**Shipped: a dedicated read-only public API namespace**, not optional-auth on the
|
||
existing endpoints. Rationale: it is far easier to audit a small, purpose-built
|
||
surface that *always* funnels through `person_visibility` than to weaken the
|
||
membership checks on the authenticated endpoints and hope every branch is covered.
|
||
|
||
- Router `app/api/v1/public.py`, mounted at `/api/v1/public`, with an
|
||
**optional-auth** dependency `CurrentUserOrNone` (returns `User | None`; never
|
||
401s). Contrast with `CurrentUser` (`deps.py:30-36`) which hard-401s.
|
||
- Endpoints (read-only; no create/update/delete):
|
||
- `GET /public/trees` — directory: lists `public` to everyone; additionally
|
||
lists `site_members` when the caller is authenticated. Paginated, search via
|
||
existing `pg_trgm`. Never lists `unlisted`/`private`.
|
||
- `GET /public/trees/{id}` — tree metadata if `can_view_tree(user_or_none)`.
|
||
- `GET /public/trees/{id}/persons`, `/persons/{pid}`, `/persons/{pid}/names`,
|
||
`/relationships`, `/events` — each filtered through `person_visibility`.
|
||
(Media is not exposed on the public surface yet — deferred.)
|
||
- **Redaction happens in the service, before serialization** — this is the safety
|
||
guarantee. It did **not** ship as a separate `PublicPersonRead` schema (that
|
||
recommendation was not adopted): the public router **reuses the member read
|
||
schemas** (`PersonRead`, `RelationshipRead`, `EventRead`, `NameRead`), and only
|
||
the tree projection (`PublicTreeRead`) is distinct. Safety comes from
|
||
`public_view_service` resolving `person_visibility` and then **dropping hidden
|
||
rows and redacting possibly-living people** (`person_service._redact` rewrites
|
||
the name to "Living person", etc.) *before* a row is ever validated into a
|
||
schema. No route hands a raw row to the serializer.
|
||
- **Rate limiting** on the public namespace (per-IP) is **deferred** — it is not
|
||
implemented in the app and may be handled at the Caddy edge if needed.
|
||
- **Audit**: count public reads; do not log PII.
|
||
|
||
## 5. Frontend public pages
|
||
|
||
- New **server-rendered** routes outside the authed app shell, e.g.
|
||
`/p/[treeId]` (tree), `/p/[treeId]/[personId]` (person), `/explore` (directory).
|
||
Server components fetch the `/api/v1/public/*` endpoints; no login redirect.
|
||
- `robots`: ships a coarse `allow: ["/", "/p/"]` rule (`frontend/app/robots.ts`)
|
||
that keeps the authed app out of the index. Per-tree `noindex, nofollow` meta
|
||
for `unlisted`/`site_members` and a `public`-only **sitemap** did **not** ship —
|
||
both are **deferred** follow-ups (per-tree noindex needs server rendering;
|
||
meanwhile `unlisted`/`site_members` trees aren't linked or listed, so they
|
||
aren't crawl-discoverable).
|
||
- The directory `/explore` is anonymous for `public`; shows `site_members` trees
|
||
only to logged-in users.
|
||
- Reuse the tree/person view components where possible, fed by the redacted
|
||
schema.
|
||
|
||
## 6. UI control
|
||
|
||
Update the visibility dropdown (`frontend/app/trees/page.tsx`, shipped in PR #41)
|
||
from 3 to 4 options with helper text:
|
||
|
||
```
|
||
Private — only you and people you invite
|
||
Public – Members — any signed-in user on this site
|
||
Unlisted — anyone with the link (not listed or indexed)
|
||
Public — anyone on the web; listed and search-indexable
|
||
```
|
||
|
||
A short confirmation when switching *to* `public` ("This makes <tree> visible to
|
||
anyone on the web. Living people stay hidden.") is worthwhile given the stakes.
|
||
|
||
## 7. Guardrails / invariants
|
||
|
||
- One enforcement point: every public response is built from `person_visibility`
|
||
output. No raw repository reads in the public router.
|
||
- Living-person protection holds regardless of tree visibility.
|
||
- Unlisted relies on UUID unguessability; never expose a sequential public id.
|
||
- Per-tree `noindex` (everything except `public`) and a `public`-only sitemap are
|
||
**deferred** (see §5); today `robots.ts` keeps the authed app out of the index
|
||
and `unlisted`/`site_members` trees aren't linked or listed.
|
||
- Tests gate the merge: privacy-engine matrix + an integration test that hits the
|
||
public endpoints anonymously and asserts no living-person PII leaks.
|
||
|
||
## 8. Suggested phasing (small PRs)
|
||
|
||
1. Enum value + migration + regen client (+ dropdown → 4 options). No behavior
|
||
change yet for non-members.
|
||
2. Privacy-engine branching + unit tests.
|
||
3. Public read API namespace (optional-auth, redacted schema, rate limit) + tests.
|
||
4. Public frontend pages (`/p/...`) + robots/sitemap.
|
||
5. In-app `/explore` directory + search.
|
||
|
||
Steps 2–3 are the privacy-critical core and should be reviewed hardest.
|
||
|
||
## 9. Open questions
|
||
|
||
- Caching: public pages are cacheable for SEO, but cache keys must not blur the
|
||
redacted-vs-member rendering. Likely: cache only the anonymous projection at the
|
||
edge; never cache member responses.
|
||
- Do `site_members` trees appear in the sitemap for logged-in crawling? (Default:
|
||
no — `noindex`.)
|
||
- Per-tree opt-out of the directory even when `public`? (Probably unnecessary;
|
||
`unlisted` already covers "reachable but not listed.")
|