Files
provenance/docs/design/tree-visibility.md
T
justin 4a3fe983fa Visibility phase 1: add site_members value + 4-option dropdown
First step of the public-viewing feature (design: docs/design/tree-visibility.md).
No non-member behavior change yet — this only widens the vocabulary and UI.

- TreeVisibility gains `site_members` (any authenticated user of the instance),
  giving the four-level model: public / site_members / unlisted / private.
- Alembic migration adds the enum value via an autocommit block (ALTER TYPE
  ADD VALUE can't run in a transaction on older Postgres); downgrade is a no-op
  since PG can't drop an enum value.
- Regenerated openapi.json + frontend TS client.
- Trees-list dropdown now offers Private / Public – Members / Unlisted / Public
  with an explanatory tooltip.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Justin Paul <justin@jpaul.me>
2026-06-09 08:54:45 -04:00

158 lines
7.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Design note: tree visibility & the public viewing surface
Status: **proposed** (design only — no code yet). Owner: Justin. Created 2026-06-09.
This is a privacy-critical change (it creates the first anonymous read surface in
Provenance). Per CLAUDE.md, design before code. Implementation should land in
small, individually-reviewable PRs, with tests on the privacy engine and the
public read path before any anonymous endpoint is exposed.
## 1. The model
Visibility flattens **two axes***who may read* and *how discoverable* — into one
ordered enum for the UI:
| Level | Anonymous (no login) | Any logged-in user | Tree members | In-app directory | Search-indexed |
|---|---|---|---|---|---|
| `public` — anyone on the web | ✅ view¹ | ✅ view¹ | ✅ full | ✅ listed to everyone | ✅ sitemap + indexable |
| `site_members` — Public, Site Members | ❌ | ✅ view¹ | ✅ full | ✅ listed to logged-in users | ❌ (`noindex`) |
| `unlisted` — anyone with the link | ✅ via direct link¹ | ✅ via link¹ | ✅ full | ❌ never listed | ❌ (`noindex`) |
| `private` | ❌ | ❌ | ✅ full | ❌ | ❌ |
¹ **Every non-member view passes through the privacy engine.** Living people are
redacted, and per-person `private` hides / `public` reveals, exactly as
`person_visibility()` already does (`backend/app/services/privacy.py:100-110`).
This is the single enforcement point — no public code path may issue a raw query.
Decisions captured (2026-06-09):
- **Unlisted** = anyone with the link, no account required. The link must be
**unguessable** (the tree UUID is already non-enumerable; do not add a public
integer id). Unlisted trees are excluded from the directory and sitemap and
served `noindex`.
- **Public** discovery for v1 includes **an in-app public browse/search**, not
just search-engine indexing.
- **Public Site Members** = *any* registered account on this instance (not an
invite list — that is already tree membership / `private`).
## 2. Data model
`TreeVisibility` enum (`backend/app/models/enums.py`) gains a value:
```
public # anyone on the web
site_members # any authenticated user of this instance <-- NEW
unlisted # anyone with the link
private # members only (default)
```
- Alembic migration to `ALTER TYPE tree_visibility ADD VALUE 'site_members'`
(Postgres enum add-value cannot run inside a transaction with other DDL — use
`op.execute` with autocommit, separate migration).
- Default stays `private`. Existing rows unchanged.
- `TreeRead`/`TreeUpdate`/`TreeCreate` schemas already carry the enum; they pick
up the new value automatically. The OpenAPI client regen (`gen:api`) exposes it
to the frontend.
## 3. Privacy engine
`can_view_tree()` today treats `public` and `unlisted` identically and ignores
whether the viewer is anonymous vs authenticated (`privacy.py:44-49`). Replace the
final line with explicit branching on viewer auth state:
```
if membership: return True # members always
match tree.visibility:
public, unlisted: return True # anonymous OK (unlisted gated only by knowing the link)
site_members: return user_id is not None # any logged-in account
private: return False
```
`person_visibility()` is unchanged — it already redacts living/private people for
non-members. Add focused unit tests: anonymous + each visibility; living person
redacted on public/unlisted; `site_members` denies anonymous but allows a
logged-in non-member; `private` denies both.
## 4. The anonymous read path (the careful part)
**Recommendation: a dedicated read-only public API namespace**, not optional-auth
on the existing endpoints. Rationale: it is far easier to audit a small,
purpose-built surface that *always* funnels through `person_visibility` than to
weaken the membership checks on the authenticated endpoints and hope every branch
is covered.
- New router `app/api/v1/public.py`, mounted at `/api/v1/public`, with an
**optional-auth** dependency `CurrentUserOrNone` (returns `User | None`; never
401s). Contrast with `CurrentUser` (`deps.py:30-36`) which hard-401s.
- Endpoints (read-only; no create/update/delete):
- `GET /public/trees` — directory: lists `public` to everyone; additionally
lists `site_members` when the caller is authenticated. Paginated, search via
existing `pg_trgm`. Never lists `unlisted`/`private`.
- `GET /public/trees/{id}` — tree metadata if `can_view_tree(user_or_none)`.
- `GET /public/trees/{id}/persons`, `/persons/{pid}`, `/relationships`,
`/events`, `/media`, … — each filtered through `person_visibility`, returning
redacted projections (a `PublicPersonRead` that omits PII for redacted people:
no exact dates, no living-person names beyond "Living", etc.).
- **A redacted response schema**, distinct from the member `PersonRead`, so the
serializer physically cannot emit fields a non-member shouldn't see. Redaction
happens in the service, not the route.
- **Rate limiting** on the public namespace (per-IP) to blunt scraping/enumeration.
- **Audit**: count public reads; do not log PII.
## 5. Frontend public pages
- New **server-rendered** routes outside the authed app shell, e.g.
`/p/[treeId]` (tree), `/p/[treeId]/[personId]` (person), `/explore` (directory).
Server components fetch the `/api/v1/public/*` endpoints; no login redirect.
- `robots`: allow + sitemap for `public`; `noindex, nofollow` meta for `unlisted`
and `site_members`. Sitemap lists only `public` trees/persons.
- The directory `/explore` is anonymous for `public`; shows `site_members` trees
only to logged-in users.
- Reuse the tree/person view components where possible, fed by the redacted
schema.
## 6. UI control
Update the visibility dropdown (`frontend/app/trees/page.tsx`, shipped in PR #41)
from 3 to 4 options with helper text:
```
Private — only you and people you invite
Public Members — any signed-in user on this site
Unlisted — anyone with the link (not listed or indexed)
Public — anyone on the web; listed and search-indexable
```
A short confirmation when switching *to* `public` ("This makes <tree> visible to
anyone on the web. Living people stay hidden.") is worthwhile given the stakes.
## 7. Guardrails / invariants
- One enforcement point: every public response is built from `person_visibility`
output. No raw repository reads in the public router.
- Living-person protection holds regardless of tree visibility.
- Unlisted relies on UUID unguessability; never expose a sequential public id.
- `noindex` everything except `public`; sitemap is `public`-only.
- Tests gate the merge: privacy-engine matrix + an integration test that hits the
public endpoints anonymously and asserts no living-person PII leaks.
## 8. Suggested phasing (small PRs)
1. Enum value + migration + regen client (+ dropdown → 4 options). No behavior
change yet for non-members.
2. Privacy-engine branching + unit tests.
3. Public read API namespace (optional-auth, redacted schema, rate limit) + tests.
4. Public frontend pages (`/p/...`) + robots/sitemap.
5. In-app `/explore` directory + search.
Steps 23 are the privacy-critical core and should be reviewed hardest.
## 9. Open questions
- Caching: public pages are cacheable for SEO, but cache keys must not blur the
redacted-vs-member rendering. Likely: cache only the anonymous projection at the
edge; never cache member responses.
- Do `site_members` trees appear in the sitemap for logged-in crawling? (Default:
no — `noindex`.)
- Per-tree opt-out of the directory even when `public`? (Probably unnecessary;
`unlisted` already covers "reachable but not listed.")