A multi-agent audit of every doc against the code surfaced ~50 stale/missing
items (the roadmap/status docs and the backlog had fallen behind the code).
This catches them up:
- CLAUDE.md: phase status was ~3 phases stale ("Phase 1 is next" while Phase 1 +
chunks of 2 & 4 shipped). Rewrote the status list; added a model-provider
tech-stack entry; updated repo-layout (integrations objectstore/models,
deploy backup.sh/dev compose).
- ARCHITECTURE.md: §6 privacy engine described 3 visibility levels — corrected to
the shipped 4 (adds site_members); documented per-tree AI policy on Tree,
LLMProvider/EmbeddingProvider split + registry, ChangeProposal origin/status/
operations, verified-email session gate, instance-owner role, schema-drift
guard, and the env_file config model.
- PRD.md: 4-level visibility in US-040/§5.5, instance-owner role (§5.1/§5.11),
per-tree AI policy (§5.8), §8 sequencing annotated with shipped status, header
date/status bumped.
- README.md: 4-level privacy; softened "Full GEDCOM 7" to the 5.5.1/7 common
subset; noted backups + instance-owner admin; moved property/land to an
explicit "where it's headed" (no property models exist yet).
- BACKLOG.md: flipped ~15 shipped-but-open rows to Have (ChangeProposal, provider
abstraction, GEDCOM citation export, membership management, operator backup,
email-verification gate, per-tree AI policy, instance owner, the whole
visibility/public-viewing/child-resource-redaction cluster #41-#51/#46), and
reconciled the executive summary, "current defects" list, quick wins, and
differentiators. Left genuinely-open items (citation/source redaction, sitemap,
per-tree noindex, scoped-token API) accurately open.
- .env.example: dropped "SMTP wired in a later phase"; documented the worker
purge knobs, S3_PRESIGN_TTL, COOKIE_NAME; removed a stray duplicate line.
- design/: tree-visibility.md and change-proposal.md marked Shipped; corrected
the redaction approach (reuses member schemas, not a separate PublicPersonRead)
and the apply() rollback claim (v1 is not cross-op transactional), and marked
rate-limiting/sitemap/noindex as deferred.
No code changes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Justin Paul <justin@jpaul.me>
8.4 KiB
Design note: tree visibility & the public viewing surface
Status: Shipped (#41-#51). Owner: Justin. Created 2026-06-09.
This is a privacy-critical change (it created the first anonymous read surface in Provenance). Per CLAUDE.md, it was designed before code and shipped in small, individually-reviewable PRs, with tests on the privacy engine and the public read path landing before any anonymous endpoint was exposed.
1. The model
Visibility flattens two axes — who may read and how discoverable — into one ordered enum for the UI:
| Level | Anonymous (no login) | Any logged-in user | Tree members | In-app directory | Search-indexed |
|---|---|---|---|---|---|
public — anyone on the web |
✅ view¹ | ✅ view¹ | ✅ full | ✅ listed to everyone | ✅ sitemap + indexable |
site_members — Public, Site Members |
❌ | ✅ view¹ | ✅ full | ✅ listed to logged-in users | ❌ (noindex) |
unlisted — anyone with the link |
✅ via direct link¹ | ✅ via link¹ | ✅ full | ❌ never listed | ❌ (noindex) |
private |
❌ | ❌ | ✅ full | ❌ | ❌ |
¹ Every non-member view passes through the privacy engine. Living people are
redacted, and per-person private hides / public reveals, exactly as
person_visibility() already does (backend/app/services/privacy.py:100-110).
This is the single enforcement point — no public code path may issue a raw query.
Decisions captured (2026-06-09):
- Unlisted = anyone with the link, no account required. The link must be
unguessable (the tree UUID is already non-enumerable; do not add a public
integer id). Unlisted trees are excluded from the directory and sitemap and
served
noindex. - Public discovery for v1 includes an in-app public browse/search, not just search-engine indexing.
- Public – Site Members = any registered account on this instance (not an
invite list — that is already tree membership /
private).
2. Data model
TreeVisibility enum (backend/app/models/enums.py) gains a value:
public # anyone on the web
site_members # any authenticated user of this instance <-- NEW
unlisted # anyone with the link
private # members only (default)
- Alembic migration to
ALTER TYPE tree_visibility ADD VALUE 'site_members'(Postgres enum add-value cannot run inside a transaction with other DDL — useop.executewith autocommit, separate migration). - Default stays
private. Existing rows unchanged. TreeRead/TreeUpdate/TreeCreateschemas already carry the enum; they pick up the new value automatically. The OpenAPI client regen (gen:api) exposes it to the frontend.
3. Privacy engine
can_view_tree() today treats public and unlisted identically and ignores
whether the viewer is anonymous vs authenticated (privacy.py:44-49). Replace the
final line with explicit branching on viewer auth state:
if membership: return True # members always
match tree.visibility:
public, unlisted: return True # anonymous OK (unlisted gated only by knowing the link)
site_members: return user_id is not None # any logged-in account
private: return False
person_visibility() is unchanged — it already redacts living/private people for
non-members. Add focused unit tests: anonymous + each visibility; living person
redacted on public/unlisted; site_members denies anonymous but allows a
logged-in non-member; private denies both.
4. The anonymous read path (the careful part)
Shipped: a dedicated read-only public API namespace, not optional-auth on the
existing endpoints. Rationale: it is far easier to audit a small, purpose-built
surface that always funnels through person_visibility than to weaken the
membership checks on the authenticated endpoints and hope every branch is covered.
- Router
app/api/v1/public.py, mounted at/api/v1/public, with an optional-auth dependencyCurrentUserOrNone(returnsUser | None; never 401s). Contrast withCurrentUser(deps.py:30-36) which hard-401s. - Endpoints (read-only; no create/update/delete):
GET /public/trees— directory: listspublicto everyone; additionally listssite_memberswhen the caller is authenticated. Paginated, search via existingpg_trgm. Never listsunlisted/private.GET /public/trees/{id}— tree metadata ifcan_view_tree(user_or_none).GET /public/trees/{id}/persons,/persons/{pid},/persons/{pid}/names,/relationships,/events— each filtered throughperson_visibility. (Media is not exposed on the public surface yet — deferred.)
- Redaction happens in the service, before serialization — this is the safety
guarantee. It did not ship as a separate
PublicPersonReadschema (that recommendation was not adopted): the public router reuses the member read schemas (PersonRead,RelationshipRead,EventRead,NameRead), and only the tree projection (PublicTreeRead) is distinct. Safety comes frompublic_view_serviceresolvingperson_visibilityand then dropping hidden rows and redacting possibly-living people (person_service._redactrewrites the name to "Living person", etc.) before a row is ever validated into a schema. No route hands a raw row to the serializer. - Rate limiting on the public namespace (per-IP) is deferred — it is not implemented in the app and may be handled at the Caddy edge if needed.
- Audit: count public reads; do not log PII.
5. Frontend public pages
- New server-rendered routes outside the authed app shell, e.g.
/p/[treeId](tree),/p/[treeId]/[personId](person),/explore(directory). Server components fetch the/api/v1/public/*endpoints; no login redirect. robots: ships a coarseallow: ["/", "/p/"]rule (frontend/app/robots.ts) that keeps the authed app out of the index. Per-treenoindex, nofollowmeta forunlisted/site_membersand apublic-only sitemap did not ship — both are deferred follow-ups (per-tree noindex needs server rendering; meanwhileunlisted/site_memberstrees aren't linked or listed, so they aren't crawl-discoverable).- The directory
/exploreis anonymous forpublic; showssite_memberstrees only to logged-in users. - Reuse the tree/person view components where possible, fed by the redacted schema.
6. UI control
Update the visibility dropdown (frontend/app/trees/page.tsx, shipped in PR #41)
from 3 to 4 options with helper text:
Private — only you and people you invite
Public – Members — any signed-in user on this site
Unlisted — anyone with the link (not listed or indexed)
Public — anyone on the web; listed and search-indexable
A short confirmation when switching to public ("This makes visible to
anyone on the web. Living people stay hidden.") is worthwhile given the stakes.
7. Guardrails / invariants
- One enforcement point: every public response is built from
person_visibilityoutput. No raw repository reads in the public router. - Living-person protection holds regardless of tree visibility.
- Unlisted relies on UUID unguessability; never expose a sequential public id.
- Per-tree
noindex(everything exceptpublic) and apublic-only sitemap are deferred (see §5); todayrobots.tskeeps the authed app out of the index andunlisted/site_memberstrees aren't linked or listed. - Tests gate the merge: privacy-engine matrix + an integration test that hits the public endpoints anonymously and asserts no living-person PII leaks.
8. Suggested phasing (small PRs)
- Enum value + migration + regen client (+ dropdown → 4 options). No behavior change yet for non-members.
- Privacy-engine branching + unit tests.
- Public read API namespace (optional-auth, redacted schema, rate limit) + tests.
- Public frontend pages (
/p/...) + robots/sitemap. - In-app
/exploredirectory + search.
Steps 2–3 are the privacy-critical core and should be reviewed hardest.
9. Open questions
- Caching: public pages are cacheable for SEO, but cache keys must not blur the redacted-vs-member rendering. Likely: cache only the anonymous projection at the edge; never cache member responses.
- Do
site_memberstrees appear in the sitemap for logged-in crawling? (Default: no —noindex.) - Per-tree opt-out of the directory even when
public? (Probably unnecessary;unlistedalready covers "reachable but not listed.")