Fold the fly-to vertical-centering fix into our patch-package patch (alongside the existing spouse-layout fix) instead of compensating in app code, and revert the in-app workaround so the two don't double-correct. - patches/family-chart+0.9.0.patch: cardToMiddle now scales datum.y by the zoom k in both dist builds (.js + .esm.js), matching datum.x. Verified the patch applies cleanly (patch-package --error-on-fail). - tree/page.tsx: the cardToMiddle caller passes raw y again (the patched library does the scaling now); pre-scaling here too would double-correct. Behavior is identical to the previous in-app fix — both center the node exactly. - CLAUDE.md: documents the two family-chart patches, how to regenerate them, and that both should be upstreamed. The cardToMiddle fix is submitted upstream (donatso/family-chart#103, issue #102); the spouse-layout fix is a TODO. The frontend Dockerfile already COPYs patches/ before npm ci, so the fix is in the production build. Signed-off-by: Justin Paul <justin@jpaul.me>
12 KiB
CLAUDE.md
Operating guide for Claude Code (and any AI assistant) working in this repository. Read this first, then docs/PRD.md and docs/ARCHITECTURE.md.
What this project is
Provenance is self-hostable, source-available software for tracing where you come from — family and land. It combines a genealogy application (people, relationships, events, sources, media) with property chain-of-title tracking (parcels, deeds, ownership events), a privacy model, an AI research assistant, and a cross-tree hint system. It is multi-tenant and container-native.
The name is the thesis: provenance means a documented chain of custody. Every fact should link to where it came from.
Non-negotiable rules
These are product invariants, not preferences. Do not violate them, and flag any task that seems to require it:
- The AI assistant never writes autonomously. Assistant "write" operations emit a
ChangeProposal(a structured diff) that a human approves, edits, or rejects in the UI. There must be no code path where a model response mutates tree data directly. This is structural — enforce it in the type system / service boundaries, not just by convention. - Privacy has a single enforcement point. All reads — API, server-rendered public pages, search, hints, assistant — resolve visibility through one privacy engine in the service layer. Never add a query path that returns rows without passing through it.
- Living people are protected by default. Non-owners do not see PII for a person who is (or may be) living. See the living-person rule in ARCHITECTURE §6.
- Hint matching is anonymous until mutual consent. A match notification must reveal nothing identifying about the other user or any living person. Identities exchange only after both sides opt in.
- Sources are first-class. Don't model citations as free-text afterthoughts. A
Sourceis a reusable entity; aCitationlinks it to a specific fact. - Only legal data sources. Ship scrapers/connectors only for permissible sources (FamilySearch API, Find A Grave, WikiTree, BLM/GLO, USGS, public-domain newspapers, public county records). Never add connectors for paywalled/terms-prohibited sites (Ancestry, MyHeritage, 23andMe).
- Everything is configurable via environment. Auth, mail, object storage, database, model providers, scrapers — all twelve-factor. No hard-coded endpoints or keys.
- Full CRUD on every object. Every stored entity (person, name, event, relationship, source, citation, media, tree, …) must support create, read, update, and delete — in the API and the UI. Historical research is constant correction and new information, so nothing is write-once. Any new feature or data type ships with all four operations; an entity you can create but not edit is a bug.
Tech stack
- Frontend: Next.js (App Router) + React + TypeScript + Tailwind + shadcn/ui. Mobile-first, server components for public/SEO pages, generated TS client from the backend OpenAPI spec.
- Backend: Python + FastAPI, async, layered (API → service → repository → domain). SQLAlchemy. OpenAPI is the contract.
- Worker: same image as backend in worker mode; queue-driven async jobs.
- Database: PostgreSQL with
pg_trgm(fuzzy search) andpgvector(match ranking). - Object storage: S3-compatible (MinIO for self-host).
- Edge: Caddy reverse proxy; optional Cloudflare Tunnel (preferred ingress, never required).
- Email: operator-configured SMTP.
- Model providers: pluggable
LLMProvider+EmbeddingProviderabstraction (ABCs) with Null / Anthropic / OpenAI-compatible (OpenAI, xAI, Ollama) implementations; an operator configures one or more via env and they're selectable by name through a registry (per-tree AI policy +default_llm_provider/default_embedding_provider). - CI/CD: Gitea Actions build per-component images. Push to the LAN registry
192.168.0.2:1234(plain HTTP, bypasses Cloudflare's body limit); pull via the publicgit.jpaul.ioFQDN. Servers pull to deploy — no host build. Mirrors the drawbar setup; see gitea-lan-push-fqdn-pull.
Pick libraries consistent with this stack. If you introduce a significant dependency or a new service, note it in ARCHITECTURE.md in the same change.
Repository layout
/ # docs and project meta (this file, README, LICENSE, COC, CONTRIBUTING)
/docs # PRD.md, ARCHITECTURE.md
/backend # FastAPI service (uv-managed). app/{api/v1, services (+ privacy engine), repositories, models, schemas, integrations (auth, mailer, objectstore, models = pluggable LLM/embedding providers), core}; migrations/ = Alembic
/deploy # docker-compose.yml (+ docker-compose.dev.yml), Caddyfile, .env.example, backup.sh + BACKUP.md (one-command pg_dump + MinIO backup) — the self-host stack
/.gitea/workflows # Gitea Actions CI (build images → Gitea registry)
/frontend # Next.js (App Router, TS, Tailwind, shadcn-style UI). app/ pages, lib/api generated OpenAPI client, components/ui
Phase 0 landed deploy-first: the compose stack (Postgres + MinIO + Caddy + FastAPI backend) and CI before the data model and frontend. Backend deps use uv; migrations use Alembic. Status (keep current as the tree grows):
- Phase 0 — Foundation: complete and running live (core data model, local auth behind
AuthProvider, Next.js frontend). - Phase 1 — Core tree: complete. Media (upload/serve), soft-delete + recovery UI, full CRUD across entities, and the 4-level tree visibility/privacy model (#41–#51).
- Phase 2 — substantially landed. GEDCOM import (preview→apply, duplicate-aware) and export (citation-preserving, #232); fuzzy name search (pg_trgm) + the public
/exploredirectory. Living-person protection is still hardening. - Phase 4 — AI assistant foundations landed. Pluggable
LLMProvider/EmbeddingProviderabstraction + multi-provider registry (Anthropic/OpenAI/xAI/Ollama, #235/#237), the ChangeProposal propose-then-confirm flow (#236), and per-tree AI model policy (#238). The assistant's tool surface that emits proposals is the remaining piece. - Also shipped: tree membership management (#233), an instance owner/operator role (
OWNER_EMAIL, #240), a schema-drift readiness guard (#239), and a one-command operator backup (#234). - Not built yet: Phase 3 (Property — parcels/deeds/chain-of-title; no property models exist), Phase 5 (OIDC/social auth — only the
AuthProviderABC exists), and cross-tree hints (last; needs multiple populated trees + the embedding provider).
Where to start
The roadmap is phased in PRD §8. Build in dependency order. Phases 0 and 1 are complete, Phase 2 is substantially done, and Phase 4's AI foundations have shipped (see the status list above). The biggest unbuilt areas are Phase 3 (Property) and Phase 5 (OIDC/social auth) — likely current targets. For reference, Phase 0 covered:
- Backend skeleton (FastAPI, async, layered) + Postgres + migrations
- Core data model from ARCHITECTURE §5 — start with User, Tree, TreeMembership, Person, Name, Relationship, Event, Place, Source, Citation, AuditEntry, soft-delete support
- Local auth (password + email verification) behind the
AuthProviderinterface - Frontend scaffold (Next.js) wired to the API via the generated client
- The deploy stack:
composefor app + postgres + objectstore, Caddy config, env-driven settings - CI/CD: Gitea Actions building images to the registry
Don't get ahead of the phases. GEDCOM and the assistant's propose-diff foundation (provider abstraction + ChangeProposal approval flow) have shipped; the remaining dependency-ordered work is Property (Phase 3, on top of the tested people graph), then richer collaboration/audit UI, with cross-tree hints last (they need multiple populated trees and the embedding provider). If you think the order is wrong, raise it rather than reordering silently.
Conventions
- Sign off every commit with the DCO. Use
git commit -s. Commits without aSigned-off-byline cannot be merged. See CONTRIBUTING.md. - Commit messages: concise summary line; body explaining why when it isn't obvious. One logical change per commit where practical.
- Tests accompany new behavior once a test surface exists.
- Docs travel with code: update PRD/ARCHITECTURE in the same change when scope or design shifts.
- Privacy/assistant/hint code gets extra care — these are the areas where bugs do real harm. Prefer a design note before a large change.
- No secrets in the repo. Config via env; provide
.env.examplewith placeholders.
Patched dependencies (family-chart)
The tree view uses family-chart (d3-based). Two adjustments live in the repo:
- CSS is vendored at
frontend/app/trees/[id]/tree/chart.css— the package blocks its CSS subpath export, so we copy it in. - The library is patched via
patch-package(frontend/patches/family-chart+0.9.0.patch, applied by thepostinstallhook; the backend/frontend DockerfilesCOPY patchesbefore install). Both hunks touchdist/family-chart.jsanddist/family-chart.esm.js(the app loads theesmbuild). Current fixes:- Spouse-centering layout (
setupSpouses/sortChildrenWithSpouses) — center a person between two spouses with children under the correct pair. cardToMiddlevertical centering — the lib scaleddatum.xby the zoom factorkbut notdatum.y, so "fly to a node" drifted vertically at any zoom ≠ 1; we add the missing* k.
- Spouse-centering layout (
To change a patch: edit the file(s) under node_modules/family-chart/dist/, then cd frontend && npx patch-package family-chart to regenerate, and verify with npx patch-package --error-on-fail.
Upstream these. Both are general library bugfixes, not app-specific. The cardToMiddle fix is submitted — donatso/family-chart#103 (issue #102). The spouse-layout fix still needs upstreaming; do it when there's time. When a fixed release ships, drop the corresponding patch hunk and remove any in-app compensation (e.g. the cardToMiddle caller in tree/page.tsx passes raw y precisely because the patch fixes it — pre-scaling there too would double-correct).
License & contribution terms
Provenance is source-available under BUSL-1.1 (see LICENSE): free for personal/family/non-commercial use, no third-party commercial hosting, and each release converts to AGPL-3.0 four years after it ships. The DCO sign-off keeps the licensing chain clean so the maintainer can manage that conversion and a possible future hosted offering. Don't add code under an incompatible license, and don't vendor dependencies whose licenses conflict with eventual AGPL distribution.
Brand
Visual identity lives in docs/brand/ (see its README for full guidance). Use these as the frontend's design tokens:
- Ink (primary text/marks):
#1A1A17light /#F2EEE6dark - Bronze (accent, constant):
#A06A42 - Paper (knockout on bronze, constant):
#F7F3EC - Muted (secondary text):
#6B6862light /#9A968Edark
Wordmark is a serif (heritage register); UI body/secondary text is a humanist sans. Logo lockup: docs/brand/provenance-logo.svg; app icon/favicon: docs/brand/provenance-icon.svg and favicon.svg. Don't recolor outside the palette or add gradients/shadows — the look is flat and warm.
Owner & contact
Maintainer: Justin Paul (justin@jpaul.io). This deployment targets a home lab: Authentik at auth.jpaul.io for auth, mail.jpaul.io for SMTP, behind Caddy + Cloudflare Tunnel.
Open questions (don't assume answers)
Parked in PRD §11 and ARCHITECTURE §14: telemetry (opt-in anonymous vs none), embeddings provider for matching, DNA as future-phase vs permanent non-goal, native mobile timing, hosted-SaaS model, queue backend default (Postgres vs Redis), and PostGIS adoption. If a task depends on one of these, surface the dependency instead of picking silently.