Files
justin fe8349819f docs: note the spouse-layout fix is upstreamed (donatso/family-chart#105)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Justin Paul <justin@jpaul.me>
2026-06-11 09:33:19 -04:00

119 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CLAUDE.md
Operating guide for Claude Code (and any AI assistant) working in this repository. Read this first, then [docs/PRD.md](docs/PRD.md) and [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md).
## What this project is
**Provenance** is self-hostable, source-available software for tracing where you come from — **family *and* land**. It combines a genealogy application (people, relationships, events, sources, media) with **property chain-of-title** tracking (parcels, deeds, ownership events), a privacy model, an AI research assistant, and a cross-tree hint system. It is multi-tenant and container-native.
The name is the thesis: *provenance* means a documented chain of custody. Every fact should link to where it came from.
## Non-negotiable rules
These are product invariants, not preferences. Do not violate them, and flag any task that seems to require it:
1. **The AI assistant never writes autonomously.** Assistant "write" operations emit a `ChangeProposal` (a structured diff) that a human approves, edits, or rejects in the UI. There must be no code path where a model response mutates tree data directly. This is structural — enforce it in the type system / service boundaries, not just by convention.
2. **Privacy has a single enforcement point.** All reads — API, server-rendered public pages, search, hints, assistant — resolve visibility through one privacy engine in the service layer. Never add a query path that returns rows without passing through it.
3. **Living people are protected by default.** Non-owners do not see PII for a person who is (or may be) living. See the living-person rule in ARCHITECTURE §6.
4. **Hint matching is anonymous until mutual consent.** A match notification must reveal nothing identifying about the other user or any living person. Identities exchange only after both sides opt in.
5. **Sources are first-class.** Don't model citations as free-text afterthoughts. A `Source` is a reusable entity; a `Citation` links it to a specific fact.
6. **Only legal data sources.** Ship scrapers/connectors only for permissible sources (FamilySearch API, Find A Grave, WikiTree, BLM/GLO, USGS, public-domain newspapers, public county records). Never add connectors for paywalled/terms-prohibited sites (Ancestry, MyHeritage, 23andMe).
7. **Everything is configurable via environment.** Auth, mail, object storage, database, model providers, scrapers — all twelve-factor. No hard-coded endpoints or keys.
8. **Full CRUD on every object.** Every stored entity (person, name, event, relationship, source, citation, media, tree, …) must support create, read, **update**, and delete — in the API *and* the UI. Historical research is constant correction and new information, so nothing is write-once. Any new feature or data type ships with all four operations; an entity you can create but not edit is a bug.
## Tech stack
- **Frontend:** Next.js (App Router) + React + TypeScript + Tailwind + shadcn/ui. Mobile-first, server components for public/SEO pages, generated TS client from the backend OpenAPI spec.
- **Backend:** Python + FastAPI, async, layered (API → service → repository → domain). SQLAlchemy. OpenAPI is the contract.
- **Worker:** same image as backend in worker mode; queue-driven async jobs.
- **Database:** PostgreSQL with `pg_trgm` (fuzzy search) and `pgvector` (match ranking).
- **Object storage:** S3-compatible (MinIO for self-host).
- **Edge:** Caddy reverse proxy; optional Cloudflare Tunnel (preferred ingress, never required).
- **Email:** operator-configured SMTP.
- **Model providers:** pluggable `LLMProvider` + `EmbeddingProvider` abstraction (ABCs) with Null / Anthropic / OpenAI-compatible (OpenAI, xAI, Ollama) implementations; an operator configures one or more via env and they're selectable by name through a registry (per-tree AI policy + `default_llm_provider`/`default_embedding_provider`).
- **CI/CD:** Gitea Actions build per-component images. **Push** to the LAN registry `192.168.0.2:1234` (plain HTTP, bypasses Cloudflare's body limit); **pull** via the public `git.jpaul.io` FQDN. Servers pull to deploy — no host build. Mirrors the drawbar setup; see [[gitea-lan-push-fqdn-pull]].
Pick libraries consistent with this stack. If you introduce a significant dependency or a new service, note it in ARCHITECTURE.md in the same change.
## Repository layout
```
/ # docs and project meta (this file, README, LICENSE, COC, CONTRIBUTING)
/docs # PRD.md, ARCHITECTURE.md
/backend # FastAPI service (uv-managed). app/{api/v1, services (+ privacy engine), repositories, models, schemas, integrations (auth, mailer, objectstore, models = pluggable LLM/embedding providers), core}; migrations/ = Alembic
/deploy # docker-compose.yml (+ docker-compose.dev.yml), Caddyfile, .env.example, backup.sh + BACKUP.md (one-command pg_dump + MinIO backup) — the self-host stack
/.gitea/workflows # Gitea Actions CI (build images → Gitea registry)
/frontend # Next.js (App Router, TS, Tailwind, shadcn-style UI). app/ pages, lib/api generated OpenAPI client, components/ui
```
Phase 0 landed **deploy-first**: the compose stack (Postgres + MinIO + Caddy + FastAPI backend) and CI before the data model and frontend. Backend deps use **uv**; migrations use **Alembic**. Status (keep current as the tree grows):
- **Phase 0 — Foundation: complete** and running live (core data model, local auth behind `AuthProvider`, Next.js frontend).
- **Phase 1 — Core tree: complete.** Media (upload/serve), soft-delete + recovery UI, full CRUD across entities, and the 4-level tree visibility/privacy model (#41#51).
- **Phase 2 — substantially landed.** GEDCOM import (preview→apply, duplicate-aware) and export (citation-preserving, #232); fuzzy name search (pg_trgm) + the public `/explore` directory. Living-person protection is still hardening.
- **Phase 4 — AI assistant foundations landed.** Pluggable `LLMProvider`/`EmbeddingProvider` abstraction + multi-provider registry (Anthropic/OpenAI/xAI/Ollama, #235/#237), the **ChangeProposal** propose-then-confirm flow (#236), and per-tree AI model policy (#238). The assistant's *tool surface that emits proposals* is the remaining piece.
- Also shipped: tree membership management (#233), an **instance owner/operator** role (`OWNER_EMAIL`, #240), a schema-drift readiness guard (#239), and a one-command operator backup (#234).
- **Not built yet:** Phase 3 (Property — parcels/deeds/chain-of-title; no property models exist), Phase 5 (OIDC/social auth — only the `AuthProvider` ABC exists), and cross-tree hints (last; needs multiple populated trees + the embedding provider).
## Where to start
The roadmap is phased in PRD §8. Build in dependency order. **Phases 0 and 1 are complete**, Phase 2 is substantially done, and Phase 4's AI foundations have shipped (see the status list above). The biggest unbuilt areas are **Phase 3 (Property)** and **Phase 5 (OIDC/social auth)** — likely current targets. For reference, Phase 0 covered:
1. Backend skeleton (FastAPI, async, layered) + Postgres + migrations
2. Core data model from ARCHITECTURE §5 — start with User, Tree, TreeMembership, Person, Name, Relationship, Event, Place, Source, Citation, AuditEntry, soft-delete support
3. Local auth (password + email verification) behind the `AuthProvider` interface
4. Frontend scaffold (Next.js) wired to the API via the generated client
5. The deploy stack: `compose` for app + postgres + objectstore, Caddy config, env-driven settings
6. CI/CD: Gitea Actions building images to the registry
Don't get ahead of the phases. GEDCOM and the assistant's propose-diff foundation (provider abstraction + ChangeProposal approval flow) have shipped; the remaining dependency-ordered work is **Property** (Phase 3, on top of the tested people graph), then richer collaboration/audit UI, with **cross-tree hints last** (they need multiple populated trees and the embedding provider). If you think the order is wrong, raise it rather than reordering silently.
## Conventions
- **Sign off every commit with the DCO.** Use `git commit -s`. Commits without a `Signed-off-by` line cannot be merged. See [CONTRIBUTING.md](CONTRIBUTING.md).
- **Commit messages:** concise summary line; body explaining *why* when it isn't obvious. One logical change per commit where practical.
- **Tests** accompany new behavior once a test surface exists.
- **Docs travel with code:** update PRD/ARCHITECTURE in the same change when scope or design shifts.
- **Privacy/assistant/hint code gets extra care** — these are the areas where bugs do real harm. Prefer a design note before a large change.
- **No secrets in the repo.** Config via env; provide `.env.example` with placeholders.
## Patched dependencies (family-chart)
The tree view uses **family-chart** (d3-based). Two adjustments live in the repo:
- **CSS is vendored** at `frontend/app/trees/[id]/tree/chart.css` — the package blocks its CSS subpath export, so we copy it in.
- **The library is patched** via `patch-package` (`frontend/patches/family-chart+0.9.0.patch`, applied by the `postinstall` hook; the backend/frontend Dockerfiles `COPY patches` before install). Both hunks touch `dist/family-chart.js` **and** `dist/family-chart.esm.js` (the app loads the `esm` build). Current fixes:
1. **Spouse-centering layout** (`setupSpouses` / `sortChildrenWithSpouses`) — center a person between two spouses with children under the correct pair.
2. **`cardToMiddle` vertical centering** — the lib scaled `datum.x` by the zoom factor `k` but not `datum.y`, so "fly to a node" drifted vertically at any zoom ≠ 1; we add the missing `* k`.
To change a patch: edit the file(s) under `node_modules/family-chart/dist/`, then `cd frontend && npx patch-package family-chart` to regenerate, and verify with `npx patch-package --error-on-fail`.
**Upstreamed.** Both are general library bugfixes, not app-specific, and are submitted upstream:
- `cardToMiddle` vertical centering — **donatso/family-chart#103** (issue **#102**).
- Multi-spouse centered layout — **donatso/family-chart#105** (issue **#104**).
If either is merged + released, bump `family-chart`, drop the corresponding patch hunk, **and** remove any in-app compensation (e.g. the `cardToMiddle` caller in `tree/page.tsx` passes raw `y` precisely because the patch fixes it — pre-scaling there too would double-correct). Until then, keep the patch.
## License & contribution terms
Provenance is **source-available** under **BUSL-1.1** (see [LICENSE](LICENSE)): free for personal/family/non-commercial use, no third-party commercial hosting, and each release converts to **AGPL-3.0** four years after it ships. The DCO sign-off keeps the licensing chain clean so the maintainer can manage that conversion and a possible future hosted offering. Don't add code under an incompatible license, and don't vendor dependencies whose licenses conflict with eventual AGPL distribution.
## Brand
Visual identity lives in [docs/brand/](docs/brand/) (see its README for full guidance). Use these as the frontend's design tokens:
- **Ink** (primary text/marks): `#1A1A17` light / `#F2EEE6` dark
- **Bronze** (accent, constant): `#A06A42`
- **Paper** (knockout on bronze, constant): `#F7F3EC`
- **Muted** (secondary text): `#6B6862` light / `#9A968E` dark
Wordmark is a serif (heritage register); UI body/secondary text is a humanist sans. Logo lockup: `docs/brand/provenance-logo.svg`; app icon/favicon: `docs/brand/provenance-icon.svg` and `favicon.svg`. Don't recolor outside the palette or add gradients/shadows — the look is flat and warm.
## Owner & contact
Maintainer: **Justin Paul** (`justin@jpaul.io`). This deployment targets a home lab: Authentik at `auth.jpaul.io` for auth, `mail.jpaul.io` for SMTP, behind Caddy + Cloudflare Tunnel.
## Open questions (don't assume answers)
Parked in PRD §11 and ARCHITECTURE §14: telemetry (opt-in anonymous vs none), embeddings provider for matching, DNA as future-phase vs permanent non-goal, native mobile timing, hosted-SaaS model, queue backend default (Postgres vs Redis), and PostGIS adoption. If a task depends on one of these, surface the dependency instead of picking silently.