Files
provenance/CLAUDE.md
justin 76b7f453c1 Add update (CRUD) for events and people; record the full-CRUD invariant
Events and people are now editable, not write-once: PATCH /events/{id} (type, structured date, place, notes) and PATCH /persons/{id} (vitals, privacy, and the primary name's given/surname). CLAUDE.md gains rule #8: every stored object must support full CRUD in API and UI — historical research is constant correction. Tests cover both updates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Justin Paul <justin@jpaul.me>
2026-06-07 09:35:55 -04:00

8.9 KiB

CLAUDE.md

Operating guide for Claude Code (and any AI assistant) working in this repository. Read this first, then docs/PRD.md and docs/ARCHITECTURE.md.

What this project is

Provenance is self-hostable, source-available software for tracing where you come from — family and land. It combines a genealogy application (people, relationships, events, sources, media) with property chain-of-title tracking (parcels, deeds, ownership events), a privacy model, an AI research assistant, and a cross-tree hint system. It is multi-tenant and container-native.

The name is the thesis: provenance means a documented chain of custody. Every fact should link to where it came from.

Non-negotiable rules

These are product invariants, not preferences. Do not violate them, and flag any task that seems to require it:

  1. The AI assistant never writes autonomously. Assistant "write" operations emit a ChangeProposal (a structured diff) that a human approves, edits, or rejects in the UI. There must be no code path where a model response mutates tree data directly. This is structural — enforce it in the type system / service boundaries, not just by convention.
  2. Privacy has a single enforcement point. All reads — API, server-rendered public pages, search, hints, assistant — resolve visibility through one privacy engine in the service layer. Never add a query path that returns rows without passing through it.
  3. Living people are protected by default. Non-owners do not see PII for a person who is (or may be) living. See the living-person rule in ARCHITECTURE §6.
  4. Hint matching is anonymous until mutual consent. A match notification must reveal nothing identifying about the other user or any living person. Identities exchange only after both sides opt in.
  5. Sources are first-class. Don't model citations as free-text afterthoughts. A Source is a reusable entity; a Citation links it to a specific fact.
  6. Only legal data sources. Ship scrapers/connectors only for permissible sources (FamilySearch API, Find A Grave, WikiTree, BLM/GLO, USGS, public-domain newspapers, public county records). Never add connectors for paywalled/terms-prohibited sites (Ancestry, MyHeritage, 23andMe).
  7. Everything is configurable via environment. Auth, mail, object storage, database, model providers, scrapers — all twelve-factor. No hard-coded endpoints or keys.
  8. Full CRUD on every object. Every stored entity (person, name, event, relationship, source, citation, media, tree, …) must support create, read, update, and delete — in the API and the UI. Historical research is constant correction and new information, so nothing is write-once. Any new feature or data type ships with all four operations; an entity you can create but not edit is a bug.

Tech stack

  • Frontend: Next.js (App Router) + React + TypeScript + Tailwind + shadcn/ui. Mobile-first, server components for public/SEO pages, generated TS client from the backend OpenAPI spec.
  • Backend: Python + FastAPI, async, layered (API → service → repository → domain). SQLAlchemy. OpenAPI is the contract.
  • Worker: same image as backend in worker mode; queue-driven async jobs.
  • Database: PostgreSQL with pg_trgm (fuzzy search) and pgvector (match ranking).
  • Object storage: S3-compatible (MinIO for self-host).
  • Edge: Caddy reverse proxy; optional Cloudflare Tunnel (preferred ingress, never required).
  • Email: operator-configured SMTP.
  • CI/CD: Gitea Actions build per-component images. Push to the LAN registry 192.168.0.2:1234 (plain HTTP, bypasses Cloudflare's body limit); pull via the public git.jpaul.io FQDN. Servers pull to deploy — no host build. Mirrors the drawbar setup; see gitea-lan-push-fqdn-pull.

Pick libraries consistent with this stack. If you introduce a significant dependency or a new service, note it in ARCHITECTURE.md in the same change.

Repository layout

/                  # docs and project meta (this file, README, LICENSE, COC, CONTRIBUTING)
/docs              # PRD.md, ARCHITECTURE.md
/backend           # FastAPI service (uv-managed). app/{api/v1, services (+ privacy engine), repositories, models, schemas, integrations (auth/mailer), core}; migrations/ = Alembic
/deploy            # docker-compose.yml, Caddyfile, .env.example — the self-host stack
/.gitea/workflows  # Gitea Actions CI (build images → Gitea registry)
/frontend          # Next.js (App Router, TS, Tailwind, shadcn-style UI). app/ pages, lib/api generated OpenAPI client, components/ui

Phase 0 is landing deploy-first: the compose stack (Postgres + MinIO + Caddy + a minimal FastAPI backend exposing /health and /health/ready) and CI come before the real data model and the frontend. Backend dependencies are managed with uv; migrations use Alembic. The core data model (ARCHITECTURE §5), local auth (Argon2 passwords, backend-issued sessions, email verify/reset behind the AuthProvider interface; API auth via Bearer header or HttpOnly cookie), and the Next.js frontend scaffold (Tailwind + shadcn-style UI, generated OpenAPI client, auth + tree/person views) have all landed — Phase 0 is complete and running on the live deployment. Phase 1 (core tree features — media, soft-delete recovery, richer CRUD) is next; OIDC/social auth is Phase 5. Keep this section current as the tree grows.

Where to start

The roadmap is phased in PRD §8. Build in dependency order. Phase 0 — Foundation is complete and running on the live deployment; Phase 1 (core tree features) is the current target. For reference, Phase 0 covered:

  1. Backend skeleton (FastAPI, async, layered) + Postgres + migrations
  2. Core data model from ARCHITECTURE §5 — start with User, Tree, TreeMembership, Person, Name, Relationship, Event, Place, Source, Citation, AuditEntry, soft-delete support
  3. Local auth (password + email verification) behind the AuthProvider interface
  4. Frontend scaffold (Next.js) wired to the API via the generated client
  5. The deploy stack: compose for app + postgres + objectstore, Caddy config, env-driven settings
  6. CI/CD: Gitea Actions building images to the registry

Don't get ahead of the phases. GEDCOM lands before the assistant (so AI writes target a stable model); property follows a tested people graph; hints come last because they need multiple populated trees. If you think the order is wrong, raise it rather than reordering silently.

Conventions

  • Sign off every commit with the DCO. Use git commit -s. Commits without a Signed-off-by line cannot be merged. See CONTRIBUTING.md.
  • Commit messages: concise summary line; body explaining why when it isn't obvious. One logical change per commit where practical.
  • Tests accompany new behavior once a test surface exists.
  • Docs travel with code: update PRD/ARCHITECTURE in the same change when scope or design shifts.
  • Privacy/assistant/hint code gets extra care — these are the areas where bugs do real harm. Prefer a design note before a large change.
  • No secrets in the repo. Config via env; provide .env.example with placeholders.

License & contribution terms

Provenance is source-available under BUSL-1.1 (see LICENSE): free for personal/family/non-commercial use, no third-party commercial hosting, and each release converts to AGPL-3.0 four years after it ships. The DCO sign-off keeps the licensing chain clean so the maintainer can manage that conversion and a possible future hosted offering. Don't add code under an incompatible license, and don't vendor dependencies whose licenses conflict with eventual AGPL distribution.

Brand

Visual identity lives in docs/brand/ (see its README for full guidance). Use these as the frontend's design tokens:

  • Ink (primary text/marks): #1A1A17 light / #F2EEE6 dark
  • Bronze (accent, constant): #A06A42
  • Paper (knockout on bronze, constant): #F7F3EC
  • Muted (secondary text): #6B6862 light / #9A968E dark

Wordmark is a serif (heritage register); UI body/secondary text is a humanist sans. Logo lockup: docs/brand/provenance-logo.svg; app icon/favicon: docs/brand/provenance-icon.svg and favicon.svg. Don't recolor outside the palette or add gradients/shadows — the look is flat and warm.

Owner & contact

Maintainer: Justin Paul (justin@jpaul.io). This deployment targets a home lab: Authentik at auth.jpaul.io for auth, mail.jpaul.io for SMTP, behind Caddy + Cloudflare Tunnel.

Open questions (don't assume answers)

Parked in PRD §11 and ARCHITECTURE §14: telemetry (opt-in anonymous vs none), embeddings provider for matching, DNA as future-phase vs permanent non-goal, native mobile timing, hosted-SaaS model, queue backend default (Postgres vs Redis), and PostGIS adoption. If a task depends on one of these, surface the dependency instead of picking silently.