diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..3e07527 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,77 @@ +# CLAUDE.md + +Operating guide for Claude Code (and any AI assistant) working in this repository. Read this first, then [docs/PRD.md](docs/PRD.md) and [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md). + +## What this project is + +**Provenance** is self-hostable, source-available software for tracing where you come from — **family *and* land**. It combines a genealogy application (people, relationships, events, sources, media) with **property chain-of-title** tracking (parcels, deeds, ownership events), a privacy model, an AI research assistant, and a cross-tree hint system. It is multi-tenant and container-native. + +The name is the thesis: *provenance* means a documented chain of custody. Every fact should link to where it came from. + +## Non-negotiable rules + +These are product invariants, not preferences. Do not violate them, and flag any task that seems to require it: + +1. **The AI assistant never writes autonomously.** Assistant "write" operations emit a `ChangeProposal` (a structured diff) that a human approves, edits, or rejects in the UI. There must be no code path where a model response mutates tree data directly. This is structural — enforce it in the type system / service boundaries, not just by convention. +2. **Privacy has a single enforcement point.** All reads — API, server-rendered public pages, search, hints, assistant — resolve visibility through one privacy engine in the service layer. Never add a query path that returns rows without passing through it. +3. **Living people are protected by default.** Non-owners do not see PII for a person who is (or may be) living. See the living-person rule in ARCHITECTURE §6. +4. **Hint matching is anonymous until mutual consent.** A match notification must reveal nothing identifying about the other user or any living person. Identities exchange only after both sides opt in. +5. **Sources are first-class.** Don't model citations as free-text afterthoughts. A `Source` is a reusable entity; a `Citation` links it to a specific fact. +6. **Only legal data sources.** Ship scrapers/connectors only for permissible sources (FamilySearch API, Find A Grave, WikiTree, BLM/GLO, USGS, public-domain newspapers, public county records). Never add connectors for paywalled/terms-prohibited sites (Ancestry, MyHeritage, 23andMe). +7. **Everything is configurable via environment.** Auth, mail, object storage, database, model providers, scrapers — all twelve-factor. No hard-coded endpoints or keys. + +## Tech stack + +- **Frontend:** Next.js (App Router) + React + TypeScript + Tailwind + shadcn/ui. Mobile-first, server components for public/SEO pages, generated TS client from the backend OpenAPI spec. +- **Backend:** Python + FastAPI, async, layered (API → service → repository → domain). SQLAlchemy. OpenAPI is the contract. +- **Worker:** same image as backend in worker mode; queue-driven async jobs. +- **Database:** PostgreSQL with `pg_trgm` (fuzzy search) and `pgvector` (match ranking). +- **Object storage:** S3-compatible (MinIO for self-host). +- **Edge:** Caddy reverse proxy; optional Cloudflare Tunnel (preferred ingress, never required). +- **Email:** operator-configured SMTP. +- **CI/CD:** Gitea Actions on `git.jpaul.io` build container images to the Gitea registry; servers pull to deploy. + +Pick libraries consistent with this stack. If you introduce a significant dependency or a new service, note it in ARCHITECTURE.md in the same change. + +## Repository layout + +``` +/ # docs and project meta (this file, README, LICENSE, COC, CONTRIBUTING) +/docs # PRD.md, ARCHITECTURE.md +``` + +Code does not exist yet — Phase 0 has not landed. When you scaffold it, propose a layout (e.g. `/backend`, `/frontend`, `/deploy` for compose/Caddy) and record it here and in ARCHITECTURE.md. Keep this section current as the tree grows. + +## Where to start + +The roadmap is phased in PRD §8. Build in dependency order. **Phase 0 — Foundation** is the current target: + +1. Backend skeleton (FastAPI, async, layered) + Postgres + migrations +2. Core data model from ARCHITECTURE §5 — start with User, Tree, TreeMembership, Person, Name, Relationship, Event, Place, Source, Citation, AuditEntry, soft-delete support +3. Local auth (password + email verification) behind the `AuthProvider` interface +4. Frontend scaffold (Next.js) wired to the API via the generated client +5. The deploy stack: `compose` for app + postgres + objectstore, Caddy config, env-driven settings +6. CI/CD: Gitea Actions building images to the registry + +Don't get ahead of the phases. GEDCOM lands before the assistant (so AI writes target a stable model); property follows a tested people graph; hints come last because they need multiple populated trees. If you think the order is wrong, raise it rather than reordering silently. + +## Conventions + +- **Sign off every commit with the DCO.** Use `git commit -s`. Commits without a `Signed-off-by` line cannot be merged. See [CONTRIBUTING.md](CONTRIBUTING.md). +- **Commit messages:** concise summary line; body explaining *why* when it isn't obvious. One logical change per commit where practical. +- **Tests** accompany new behavior once a test surface exists. +- **Docs travel with code:** update PRD/ARCHITECTURE in the same change when scope or design shifts. +- **Privacy/assistant/hint code gets extra care** — these are the areas where bugs do real harm. Prefer a design note before a large change. +- **No secrets in the repo.** Config via env; provide `.env.example` with placeholders. + +## License & contribution terms + +Provenance is **source-available** under **BUSL-1.1** (see [LICENSE](LICENSE)): free for personal/family/non-commercial use, no third-party commercial hosting, and each release converts to **AGPL-3.0** four years after it ships. The DCO sign-off keeps the licensing chain clean so the maintainer can manage that conversion and a possible future hosted offering. Don't add code under an incompatible license, and don't vendor dependencies whose licenses conflict with eventual AGPL distribution. + +## Owner & contact + +Maintainer: **Justin Paul** (`justin@jpaul.io`). This deployment targets a home lab: Authentik at `auth.jpaul.io` for auth, `mail.jpaul.io` for SMTP, behind Caddy + Cloudflare Tunnel. + +## Open questions (don't assume answers) + +Parked in PRD §11 and ARCHITECTURE §14: telemetry (opt-in anonymous vs none), embeddings provider for matching, DNA as future-phase vs permanent non-goal, native mobile timing, hosted-SaaS model, queue backend default (Postgres vs Redis), and PostGIS adoption. If a task depends on one of these, surface the dependency instead of picking silently.