Files
provenance/CLAUDE.md
T
2026-06-06 11:32:31 -04:00

93 lines
8.5 KiB
Markdown

# CLAUDE.md
Operating guide for Claude Code (and any AI assistant) working in this repository. Read this first, then [docs/PRD.md](docs/PRD.md) and [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md).
## What this project is
**Provenance** is self-hostable, source-available software for tracing where you come from — **family *and* land**. It combines a genealogy application (people, relationships, events, sources, media) with **property chain-of-title** tracking (parcels, deeds, ownership events), a privacy model, an AI research assistant, and a cross-tree hint system. It is multi-tenant and container-native.
The name is the thesis: *provenance* means a documented chain of custody. Every fact should link to where it came from.
## Non-negotiable rules
These are product invariants, not preferences. Do not violate them, and flag any task that seems to require it:
1. **The AI assistant never writes autonomously.** Assistant "write" operations emit a `ChangeProposal` (a structured diff) that a human approves, edits, or rejects in the UI. There must be no code path where a model response mutates tree data directly. This is structural — enforce it in the type system / service boundaries, not just by convention.
2. **Privacy has a single enforcement point.** All reads — API, server-rendered public pages, search, hints, assistant — resolve visibility through one privacy engine in the service layer. Never add a query path that returns rows without passing through it.
3. **Living people are protected by default.** Non-owners do not see PII for a person who is (or may be) living. See the living-person rule in ARCHITECTURE §6.
4. **Hint matching is anonymous until mutual consent.** A match notification must reveal nothing identifying about the other user or any living person. Identities exchange only after both sides opt in.
5. **Sources are first-class.** Don't model citations as free-text afterthoughts. A `Source` is a reusable entity; a `Citation` links it to a specific fact.
6. **Only legal data sources.** Ship scrapers/connectors only for permissible sources (FamilySearch API, Find A Grave, WikiTree, BLM/GLO, USGS, public-domain newspapers, public county records). Never add connectors for paywalled/terms-prohibited sites (Ancestry, MyHeritage, 23andMe).
7. **Everything is configurable via environment.** Auth, mail, object storage, database, model providers, scrapers — all twelve-factor. No hard-coded endpoints or keys.
## Tech stack
- **Frontend:** Next.js (App Router) + React + TypeScript + Tailwind + shadcn/ui. Mobile-first, server components for public/SEO pages, generated TS client from the backend OpenAPI spec.
- **Backend:** Python + FastAPI, async, layered (API → service → repository → domain). SQLAlchemy. OpenAPI is the contract.
- **Worker:** same image as backend in worker mode; queue-driven async jobs.
- **Database:** PostgreSQL with `pg_trgm` (fuzzy search) and `pgvector` (match ranking).
- **Object storage:** S3-compatible (MinIO for self-host).
- **Edge:** Caddy reverse proxy; optional Cloudflare Tunnel (preferred ingress, never required).
- **Email:** operator-configured SMTP.
- **CI/CD:** Gitea Actions build per-component images. **Push** to the LAN registry `192.168.0.2:1234` (plain HTTP, bypasses Cloudflare's body limit); **pull** via the public `git.jpaul.io` FQDN. Servers pull to deploy — no host build. Mirrors the drawbar setup; see [[gitea-lan-push-fqdn-pull]].
Pick libraries consistent with this stack. If you introduce a significant dependency or a new service, note it in ARCHITECTURE.md in the same change.
## Repository layout
```
/ # docs and project meta (this file, README, LICENSE, COC, CONTRIBUTING)
/docs # PRD.md, ARCHITECTURE.md
/backend # FastAPI service (uv-managed). app/{api/v1, services (+ privacy engine), repositories, models, schemas, integrations (auth/mailer), core}; migrations/ = Alembic
/deploy # docker-compose.yml, Caddyfile, .env.example — the self-host stack
/.gitea/workflows # Gitea Actions CI (build images → Gitea registry)
/frontend # Next.js (App Router, TS, Tailwind, shadcn-style UI). app/ pages, lib/api generated OpenAPI client, components/ui
```
Phase 0 is landing **deploy-first**: the compose stack (Postgres + MinIO + Caddy + a minimal FastAPI backend exposing `/health` and `/health/ready`) and CI come before the real data model and the frontend. Backend dependencies are managed with **uv**; migrations use **Alembic**. The core data model (ARCHITECTURE §5), **local auth** (Argon2 passwords, backend-issued sessions, email verify/reset behind the `AuthProvider` interface; API auth via Bearer header or HttpOnly cookie), and the **Next.js frontend scaffold** (Tailwind + shadcn-style UI, generated OpenAPI client, auth + tree/person views) have all landed — **Phase 0 is complete and running on the live deployment.** Phase 1 (core tree features — media, soft-delete recovery, richer CRUD) is next; OIDC/social auth is Phase 5. Keep this section current as the tree grows.
## Where to start
The roadmap is phased in PRD §8. Build in dependency order. **Phase 0 — Foundation is complete** and running on the live deployment; **Phase 1 (core tree features) is the current target.** For reference, Phase 0 covered:
1. Backend skeleton (FastAPI, async, layered) + Postgres + migrations
2. Core data model from ARCHITECTURE §5 — start with User, Tree, TreeMembership, Person, Name, Relationship, Event, Place, Source, Citation, AuditEntry, soft-delete support
3. Local auth (password + email verification) behind the `AuthProvider` interface
4. Frontend scaffold (Next.js) wired to the API via the generated client
5. The deploy stack: `compose` for app + postgres + objectstore, Caddy config, env-driven settings
6. CI/CD: Gitea Actions building images to the registry
Don't get ahead of the phases. GEDCOM lands before the assistant (so AI writes target a stable model); property follows a tested people graph; hints come last because they need multiple populated trees. If you think the order is wrong, raise it rather than reordering silently.
## Conventions
- **Sign off every commit with the DCO.** Use `git commit -s`. Commits without a `Signed-off-by` line cannot be merged. See [CONTRIBUTING.md](CONTRIBUTING.md).
- **Commit messages:** concise summary line; body explaining *why* when it isn't obvious. One logical change per commit where practical.
- **Tests** accompany new behavior once a test surface exists.
- **Docs travel with code:** update PRD/ARCHITECTURE in the same change when scope or design shifts.
- **Privacy/assistant/hint code gets extra care** — these are the areas where bugs do real harm. Prefer a design note before a large change.
- **No secrets in the repo.** Config via env; provide `.env.example` with placeholders.
## License & contribution terms
Provenance is **source-available** under **BUSL-1.1** (see [LICENSE](LICENSE)): free for personal/family/non-commercial use, no third-party commercial hosting, and each release converts to **AGPL-3.0** four years after it ships. The DCO sign-off keeps the licensing chain clean so the maintainer can manage that conversion and a possible future hosted offering. Don't add code under an incompatible license, and don't vendor dependencies whose licenses conflict with eventual AGPL distribution.
## Brand
Visual identity lives in [docs/brand/](docs/brand/) (see its README for full guidance). Use these as the frontend's design tokens:
- **Ink** (primary text/marks): `#1A1A17` light / `#F2EEE6` dark
- **Bronze** (accent, constant): `#A06A42`
- **Paper** (knockout on bronze, constant): `#F7F3EC`
- **Muted** (secondary text): `#6B6862` light / `#9A968E` dark
Wordmark is a serif (heritage register); UI body/secondary text is a humanist sans. Logo lockup: `docs/brand/provenance-logo.svg`; app icon/favicon: `docs/brand/provenance-icon.svg` and `favicon.svg`. Don't recolor outside the palette or add gradients/shadows — the look is flat and warm.
## Owner & contact
Maintainer: **Justin Paul** (`justin@jpaul.io`). This deployment targets a home lab: Authentik at `auth.jpaul.io` for auth, `mail.jpaul.io` for SMTP, behind Caddy + Cloudflare Tunnel.
## Open questions (don't assume answers)
Parked in PRD §11 and ARCHITECTURE §14: telemetry (opt-in anonymous vs none), embeddings provider for matching, DNA as future-phase vs permanent non-goal, native mobile timing, hosted-SaaS model, queue backend default (Postgres vs Redis), and PostGIS adoption. If a task depends on one of these, surface the dependency instead of picking silently.