Files
justin 447daf7fa8 docs: bring all documentation current with shipped work
A multi-agent audit of every doc against the code surfaced ~50 stale/missing
items (the roadmap/status docs and the backlog had fallen behind the code).
This catches them up:

- CLAUDE.md: phase status was ~3 phases stale ("Phase 1 is next" while Phase 1 +
  chunks of 2 & 4 shipped). Rewrote the status list; added a model-provider
  tech-stack entry; updated repo-layout (integrations objectstore/models,
  deploy backup.sh/dev compose).
- ARCHITECTURE.md: §6 privacy engine described 3 visibility levels — corrected to
  the shipped 4 (adds site_members); documented per-tree AI policy on Tree,
  LLMProvider/EmbeddingProvider split + registry, ChangeProposal origin/status/
  operations, verified-email session gate, instance-owner role, schema-drift
  guard, and the env_file config model.
- PRD.md: 4-level visibility in US-040/§5.5, instance-owner role (§5.1/§5.11),
  per-tree AI policy (§5.8), §8 sequencing annotated with shipped status, header
  date/status bumped.
- README.md: 4-level privacy; softened "Full GEDCOM 7" to the 5.5.1/7 common
  subset; noted backups + instance-owner admin; moved property/land to an
  explicit "where it's headed" (no property models exist yet).
- BACKLOG.md: flipped ~15 shipped-but-open rows to Have (ChangeProposal, provider
  abstraction, GEDCOM citation export, membership management, operator backup,
  email-verification gate, per-tree AI policy, instance owner, the whole
  visibility/public-viewing/child-resource-redaction cluster #41-#51/#46), and
  reconciled the executive summary, "current defects" list, quick wins, and
  differentiators. Left genuinely-open items (citation/source redaction, sitemap,
  per-tree noindex, scoped-token API) accurately open.
- .env.example: dropped "SMTP wired in a later phase"; documented the worker
  purge knobs, S3_PRESIGN_TTL, COOKIE_NAME; removed a stray duplicate line.
- design/: tree-visibility.md and change-proposal.md marked Shipped; corrected
  the redaction approach (reuses member schemas, not a separate PublicPersonRead)
  and the apply() rollback claim (v1 is not cross-op transactional), and marked
  rate-limiting/sitemap/noindex as deferred.

No code changes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Justin Paul <justin@jpaul.me>
2026-06-10 21:05:29 -04:00

66 lines
5.0 KiB
Markdown

# Provenance
**Where it came from matters.**
Provenance is self-hostable software for tracing where you come from — your family *and* your land. Build a family tree, document every claim with real sources, reconstruct the chain of ownership behind a piece of property, and keep all of it in a format you control, on infrastructure you run.
Your history shouldn't live behind a subscription. Your data shouldn't be someone else's product. The story of where you came from belongs to you — and to whoever comes after.
---
## Why "Provenance"
Museums and collectors use the word for the chain of custody behind an object: where it came from, who held it, how it got here. A painting without provenance is just a painting. A painting *with* provenance is a story.
People and land work the same way. A name on a tree is just a name. A name with sources, photos, letters, and the small details of a life — that's a person. A parcel of farmland traced from its original federal patent through every deed and heir to the present day — that's a story too. Provenance treats both as facets of the same thing.
Every fact links to its source. Every claim can be traced. Nothing is just asserted; everything is shown.
## What it does
- **Build a tree that holds up.** People, relationships, events, and places — with every fact linked to the document, photo, or record it came from.
- **Bring your own archive.** Scans, PDFs, photos, audio recordings — first-class citizens, not afterthoughts.
- **A research assistant that proposes, never overwrites.** The built-in AI assistant searches legal sources, lays out what it found, and waits for your approval before anything touches your data. You can point it at the major model providers or a self-hosted model — your keys, your choice.
- **Standards over silos.** GEDCOM import and export (5.5.1 / 7 common subset) — duplicate-aware import, citation-preserving export. Migrate in, migrate out.
- **Privacy you control.** Public, members-only (any signed-in user on your instance), unlisted, or private per tree; any individual can be hidden; living people are protected by default.
- **Find your people.** When another user's tree overlaps with yours, Provenance can surface an anonymous "possible match" — and only connects you if you both say yes.
- **Run it your way.** Container-native. Self-host behind Caddy and, if you like, a Cloudflare Tunnel. Multi-tenant, so your whole extended family — or a whole community of strangers — can coexist on one deployment. One-command backups (Postgres + object storage) and an instance-owner admin role keep operations in your hands.
**Where it's headed — trace the land, not just the family.** The same source-backed treatment for *property*: parcels, deeds, and ownership events, reconstructing chain-of-title and tying land to the people who held it. The people side ships today; the land half is on the roadmap, not yet built — but it's why Provenance exists, not an afterthought.
## Who it's for
- The person who became the keeper of the photos after a parent passed
- Farm and rural families tracing land back to the original patent
- Researchers who want their citations to actually mean something
- Adoptees and donor-conceived people piecing together a fuller picture
- Anyone who looked at the big genealogy subscriptions and thought *I don't want my family history to be someone else's recurring revenue*
## Principles
- **Your data is yours.** Open formats. Export anytime. Self-host anywhere.
- **Sources or it didn't happen.** Every fact can carry citations. The record holds what you know *and* how you know it.
- **The assistant serves you.** AI proposes; you decide. No autonomous writes, ever.
- **Honest about hard things.** Adoption, estrangement, complicated parentage, name changes, people who don't want to be on a tree — treated as normal, not edge cases.
- **No dark patterns.** No paywalled hints. No surprise upsells. No "you have new ancestors waiting" emails.
## Licensing
Provenance is **source-available**, not open source (yet). It is licensed under the [Business Source License 1.1](LICENSE):
- **Free forever for personal, family, and non-commercial use** — self-host all you like.
- **Commercial hosting for a fee is not permitted** without a separate license from the author.
- **Each release converts to AGPL-3.0** (a true open-source license) four years after it ships.
In plain terms: run it for yourself, your family, or your community at no cost, forever. You just can't take this code and sell it as a hosted service — that's reserved for a possible future first-party offering. See [LICENSE](LICENSE) for the exact terms.
## Status
Early and moving fast. The product is being built in the open, commit by commit, and stood up in a live home lab as it goes. See [docs/PRD.md](docs/PRD.md) for the product requirements and roadmap.
If the principles above resonate, watch the repo, open an issue with your use case, or pitch in. See [CONTRIBUTING.md](CONTRIBUTING.md).
---
***Provenance.*** *Where it came from matters.*