A multi-agent audit of every doc against the code surfaced ~50 stale/missing
items (the roadmap/status docs and the backlog had fallen behind the code).
This catches them up:
- CLAUDE.md: phase status was ~3 phases stale ("Phase 1 is next" while Phase 1 +
chunks of 2 & 4 shipped). Rewrote the status list; added a model-provider
tech-stack entry; updated repo-layout (integrations objectstore/models,
deploy backup.sh/dev compose).
- ARCHITECTURE.md: §6 privacy engine described 3 visibility levels — corrected to
the shipped 4 (adds site_members); documented per-tree AI policy on Tree,
LLMProvider/EmbeddingProvider split + registry, ChangeProposal origin/status/
operations, verified-email session gate, instance-owner role, schema-drift
guard, and the env_file config model.
- PRD.md: 4-level visibility in US-040/§5.5, instance-owner role (§5.1/§5.11),
per-tree AI policy (§5.8), §8 sequencing annotated with shipped status, header
date/status bumped.
- README.md: 4-level privacy; softened "Full GEDCOM 7" to the 5.5.1/7 common
subset; noted backups + instance-owner admin; moved property/land to an
explicit "where it's headed" (no property models exist yet).
- BACKLOG.md: flipped ~15 shipped-but-open rows to Have (ChangeProposal, provider
abstraction, GEDCOM citation export, membership management, operator backup,
email-verification gate, per-tree AI policy, instance owner, the whole
visibility/public-viewing/child-resource-redaction cluster #41-#51/#46), and
reconciled the executive summary, "current defects" list, quick wins, and
differentiators. Left genuinely-open items (citation/source redaction, sitemap,
per-tree noindex, scoped-token API) accurately open.
- .env.example: dropped "SMTP wired in a later phase"; documented the worker
purge knobs, S3_PRESIGN_TTL, COOKIE_NAME; removed a stray duplicate line.
- design/: tree-visibility.md and change-proposal.md marked Shipped; corrected
the redaction approach (reuses member schemas, not a separate PublicPersonRead)
and the apply() rollback claim (v1 is not cross-op transactional), and marked
rate-limiting/sitemap/noindex as deferred.
No code changes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Justin Paul <justin@jpaul.me>
5.0 KiB
Provenance
Where it came from matters.
Provenance is self-hostable software for tracing where you come from — your family and your land. Build a family tree, document every claim with real sources, reconstruct the chain of ownership behind a piece of property, and keep all of it in a format you control, on infrastructure you run.
Your history shouldn't live behind a subscription. Your data shouldn't be someone else's product. The story of where you came from belongs to you — and to whoever comes after.
Why "Provenance"
Museums and collectors use the word for the chain of custody behind an object: where it came from, who held it, how it got here. A painting without provenance is just a painting. A painting with provenance is a story.
People and land work the same way. A name on a tree is just a name. A name with sources, photos, letters, and the small details of a life — that's a person. A parcel of farmland traced from its original federal patent through every deed and heir to the present day — that's a story too. Provenance treats both as facets of the same thing.
Every fact links to its source. Every claim can be traced. Nothing is just asserted; everything is shown.
What it does
- Build a tree that holds up. People, relationships, events, and places — with every fact linked to the document, photo, or record it came from.
- Bring your own archive. Scans, PDFs, photos, audio recordings — first-class citizens, not afterthoughts.
- A research assistant that proposes, never overwrites. The built-in AI assistant searches legal sources, lays out what it found, and waits for your approval before anything touches your data. You can point it at the major model providers or a self-hosted model — your keys, your choice.
- Standards over silos. GEDCOM import and export (5.5.1 / 7 common subset) — duplicate-aware import, citation-preserving export. Migrate in, migrate out.
- Privacy you control. Public, members-only (any signed-in user on your instance), unlisted, or private per tree; any individual can be hidden; living people are protected by default.
- Find your people. When another user's tree overlaps with yours, Provenance can surface an anonymous "possible match" — and only connects you if you both say yes.
- Run it your way. Container-native. Self-host behind Caddy and, if you like, a Cloudflare Tunnel. Multi-tenant, so your whole extended family — or a whole community of strangers — can coexist on one deployment. One-command backups (Postgres + object storage) and an instance-owner admin role keep operations in your hands.
Where it's headed — trace the land, not just the family. The same source-backed treatment for property: parcels, deeds, and ownership events, reconstructing chain-of-title and tying land to the people who held it. The people side ships today; the land half is on the roadmap, not yet built — but it's why Provenance exists, not an afterthought.
Who it's for
- The person who became the keeper of the photos after a parent passed
- Farm and rural families tracing land back to the original patent
- Researchers who want their citations to actually mean something
- Adoptees and donor-conceived people piecing together a fuller picture
- Anyone who looked at the big genealogy subscriptions and thought I don't want my family history to be someone else's recurring revenue
Principles
- Your data is yours. Open formats. Export anytime. Self-host anywhere.
- Sources or it didn't happen. Every fact can carry citations. The record holds what you know and how you know it.
- The assistant serves you. AI proposes; you decide. No autonomous writes, ever.
- Honest about hard things. Adoption, estrangement, complicated parentage, name changes, people who don't want to be on a tree — treated as normal, not edge cases.
- No dark patterns. No paywalled hints. No surprise upsells. No "you have new ancestors waiting" emails.
Licensing
Provenance is source-available, not open source (yet). It is licensed under the Business Source License 1.1:
- Free forever for personal, family, and non-commercial use — self-host all you like.
- Commercial hosting for a fee is not permitted without a separate license from the author.
- Each release converts to AGPL-3.0 (a true open-source license) four years after it ships.
In plain terms: run it for yourself, your family, or your community at no cost, forever. You just can't take this code and sell it as a hosted service — that's reserved for a possible future first-party offering. See LICENSE for the exact terms.
Status
Early and moving fast. The product is being built in the open, commit by commit, and stood up in a live home lab as it goes. See docs/PRD.md for the product requirements and roadmap.
If the principles above resonate, watch the repo, open an issue with your use case, or pitch in. See CONTRIBUTING.md.
Provenance. Where it came from matters.