provenance

Author	SHA1	Message	Date
justin	c5631d3eab	Add an instance owner/operator role (env-declared via OWNER_EMAIL) Provenance had no system-level owner: ownership was only per-tree (TreeMembership), so a self-hosted instance had no operator account and no instance-admin surface. This adds one, declared by environment per the project's twelve-factor rule. - OWNER_EMAIL (comma-separated): the account(s) named here are instance owners. Derived at request time — no DB column, no migration, can't drift from the env, survives DB resets. is_instance_owner()/InstanceOwner dependency in api/deps.py. - Ownership requires a VERIFIED email (independent of REQUIRE_EMAIL_VERIFICATION). Registration is open, so without this an attacker could seize the role by registering the owner address first; verification ties it to inbox control. - GET /api/v1/admin/instance (owner-only): operational status — version, env, user/tree counts, configured AI providers. Deliberately exposes no tree data or PII: instance ownership is an operator role, NOT a privacy-engine bypass. - /users/me reports is_instance_owner; frontend gains an owner-only /admin page and a conditional sidebar link (server-enforced, not just client-hidden). Found-and-fixed by an adversarial security review before merge: the verified-email land-grab (above) and a frontend null-deref where the admin page crashed on 401/5xx instead of failing closed. Docs: .env.example + ARCHITECTURE (notes the not-a-privacy-bypass boundary and the verified-email requirement). Tests: owner matching, the land-grab guard, /users/me, and owner-only /admin. Suite 96 passing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-09 23:16:45 -04:00
justin	94b5caa7e5	Guard against schema drift: fail readiness + log loudly when DB is behind code Defense-in-depth for the deploy pipeline. Today a backend image shipped ahead of an un-applied migration; the Tree model selected columns the DB didn't have yet, so every trees query 500'd with an opaque UndefinedColumnError and the UI showed no trees. The root cause (deploys not running migrations) is fixed separately; this makes the symptom impossible to miss. - app/core/schema_version.py: compare the DB's stamped alembic head to the head(s) baked into the image's migration scripts. A DB with no alembic_version table (e.g. a create_all test DB) is treated as current, so this stays quiet outside real deployments. Uses to_regclass so a missing table never poisons the caller's transaction. - /health/ready: returns 503 with an explicit "drift: db=… expected=…" message when the schema is behind, instead of reporting ready and serving 500s. - Startup lifespan: logs CRITICAL on drift (advisory — never blocks startup). Liveness (/health) is untouched, so a drifted container isn't killed into a crash-loop — it's loudly degraded and self-heals once migrations apply. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-09 21:55:21 -04:00
justin	de50f2c803	Model providers: OpenAI/xAI/Ollama + run several at once (registry) Extends the #215 abstraction: - OpenAICompatibleLLMProvider / OpenAICompatibleEmbeddingProvider — one impl (via the official openai SDK) covers OpenAI, xAI (api.x.ai/v1), Ollama (…:11434/v1), OpenRouter, etc.; they differ only by base_url, key, and model. - Registry factory: build_llm_providers() / build_embedding_providers() return every provider whose credentials are configured, so you can run several concurrently. get_llm_provider(name)/get_embedding_provider(name) select by name, falling back to default__provider, then Null. - Per-provider env config (ANTHROPIC_, OPENAI_, XAI_, OLLAMA_*) + DEFAULT_LLM_PROVIDER / DEFAULT_EMBEDDING_PROVIDER; documented in .env.example. Defaults keep AI off (empty registry). Embeddings now have real backends (OpenAI/Ollama), still separate from the LLM since Anthropic offers no embeddings endpoint. Tests cover multi-provider selection, default resolution, disabled-without-credentials, and null fail-loud. Full suite 87 passed. Relates to #215. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-09 18:39:19 -04:00
justin	330543f9ce	Fix #215 : pluggable LLM + embedding provider abstraction Adds the vendor-agnostic seam the AI assistant + match-ranking plug into: - LLMProvider / EmbeddingProvider ABCs (base.py). LLM and embeddings are SEPARATE abstractions — Anthropic has no embeddings endpoint, so each is configured independently and either can be off. - NullLLMProvider / NullEmbeddingProvider — the default; fail loud with a clear "not configured" error so AI-off deployments don't silently no-op. - AnthropicLLMProvider — first concrete LLM impl, via the official anthropic SDK (default model claude-opus-4-8). A local provider (e.g. Ollama) would be another subclass of the same interface. - Factory in deps.py (get_llm_provider / get_embedding_provider) selects by env (MODEL_PROVIDER / EMBEDDING_PROVIDER); documented in .env.example. Providers are read-only text/vector producers — they never touch the DB, so the "AI never writes autonomously" invariant (CLAUDE.md #1) holds; writes will go through ChangeProposal (#214). Tests: provider selection (null default, anthropic when keyed, fallback without key) + null providers raise. 81 passed. Closes #215 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-09 12:51:01 -04:00
justin	660fe7b37f	Security: gate sessions on verified email (opt-in) Backlog §2.10: registration issued a live session and email_verified_at was written but never read, so an unverified user had full access and there was no switch to require verification. Add REQUIRE_EMAIL_VERIFICATION (default false). When true: - resolve_session_user returns None for a user whose email_verified_at is null — the single read-side gate covering every authenticated request, incl. the session minted at registration. - login raises 403 ("email not verified") instead of issuing a useless token. Default false on purpose: self-hosts without SMTP, and accounts created before this gate existed (email_verified_at null), must not be locked out. Operators enable it once mail works and accounts are verified. Documented in .env.example. Tests: default-off keeps unverified accounts working; on → register's session won't resolve (401), login is 403, and after verify-email both work. 75 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-09 11:22:54 -04:00
justin	34d30e3134	Add media (object storage) and the background worker (Phase 1) Media model + migration; an ObjectStore interface with an S3/MinIO (boto3) implementation behind the service layer. Upload (multipart) stores bytes in object storage + a metadata row (checksum, size, content-type, optional attach to person/event/source); list returns presigned URLs; delete is soft. Editor-gated, privacy-filtered, audited. 24 tests pass (object store faked). Introduces the worker container (same image, 'python -m app.worker'): its first job is the scheduled 30-day soft-delete purge across tables + media object cleanup. Compose gains worker + S3 env on backend/worker; dev override builds the worker too. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-06 21:46:09 -04:00
justin	5123c85397	Add auth foundation: sessions/tokens schema, Argon2 hashing, config Two tables (sessions, user_tokens) + migration; only token hashes are stored, so a DB leak yields no usable credential. Argon2id password hashing and token primitives in app/core/security. Config and .env.example gain session/cookie/token TTLs, app base URL, and SMTP settings (twelve-factor). Migration verified reversible (drops the token_purpose enum) and matches the models. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-06 10:51:51 -04:00
justin	dffd05d303	Add layered service/API for tenancy and people with the privacy seam Wires the data model through repository -> service -> API/v1. The privacy engine (app/services/privacy.py) is the single enforcement point: every read resolves visibility there (tree role, tree visibility, per-person override; living-person redaction is a marked Phase 2 TODO). All writes record an attributable AuditEntry. Endpoints: POST /users (open dev bootstrap until auth), GET /users/me, POST/GET /trees, GET /trees/{id}, and POST/GET /trees/{id}/persons. Authn is a temporary X-User-Id header shim; authz is membership-based (owner/editor/viewer). Domain errors map to 401/403/404/409. Verified on the deploy target: private tree -> 403 for non-members, missing actor -> 401, audit log populated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-06 10:40:19 -04:00
justin	03aa9a3ca7	Scaffold FastAPI backend skeleton with health probes Phase 0 foundation. uv-managed FastAPI app (package=false, runs from source via uv run). Layered seams in place: app/api for routers, app/core for config (pydantic-settings, fully env-driven) and the async SQLAlchemy engine; service/repository/domain layers land with the data model. Exposes /health (liveness) and /health/ready (Postgres reachability via SELECT 1, 503 on failure) so the deploy wiring is verifiable before any data model exists. Includes a liveness test and the resolved uv.lock. Ignore pytest/ruff/mypy caches. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-06 10:16:58 -04:00

9 Commits