provenance

Author	SHA1	Message	Date
justin	88beb9650f	compose: forward OWNER_EMAIL to the backend container The instance-owner feature reads OWNER_EMAIL, but the backend service's environment block is an explicit allow-list that didn't include it — so setting it in .env never reached the app (is_instance_owner always saw "" → no owner). Add the passthrough. NOTE: the same allow-list omits the AI provider keys (ANTHROPIC_API_KEY, OPENAI_, XAI_, OLLAMA_*) and SMTP settings, so those documented env vars also don't currently reach the backend on this deployment. Worth a follow-up (forward them explicitly, or switch the service to env_file) so .env actually drives all configuration per the twelve-factor rule. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-09 23:22:48 -04:00
justin	c5631d3eab	Add an instance owner/operator role (env-declared via OWNER_EMAIL) Provenance had no system-level owner: ownership was only per-tree (TreeMembership), so a self-hosted instance had no operator account and no instance-admin surface. This adds one, declared by environment per the project's twelve-factor rule. - OWNER_EMAIL (comma-separated): the account(s) named here are instance owners. Derived at request time — no DB column, no migration, can't drift from the env, survives DB resets. is_instance_owner()/InstanceOwner dependency in api/deps.py. - Ownership requires a VERIFIED email (independent of REQUIRE_EMAIL_VERIFICATION). Registration is open, so without this an attacker could seize the role by registering the owner address first; verification ties it to inbox control. - GET /api/v1/admin/instance (owner-only): operational status — version, env, user/tree counts, configured AI providers. Deliberately exposes no tree data or PII: instance ownership is an operator role, NOT a privacy-engine bypass. - /users/me reports is_instance_owner; frontend gains an owner-only /admin page and a conditional sidebar link (server-enforced, not just client-hidden). Found-and-fixed by an adversarial security review before merge: the verified-email land-grab (above) and a frontend null-deref where the admin page crashed on 401/5xx instead of failing closed. Docs: .env.example + ARCHITECTURE (notes the not-a-privacy-bypass boundary and the verified-email requirement). Tests: owner matching, the land-grab guard, /users/me, and owner-only /admin. Suite 96 passing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-09 23:16:45 -04:00
justin	de50f2c803	Model providers: OpenAI/xAI/Ollama + run several at once (registry) Extends the #215 abstraction: - OpenAICompatibleLLMProvider / OpenAICompatibleEmbeddingProvider — one impl (via the official openai SDK) covers OpenAI, xAI (api.x.ai/v1), Ollama (…:11434/v1), OpenRouter, etc.; they differ only by base_url, key, and model. - Registry factory: build_llm_providers() / build_embedding_providers() return every provider whose credentials are configured, so you can run several concurrently. get_llm_provider(name)/get_embedding_provider(name) select by name, falling back to default__provider, then Null. - Per-provider env config (ANTHROPIC_, OPENAI_, XAI_, OLLAMA_*) + DEFAULT_LLM_PROVIDER / DEFAULT_EMBEDDING_PROVIDER; documented in .env.example. Defaults keep AI off (empty registry). Embeddings now have real backends (OpenAI/Ollama), still separate from the LLM since Anthropic offers no embeddings endpoint. Tests cover multi-provider selection, default resolution, disabled-without-credentials, and null fail-loud. Full suite 87 passed. Relates to #215. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-09 18:39:19 -04:00
justin	330543f9ce	Fix #215 : pluggable LLM + embedding provider abstraction Adds the vendor-agnostic seam the AI assistant + match-ranking plug into: - LLMProvider / EmbeddingProvider ABCs (base.py). LLM and embeddings are SEPARATE abstractions — Anthropic has no embeddings endpoint, so each is configured independently and either can be off. - NullLLMProvider / NullEmbeddingProvider — the default; fail loud with a clear "not configured" error so AI-off deployments don't silently no-op. - AnthropicLLMProvider — first concrete LLM impl, via the official anthropic SDK (default model claude-opus-4-8). A local provider (e.g. Ollama) would be another subclass of the same interface. - Factory in deps.py (get_llm_provider / get_embedding_provider) selects by env (MODEL_PROVIDER / EMBEDDING_PROVIDER); documented in .env.example. Providers are read-only text/vector producers — they never touch the DB, so the "AI never writes autonomously" invariant (CLAUDE.md #1) holds; writes will go through ChangeProposal (#214). Tests: provider selection (null default, anthropic when keyed, fallback without key) + null providers raise. 81 passed. Closes #215 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-09 12:51:01 -04:00
justin	8652425413	Fix #196 : one-command operator backup (pg_dump + MinIO) Move backup from a documented procedure to `deploy/backup.sh`: dumps Postgres (pg_dump --clean --if-exists, gzipped) and archives the MinIO /data directory into a single timestamped bundle under backups/. Reads config from the compose .env with the same defaults the stack uses; optional BACKUP_RETAIN_DAYS prunes old bundles (cron-friendly). BACKUP.md documents usage + the restore procedure (kept manual/documented rather than an untested destructive script). Closes #196 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-09 12:45:33 -04:00
justin	660fe7b37f	Security: gate sessions on verified email (opt-in) Backlog §2.10: registration issued a live session and email_verified_at was written but never read, so an unverified user had full access and there was no switch to require verification. Add REQUIRE_EMAIL_VERIFICATION (default false). When true: - resolve_session_user returns None for a user whose email_verified_at is null — the single read-side gate covering every authenticated request, incl. the session minted at registration. - login raises 403 ("email not verified") instead of issuing a useless token. Default false on purpose: self-hosts without SMTP, and accounts created before this gate existed (email_verified_at null), must not be locked out. Operators enable it once mail works and accounts are verified. Documented in .env.example. Tests: default-off keeps unverified accounts working; on → register's session won't resolve (401), login is 403, and after verify-email both work. 75 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-09 11:22:54 -04:00
justin	7f640649b9	Auto-apply migrations on deploy (entrypoint + one-shot service) So a deploy never needs a manual `alembic upgrade head`: - Backend image gains an entrypoint that runs `alembic upgrade head` before uvicorn when RUN_MIGRATIONS=1 (set on the backend service). This self-migrates even on a Watchtower in-place image swap, which doesn't re-run one-shot jobs. - A one-shot `migrate` service covers the `docker compose up` path; backend and worker depend on it completing, which also serializes it with the backend entrypoint so alembic never runs concurrently. `upgrade head` is idempotent. Activating this needs the updated compose on the host once (Watchtower only swaps images, not the compose file / env). After that, migrations are automatic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 10:50:28 -04:00
justin	34d30e3134	Add media (object storage) and the background worker (Phase 1) Media model + migration; an ObjectStore interface with an S3/MinIO (boto3) implementation behind the service layer. Upload (multipart) stores bytes in object storage + a metadata row (checksum, size, content-type, optional attach to person/event/source); list returns presigned URLs; delete is soft. Editor-gated, privacy-filtered, audited. 24 tests pass (object store faked). Introduces the worker container (same image, 'python -m app.worker'): its first job is the scheduled 30-day soft-delete purge across tables + media object cleanup. Compose gains worker + S3 env on backend/worker; dev override builds the worker too. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-06 21:46:09 -04:00
justin	0b9d72c878	Drop bundled Watchtower; rely on the host's global Watchtower ripper already runs a single global nickfedor/watchtower (label-enabled) that watches every stack; the bundled containrrr/watchtower was redundant and crash-looped (its Docker API client is too old for Docker 29). Keep the watchtower.enable labels on backend/frontend so the host instance auto-deploys them; remove the per-stack service and profile. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-06 11:58:49 -04:00
justin	768d1b23d4	Add Watchtower auto-deploy for app images (2-minute poll) Watchtower (profile-gated) watches only the label-enabled backend/frontend containers and recreates them when a new :test-main digest lands in the registry, polling every 120s. Scoped by label so it never touches Postgres/MinIO/Caddy/cloudflared. Reads registry creds from the host docker config. Lab host runs COMPOSE_PROFILES=tunnel,watchtower. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-06 11:55:38 -04:00
justin	828445a6b3	Add Cloudflare Tunnel connector (profile-gated) to the deploy stack A cloudflared service (opt-in via the 'tunnel' compose profile, token from CLOUDFLARE_TUNNEL_TOKEN) connects the lab to Cloudflare. One public hostname -> http://caddy:80 is sufficient because Caddy does the internal path routing. Mirrors the drawbar tunnel setup. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-06 11:32:15 -04:00
justin	4921ce0776	Mirror drawbar CI/CD: push to LAN registry, pull via public FQDN Split the registry endpoints like the drawbar containers. Per-component Gitea Actions workflows (build-backend, build-frontend; runs-on docker, path-filtered) push images to the LAN endpoint 192.168.0.2:1234 over plain HTTP (buildx insecure/http) to bypass Cloudflare's request-body limit, then link each package to the repo via the Gitea API. Auth via the REGISTRY_TOKEN Actions secret (the same token drawbar uses). Tag scheme: test-main / test-sha-<long> / version / latest (v* tags). The deploy compose now PULLS git.jpaul.io/justin/provenance-{backend,frontend}:${IMAGE_TAG:-test-main} (no host build); docker-compose.dev.yml is a local-build override for dev / pre-CI. Replaces the previous single build.yml. Docs + memory updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-06 11:19:26 -04:00
justin	fccc81a6cc	Wire the frontend into the deploy stack and CI Compose gains a frontend service; Caddy now routes / to frontend:3000 (keeping /api/* and /health* on the backend). CI builds and pushes a frontend image alongside the backend. Verified end-to-end on the deploy target: / serves the app, /api and /health still resolve through Caddy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-06 11:03:07 -04:00
justin	5123c85397	Add auth foundation: sessions/tokens schema, Argon2 hashing, config Two tables (sessions, user_tokens) + migration; only token hashes are stored, so a DB leak yields no usable credential. Argon2id password hashing and token primitives in app/core/security. Config and .env.example gain session/cookie/token TTLs, app base URL, and SMTP settings (twelve-factor). Migration verified reversible (drops the token_purpose enum) and matches the models. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-06 10:51:51 -04:00
justin	0b5c3b260a	Add self-host compose stack (Postgres, MinIO, backend, Caddy) One env-driven compose stack stands up the whole system per ARCHITECTURE §2/§12. Postgres uses the pgvector image (pgvector + pg_trgm in contrib); MinIO is the S3-compatible store; Caddy reverse-proxies /api/* and /health* to the backend with an env-driven site address (':80' local, a domain for auto-HTTPS, or plain HTTP behind a Cloudflare Tunnel). Healthchecks and depends_on gate startup order. .env.example documents twelve-factor config (DB, S3, SMTP, Caddy, model keys) with placeholders; no secrets in the repo. Verified end-to-end on the deploy target: all services healthy, /health/ready green against real Postgres. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Justin Paul <justin@jpaul.me>	2026-06-06 10:17:12 -04:00

15 Commits