DDemo PlatformInvite-only demo experiences

Architecture

System architecture

How this platform is built and deployed.

Three bounded AI experiences running on a single-node k3s cluster. The frontend is public; the backend is internal-only.

System topology

InternetTraefikingressNext.js BFFport 3000 · publicFastAPI backendport 8000 · internalOracle DBproductiontext-toolsRust · internalLLM APIsClaude · OpenAIVoice APIsxAI · OpenAITwilioinbound voiceMedia Streamscluster-internal DNS — backend not exposed through ingressOracle Autonomous DB in production · Postgres locally── inbound call (Twilio → backend) · - - - Media Streams (bidirectional)

Deployment

Registry-free deployment pipeline for demo-service

There is no image registry in the deployment path. Images are built locally, saved as tar files, copied to the Oracle Cloud VM with scp, imported directly into k3s, then deployed with Helm.

  • Image tags use the current short git SHA, so the deployed state traces back to an exact commit.
  • The registry-free path is deliberate: it avoids a paid registry dependency and keeps each deploy as a self-contained artifact.
  • Rollback is helm rollback plus a previously shipped image tar.
  • The Rust text-tools sidecar follows the same pipeline with its own Helm chart, so backend and sidecar deploys stay independent.
  • Runtime target: single-node k3s on an Oracle Cloud VM, with Traefik exposing the frontend and the backend remaining internal-only through cluster DNS.
  • Public URL: demo.lebedev.ai

Tech stack

  • Frontend/BFF: Next.js 15 App Router, TypeScript.
  • Backend API: FastAPI (Python 3.14), SQLAlchemy 2.x, Alembic migrations, Pydantic 2.
  • Database: Oracle Autonomous Database in production, Postgres locally.
  • AI: LLM APIs (Claude, OpenAI) for workflow and retrieval; xAI and OpenAI realtime for voice. Workflows are model-agnostic — providers are swappable via config.
  • Local learning path: Messy Notes can run through Ollama model messy-brief-local, a LoRA-tuned Qwen2.5 1.5B adapter packaged as GGUF for local demonstration.
  • Text processing: a small Rust service (axum) handles deterministic chunking and normalization for RAG ingest. It runs as an internal sidecar — it exists as hands-on Rust practice, not because Python couldn't do it.

Context Engine

  • Shared backend capability for domain-pack-driven context infrastructure across experiences and future domains.
  • Core owns generic artifact ingestion, chunking, provenance, extraction orchestration, persisted signals, generated views, actionable items, and domain registration.
  • Domain packs own artifact types, extractors, perspective builders, task generators, views, prompts, and interpretation rules.
  • Source links preserve where derived context came from so experiences can show evidence, linked artifacts, and explicit inference labels instead of unsupported claims.

Access model

  • User enters an invitation code on the Access Hub.
  • Backend validates and issues a signed access token scoped to the experience.
  • Frontend stores the token in localStorage per-experience.
  • Protected routes and API calls require a valid token.

Experiences

  • Messy Notes — bounded multi-agent workflow that turns raw notes into a structured brief. Local development can switch to a simpler Ollama-backed LoRA SLM workflow for learning and demonstration.
  • RAG Demo — persona-scoped document retrieval with grounded answers and citations.
  • Voice Demo — browser and phone access to a persona-configured voice advisor via xAI or OpenAI realtime.