System topology
System architecture
How this platform is built and deployed.
Three bounded AI experiences running on a single-node k3s cluster. The frontend is public; the backend is internal-only.
Repositories
- Frontend / BFF
github.com/plebedev/demo-web-app
Next.js 15 App Router · TypeScript · standalone container - Backend API
github.com/plebedev/demo-service
FastAPI · Python 3.14 · SQLAlchemy 2 · Alembic - Local SLM training
github.com/plebedev/messy-brief-slm
MLX-LM LoRA dataset, adapter training, GGUF export, and Ollama packaging for the local Messy Notes model
Deployment
There is no image registry in the deployment path. Images are built locally, saved as tar files, copied to the Oracle Cloud VM with scp, imported directly into k3s, then deployed with Helm.
- Image tags use the current short git SHA, so the deployed state traces back to an exact commit.
- The registry-free path is deliberate: it avoids a paid registry dependency and keeps each deploy as a self-contained artifact.
- Rollback is
helm rollbackplus a previously shipped image tar. - The Rust
text-toolssidecar follows the same pipeline with its own Helm chart, so backend and sidecar deploys stay independent. - Runtime target: single-node k3s on an Oracle Cloud VM, with Traefik exposing the frontend and the backend remaining internal-only through cluster DNS.
- Public URL: demo.lebedev.ai
Tech stack
- Frontend/BFF: Next.js 15 App Router, TypeScript.
- Backend API: FastAPI (Python 3.14), SQLAlchemy 2.x, Alembic migrations, Pydantic 2.
- Database: Oracle Autonomous Database in production, Postgres locally.
- AI: LLM APIs (Claude, OpenAI) for workflow and retrieval; xAI and OpenAI realtime for voice. Workflows are model-agnostic — providers are swappable via config.
- Local learning path: Messy Notes can run through Ollama model
messy-brief-local, a LoRA-tuned Qwen2.5 1.5B adapter packaged as GGUF for local demonstration. - Text processing: a small Rust service (axum) handles deterministic chunking and normalization for RAG ingest. It runs as an internal sidecar — it exists as hands-on Rust practice, not because Python couldn't do it.
Context Engine
- Shared backend capability for domain-pack-driven context infrastructure across experiences and future domains.
- Core owns generic artifact ingestion, chunking, provenance, extraction orchestration, persisted signals, generated views, actionable items, and domain registration.
- Domain packs own artifact types, extractors, perspective builders, task generators, views, prompts, and interpretation rules.
- Source links preserve where derived context came from so experiences can show evidence, linked artifacts, and explicit inference labels instead of unsupported claims.
Access model
- User enters an invitation code on the Access Hub.
- Backend validates and issues a signed access token scoped to the experience.
- Frontend stores the token in localStorage per-experience.
- Protected routes and API calls require a valid token.
Experiences
- Messy Notes — bounded multi-agent workflow that turns raw notes into a structured brief. Local development can switch to a simpler Ollama-backed LoRA SLM workflow for learning and demonstration.
- RAG Demo — persona-scoped document retrieval with grounded answers and citations.
- Voice Demo — browser and phone access to a persona-configured voice advisor via xAI or OpenAI realtime.