Lumineltek is a personal project in active development. Not yet publicly available. This case study documents architecture and decisions from the build in progress.
Lumineltek
RAG-as-a-Service, from scratch.
A multi-tenant SaaS platform for AI-powered document processing and semantic search. Built end-to-end: Node.js + Postgres + pgvector on the back, Next.js 15 on the front, a public API for external consumers, and billing, permissions, and grounding verification baked in as infrastructure, not features.
- Node.js
- Postgres / pgvector
- Elasticsearch
- pg-boss
- OpenAI / Anthropic
- Chandra (Datalab)
- Next.js 15
- TypeScript
- Lemon Squeezy
01 / the thesis
Most "chat with your docs" tools are demos wearing a production badge. The retrieval is a black box. The answer isn't attributable to a specific passage. Permissions are bolted on after the first enterprise customer asks for them. Billing is duct tape over Stripe.
Lumineltek is an attempt at the serious version: a RAG platform where answers are traceable, the search backend is swappable, tenants are isolated from the root of the schema, and the public API is a first-class surface. If a customer asks "which passage did this sentence come from?" the answer is always available. If they ask "who saw this document yesterday?" there's an audit trail.
02 / ingestion pipeline
Documents flow through four distinct phases. Each phase is a pg-boss worker. Each phase writes its output to disk before the next starts.
upload ──► Phase 0 Phase 1 Phase 3 Phase 4
conversion ──► extraction ───► chunking ────► indexing
│
▼
Chandra (Datalab)
primary. OpenAI Vision
as deprecated fallback.
One endpoint handles PDF, DOCX, PPTX,
XLSX, HTML, EPUB, and images. Chandra
is called with include_markdown_in_chunks=true, so the response
is pre-flattened blocks with both HTML and markdown. No tree walking, no
HTML stripping on our side.
Each phase checkpoints to disk before handing off. A worker crash during extraction resumes from the last checkpoint instead of re-running expensive LLM calls. Felt like overkill in week one. By month three it had paid for itself a dozen times over.
Chandra returns bbox, polygon, images, section
hierarchy. We store all of it, even when the UI doesn't render it yet.
Re-processing later (to build, say, an interactive page overlay that
highlights source regions) is far more expensive than the storage.
03 / retrieval and search
The semantic search pipeline has six stages:
- Query parsing: intent classification, entity extraction (AI, via a curated prompt library).
- Hybrid retrieval: vector + BM25 fused via reciprocal rank fusion.
- Re-ranking: cross-encoder over the top-K candidates.
- Source alignment: every retrieved chunk is mapped back to its document, page, and position.
- Generation: response produced with citations inline, not appended.
- Grounding verification: a post-hoc LLM validator checks that every claim in the answer maps to a retrieved passage. Unmapped claims get flagged.
swappable search provider
A provider abstraction selects the backend at runtime:
SEARCH_PROVIDER=pgvector # or: elasticsearch
Both are fully implemented behind a shared interface. pgvector keeps operations simple (one database). Elasticsearch scales past what pgvector comfortably handles. Having both working means we can start simple and scale horizontally without a platform rewrite.
04 / the "as-a-service" surface
multi-tenancy from the root
tenant
└── workspace
└── domain
├── roles (40+ permission domains, fine-grained RBAC)
├── documents
└── API keys
40+ permission domains is a lot. Getting the taxonomy right up front saved enormous retrofit cost later.
public API v1
POST /api/v1/documents: ingest, returns a job ID; status polls against the job.GET /api/v1/documents,GET /api/v1/domains: list, filtered by tenant/workspace.GET /api/v1/usage: metered consumption (tokens, extractions, storage)./api/v1/public: unauthenticated endpoints for demos and docs.
API keys are first-class. Each key scoped to a tenant, rate-limited, usage-metered. The platform bills on real consumption (tokens, extractions, storage), not "10 docs/month" heuristics. Payments flow through Lemon Squeezy (plans, top-ups, subscriptions, webhooks).
integrations
Confluence and Jira pipelines pull content into the ingestion pipeline automatically. A generic DMS integration handles arbitrary document sources. Same 4-phase pipeline applies regardless of origin.
05 / quality infrastructure
Most RAG demos ship without any of these. They're why a platform version needs them.
- LLM validator: post-hoc grounding check on every answer.
- Confidence scoring: low-confidence answers surface for review before reaching the end user.
- PDF quality gate: runs before extraction, catches PDFs that would produce garbage (scanned pages with bad OCR heuristics) instead of silently producing bad chunks. Accompanied by a quality report generator so failures are diagnosable.
- Human-in-the-loop: flagged outputs route to reviewers with source-passage highlighting; corrections feed back into the training data.
- Search logging & stats: every query, result set, and click tracked for retrieval-quality regression testing.
In compliance-heavy contexts, "the answer looks right" isn't enough. Reviewers need to see which passage supports each claim, confidence on each, and an audit trail of who accessed what. Lumineltek treats all of that as infrastructure, not a feature.
06 / what I learned
- The hard part of RAG isn't the model call. It's the plumbing: ingestion quality, grounded generation, hallucination handling, and making all of it auditable. The wrapper decides whether the product is trustworthy.
- Provider abstractions pay off. Swapping pgvector and Elasticsearch by config meant starting simple without painting into a corner.
- Checkpointing every pipeline stage is not overkill. Crashes during expensive phases happen more than you'd expect.
- Multi-tenant permission taxonomy is load-bearing. Fine-grained from day one, or retrofit pain forever.
- Production AI is less "model" and more "verification loop." Shipping grounding-verification on day one changed how every downstream feature was designed.
The serious version of RAG is almost entirely about the stuff around the model: ingestion quality, attribution, permissions, billing, verification. Build that layer right and the product earns trust. Skip it and no amount of prompt tuning will save you.