back to projects

STATUS · NOT YET LIVE

Lumineltek is a personal project in active development. Not yet publicly available. This case study documents architecture and decisions from the build in progress.

CASE STUDY · LUMINELTEK · PERSONAL PROJECT

Lumineltek
RAG-as-a-Service, from scratch.

A multi-tenant SaaS platform for AI-powered document processing and semantic search. Built end-to-end: Node.js + Postgres + pgvector on the back, Next.js 15 on the front, a public API for external consumers, and billing, permissions, and grounding verification baked in as infrastructure, not features.

year: 2025, ongoing role: sole developer status: active

Node.js
Postgres / pgvector
Elasticsearch
pg-boss
OpenAI / Anthropic
Chandra (Datalab)
Next.js 15
TypeScript
Lemon Squeezy

01 / the thesis

Most "chat with your docs" tools are demos wearing a production badge. The retrieval is a black box. The answer isn't attributable to a specific passage. Permissions are bolted on after the first enterprise customer asks for them. Billing is duct tape over Stripe.

Lumineltek is an attempt at the serious version: a RAG platform where answers are traceable, the search backend is swappable, tenants are isolated from the root of the schema, and the public API is a first-class surface. If a customer asks "which passage did this sentence come from?" the answer is always available. If they ask "who saw this document yesterday?" there's an audit trail.

02 / ingestion pipeline

Documents flow through four distinct phases. Each phase is a pg-boss worker. Each phase writes its output to disk before the next starts.

  upload ──► Phase 0         Phase 1          Phase 3         Phase 4
            conversion ──► extraction ───► chunking ────► indexing
                               │
                               ▼
                         Chandra (Datalab)
                         primary. OpenAI Vision
                         as deprecated fallback.

One endpoint handles PDF, DOCX, PPTX, XLSX, HTML, EPUB, and images. Chandra is called with include_markdown_in_chunks=true, so the response is pre-flattened blocks with both HTML and markdown. No tree walking, no HTML stripping on our side.

DECISION · CHECKPOINTING

Each phase checkpoints to disk before handing off. A worker crash during extraction resumes from the last checkpoint instead of re-running expensive LLM calls. Felt like overkill in week one. By month three it had paid for itself a dozen times over.

DECISION · STORE EVERYTHING CHANDRA GIVES

Chandra returns bbox, polygon, images, section hierarchy. We store all of it, even when the UI doesn't render it yet. Re-processing later (to build, say, an interactive page overlay that highlights source regions) is far more expensive than the storage.

03 / retrieval and search

The semantic search pipeline has six stages:

Query parsing: intent classification, entity extraction (AI, via a curated prompt library).
Hybrid retrieval: vector + BM25 fused via reciprocal rank fusion.
Re-ranking: cross-encoder over the top-K candidates.
Source alignment: every retrieved chunk is mapped back to its document, page, and position.
Generation: response produced with citations inline, not appended.
Grounding verification: a post-hoc LLM validator checks that every claim in the answer maps to a retrieved passage. Unmapped claims get flagged.

swappable search provider

A provider abstraction selects the backend at runtime:

SEARCH_PROVIDER=pgvector        # or: elasticsearch

Both are fully implemented behind a shared interface. pgvector keeps operations simple (one database). Elasticsearch scales past what pgvector comfortably handles. Having both working means we can start simple and scale horizontally without a platform rewrite.

04 / the "as-a-service" surface

multi-tenancy from the root

tenant
  └── workspace
        └── domain
              ├── roles         (40+ permission domains, fine-grained RBAC)
              ├── documents
              └── API keys

40+ permission domains is a lot. Getting the taxonomy right up front saved enormous retrofit cost later.

public API v1

POST /api/v1/documents: ingest, returns a job ID; status polls against the job.
GET /api/v1/documents, GET /api/v1/domains: list, filtered by tenant/workspace.
GET /api/v1/usage: metered consumption (tokens, extractions, storage).
/api/v1/public: unauthenticated endpoints for demos and docs.

API keys are first-class. Each key scoped to a tenant, rate-limited, usage-metered. The platform bills on real consumption (tokens, extractions, storage), not "10 docs/month" heuristics. Payments flow through Lemon Squeezy (plans, top-ups, subscriptions, webhooks).

integrations

Confluence and Jira pipelines pull content into the ingestion pipeline automatically. A generic DMS integration handles arbitrary document sources. Same 4-phase pipeline applies regardless of origin.

05 / quality infrastructure

Most RAG demos ship without any of these. They're why a platform version needs them.

LLM validator: post-hoc grounding check on every answer.
Confidence scoring: low-confidence answers surface for review before reaching the end user.
PDF quality gate: runs before extraction, catches PDFs that would produce garbage (scanned pages with bad OCR heuristics) instead of silently producing bad chunks. Accompanied by a quality report generator so failures are diagnosable.
Human-in-the-loop: flagged outputs route to reviewers with source-passage highlighting; corrections feed back into the training data.
Search logging & stats: every query, result set, and click tracked for retrieval-quality regression testing.

GROUNDING AS INFRASTRUCTURE

In compliance-heavy contexts, "the answer looks right" isn't enough. Reviewers need to see which passage supports each claim, confidence on each, and an audit trail of who accessed what. Lumineltek treats all of that as infrastructure, not a feature.

06 / what I learned

The hard part of RAG isn't the model call. It's the plumbing: ingestion quality, grounded generation, hallucination handling, and making all of it auditable. The wrapper decides whether the product is trustworthy.
Provider abstractions pay off. Swapping pgvector and Elasticsearch by config meant starting simple without painting into a corner.
Checkpointing every pipeline stage is not overkill. Crashes during expensive phases happen more than you'd expect.
Multi-tenant permission taxonomy is load-bearing. Fine-grained from day one, or retrofit pain forever.
Production AI is less "model" and more "verification loop." Shipping grounding-verification on day one changed how every downstream feature was designed.

TAKEAWAY

The serious version of RAG is almost entirely about the stuff around the model: ingestion quality, attribution, permissions, billing, verification. Build that layer right and the product earns trust. Skip it and no amount of prompt tuning will save you.

LumineltekRAG-as-a-Service, from scratch.

01 / the thesis

02 / ingestion pipeline

03 / retrieval and search

swappable search provider

04 / the "as-a-service" surface

multi-tenancy from the root

public API v1

integrations

05 / quality infrastructure

06 / what I learned

Lumineltek
RAG-as-a-Service, from scratch.