RAG Pipeline Documentation

End-to-end flow for uploads: storage, chunking, embeddings in pgvector, retrieval per question, then model calls with retrieved context attached.

Overview

Users can:

Upload PDF, TXT, or Markdown
Ask questions scoped to their library
Get answers with retrieved chunks feeding the prompt

Architecture

Document Upload → Storage → Vectorization → Embeddings DB
                                                ↓
User Question → Embed Query → Similarity Search → Context Injection
                                                ↓
                                          AI Response

Upload Flow

1. File Upload (`/api/upload`)

// Accepts: PDF, TXT, MD files (max 10MB)
// Stores in Supabase Storage: uploads/{user_id}/{filename}
// Creates document record with status: 'processing'

2. Vectorization (`/api/rag`)

Triggered after upload:

// 1. Download file from Storage
// 2. Extract text (PDF uses pdf2json)
// 3. Split into chunks (1000 chars, 200 overlap)
// 4. Generate embeddings via OpenAI
// 5. Store in document_embeddings table
// 6. Update document status: 'completed'

Configuration

All RAG settings are in src/config/rag.ts:

Chunking Settings

chunking: {
  chunkSize: 1000,      // Characters per chunk
  chunkOverlap: 200,    // Overlap between chunks
  maxChunks: 10000,     // Safety limit
  minChunkSize: 100,    // Minimum chunk size
}

Embedding Settings

embedding: {
  model: "text-embedding-3-small",
  dimensions: 1536,
  batchSize: 20,
}

Retrieval Settings

retrieval: {
  defaultTopK: 5,           // Chunks to return
  matchThreshold: 0.0,      // Minimum similarity
  listQueryMultiplier: 4,   // More chunks for list queries
}

Retrieval Process

1. Query Classification

Queries are classified to optimize retrieval:

// List queries: "what are", "list", "enumerate"
// → Retrieves more chunks (topK × 4)

// Meta queries: "summarize", "overview"
// → Uses lower similarity threshold

2. Similarity Search

// 1. Embed user question
// 2. Try RPC-based pgvector search
// 3. Fallback to JS-based similarity
// 4. Apply boosts for structured content
// 5. Diversify results across document sections

3. Context Injection

Retrieved chunks are injected into the system prompt:

const systemPrompt = `
You have access to the following context from uploaded documents.
Use this information to answer questions accurately.

Context from document:
${relevantChunks.map(c => c.content).join('\n\n---\n\n')}
`;

Chat API Integration

When documents are attached to a chat:

// In /api/chat/route.ts
if (documentIds && documentIds.length > 0) {
  const relevantChunks = await retrieveRelevantChunks(
    userQuestion,
    documentIds,
    RAG_CONFIG.retrieval.defaultTopK
  );

  if (relevantChunks.length > 0) {
    systemPrompt = buildRagSystemPrompt(contextText);
  }
}

Supported File Types

Type	MIME Type	Extraction
PDF	application/pdf	pdf2json
Text	text/plain	Direct read
Markdown	text/markdown	Direct read

Best Practices

Document Preparation

Use clear headings - Helps chunk boundaries
Keep paragraphs focused - Better semantic matching
Include key terms - Improves retrieval accuracy

Query Tips

Be specific - "What are the 5 key features?" vs "features?"
Reference document - "According to the document..."
Ask follow-ups - Build on previous context

Troubleshooting

No Results Found

Check document status is 'completed'
Verify embeddings exist in database
Lower similarity threshold in config

Irrelevant Results

Increase defaultTopK for more context
Enable query diversification
Check chunk size settings

Slow Retrieval

Ensure pgvector index exists
Reduce batchSize for embedding
Use RPC function instead of direct SQL