RAG Pipeline Documentation

End-to-end flow for uploads: storage, chunking, embeddings in pgvector, retrieval per question, then model calls with retrieved context attached.

Overview

Users can:

  1. Upload PDF, TXT, or Markdown
  2. Ask questions scoped to their library
  3. Get answers with retrieved chunks feeding the prompt

Architecture

Document Upload → Storage → Vectorization → Embeddings DB
                                                ↓
User Question → Embed Query → Similarity Search → Context Injection
                                                ↓
                                          AI Response

Upload Flow

1. File Upload (/api/upload)

// Accepts: PDF, TXT, MD files (max 10MB)
// Stores in Supabase Storage: uploads/{user_id}/{filename}
// Creates document record with status: 'processing'

2. Vectorization (/api/rag)

Triggered after upload:

// 1. Download file from Storage
// 2. Extract text (PDF uses pdf2json)
// 3. Split into chunks (1000 chars, 200 overlap)
// 4. Generate embeddings via OpenAI
// 5. Store in document_embeddings table
// 6. Update document status: 'completed'

Configuration

All RAG settings are in src/config/rag.ts:

Chunking Settings

chunking: {
  chunkSize: 1000,      // Characters per chunk
  chunkOverlap: 200,    // Overlap between chunks
  maxChunks: 10000,     // Safety limit
  minChunkSize: 100,    // Minimum chunk size
}

Embedding Settings

embedding: {
  model: "text-embedding-3-small",
  dimensions: 1536,
  batchSize: 20,
}

Retrieval Settings

retrieval: {
  defaultTopK: 5,           // Chunks to return
  matchThreshold: 0.0,      // Minimum similarity
  listQueryMultiplier: 4,   // More chunks for list queries
}

Retrieval Process

1. Query Classification

Queries are classified to optimize retrieval:

// List queries: "what are", "list", "enumerate"
// → Retrieves more chunks (topK × 4)

// Meta queries: "summarize", "overview"
// → Uses lower similarity threshold

2. Similarity Search

// 1. Embed user question
// 2. Try RPC-based pgvector search
// 3. Fallback to JS-based similarity
// 4. Apply boosts for structured content
// 5. Diversify results across document sections

3. Context Injection

Retrieved chunks are injected into the system prompt:

const systemPrompt = `
You have access to the following context from uploaded documents.
Use this information to answer questions accurately.

Context from document:
${relevantChunks.map(c => c.content).join('\n\n---\n\n')}
`;

Chat API Integration

When documents are attached to a chat:

// In /api/chat/route.ts
if (documentIds && documentIds.length > 0) {
  const relevantChunks = await retrieveRelevantChunks(
    userQuestion,
    documentIds,
    RAG_CONFIG.retrieval.defaultTopK
  );

  if (relevantChunks.length > 0) {
    systemPrompt = buildRagSystemPrompt(contextText);
  }
}

Supported File Types

TypeMIME TypeExtraction
PDFapplication/pdfpdf2json
Texttext/plainDirect read
Markdowntext/markdownDirect read

Best Practices

Document Preparation

  1. Use clear headings - Helps chunk boundaries
  2. Keep paragraphs focused - Better semantic matching
  3. Include key terms - Improves retrieval accuracy

Query Tips

  1. Be specific - "What are the 5 key features?" vs "features?"
  2. Reference document - "According to the document..."
  3. Ask follow-ups - Build on previous context

Troubleshooting

No Results Found

  • Check document status is 'completed'
  • Verify embeddings exist in database
  • Lower similarity threshold in config

Irrelevant Results

  • Increase defaultTopK for more context
  • Enable query diversification
  • Check chunk size settings

Slow Retrieval

  • Ensure pgvector index exists
  • Reduce batchSize for embedding
  • Use RPC function instead of direct SQL