Best LLM Stack for Q&A over Internal PDFs?

Question

I'm looking to build an LLM-based chatbot that can answer questions using a set of internal PDF documents. Has anyone worked on a similar use case with good success? What approach and LLM stack did you use to solve this - RAG (Retrieval-Augmented Generation), fine-tuning, or embedding-based search?

lovelearning · Accepted Answer

RAG and embedding-based search are the same thing AFAIK.
My approach is to stuff as many documents as possible directly into the context. The context windows of frontier models are large enough for my use case of ~20-40 documents. Context windows are 128K tokens for gpt-4o/o1/o3 and 1M for Gemini.
When stuffing all of them in one query isn't possible, split the documents into multiple queries and aggregate the answers.
I've tried RAG. But matching query embeddings to chunk embeddings isn't that straightforward. I noticed that relevant content was missed even with my modest number of documents. Semantic matching using query embeddings is one level above dumb keyword-matching but one level below direct queries to LLMs.
Direct LLM queries seem to perform the best especially when some intermediate understanding is required (like "Based on these documents, infer the industries where X technique may be useful"). That's not possible with simple embedding search unless some of the documents specifically use the umbrella word "industry" or its close synonyms.
Embedding search can probably be improved - like generating a synthetic answer and matching that answer's embedding to chunk embeddings. But I haven't tried such techniques.

epirogov · Answer

Hello, I found Aspose released LLM plugin:
https://products.aspose.net/pdf/chat-gpt/
At glance I see it supports some advanced features:
Automatic detection of multiple languages. Batching requests for reduce LLM API call frequency to lower operational costs.

constantinum · Answer

There is one with Langchain+pydantic+llmwhisperer https://unstract.com/blog/comparing-approaches-for-using-llm...

muzani · Answer

Langchain was the OG for PDF RAG. You don't need fine tuning or anything, it does embedding based search right out of the box.

ratg13 · Answer

Microsoft co-pilot does this out of the boxJust upload your documents to a OneDrive, Sharepoint, or Teams Site that you have access to and just start asking questions.

mayoosh · Answer

I think you can just give the full PDF to gemini 2.0 flash using their labs UI and then chat with it.