Motivation: It's often difficult making the vast amount of information accessible for new users: A question may have been asked a couple times before, we may even have a good guide covering that, *but* the new user may not be able to exactly match their "point of confusion" to an actual question, a specific topic, etc. My idea was that a small LLM that "read" all of the existing materials might be extremely efficient mapping a "plain text human question" to an actual starting point in the documentation.
Question: LLMs are not my field, so I'm kind of lost in the vast amount of different tools, projects, repos, ... that seems to grow every day. So, what's your recommendation for something that I should check out?
---
Additional information:
- All "documents" to be processed are either markdown files or jupyter notebooks (could clean/convert them to markdown)
- Since the documentations changes/grows (e.g., new tutorials, new functionality) I would like to "retrain" the model quite often, to ensure it's always up to date. This step should be easy and not to consuming (time/money).
- Anything that could be integrated into some kind of automated process, triggered on a documentation update, would be cool.
- Having the possibility to not only answer questions but also return links to relevant pages in the documentation would be amazing.
- The model does not have to be "good" at anything (besides answering questions / knowledge retrieval). However, if it could handle simple coding related questions with common tools, that would be a big plus (e.g., "How can I extract X into a pandas dataframe and only show Y?" => we may have a `to_pandas` function, but then "showing Y" would require a simple pandas command which the tool could also suggest).
The OpenAI docs have a list of things that fine-tuning CAN help with, and answering questions about documentation is notable absent: https://platform.openai.com/docs/guides/fine-tuning/common-u...
There are two methods that are a better bet for what you're trying to do here.
The first, if your documentation is less than maybe 50 pages of text, is to dump the entire documentation into the prompt each time. This used to be prohibitively expensive but all of the major model providers have prompt caching now which can make this option a lot cheaper.
Google Gemini can go up to 2 million tokens, so you can fit a whole lot of documentation in there.
The other, more likely option, is RAG - Retrieval Augmented Generation. That's the trick where you run searches against your documentation for pages and examples that might help answer the user's question and stuff those into the prompt.
RAG is easy to build an initial demo of and challenging to build a really GOOD implementation, but it's a well trodden path at this point to there are plenty of resources out there to help.
Here are my own notes on RAG so far: https://simonwillison.net/tags/rag/
You can quickly prototype how well these options would work using OpenAI GPTs, Claude Projects and Google's NotebookLM - each of those will let you dump in a bunch of documentation and then ask questions about it. Projects includes all of that source material in every prompt (so has a strict length limit) - GPTs and NotebookLM both implement some form of RAG, though the details are frustratingly undocumented.
Basically you'll use any LLM and a vector DB of your choice (I like ChromaDB to date). Write a tool that will walk your source documents and chunk them. Submit the chunks to your LLM with a prompt that asks the LLM to come up with search retrieval questions for the chunk. Store the document and the questions in ChromaDB, cross-referencing the question to the document source (you can add the filename/path as metadata to the question) and the relevant chunk (by it's ID).
Run this tool whenever your docs change - you can automate this. Being intelligent about detecting new/changed content and how you chunk/generate questions can save you time and money and be a place to optimize.
To use it, you need to accept user input, run the input as a text query against your vector DB and submit both the results (with filenames and relevant chunks) and the user's query to a LLM with a prompt designed to elicit a certain kind of response based on input and the relevant chunks. Show the response to the user. Loop if you want.
You can build most of this with as few tools as `litellm`, `langchain` and `huggingface` libraries. You'll be surprised how far you can get with such a dumb setup.
Yes, this is basic RAG. That's how you do it without getting overwhelmed with all the tooling/libraries out there.
1: https://en.m.wikipedia.org/wiki/Retrieval-augmented_generati...