12 RAG Pain Points and Proposed Solutions
Things that might lead to failure of RAG pipeline. Mostly taken from the blog
Pain point: * and solutions
1: Missing Content:
- Clean your data
- Better prompting
2: Missed the Top Ranked Documents
- Hyperparameter tuning for
chunk_size
andsimilarity_top_k
as in Hyperparameter Optimization for RAG. - Reranking notebook usses Improving Retrieval Performance by Fine-tuning Cohere Reranker with LlamaIndex and
CohereRank
to rerank the resultsimport os from llama_index.postprocessor.cohere_rerank import CohereRerank api_key = os.environ["COHERE_API_KEY"] cohere_rerank = CohereRerank(api_key=api_key, top_n=2) # return top 2 nodes from reranker query_engine = index.as_query_engine( similarity_top_k=10, # we can set a high top_k here to ensure maximum relevant retrieval node_postprocessors=[cohere_rerank], # pass the reranker to node_postprocessors ) response = query_engine.query( "What did Sam Altman do in this essay?", )
3: Not in Context — Consolidation Strategy Limitations
- Tweak retrieval strategies
- Finetune embeddings
4: Not Extracted
- Clean your Data
- Prompt Compression
- Long Context Reorder (put crucial content at beginning and end)
5: Wrong Format
- Output Parsing
- Pydantic
6: Incorrect Specificity
7: Incomplete and Impartial Responses
8: Data Ingestion Scalability
- Chain of table and Llama solution
- Mix-Self-Consistency Pack based on Rethinking Tabular Data Understanding with Large Language Models Llama solution
9: Structured Data QA
- Use Llama index
ChainOfTablePack
based on Chain of Table - Use Llama index
MixSelfConsistencyQueryEngine
based on Rethinking Tabular Data Understanding with Large Language Models
10: Data Extraction from Complex PDFs
- Use pdf2htmlEX
- Use
EmbeddedTablesUnstructuredRetrieverPack
inLlamaIndex
11: Fallback Model(s): Use a model router like - Neutrino
from llama_index.llms import Neutrino
from llama_index.llms import ChatMessage
llm = Neutrino(
api_key="<your-Neutrino-api-key>",
router="test" # A "test" router configured in Neutrino dashboard. You treat a router as a LLM. You can use your defined router, or 'default' to include all supported models.
)
response = llm.complete("What is large language model?")
print(f"Optimal model: {response.raw['model']}")
from llama_index.llms import OpenRouter
from llama_index.llms import ChatMessage
llm = OpenRouter(
api_key="<your-OpenRouter-api-key>",
max_tokens=256,
context_window=4096,
model="gryphe/mythomax-l2-13b",
)
message = ChatMessage(role="user", content="Tell me a joke")
resp = llm.chat([message])
print(resp)
12: LLM Security
- Use things like Llama Guard