Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely

Development: The authors present a survey that introduces a RAG task categorization method that helps to classify user queries into four levels according to the type of external data required and the focus of the task. It summarizes key challenges in building robust data-augmented LLM applications and the most effective techniques for addressing them.

In general, it breaks down the complexity of queries into several levels: L1: Explicit Fact Queries: ** To just answer specific questions based on document or snippets within the collection. *L2: Implicit Fact Queries: ** To answer questions involving data dependencies or some level of logical or common sense reasoning. *L3: Interpretable Rational Queries: ** Queries that require external data to create rational for comparison. **L4: Hidden Rational Queri8es: They have domain specific reasoning that may not be explicitly described and difficult to enumerate.