STRUCTRAG: BOOSTING KNOWLEDGE INTENSIVE REASONING OF LLMS VIA INFERENCE-TIME HYBRID INFORMATION STRUCTURIZATION

Developments The authors create a new framework called StructRAG that identifies the optimal structures documents to be fed into the prompts. They show that they are very good at improving the result, here are core components

As seen on the internet: 🛣️ Hybrid Structure Router Analyzes the input question and determines the best format to structure the data before processing it. It can choose from: - Tables for tasks with a lot of statistical data - Graphs for tasks requiring long-chain reasoning like tracing cause-effect relationships - Catalogues for summarizing or organizing hierarchical information - Chunks for simpler, one-off tasks. - Algorithms for more procedural tasks Each type benefits from a specific structure. For example, using a table for a statistical comparison task is much more efficient than just presenting the raw text.

🧱 Scattered Knowledge Structurizer Once the Hybrid Structure Router has selected the best knowledge format, StructRAG takes all the relevant information and organizes it into the appropriate structure: - For tables, it arranges data into rows and columns (e.g., comparing company financials across years). - For graphs, it forms entity-relationship triples like “Company A → revenue growth → 10%.” - For chunks, it keeps the text but filters out the noise, giving the model only what’s relevant.

🛠️ Structured Knowledge Utilizer This component decomposes complex questions into sub-questions and extracts relevant information from the structured knowledge to answer each one. Then, it integrates those sub-answers into a final inference. For example, if you ask the model, “Which company has shown the best growth over the last 5 years?” the Utilizer breaks this down into sub-questions like: - What was each company's growth percentage? - How did their revenue change year-on-year? - How do those numbers compare? It retrieves precise data from the structured knowledge (e.g., the table) and uses it to construct an answer that’s more accurate and contextually aware.

In their own words:

StructRAG framework consists of three modules designed to sequentially identify the most suitable structure type, construct structured knowledge in that format, and utilize that structured knowledge to infer the final answer. First, recognizing that different structure types are suited for different tasks, a hybrid structure router is proposed to determine the most appropriate structure type based on the question and document information of the current task. Second, given that constructing structured knowledge is complex and requires strong comprehension and generation abilities, an LLM-based scattered knowledge structurizer is employed to convert raw documents into structured knowledge in the optimal type. Finally, since questions in knowledgeintensive reasoning tasks can often be a complex composite problems that are challenging to solve directly, a structured knowledge utilizer is used to perform question decomposition and precise knowledge extraction for more accurate answer inference