Developments: The authors devlope a powerful graph-rag based system called MediGraphRag for medical discustions reaching SOTA results. They do so, they most notably introduce 'Triple Graph Construction'.
- Dynamically selected chunk partitioning.
- Entity extraction using structured output of name, type, and a context in how it mattered.
- Perform Triple Link construction
Graph Construction: 'Triple Link' They create a Triple link Repository Graph (RepoGraph). They create three graphs:
Involving the creation of several graph layers.
- Medical papers/books
- Medical Dictionaries
- UMLS made of well-defined medical vocabularies and their relationships.
They link the entities that are extracted from medical books/papers as E2, based on their relevance, then they use to compute cosine similarity. They then create a relation as a consise phrase based on on the entity, and associated references.
Next, they tag the graphs strting with some pre-defined tags, and iteratively generate more tag summaries, for closely-related graphs. These tags have multiple categories. Language models then tag the system using a prompts akin to the following
Generate a structured summary from the provided medical content,
strictly adhering to the following categories... {Tag
Name: Description of the tag}... .
They then iterate using heirarchichal clustering based on tag similarity to group the graphs and generate synthesized tag summaries, with each graph having it's own groups. They try to merge the top 20% of most similar pairs, and merge them if the pairwise similarities cross a specified threshold.
Retrieval To do retrieval, they generate tags based on the query Q, and then, generally, find similar tabs and reconstruct a prompt that includes the concatenated entity names and relationships:
Given QUESTION:
{Q}. GRAPH: {ei
[na]+Re
j
ei
+ej [na], ...}. Answer
the user question: QUESTION using the graph:
GRAPH...