MedGraphRAG
Published on 2026-03-10
Summary
- MedGraphRAG is an innovative framework designed to improve the accuracy and safety of LLMs in the medical field.
- Uses a 3-tier hierarchical graph that links private user data to established medical textbooks and foundation dictionaries.
Key Points
- The 3-tier layer is as follows:
- Top level (User-Provided)
- Medium level (Medical Papers and Books)
- Bottom level (Fundamental Medical Dictionary)
- The paper proposes a U-retrieve strategy to combine top-down retrieval with bottom-up response generation to answer user queries. This is designed to not allow the LLM to generate too much information and depend on the facts.
- Meta-Graphs: These are weighted nodes used to construct the system's comprehensive global knowledge graph.
- The pre-defined medical categories used for tag generation are symptoms, patient history, body functions and medications.
- The paper suggests a hybrid static-semantic method to divide larger medical documents into manageable data chunks. It uses a technique called Proposition Transfer to the text, which transforms the raw paragraphs into standalone, self-sustaining statements. This is then fed to an LLM that uses a zero-shot approach to decide whether a statement belongs to a existing data chunk or if it requires initiating a new chunk.
Notes
3-Tier Graph
- Top Level:
- Consists of specific, confidential user data.
- User-specific and experiences the highest frequency of updates and changes.
- The paper uses MIMIC-IV for this.
- Entities are extracted from documents and then linked to stuff in the second tier based on relevance.
- Medium Level:
- Built from up-to-date, peer-reviewed medical textbooks and articles.
- Acts as a bridge.
- Updated at a medium frequency, typically at an annual basis.
- MedC-K dataset used by the paper.
- Bottom Level:
- Provides detailed explanation of medical terms and their semantic relationships.
- Most fundamental and authoritative data tier.
- UMLS dataset used for this layer.
U-Retrieve Strategy
- Top-Down Retrieval:
- Structure the user's query using predefined medical tags.
- Using these summarized tag descriptions, the system performs a top-down matching process, starting from the largest, highest-level global graphs, and progressively indexes down into the smaller, more specific graphs.
- This downward matching is repeated until the system reaches the foundational layer where it activates multiple relevant medical entities.
- All the pertinent information related to these activated medical entities is gathered. This includes the content of the entities, their top-k related entities, their relationships and any associated foundational medical knowledge.
- Bottom-Up Response Generation:
- Once the content is retrieved, the LLM is prompted to generate an initial, intermediate text response.
- This is then carried upwards and combined with the summarized tag information of the next higher-level graph.
- This is repeated until the highest level of the graph structure is reached.
Meta-Graphs
- After user documents are segmented into chunks, and entities are extracted and linked, the system creates a meta-graph for each individual data chunk.
- The system prompts an LLM to identify relationships between the extracted entities based on their names, descriptions, definitions and associated lower-level medical knowledge.
- The LLM establishes these relationships by identifying the source and target entities, and then assigning a closeness score. This resulting weighted graph is what is referred to as a meta-graph.
- These individual meta-graphs are then merged iteratively using the generated tags, and similarity calculation.
- This bottom-up merging process repeats until a single global graph remains.
Expanding MedGraphRAG
-
Temporal Knowledge Graphs
-
Real-time physiological data streams
-
Standardized clinical risk-scoring systems.
-
The base idea is to augment the 3-tier structure of MedGraphRAG with time-stamped edges and agentic reasoning loops.
-
The static meta-graphs in MedGraphRAG can be evolved into [[Temporal KGs]] that can model a patient's health trajecteroy as a sequence of state-dependent snapshots.
-
In our expansion, we can have a patient-centred graph that is defined by specific temporal and causal relationships.