MedGraphRAG

Published on 2026-03-10

Summary

  • MedGraphRAG is an innovative framework designed to improve the accuracy and safety of LLMs in the medical field.
  • Uses a 3-tier hierarchical graph that links private user data to established medical textbooks and foundation dictionaries.

Key Points

  • The 3-tier layer is as follows:
    1. Top level (User-Provided)
    2. Medium level (Medical Papers and Books)
    3. Bottom level (Fundamental Medical Dictionary)
  • The paper proposes a U-retrieve strategy to combine top-down retrieval with bottom-up response generation to answer user queries. This is designed to not allow the LLM to generate too much information and depend on the facts.
  • Meta-Graphs: These are weighted nodes used to construct the system's comprehensive global knowledge graph.
  • The pre-defined medical categories used for tag generation are symptoms, patient history, body functions and medications.
  • The paper suggests a hybrid static-semantic method to divide larger medical documents into manageable data chunks. It uses a technique called Proposition Transfer to the text, which transforms the raw paragraphs into standalone, self-sustaining statements. This is then fed to an LLM that uses a zero-shot approach to decide whether a statement belongs to a existing data chunk or if it requires initiating a new chunk.

Notes

3-Tier Graph

  1. Top Level:
    • Consists of specific, confidential user data.
    • User-specific and experiences the highest frequency of updates and changes.
    • The paper uses MIMIC-IV for this.
    • Entities are extracted from documents and then linked to stuff in the second tier based on relevance.
  2. Medium Level:
    • Built from up-to-date, peer-reviewed medical textbooks and articles.
    • Acts as a bridge.
    • Updated at a medium frequency, typically at an annual basis.
    • MedC-K dataset used by the paper.
  3. Bottom Level:
    • Provides detailed explanation of medical terms and their semantic relationships.
    • Most fundamental and authoritative data tier.
    • UMLS dataset used for this layer.

U-Retrieve Strategy

  1. Top-Down Retrieval:
    • Structure the user's query using predefined medical tags.
    • Using these summarized tag descriptions, the system performs a top-down matching process, starting from the largest, highest-level global graphs, and progressively indexes down into the smaller, more specific graphs.
    • This downward matching is repeated until the system reaches the foundational layer where it activates multiple relevant medical entities.
    • All the pertinent information related to these activated medical entities is gathered. This includes the content of the entities, their top-k related entities, their relationships and any associated foundational medical knowledge.
  2. Bottom-Up Response Generation:
    • Once the content is retrieved, the LLM is prompted to generate an initial, intermediate text response.
    • This is then carried upwards and combined with the summarized tag information of the next higher-level graph.
    • This is repeated until the highest level of the graph structure is reached.

Meta-Graphs

  • After user documents are segmented into chunks, and entities are extracted and linked, the system creates a meta-graph for each individual data chunk.
  • The system prompts an LLM to identify relationships between the extracted entities based on their names, descriptions, definitions and associated lower-level medical knowledge.
  • The LLM establishes these relationships by identifying the source and target entities, and then assigning a closeness score. This resulting weighted graph is what is referred to as a meta-graph.
  • These individual meta-graphs are then merged iteratively using the generated tags, and similarity calculation.
  • This bottom-up merging process repeats until a single global graph remains.

Expanding MedGraphRAG

  • Temporal Knowledge Graphs

  • Real-time physiological data streams

  • Standardized clinical risk-scoring systems.

  • The base idea is to augment the 3-tier structure of MedGraphRAG with time-stamped edges and agentic reasoning loops.

  • The static meta-graphs in MedGraphRAG can be evolved into [[Temporal KGs]] that can model a patient's health trajecteroy as a sequence of state-dependent snapshots.

  • In our expansion, we can have a patient-centred graph that is defined by specific temporal and causal relationships.