Multimodal GraphRAG

Graph-based retrieval-augmented generation connecting visual and textual content

Most RAG systems operate over text. Real-world knowledge is multimodal — documents contain figures, tables, diagrams, and images that carry information not captured in surrounding text. We are building graph-augmented retrieval pipelines that connect visual and textual content, enabling cross-modal queries and generation.

The graph structure links text passages to associated figures and captions, captures spatial and semantic relationships between multimodal elements, and serves as the retrieval backbone for a generation system that can draw on both modalities.

Team: Anirudh Chaitanya

External Collaborator: Pradeep Bolledu