Multimodal GraphRAG
Graph-based retrieval-augmented generation connecting visual and textual content
Most RAG systems operate over text. Real-world knowledge is multimodal — documents contain figures, tables, diagrams, and images that carry information not captured in surrounding text. We are building graph-augmented retrieval pipelines that connect visual and textual content, enabling cross-modal queries and generation.
The graph structure links text passages to associated figures and captions, captures spatial and semantic relationships between multimodal elements, and serves as the retrieval backbone for a generation system that can draw on both modalities.
Team: Anirudh Chaitanya
External Collaborator: Pradeep Bolledu