May 10, 20262026

MindMap-AI — Semantic Research Graph for Grounded PDF Intelligence

MindMap-AI converts academic PDFs into a Neo4j-backed semantic graph and answers questions using graph-selected evidence instead of ungrounded document summaries. The system links answers back to citations, graph relationships, and source PDF passages.

Role
Full-Stack Engineer & System Designer
Stack
FastAPI · Python · Neo4j · OpenAI API · Next.js · React · TypeScript · Zustand · React Force Graph · React PDF · TailwindCSS

Problem

Traditional document RAG systems often treat PDFs as flat text chunks and send large portions of the document directly to an LLM. This approach makes source tracing difficult, weakens relationship awareness between concepts, and increases the risk of unsupported or hallucinated answers. Research documents also contain interconnected entities, citations, claims, and semantic relationships that are difficult to explore through linear retrieval pipelines alone.

Solution

MindMap-AI converts research documents into a Neo4j-backed semantic graph through a multi-stage ingestion pipeline that includes parsing, semantic extraction, normalization, evidence linking, and graph construction. Instead of relying on raw-context prompting, the system retrieves graph-selected evidence through semantic traversal and evidence ranking before composing grounded answers. The frontend connects graph exploration, citations, source passages, and PDF highlighting into a unified research workflow.

Decisions

  • Used Neo4j because semantic traversal and relationship querying are core system requirements
  • Used relation instances instead of simple graph edges to preserve evidence and provenance
  • Separated extraction from parsing so document structure changes would not tightly couple the pipeline
  • Avoided full-document prompting to reduce hallucinated responses and improve traceability
  • Used Zustand for lightweight interaction-heavy frontend state management
  • Prioritized evidence-first retrieval before answer generation to improve answer grounding

Architecture highlights

  • Separated PDF parsing, extraction, normalization, and graph writing into independent ingestion stages
  • Implemented relation-instance graph modeling to attach evidence and provenance directly to claims
  • Built graph-based retrieval before LLM synthesis to improve grounding and reduce hallucinations
  • Designed evidence-aware query orchestration with ranking and clustering stages
  • Connected semantic answers back to graph nodes, citations, and highlighted PDF passages
  • Created an interactive frontend workflow for graph exploration and semantic inspection

Outcomes

  • Built an end-to-end PDF-to-graph semantic ingestion workflow
  • Implemented grounded semantic query answering with citation-aware retrieval
  • Created interactive graph exploration connected to PDF evidence navigation
  • Established backend, frontend, and e2e testing structure for semantic workflows
  • Designed a modular architecture that allows independent evolution of parsing and retrieval layers