Inspiration

The CodeGraph project was inspired by two major challenges: 1) helping developers understand complex codebases visually, and 2) providing language models with efficient code context to enhance their programming assistance. Traditional code exploration tools often fail to capture relationships between components, while LLMs struggle with the context limits when analyzing large codebases. We wanted to create a tool that would not only make code exploration more intuitive for humans but also provide LLMs with a lightweight, structured graph representation that enables faster and more accurate code understanding and generation.

What it does

CodeGraph creates an interactive visual knowledge graph of Python code relationships, showing files, functions, and their connections. Most importantly, it provides:

  • A lightweight context graph for LLMs that dramatically improves their understanding of code structure without consuming excessive context window space
  • Semantic search capabilities allowing AI assistants to quickly find relevant code sections
  • MCP server integration enabling AI assistants like Claude and Cline to directly access and navigate code relationships
  • Interactive web-based visualization for human developers
  • Database storage for persistent graph data

By providing this structured graph representation to LLMs, CodeGraph enables them to:

  • Understand code architecture without needing the entire codebase in context
  • Navigate effectively between related code components
  • Maintain awareness of dependencies and relationships
  • Generate more contextually appropriate code solutions
  • Provide more accurate and helpful responses to code-related questions

How we built it

We built CodeGraph using a combination of:

  • Python for core functionality and parsing
  • NetworkX for graph data structures and algorithms
  • Pyvis for interactive web visualizations
  • MongoDB for metadata storage
  • Vector database (Qdrant) for embeddings and semantic search
  • Sentence Transformers for generating vector embeddings
  • FastMCP for the Model Context Protocol server implementation

The architecture prioritizes creating an efficient, information-dense graph representation that provides maximum context to LLMs while minimizing token usage.

Challenges we ran into

Throughout development, we faced several challenges:

  • Designing a graph representation that balances information density with token efficiency for LLMs
  • Determining the optimal level of abstraction for code relationships
  • Parsing complex Python code accurately across multiple files
  • Creating effective vector embeddings that capture code semantics
  • Implementing the MCP server to make the tool accessible to AI assistants
  • Optimizing performance with large projects that generate complex graphs

Accomplishments that we're proud of

We're particularly proud of:

  • Creating a lightweight graph representation that significantly improves LLM code understanding
  • Seamless integration with AI assistants through the MCP protocol
  • Semantic search capability that allows finding code by describing functionality
  • The efficient token usage that allows LLMs to understand large codebases
  • Real-time updates that reflect changes to the codebase

What we learned

This project taught us valuable lessons about:

  • Optimizing knowledge representations for LLM context windows
  • Graph theory and visualization techniques
  • Vector embeddings and semantic search implementation
  • How LLMs understand and process code relationships
  • Building tools that enhance AI capabilities
  • MCP server development and integration with AI assistants

What's next for CodeGraph

  • Even more efficient graph representations for LLMs
  • Automated abstraction levels that adjust based on codebase size and complexity
  • Support for additional programming languages beyond Python
  • Integration with development environments (IDEs)
  • Advanced query capabilities allowing LLMs to navigate code relationships through natural language
  • Performance optimizations for handling enterprise-scale codebases
  • Expanded MCP capabilities for deeper AI integration
  • The core focus will remain on optimizing how we can provide rich, structured code context to LLMs in ways that minimize token usage while maximizing code understanding.

Built With

Share this project:

Updates