CodeGraph | Devpost

CodeGraph workflow

Inspiration

The CodeGraph project was inspired by two major challenges: 1) helping developers understand complex codebases visually, and 2) providing language models with efficient code context to enhance their programming assistance. Traditional code exploration tools often fail to capture relationships between components, while LLMs struggle with the context limits when analyzing large codebases. We wanted to create a tool that would not only make code exploration more intuitive for humans but also provide LLMs with a lightweight, structured graph representation that enables faster and more accurate code understanding and generation.

What it does

CodeGraph creates an interactive visual knowledge graph of Python code relationships, showing files, functions, and their connections. Most importantly, it provides:

A lightweight context graph for LLMs that dramatically improves their understanding of code structure without consuming excessive context window space
Semantic search capabilities allowing AI assistants to quickly find relevant code sections
MCP server integration enabling AI assistants like Claude and Cline to directly access and navigate code relationships
Interactive web-based visualization for human developers
Database storage for persistent graph data

By providing this structured graph representation to LLMs, CodeGraph enables them to:

Understand code architecture without needing the entire codebase in context
Navigate effectively between related code components
Maintain awareness of dependencies and relationships
Generate more contextually appropriate code solutions
Provide more accurate and helpful responses to code-related questions

How we built it

We built CodeGraph using a combination of:

Python for core functionality and parsing
NetworkX for graph data structures and algorithms
Pyvis for interactive web visualizations
MongoDB for metadata storage
Vector database (Qdrant) for embeddings and semantic search
Sentence Transformers for generating vector embeddings
FastMCP for the Model Context Protocol server implementation

The architecture prioritizes creating an efficient, information-dense graph representation that provides maximum context to LLMs while minimizing token usage.

Challenges we ran into

Throughout development, we faced several challenges:

Designing a graph representation that balances information density with token efficiency for LLMs
Determining the optimal level of abstraction for code relationships
Parsing complex Python code accurately across multiple files
Creating effective vector embeddings that capture code semantics
Implementing the MCP server to make the tool accessible to AI assistants
Optimizing performance with large projects that generate complex graphs

Accomplishments that we're proud of

We're particularly proud of:

Creating a lightweight graph representation that significantly improves LLM code understanding
Seamless integration with AI assistants through the MCP protocol
Semantic search capability that allows finding code by describing functionality
The efficient token usage that allows LLMs to understand large codebases
Real-time updates that reflect changes to the codebase

What we learned

This project taught us valuable lessons about:

Optimizing knowledge representations for LLM context windows
Graph theory and visualization techniques
Vector embeddings and semantic search implementation
How LLMs understand and process code relationships
Building tools that enhance AI capabilities
MCP server development and integration with AI assistants

What's next for CodeGraph

Even more efficient graph representations for LLMs
Automated abstraction levels that adjust based on codebase size and complexity
Support for additional programming languages beyond Python
Integration with development environments (IDEs)
Advanced query capabilities allowing LLMs to navigate code relationships through natural language
Performance optimizations for handling enterprise-scale codebases
Expanded MCP capabilities for deeper AI integration
The core focus will remain on optimizing how we can provide rich, structured code context to LLMs in ways that minimize token usage while maximizing code understanding.

Built With

database
mcp
mongodb
networkx
python
pyvis
vector

Updates

harsh ladani started this project — Apr 13, 2025 01:22 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.