Skip to content

AveryClapp/Cache-Explorer

Cache Explorer

CI GitHub release GitHub stars License: MIT

Visualize CPU cache behavior in real-time. See exactly which lines of your code cause cache misses.

Demo

Why Cache Explorer?

Before: "Why is my code slow?" → Guesswork, profilers, prayer

After: Exact line-by-line cache miss attribution

demo-vid.mp4

Quick Start

Docker (Easiest)

git clone https://github.com/AveryClapp/cache-explorer.git
cd cache-explorer
docker-compose up --build
# Open http://localhost:8080

From Source

git clone https://github.com/AveryClapp/cache-explorer.git
cd cache-explorer

# Build (requires LLVM 18, CMake, Ninja)
cd backend/cache-simulator && mkdir -p build && cd build && cmake .. -G Ninja && ninja && cd ../../..
cd backend/llvm-pass && mkdir -p build && cd build && cmake .. -G Ninja -DLLVM_DIR=$(llvm-config --cmakedir) && ninja && cd ../../..
cd backend/runtime && mkdir -p build && cd build && cmake .. -G Ninja && ninja && cd ../../..

# Run
cd backend/server && npm install && node server.js &
cd frontend && npm install && npm run dev
# Open http://localhost:5173

CLI Only

./backend/scripts/cache-explore mycode.c --config intel --json

Features

  • Source-level attribution - See exactly which line caused each cache miss
  • 3C miss classification - Compulsory, Capacity, Conflict breakdown
  • MESI coherence - Full multi-core cache coherence simulation
  • False sharing detection - Find hidden performance killers in threaded code
  • 6 prefetch policies - None, Next-line, Stream, Stride, Adaptive, Intel DCU
  • 14 hardware presets - Intel, AMD, Apple Silicon, ARM, Educational
  • Real-time visualization - WebSocket streaming to interactive UI
  • Works offline - No cloud, no rate limits, your code stays local

Hardware Presets

Vendor Presets
Intel 12th Gen, 14th Gen, Xeon, Sapphire Rapids
AMD Zen 3, Zen 4, EPYC
Apple M1, M2, M3
ARM AWS Graviton 3, Raspberry Pi 4
Learning Educational (tiny caches to see misses easily)

How It Works

Source Code (.c/.cpp)
        │
        ▼
┌───────────────────────┐
│  LLVM Pass            │  Instruments every load/store
└───────────────────────┘
        │
        ▼
┌───────────────────────┐
│  Runtime Library      │  Captures: address, size, file:line
└───────────────────────┘
        │
        ▼
┌───────────────────────┐
│  Cache Simulator      │  MESI coherence, prefetching, TLB
└───────────────────────┘
        │
        ▼
┌───────────────────────┐
│  Web UI / JSON        │  Real-time visualization
└───────────────────────┘

Installation

Prerequisites

  • LLVM 17-21 (18 recommended)
  • CMake 3.20+
  • Ninja (optional but faster)
  • Node.js 18+ (for web UI)

macOS

brew install llvm@18 cmake ninja node
export PATH="/opt/homebrew/opt/llvm@18/bin:$PATH"

Ubuntu/Debian

wget https://apt.llvm.org/llvm.sh
chmod +x llvm.sh
sudo ./llvm.sh 18
sudo apt install cmake ninja-build nodejs npm

Build

git clone https://github.com/AveryClapp/cache-explorer.git
cd cache-explorer

# Build everything
cd backend/cache-simulator && mkdir -p build && cd build
cmake .. -G Ninja && ninja && cd ../../..

cd backend/llvm-pass && mkdir -p build && cd build
cmake .. -G Ninja -DLLVM_DIR=$(llvm-config --cmakedir) && ninja && cd ../../..

cd backend/runtime && mkdir -p build && cd build
cmake .. -G Ninja && ninja && cd ../../..

# Start the server
cd backend/server && npm install && node server.js &

# Start the frontend
cd frontend && npm install && npm run dev

CLI Usage

# Basic analysis
cache-explore mycode.c --config intel

# With prefetching simulation
cache-explore mycode.c --config amd --prefetch stream

# Fast mode (3x faster, skips 3C classification)
cache-explore mycode.c --fast

# JSON output for scripting
cache-explore mycode.c --json

# Custom optimization level
cache-explore mycode.c -O3 --config apple

Running Tests

cd backend/cache-simulator/build
./CacheLevelTest        # 22 tests
./CacheSystemTest       # 25 tests
./MESICoherenceTest     # 19 tests
./MultiCorePrefetchTest # 18 tests
./MultiCoreTLBTest      # 8 tests
./AdvancedInstrumentationTest # 31 tests
# Total: 123 tests

Limitations

  • Requires recompilation - Can't trace pre-compiled binaries (use Intel Pin for that)
  • No speculative execution - All accesses treated as committed
  • Single socket - No NUMA simulation

Contributing

See CONTRIBUTING.md for guidelines.

License

MIT - See LICENSE for details.

Acknowledgments

Inspired by Compiler Explorer and Cachegrind.