AI-powered agent that autonomously discovers and exploits smart contract vulnerabilities using xAI's Grok.
Tested on 533 real-world contracts | 30+ successful exploits | ~$20M total USD value exploited | Fully autonomous workflow
Docker-based sandbox supporting Live Contracts (on-chain forking) and CTF Challenges (Damn Vulnerable DeFi)
- Access Control - Shezmu ($4.9M), TempleDao ($2.3M), SuperRare ($730K), DEPUSDT_LEVUSDC ($105K), Cftoken
- Flash Loan Attacks - ChiSale ($16.3K), CFC, NovaXM2E (2.86 ETH)
- Reentrancy - Cream Finance, Convergence
- Logic Flaws - DeezNutz404, Novo (4.75M tokens)
- Price Oracle Manipulation - Multiple DeFi protocols
Note: Most exploits target contracts deployed after Grok's knowledge cutoff. The agent operates without web search tools, relying solely on source code analysis and reasoning. Full chain-of-thought reasoning and execution logs available in
reports/directory.
┌─────────────────────┐
│ Python Agent │ ← Grok API (ReAct reasoning)
│ (Host Machine) │
└─────────┬───────────┘
│
├─── workspace/ (shared volume)
│ ├── contract_context/ # Source code, ABI, state
│ ├── memory/ # Persistent learning
│ └── Exploit.sol # Generated exploits
│
▼
┌─────────────────────┐
│ Docker Container │
│ ├── Foundry │ ← Solidity compilation & testing
│ ├── Heimdall │ ← Bytecode decompilation
│ ├── Anvil │ ← Blockchain forking
│ └── Slither │ ← Static analysis
└─────────────────────┘
One of the key technical challenges is creating a perfectly isolated sandbox that can replay any historical blockchain state:
Multi-Chain Support:
- ✅ Ethereum Mainnet - Fork and replay at any block height
- ✅ BSC (Binance Smart Chain) - Full state reproduction
Technical Implementation:
- Anvil Forking - Uses Foundry's Anvil to fork live blockchains at specific block heights
- State Snapshot - Captures contract storage, balances, and deployment data at target block
- Perfect Isolation - Each exploit runs in a fresh Docker container with no cross-contamination
- Sub-second Startup - Container spins up and forks blockchain in <5 seconds
Why This is Hard:
- Must handle billions of blocks of historical data via RPC
- Preserve exact state including storage slots, balances, nonces
- Handle proxy contracts, upgradeable patterns, and complex dependencies
- Support both verified contracts (source available) and unverified (bytecode decompilation)
This enables testing exploits against real production state at the exact block height when vulnerabilities existed.
The agent operates in a shared workspace/ directory with access to:
Available Tools:
execute- Run bash commands in Docker (forge, cast, slither, etc.)read- Read files (source code, ABI, state snapshots)write- Create/modify Solidity exploit fileslist_dir- Browse contract context and memorygrep- Search patterns in source code
Workspace Structure:
workspace/
├── contract_context/ # Target contract data
│ └── 0xAddress/
│ ├── source/*.sol # Contract source code
│ ├── abi.json # Contract ABI
│ └── state_at_block.json # On-chain state snapshot
├── memory/ # Persistent learning
│ ├── global/strategies/ # Exploit patterns library
│ └── current_case/ # Current task context
├── Exploit.sol # Agent writes exploit here
└── foundry.toml # Forge configuration
# Clone the repo
git clone <repo-url>
cd <repo-name>
# Install dependencies
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
playwright install chromium
# Configure API keys
cp env.example .env
# Edit .env with your XAI_API_KEY, ETH_RPC_URL, ETHERSCAN_API_KEYdocker build -t exploit-agent .# Verify sandbox environment
python tests/test_sandbox.py
# Test agent tools
python tests/test_tools.pyBest for learning and benchmarking. No blockchain access needed for most challenges.
Step 1: Setup DVD (one-time)
# Clone DVD repo into workspace
cd workspace
git clone https://github.com/theredguild/damn-vulnerable-defi.git dvd
cd dvd
git submodule update --init --recursive
cd ../..
# Prepare all 18 challenges
python prepare_context.py ctf --allStep 2: Run a Challenge
Example with unstoppable (easy difficulty):
# First run (creates fresh memory)
python run_agent.py contracts/ctf/unstoppable.json
# Retry same challenge (memory preserved - agent remembers previous attempts)
python run_agent.py contracts/ctf/unstoppable.json
# Switch to different challenge (clear short-term memory with --reset)
python run_agent.py contracts/ctf/truster.json --resetThe agent will:
- Read the vulnerable contract source code
- Analyze the vulnerability
- Write exploit code in the test file
- Run
forge testto verify
Step 3: Batch Evaluation (Optional)
Run the agent on multiple challenges to benchmark performance.
Each challenge gets a fresh current_case/ memory, but global/strategies/ accumulates across challenges (agent learns from previous exploits).
# List all 18 challenges
python -m evaluation.run_ctf_eval --list
# Batch run all challenges
python -m evaluation.run_ctf_eval --all
# Batch run by difficulty (easy/medium/hard/expert)
python -m evaluation.run_ctf_eval --difficulty easy
# Run single challenge
python -m evaluation.run_ctf_eval --challenge unstoppableGenerates summary report with pass rates by difficulty/category. Results saved to runs/ctf_eval/.
For exploiting real on-chain contracts. Requires RPC access.
Step 1: Create Contract Config
Create contracts/live/my_contract.json:
{
"type": "live",
"name": "MyContract",
"address": "0x1234567890123456789012345678901234567890",
"description": "Description of the vulnerability",
"chain_id": 1,
"chain_name": "ethereum",
"block_height": 21450000,
"hints": "Optional hints for the agent..."
}Step 2: Prepare Context
python prepare_context.py live contracts/live/my_contract.jsonThis downloads:
- Contract source code from Etherscan (or decompiles via Heimdall if unverified)
- ABI and deployment info
- Current state variables
- Proxy detection (if applicable)
Step 3: Run Agent
Example with sorra_staking:
# First run
python run_agent.py contracts/live/sorra_staking.json
# Retry same contract (memory preserved)
python run_agent.py contracts/live/sorra_staking.json
# Switch to different contract (clear short-term memory)
python run_agent.py contracts/live/other_contract.json --resetThe agent will:
- Fork the blockchain at the specified block
- Fund the agent address with ETH
- Analyze the contract and execute exploit
- Report balance changes
Results saved to runs/live_<name>_<timestamp>/.
| Difficulty | Challenges |
|---|---|
| Easy | unstoppable, naive-receiver, truster, side-entrance |
| Medium | the-rewarder, selfie, compromised, puppet, puppet-v2 |
| Hard | free-rider, backdoor, climber, wallet-mining |
| Expert | puppet-v3, abi-smuggling, shards, curvy-puppet, withdrawal |
The agent uses filesystem-based memory for persistent learning:
workspace/memory/
├── global/strategies/ # Long-term: persists across all runs
│ ├── reentrancy.md # Agent writes learned patterns here
│ ├── flash_loan.md
│ └── ...
└── current_case/ # Short-term: current task only
├── todo.md # Task tracking
└── attempts.md # Exploit attempt log
global/strategies/: Agent writes learned exploit patterns here after successful exploits. Accumulates across batch runs.current_case/: Preserved on retry, cleared with--resetor automatically in batch mode.
Automated scanners that discovered 200+ high-risk contracts on BSC blockchain.
Scan for vulnerable upgradeable proxy contracts (uninitialized implementations).
python searcher/continuous_scan.py --chain bsc --from-block 59000000 --to-block 60000000Events scanned: EIP-1967 (Upgraded, AdminChanged, BeaconUpgraded), Diamond (DiamondCut)
Critical signal: UNINITIALIZED_IMPL - attacker can call initialize() to take ownership.
Scan for ERC20 tokens with dangerous burn functions (rug pull patterns).
python searcher/burn_scanner.py --chain bsc --from-block 59000000 --to-block 60000000How it works:
- Scan
PairCreatedevents from DEX factories (PancakeSwap, etc.) - Get bytecode of each new token
- Check for dangerous burn function selectors
Dangerous patterns (24 signatures):
| Risk | Functions |
|---|---|
| 🔴 HIGH | burnPair(address), burn(address,uint256), burnLP(address), destroyPair(address), burnAll(address) |
| 🟠 MEDIUM | burnFrom(address,uint256), emergencyBurn, forceBurn, adminBurn |
Exploit: Call burnPair(lpAddress) → burns LP tokens → sync() → swap remaining tokens for all liquidity.
.
├── agents/
│ ├── react_agent.py # ReAct agent (live + CTF)
│ ├── prompts.py # Live contract prompts
│ ├── ctf_prompts.py # CTF prompts (no hints)
│ ├── tools.py # bash, grep, file_ops, forge
│ └── memory.py # Memory system (filesystem-based)
│
├── prepare/ # Context preparation
│ ├── live_context.py # Live contract context
│ └── ctf_context.py # CTF challenge context
│
├── evaluation/ # Batch evaluation (benchmark)
│ ├── ctf_evaluator.py # CTF batch evaluator
│ └── run_ctf_eval.py # CLI for batch runs
│
├── searcher/ # Vulnerability scanners
│ ├── continuous_scan.py # Proxy scanner (EIP-1967, Diamond)
│ ├── burn_scanner.py # ERC20 burn function scanner
│ └── proxy_scanner.py # Single address scanner
│
├── contracts/ # Config files (input)
│ ├── live/ # Live contract configs (name, address, chain)
│ └── ctf/ # CTF configs (name, difficulty, test_function)
│
├── workspace/ # Prepared context (generated)
│ ├── dvd/ # DVD repo (Damn Vulnerable DeFi)
│ ├── ctf_context/ # CTF context (source code, README, test file)
│ ├── contract_context/ # Live contract context (ABI, source, state)
│ └── memory/ # Agent memory (persistent)
│ ├── global/strategies/ # Long-term: learned exploit patterns
│ └── current_case/ # Short-term: current task todo/attempts
├── tests/ # Test files
├── runs/ # Results
│
├── prepare_context.py # Unified preparation entry
├── run_agent.py # Main agent runner
└── Dockerfile # Sandbox image
| Variable | Required For | Description |
|---|---|---|
XAI_API_KEY |
All | Grok API key |
ETH_RPC_URL |
Live (ETH) | Ethereum RPC (Alchemy) |
BSC_RPC_URL |
Live (BSC) | BSC RPC (Alchemy) |
ETHERSCAN_API_KEY |
Live | Contract source download |
MIT