Skip to content

ZhimaoL/surf-hack

Repository files navigation

Smart Contract Exploitation Agent

AI-powered agent that autonomously discovers and exploits smart contract vulnerabilities using xAI's Grok.

Tested on 533 real-world contracts | 30+ successful exploits | ~$20M total USD value exploited | Fully autonomous workflow

Docker-based sandbox supporting Live Contracts (on-chain forking) and CTF Challenges (Damn Vulnerable DeFi)

Successful Exploit Categories

  • Access Control - Shezmu ($4.9M), TempleDao ($2.3M), SuperRare ($730K), DEPUSDT_LEVUSDC ($105K), Cftoken
  • Flash Loan Attacks - ChiSale ($16.3K), CFC, NovaXM2E (2.86 ETH)
  • Reentrancy - Cream Finance, Convergence
  • Logic Flaws - DeezNutz404, Novo (4.75M tokens)
  • Price Oracle Manipulation - Multiple DeFi protocols

Note: Most exploits target contracts deployed after Grok's knowledge cutoff. The agent operates without web search tools, relying solely on source code analysis and reasoning. Full chain-of-thought reasoning and execution logs available in reports/ directory.

Architecture

┌─────────────────────┐
│   Python Agent      │  ← Grok API (ReAct reasoning)
│   (Host Machine)    │
└─────────┬───────────┘
          │
          ├─── workspace/ (shared volume)
          │    ├── contract_context/    # Source code, ABI, state
          │    ├── memory/              # Persistent learning
          │    └── Exploit.sol          # Generated exploits
          │
          ▼
┌─────────────────────┐
│  Docker Container   │
│  ├── Foundry        │  ← Solidity compilation & testing
│  ├── Heimdall       │  ← Bytecode decompilation
│  ├── Anvil          │  ← Blockchain forking
│  └── Slither        │  ← Static analysis
└─────────────────────┘

Blockchain Fork & Isolated Execution

One of the key technical challenges is creating a perfectly isolated sandbox that can replay any historical blockchain state:

Multi-Chain Support:

  • Ethereum Mainnet - Fork and replay at any block height
  • BSC (Binance Smart Chain) - Full state reproduction

Technical Implementation:

  • Anvil Forking - Uses Foundry's Anvil to fork live blockchains at specific block heights
  • State Snapshot - Captures contract storage, balances, and deployment data at target block
  • Perfect Isolation - Each exploit runs in a fresh Docker container with no cross-contamination
  • Sub-second Startup - Container spins up and forks blockchain in <5 seconds

Why This is Hard:

  • Must handle billions of blocks of historical data via RPC
  • Preserve exact state including storage slots, balances, nonces
  • Handle proxy contracts, upgradeable patterns, and complex dependencies
  • Support both verified contracts (source available) and unverified (bytecode decompilation)

This enables testing exploits against real production state at the exact block height when vulnerabilities existed.


Agent Workspace & Tools

The agent operates in a shared workspace/ directory with access to:

Available Tools:

  • execute - Run bash commands in Docker (forge, cast, slither, etc.)
  • read - Read files (source code, ABI, state snapshots)
  • write - Create/modify Solidity exploit files
  • list_dir - Browse contract context and memory
  • grep - Search patterns in source code

Workspace Structure:

workspace/
├── contract_context/           # Target contract data
│   └── 0xAddress/
│       ├── source/*.sol        # Contract source code
│       ├── abi.json           # Contract ABI
│       └── state_at_block.json # On-chain state snapshot
├── memory/                     # Persistent learning
│   ├── global/strategies/      # Exploit patterns library
│   └── current_case/          # Current task context
├── Exploit.sol                # Agent writes exploit here
└── foundry.toml               # Forge configuration

Quick Start

Prerequisites

# Clone the repo
git clone <repo-url>
cd <repo-name>

# Install dependencies
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
playwright install chromium

# Configure API keys
cp env.example .env
# Edit .env with your XAI_API_KEY, ETH_RPC_URL, ETHERSCAN_API_KEY

Build Docker Image

docker build -t exploit-agent .

Test the Setup

# Verify sandbox environment
python tests/test_sandbox.py

# Test agent tools
python tests/test_tools.py

Usage

Option A: CTF Mode (Damn Vulnerable DeFi)

Best for learning and benchmarking. No blockchain access needed for most challenges.

Step 1: Setup DVD (one-time)

# Clone DVD repo into workspace
cd workspace
git clone https://github.com/theredguild/damn-vulnerable-defi.git dvd
cd dvd
git submodule update --init --recursive
cd ../..

# Prepare all 18 challenges
python prepare_context.py ctf --all

Step 2: Run a Challenge

Example with unstoppable (easy difficulty):

# First run (creates fresh memory)
python run_agent.py contracts/ctf/unstoppable.json

# Retry same challenge (memory preserved - agent remembers previous attempts)
python run_agent.py contracts/ctf/unstoppable.json

# Switch to different challenge (clear short-term memory with --reset)
python run_agent.py contracts/ctf/truster.json --reset

The agent will:

  1. Read the vulnerable contract source code
  2. Analyze the vulnerability
  3. Write exploit code in the test file
  4. Run forge test to verify

Step 3: Batch Evaluation (Optional)

Run the agent on multiple challenges to benchmark performance.

Each challenge gets a fresh current_case/ memory, but global/strategies/ accumulates across challenges (agent learns from previous exploits).

# List all 18 challenges
python -m evaluation.run_ctf_eval --list

# Batch run all challenges
python -m evaluation.run_ctf_eval --all

# Batch run by difficulty (easy/medium/hard/expert)
python -m evaluation.run_ctf_eval --difficulty easy

# Run single challenge
python -m evaluation.run_ctf_eval --challenge unstoppable

Generates summary report with pass rates by difficulty/category. Results saved to runs/ctf_eval/.


Option B: Live Contract Mode

For exploiting real on-chain contracts. Requires RPC access.

Step 1: Create Contract Config

Create contracts/live/my_contract.json:

{
  "type": "live",
  "name": "MyContract",
  "address": "0x1234567890123456789012345678901234567890",
  "description": "Description of the vulnerability",
  "chain_id": 1,
  "chain_name": "ethereum",
  "block_height": 21450000,
  "hints": "Optional hints for the agent..."
}

Step 2: Prepare Context

python prepare_context.py live contracts/live/my_contract.json

This downloads:

  • Contract source code from Etherscan (or decompiles via Heimdall if unverified)
  • ABI and deployment info
  • Current state variables
  • Proxy detection (if applicable)

Step 3: Run Agent

Example with sorra_staking:

# First run
python run_agent.py contracts/live/sorra_staking.json

# Retry same contract (memory preserved)
python run_agent.py contracts/live/sorra_staking.json

# Switch to different contract (clear short-term memory)
python run_agent.py contracts/live/other_contract.json --reset

The agent will:

  1. Fork the blockchain at the specified block
  2. Fund the agent address with ETH
  3. Analyze the contract and execute exploit
  4. Report balance changes

Results saved to runs/live_<name>_<timestamp>/.


CTF Challenges Reference

Difficulty Challenges
Easy unstoppable, naive-receiver, truster, side-entrance
Medium the-rewarder, selfie, compromised, puppet, puppet-v2
Hard free-rider, backdoor, climber, wallet-mining
Expert puppet-v3, abi-smuggling, shards, curvy-puppet, withdrawal

Memory System

The agent uses filesystem-based memory for persistent learning:

workspace/memory/
├── global/strategies/     # Long-term: persists across all runs
│   ├── reentrancy.md      # Agent writes learned patterns here
│   ├── flash_loan.md
│   └── ...
└── current_case/          # Short-term: current task only
    ├── todo.md            # Task tracking
    └── attempts.md        # Exploit attempt log
  • global/strategies/: Agent writes learned exploit patterns here after successful exploits. Accumulates across batch runs.
  • current_case/: Preserved on retry, cleared with --reset or automatically in batch mode.

Vulnerability Searchers

Automated scanners that discovered 200+ high-risk contracts on BSC blockchain.

Proxy Scanner

Scan for vulnerable upgradeable proxy contracts (uninitialized implementations).

python searcher/continuous_scan.py --chain bsc --from-block 59000000 --to-block 60000000

Events scanned: EIP-1967 (Upgraded, AdminChanged, BeaconUpgraded), Diamond (DiamondCut)

Critical signal: UNINITIALIZED_IMPL - attacker can call initialize() to take ownership.


Burn Scanner

Scan for ERC20 tokens with dangerous burn functions (rug pull patterns).

python searcher/burn_scanner.py --chain bsc --from-block 59000000 --to-block 60000000

How it works:

  1. Scan PairCreated events from DEX factories (PancakeSwap, etc.)
  2. Get bytecode of each new token
  3. Check for dangerous burn function selectors

Dangerous patterns (24 signatures):

Risk Functions
🔴 HIGH burnPair(address), burn(address,uint256), burnLP(address), destroyPair(address), burnAll(address)
🟠 MEDIUM burnFrom(address,uint256), emergencyBurn, forceBurn, adminBurn

Exploit: Call burnPair(lpAddress) → burns LP tokens → sync() → swap remaining tokens for all liquidity.


Project Structure

.
├── agents/
│   ├── react_agent.py          # ReAct agent (live + CTF)
│   ├── prompts.py              # Live contract prompts
│   ├── ctf_prompts.py          # CTF prompts (no hints)
│   ├── tools.py                # bash, grep, file_ops, forge
│   └── memory.py               # Memory system (filesystem-based)
│
├── prepare/                    # Context preparation
│   ├── live_context.py         # Live contract context
│   └── ctf_context.py          # CTF challenge context
│
├── evaluation/                 # Batch evaluation (benchmark)
│   ├── ctf_evaluator.py        # CTF batch evaluator
│   └── run_ctf_eval.py         # CLI for batch runs
│
├── searcher/                   # Vulnerability scanners
│   ├── continuous_scan.py      # Proxy scanner (EIP-1967, Diamond)
│   ├── burn_scanner.py         # ERC20 burn function scanner
│   └── proxy_scanner.py        # Single address scanner
│
├── contracts/                  # Config files (input)
│   ├── live/                   # Live contract configs (name, address, chain)
│   └── ctf/                    # CTF configs (name, difficulty, test_function)
│
├── workspace/                  # Prepared context (generated)
│   ├── dvd/                    # DVD repo (Damn Vulnerable DeFi)
│   ├── ctf_context/            # CTF context (source code, README, test file)
│   ├── contract_context/       # Live contract context (ABI, source, state)
│   └── memory/                 # Agent memory (persistent)
│       ├── global/strategies/  # Long-term: learned exploit patterns
│       └── current_case/       # Short-term: current task todo/attempts
├── tests/                      # Test files
├── runs/                       # Results
│
├── prepare_context.py          # Unified preparation entry
├── run_agent.py               # Main agent runner
└── Dockerfile                  # Sandbox image

Environment Variables

Variable Required For Description
XAI_API_KEY All Grok API key
ETH_RPC_URL Live (ETH) Ethereum RPC (Alchemy)
BSC_RPC_URL Live (BSC) BSC RPC (Alchemy)
ETHERSCAN_API_KEY Live Contract source download

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages