Skip to content

Tanush-A/Refactorika

Repository files navigation

Refactorika

📚 Full documentation: docs/. New here? Refactorika has two branchesworking (the demo: MCP agent harness + four-arm benchmark) and main (this branch: the v3 graph engine). Read docs/branches.md first.

A graph-driven, verified refactoring engine for Python. Point it at a repo; it builds a reference-correct model of the whole program, plans a safe dependency order, applies deterministic transforms, and proves nothing broke — committing each verified change and reverting anything that fails its tests.

The pitch: refactoring is a whole-program graph problem, not a per-file one. The LLM brings judgment, deterministic engines bring correctness at scale, the graph connects them, and the test suite proves behavior is preserved.

What makes it correct

  • Real reference resolution, not regex. A symbol graph built from Jedi static analysis: a rename updates every true reference and nothing that merely shares the name; dead code is removed only when reachability proves it dead.
  • Deterministic engines own the edits. rope (cross-file rename), LibCST (node replacement, dead-code removal), ruff + autoflake (cleanup). The LLM emits compact specs, never diffs.
  • Verified, then committed. Every edit passes parse → ruff → pyright → pytest (tests impact-scoped to what the change can affect) before git commit; any failure reverts the files byte-for-byte. The full suite gates the run at baseline and finale.
  • Efficient by construction. Leaf-to-root ordering means each step builds on already-verified code; only impacted tests run per edit.

Quickstart

Full how-to (every flag, the MCP tools, the agent campaign, Redis, eval): docs/usage.md.

python3 -m venv .venv
.venv/bin/python -m pip install -e ".[dev]"        # add ".[semantic]" for embeddings/vector recall
docker compose up -d redis                          # optional: live Redis (redis-stack); else JSON fallback

# Dry-run on the bundled messy repo (no changes written): see the plan, the verified
# edits (dead code removed + cleanup), and the before/after metrics.
.venv/bin/refactorika demo_repo

# Inspect without running:
.venv/bin/refactorika demo_repo --show-graph     # the symbol graph, entry points, dead code
.venv/bin/refactorika demo_repo --show-plan      # the leaf-to-root worklist

# Apply in place (commits each verified edit to git):
.venv/bin/refactorika demo_repo --apply

# Reference-correct rename across the whole repo (the centerpiece) — deterministic:
.venv/bin/refactorika demo_repo --rename orders.compute_total=calculate_order_total

# Add LLM judgment: god-function decomposition with consistent naming via decision memory.
# The first run with ANTHROPIC_API_KEY records responses to .refactorika/llm_cache.json;
# subsequent runs replay that cache offline (no key needed).
.venv/bin/refactorika demo_repo --llm

# Run the agentic campaign (audit -> plan -> specialist agents) through the verified engine.
# Applies in place; complexity decomposition needs ANTHROPIC_API_KEY.
.venv/bin/refactorika demo_repo --agents

# Inspect decision memory / semantic neighbors:
.venv/bin/refactorika demo_repo --show-memory
.venv/bin/refactorika demo_repo --show-similar orders.compute_total

# Tests (offline — no Redis, no API key needed):
REFACTORIKA_OFFLINE=1 .venv/bin/python -m pytest -q

Run as an MCP server (use inside an agent)

.venv/bin/python -m refactorika.mcp_server   # stdio MCP server
claude mcp add refactorika -- .venv/bin/python -m refactorika.mcp_server

Tools: build_graph, get_plan, run_pipeline, run_agents, analyze_file, find_duplicates, find_dead_code, apply_and_verify(_multi), generate_docs, get_context_map, get_log. Full reference in docs/usage.md.

Providers, memory, and evaluation

  • Provider-agnostic LLM — generation via Claude or local Ollama, embeddings via local MiniLM or Ollama (separate, since Anthropic has no embeddings API). Selected by env (REFACTORIKA_LLM_PROVIDER, REFACTORIKA_EMBED_PROVIDER); a record/replay cache makes any provider reproducible. See .env.example.
  • Redis as the shared brain — decisions are stored in Redis (REDIS_URL, e.g. Redis Cloud) and recalled by semantic similarity so refactors stay consistent; local-JSON fallback for offline (REFACTORIKA_OFFLINE=1). Inspect with refactorika <dir> --show-memory; docker compose up -d redis runs a local instance.
  • RefactorBench evalmake fetch && make eval-inscope runs the engine on real OSS refactoring tasks; results in eval/results/. See docs/11-benchmarks-and-eval.md.

How it works

CLI / MCP  →  orchestrator  →  graph (Jedi)  →  planner (+LLM judgment)  →  engines (rope/LibCST/ruff)
                                                                              →  checker (gates + git)
                              Redis Iris = graph + decision memory + vectors (local-JSON fallback)

Layout

refactorika/graph/       reference-correct symbol graph + leaf-to-root order + impact
refactorika/transforms/  deterministic engines (rename, cleanup, dead_code, node_replace)
refactorika/pipeline/    orchestrator · planner · planner_llm · checker
refactorika/llm/         Anthropic client with record/replay cache + stub seam
refactorika/memory/      Redis Iris: agent/decision memory, vectors (JSON fallback)
refactorika/core/        schema · gates · storage
refactorika/cli.py       standalone Typer CLI      refactorika/mcp_server.py   MCP server
demo_repo/               deliberately messy target + its tests
docs/v3_spec.md          the source-of-truth spec (as built)

Python only; behavior-preserving structural refactors. See docs/v3_spec.md for the full spec and CLAUDE.md for project memory.

About

cal ai hackathon

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors