Skip to main content
████████╗ ██████╗  ██████╗ ██╗     ██╗  ██╗██╗████████╗      ██████╗██╗     ██╗
╚══██╔══╝██╔═══██╗██╔═══██╗██║     ██║ ██╔╝██║╚══██╔══╝     ██╔════╝██║     ██║
   ██║   ██║   ██║██║   ██║██║     █████╔╝ ██║   ██║  █████╗██║     ██║     ██║
   ██║   ██║   ██║██║   ██║██║     ██╔═██╗ ██║   ██║  ╚════╝██║     ██║     ██║
   ██║   ╚██████╔╝╚██████╔╝███████╗██║  ██╗██║   ██║        ╚██████╗███████╗██║
   ╚═╝    ╚═════╝  ╚═════╝ ╚══════╝╚═╝  ╚═╝╚═╝   ╚═╝         ╚═════╝╚══════╝╚═╝

> Where LLMs Collaborate, Not Compete

The universal orchestration layer that makes Claude, Gemini, Codex, and Qwen work together. Stop fighting with your AI. Start conducting it.

toolkit-cli is a multi-agent AI orchestration platform that makes Claude, Gemini, Codex, and Qwen collaborate on development tasks instead of working in isolation. It provides 80+ spec-aware commands that enable 10-20x productivity through intelligent task distribution, conflict prevention, and synthesized AI collaboration. PATENT PENDING

🎃
Launching in:
00d 00h 00m 00s

October 31, 2025 • Midnight PDT (Halloween)

Browse 80+ Commands
No inference tax • We don't scrape your code • Full context, no BS

The Spec-Context Revolution

toolkit-cli is different. Before any command, it reads your project's DNA—the specs, plans, and contracts that define your vision. It doesn't just fix code; it preserves your intent.

BEFORE: Generic AI

# Error: 'user_name' can be None
# Generic AI "fix":
user_name = get_user_name() or ""
# Just silences the error ❌

Standard AI tools ignore your specs and apply generic fixes that break your business logic.

AFTER: toolkit-cli

# Spec: "Invalid users must re-register"
# toolkit-cli fix:
user_name = get_user_name()
if user_name is None:
    return redirect_to_registration()
# Preserves your intent ✓

Reads your specs first. Fixes code while preserving business logic and design decisions.

Every command. Every agent. Spec-aware.

Multi-Agent AI vs Single AI

Feature Single AI
(ChatGPT)
Multi-Agent
(toolkit-cli)
AI Perspectives 1 model 4+ (Claude, Gemini, Codex, Qwen)
Spec-Aware ✅ Reads project specs first
Pricing Model $20/month forever $20 one-time
Conflict Detection ✅ Automatic
Productivity Gain 2-3x 10-20x

Single AI gives you one perspective. Multi-agent orchestration gives you diverse analysis, conflict resolution, and synthesized solutions from the best models working together.

Our Spec-Kit Enhancements

We forked the original spec-kit and supercharged it with features the original never had.

🤖

Multi-Agent Spec Generation

Our enhancement: Claude, Gemini, Codex, and Qwen collaborate to generate specs with diverse perspectives and conflict detection.

Original spec-kit: Single-agent only

80+ Spec-Aware Commands

Our enhancement: Every command reads specs first - /fix, /make, /improve, /debug all preserve your intent automatically.

Original spec-kit: Manual spec reading required

📋

Specification Review & Approval

Our enhancement: Auto-generated PRDs with cost/time estimates, approval workflows, and contract persistence.

Original spec-kit: No review system

🚀

Complete Product Lifecycle

Our enhancement: Full ideation → specification → engineering → deployment pipeline with 100% autonomous build.

Original spec-kit: Specification generation only

💰

Automatic Cost/Time Estimation

Our enhancement: Intelligent estimation algorithms calculate development hours, costs ($4K-$80K), and phase-by-phase timelines.

Original spec-kit: No estimation capabilities

🏗️

Smart Infrastructure Detection

Our enhancement: Automatically detects and provisions 50+ services (Vercel, Supabase, Stripe, etc.) from your spec requirements.

Original spec-kit: Manual infrastructure setup

We didn't just fork spec-kit—we reimagined it. Every enhancement makes specifications more actionable, more intelligent, and more integrated into the complete development lifecycle.

A Conductor's Toolkit

80+ slash commands. Each one spec-aware. Each one orchestrating multiple LLMs. Each one designed to make you unstoppable.

10s

/oneshot

Idea to production-ready codebase in 10 seconds. Backend, frontend, database, CI/CD—fully scaffolded.

🧩
10-PHASE

/strategy

10-phase product discovery: vision, ideation, market analysis, competition, design, validation—AI-powered strategy.

💩
QA

/bs

Pre-commit quality gate. Catches architectural drift, mock code, and scope creep before they hit main.

🚀
AUTO

/implement

Auto-execute all tasks from tasks.md. Sit back while your backlog clears itself with AI automation.

👥
MULTI

/peer-review

4+ AI agents review your code in parallel. Get architectural, performance, and security insights simultaneously.

🔐
TUI

/keys

Beautiful TUI for .env files. Auto-masks secrets so you never expose them by accident. AI-assisted.

See all 80+ commands below: /specify /plan /debug /improve /tutorial /undo /license
View Full Command List

One more thing.

We said 80+ commands. We weren't kidding.

Filter by category to find what you need

Showing 81 commands
💡
help-me
AI pair programmer
📝
specify
Feature specs
oneshot
Idea to production
🚀
project-init
Initialize project
🔧
fix
Spec-aware fixes
🧪
test
Testing strategy
📋
plan
Implementation roadmap
tasks
Generate tasks.md
⚒️
make
AI implementation
implement
Execute tasks.md
🎯
next
Smart recommendations
go
Flow continuation
🎨
ux
UX design
🎨
mock-ups
Visual design
🎨
responsive
Responsive testing
📊
analyze-complexity
Complexity analysis
🎯
suggest-refactor
Refactoring tips
📝
summarize-code
Code documentation
🔍
predict-dependency-impact
Dependency analysis
📊
estimate-project-complexity
Project estimation
🔨
improve
Multi-agent review
🔥
brutal
Honest code audit
👥
peer-review
Multi-agent review
🪞
reflect
Code critique
⚠️
weaknesses
Self-reflection
polish
Code polishing
💩
bs
Detect violations
quick-win
Auto improvements
🧪
generate-tests
Test generation
🧪
user-flow-testing
UX analysis
🔍
monitor
Production monitoring
👁️
look
Visual testing & AI feedback
🚀
ship
Deployment readiness
🚀
deploy
Multi-platform deployment
⏱️
benchmark
Performance benchmarks
📈
performance
Performance analysis
🔄
migrate
Framework migrations
✔️
verify
Implementation verification
👀
preview
Preview server
📊
version
System status
🔄
clone
Repository cloning
🔍
diff
Git diff analysis
🔍
debug
Root cause analysis
optimize
Performance optimization
🧹
clean-up
Code cleanup
undo
Rollback changes
📦
re-context
Rebuild context
🛡️
security
Threat modeling
🔐
keys
.env manager
🚨
errors
Screenshot detection
📚
docs
Documentation
🔍
seo
SEO optimization
🌐
domains
Domain discovery + trademarks
🔮
synthesize
Multi-AI synthesis
🧠
learn
Knowledge base
🎓
tutorial
Interactive onboarding
📄
pdf
PDF processing
📊
progress
Progress tracking
💰
costs
Cost management
🔗
chain
Command chains
🔍
new-feature
Discover features
🤝
peers
P2P coordination
🗂️
clipboard
Agent clipboard
🤖
daemon
Multi-agent daemon
💻
tui
Terminal UI
🎮
interactive
Interactive mode
🔑
license
License management
📊
usage
Usage tracking
🤖
ai-native
AI optimization
🚀
init
Project setup
BETA
🚀
product-manager
Idea → Deployed App
🎯
strategy
Strategy workflow
📋
spec-kit-plan
Spec kit planning
🏃
sprints
Sprint planning
📈
growth-hack
Growth analysis
🔄
pipeline
CI/CD pipeline
⚙️
preferences
User preferences
📸
capture
Capture information
🌊
flow
Workflow management
🎨
theme
Theme management
🧪
beta
Beta features

80+ commands. Zero configuration. Infinite possibilities.

Patent Pending

🐻 [BEWARE] 🐻

⚠️
EXPERIMENTAL BETA FEATURE
This feature is in active development. Expect bugs, breaking changes, and incomplete functionality.
⚠️
🧪 BETA: Product Manager

/product-manager (Beta Preview)

Experimental workflow for building apps through conversation.

How It Works (Experimental)

This workflow is still being tested and refined

1

🎯 Ideation

Conversational Q&A gathers requirements. AI extracts structure from natural language.

2

📋 Specification

Auto-generated PRD with cost/time estimates ($4K-$40K, 1-10 weeks).

3

⚙️ Engineering

100% autonomous build. Tests → Code → Deploy while you sleep.

4

🚀 Go-to-Market

Deployment + SEO + analytics. Production URL + credentials delivered.

5

🛠️ Maintenance

Production monitoring, anomaly detection, auto-scaling, backups.

😓

Traditional Way

  • Weeks of planning: Write PRDs, create mockups, review with team
  • Hire developers: $150k+/year, 3-6 month search
  • Months of development: Constant back-and-forth, missed requirements
  • Manual deployment: DevOps, servers, monitoring setup
  • Ongoing maintenance: Bugs, scaling issues, 24/7 monitoring
3-6 months
$50K-$200K+
🚀

With toolkit-cli

  • One conversation: Answer 10-15 questions, AI generates complete spec
  • Multi-agent AI team: Claude + Codex + Gemini + Qwen collaborate
  • Autonomous build: 100% hands-off, tests → code → deploy while you sleep
  • One-command deploy: Production URL + credentials automatically generated
  • AI monitoring: Anomaly detection, auto-scaling, self-healing infrastructure
24-48 hours
$500-$2K
100x faster, 50x cheaper

See It In Action

terminal
$ toolkit-cli product-manager "vintage camera auction platform"
🎯 ◉ Ideation → ○ Specification → ○ Engineering → ○ Go-to-Market → ○ Maintenance
🤖 AI: Let's build your camera auction platform! First question:
Who are your target users?
👤 You: Photographers and vintage camera collectors...
🤖 AI: Perfect! I understand you want:
• User marketplace for camera auctions
• Authentication + profile system
• Bidding engine with real-time updates
• Payment integration (Stripe)
• Mobile-first responsive design
✨ Generating specification contract...
✅ Specification Contract Generated
📋 Executive Summary: Market analysis complete
👥 User Stories: 8 stories generated
⚙️ Functional Requirements: 12 requirements
🏗️ Technical Architecture: Next.js + Supabase + Stripe
⚠️ Risk Assessment: 4 risks identified with mitigation
💰 Cost Estimate: $16,450 (160 hours × $100/hr + $50 AI + $200/mo infra)
⏱️ Timeline Estimate: 4 weeks (Design 20% → Dev 50% → QA 25% → Deploy 10%)
📁 Contract saved: .toolkit/product-manager/contracts/contract_1760483044.json
✅ AUTO-APPROVED FOR AUTONOMOUS BUILD
🚀 Proceeding to Engineering stage (100% autonomous, zero interaction)...

That's it. AI handles everything from here. Go to sleep. Wake up to a deployed app.

⚠️ Beta Limitations & Known Issues

  • • May fail on complex projects or unusual requirements
  • • Cost/time estimates are rough approximations only
  • • Generated code may have bugs or security issues
  • • Not recommended for production use without manual review
  • • Active development - expect breaking changes

Interested in testing this beta feature?

Run: toolkit-cli product-manager "your idea"

Remember: This is experimental software. Use at your own risk.

Works with Every AI

Use toolkit-cli with Claude Code, Cursor, Windsurf, Roo, or any editor you love. One tool. 12 AI agents. Infinite combinations.

~/my-project
$ toolkit-cli fix auth.py --ai "claude gemini qwen codex"
Claude analyzing architecture patterns...
Gemini reviewing UX implications...
Qwen checking i18n compliance...
Codex generating optimized solution...
Synthesized fix applied. 4 agents collaborated.
Claude
Claude
Sonnet, Opus, Haiku
Gemini
Gemini
Pro, Flash, Ultra
💧
Qwen
2.5, Coder, Math
OpenAI
OpenAI
GPT-4, o1, Codex
Copilot
Copilot
GitHub AI
▶️
Cursor
AI-first IDE
🌊
Windsurf
Codeium IDE
🦘
Roo
Roo Cline
Ollama
Ollama
Local models
🔀
OpenRouter
Unified API
🤖
Auggie
AI Assistant
💻
Kilocode
Code AI
🔓
OpenCode
Open source AI
💻

Claude Code

Native slash commands

# In Claude Code chat
/fix auth.py

Any Terminal

Direct CLI access

$ toolkit-cli fix auth.py
🚀

VS Code, Cursor...

Task integration

# tasks.json
"command": "toolkit-cli"
Provider agnostic. Zero lock-in. Pure freedom.

Mix and match models. Switch providers. Use local or cloud. Your workflow, your rules.

The 3 Wise Men

Claude + Codex + Gemini = AGI

--ai claude gemini codex

I use Claude, Codex, and Gemini for everything. Not because one is better—but because three minds are smarter than one.

Claude sees structure. Codex sees patterns. Gemini sees scale. Together, they catch what any single model misses.

This isn't about benchmarks or leaderboards. It's about collaboration. When three agents debate, synthesize, and build consensus—that's when you get something closer to AGI than any single model alone.

"The wisest human doesn't work alone. Why should your AI?"

Pay Once. Use Forever.

No subscriptions. No recurring charges. Multi-agent collaboration that pays for itself.

Individual / Small Business Fortune 500 / Big Tech

FREE

$0 One-time
  • check_circle 30-day unlimited trial
  • check_circle 2 hours/day forever after
  • check_circle Community Support
MOST POPULAR

DEVELOPER

$20 One-time
  • check_circle 5 hours/day usage
  • check_circle Unlimited agents
  • check_circle Unlimited projects
  • check_circle Community Support
Get Started

PROFESSIONAL

$100 One-time
  • check_circle 10 hours/day usage
  • check_circle Unlimited agents
  • check_circle Unlimited projects
  • check_circle Community Support
Get Started

UNLIMITED

$200 One-time
  • check_circle Unlimited usage
  • check_circle Unlimited agents
  • check_circle Unlimited projects
  • check_circle Community Support
Get Started
🏢

Big Tech Pricing

💰

If your email ends with one of these domains, different rules apply.

AI Company License

$500
per user
per month

For OpenAI, Anthropic, Google, and other AI labs.

Purchase AI Company License

Enterprise License

$200
per user
per month

For Fortune 500 companies.

Purchase Enterprise License
🍎
apple.com
🔍
google.com
🤖
openai.com
🧠
anthropic.com
🪟
microsoft.com
📦
amazon.com
🟢
nvidia.com

You're building AI that'll make billions. Pay your indie developers fairly. 💸

All tiers include lifetime access to updates. Usage limits reset daily. Try free for 30 days—unlimited.

💼 Small business license = Individual pricing (pay once, use forever). Monthly billing only applies to Fortune 500 and Big Tech companies.

Frequently Asked Questions

What is multi-agent AI?

Multi-agent AI is a system where multiple AI models (like Claude, Gemini, and Codex) work together on a single task, each contributing unique perspectives. Unlike single-AI systems, multi-agent platforms orchestrate collaboration to produce higher-quality results through diverse analysis and conflict resolution. toolkit-cli pioneered this approach with patent-pending coordination technology.

How do you use multiple AI models together?

toolkit-cli makes it simple: install once, configure your API keys, then run any command with --ai "claude gemini codex". The platform automatically distributes tasks, synthesizes responses, and resolves conflicts—no manual coordination needed.

Example: toolkit-cli fix auth.py --ai "claude gemini" runs Claude for architecture analysis while Gemini reviews UX implications simultaneously.

Are there AI tools with one-time payment instead of subscriptions?

Yes! toolkit-cli uses a pay-once, use-forever model. Pay $20-$200 once for lifetime access to all 80+ commands and future updates. No monthly fees, no recurring charges, no subscription traps. Your API keys, your data, your ownership. Try free for 30 days with unlimited usage, then 2 hours/day forever even on the free tier.

Should I use Claude or Gemini for coding?

Use both! Claude excels at architecture and technical analysis, while Gemini provides strong UX insights and accessibility checks. toolkit-cli lets you run them in parallel with commands like toolkit-cli fix --ai "claude gemini" to get the best of both. Add Codex for implementation and Qwen for i18n coverage.

What's the difference between multi-agent AI and single AI?

Single AI (like ChatGPT alone) gives one perspective. Multi-agent AI orchestrates 4+ models simultaneously—Claude for architecture, Gemini for UX, Qwen for i18n, Codex for implementation. Result: 10-20x better solutions through diverse analysis, automatic conflict detection, and synthesized recommendations. toolkit-cli pioneered this with patent-pending coordination algorithms that prevent AI agents from contradicting each other.

How We Got Here

Not a software engineer. For 30 years, tried to learn to code—dyslexic hands and ADHD kept my fingers two words behind my mind. Became a software architect instead. Built startups. Hired 300+ developers on oDesk/Upwork over 2 decades. 80,000 billed hours. Learned from PRs and screenshots, watching developers from CERN, NASA, and Microsoft build what these hands couldn't.

2016, tried to invent agentic AI. Lost millions. Lost friends. Lost everything. Last startup, Findy—Perplexity's search intelligence combined with Scale AI's data labeling, powered by public typeahead transformers—years before Perplexity existed. Lost everything again.

August 2024, started over. Claude Dev became Cline, then Cursor, then Roo. Forked Kilo, built an AI OS, made a fucking mess. But 25 billion tokens later, figured out the gaps in context. Testing Amazon's Kiro—saw the potential, but the mistakes led to spec-kit. Toolkit was born.

Every line of code written by AI. Built in Claude Code in the terminal—no IDE required. Alpha, but already production-capable.

49 now. Living at mom and dad's. Licensing toolkit to build a snowboard factory in the mountains. Make boards with friends. Live the dream that's been 30 years in the making.

If this helps you build your dreams, you just helped me live mine.

Never fucking quit.

Thanks to Tom Latzo, Fredrick Karlsson, Serge Gulin, and Vladimir Glafirov for teaching me what I couldn't learn on my own.

Aaron Rosenthal
aka roseyballs

check_circle Copied to clipboard!