NL-Shell | Devpost

Typing cryptic commands in cmd/PowerShell is frustrating — especially for non-tech users or during late-night debugging. I wanted a shell where you just say "open my downloads, kill all chrome tabs eating 80% CPU, then create a Python venv named awesome-project" and it happens safely. Built during vibe-coding sessions with local models to keep it private, fast, and offline-capable. What it does (core description – 200–400 words) NL-Shell-V2 is a natural-language interface for Windows that lets users control their OS with everyday English. Type or speak commands like:

"Show me my recent screenshots" "Delete temp files older than 30 days but ask first" "Install Python 3.12 in a new env and run my script" "Block website facebook.com for the next 2 hours"

Under the hood:

Ollama runs a small, fast local model (3–4B params) to interpret intent. Risk analyzer classifies danger (read-only vs delete vs network vs elevation). Safe commands → chain executor runs via subprocess with user confirmation for risky ones. Modular: config for model choice, risk thresholds, custom mappings in JSON. Packs into single .exe via PyInstaller — no install hassle.

It's privacy-first (100% local), beginner-friendly, and extensible for plugins/scripts. How we built it (show technical depth – emphasize AI usage for 40% criterion) Started with Ollama + Python glue code. Used AI tools (Cursor + Copilot) to rapidly prototype modules: prompt engineering, risk classifier logic, subprocess safety wrappers. Iterated prompts live in Ollama to boost accuracy on Windows-specific intents. Added safety layer with rule-based + LLM hybrid checks to prevent disasters. Built standalone launcher/shortcut installer for real-user feel. All during short vibe-coding bursts — no massive planning, just build → test → refine loop. Challenges we ran into LLM hallucination on exact Windows commands → solved with few-shot examples + JSON output forcing. Balancing speed vs accuracy on 3B models → switched to Phi-4 Mini / Qwen 3 for 1.5–2× faster inference. Ensuring risk detection doesn't block legit commands → tunable thresholds + user override. Accomplishments that we're proud of

Functional local NL shell prototype in < few weeks of part-time work. Built-in safety that actually works (categorizes rm -rf equivalents, network changes, etc.). Standalone .exe — feels like a real app, not a script. Runs offline on consumer laptops — true vibe coding freedom.

What we learned Prompt engineering + structured output (JSON mode in Ollama) is magic for reliability. Small models + quantization beat bigger ones for interactive tools. Safety layers are non-negotiable for real-world agent-like systems. What's next for NL-Shell-V2 Add voice input (Whisper), Linux/macOS support, plugin ecosystem, fine-tune on command datasets. Dream: full AI OS companion.