Inspiration
The inspiration came from a common fear in the AI community: uncontrolled autonomy. As AI agents become more capable of "Computer Use," the thought of a bot clicking buttons on your screen while you aren't looking is terrifying.
I wanted to build an agent that felt like a High-End Butler—someone with the keys to the house (your Mac) but who wouldn't dream of opening a door without a subtle nod of approval. Seeing the potential of the DigitalOcean Gradient platform, I realized I could host the "Brain" in a secure, high-performance cloud environment while keeping the "Hands" local and strictly regulated.
What it does
ExecutiveButler is a hybrid AI agent.
- Remote Reasoning: You chat with a Llama 3 / DeepSeek model hosted on DigitalOcean Gradient from any device (web or mobile).
- Local Execution: The agent can perform real macOS system actions: listing files, taking screenshots, opening tabs, or summarizing documents.
- Safety First: Before any action is executed, a native macOS popup appears on your computer. The action only proceeds if you manually click "Approve."
- Cloud Memory: If the agent takes a screenshot for you, it doesn't just sit on your hard drive; it is uploaded to a DigitalOcean Space, allowing you to view it instantly on your mobile device via a secure, time-limited URL.
How we built it
The project uses a "Split-Brain" architecture:
- The Cloud Brain: Built using the DigitalOcean Gradient Agent Development Kit (ADK). This handles the intent recognition and decides which tools to call.
- The Communication Bridge: We used Model Context Protocol (MCP) to standardize the way the Cloud Agent talks to the local machine.
- The Desktop Client: A Python-based local server that listens for commands. It uses
AppleScriptfor macOS system control andTkinterfor the real-time approval UI. - Storage: DigitalOcean Spaces serves as our multimodal memory, storing screenshots using the
boto3SDK to provide a seamless feedback loop from Desktop to Mobile.
Challenges we ran into
The biggest hurdle was Network Latency and Connectivity. Bridging a cloud-hosted agent to a local machine behind a residential firewall required a robust WebSocket bridge to ensure the Agent didn't "timeout" while waiting for a human to click the "Approve" button.
Additionally, handling Multimodal Feedback was tricky. Simply telling the user "I took a screenshot" isn't helpful if they are on their phone and the file is on their Mac. Integrating DigitalOcean Spaces to act as a "Cloud Clipboard" was the breakthrough that made the remote experience actually usable.
Accomplishments that we're proud of
- Zero-Trust Execution: Successfully implemented a "Human-in-the-loop" workflow where the AI cannot bypass the local user's permission.
- Full-Stack Integration: Effectively combined Gradient (AI), Spaces (Storage), and App Platform (Hosting) into a single cohesive product.
- Cross-Device Flow: The feeling of typing a command on a phone and seeing a physical popup appear on a laptop thousands of miles away is incredibly satisfying.
What we learned
I learned that the Model Context Protocol (MCP) is a game-changer for the "Internet of Actions." It allows us to stop building "wrappers" and start building "bridges." I also gained a deep appreciation for the DigitalOcean Gradient ecosystem, specifically how it simplifies the deployment of complex, agentic workflows that would typically require a massive DevOps team to manage.
What's next for ExecutiveButler
- Fine-Tuning: Hosting a specialized "System-Control" model on a DigitalOcean GPU Droplet to reduce the token cost of long system-log summaries.
- Voice Integration: Using the fal.ai multimodal models via Gradient to allow for "Voice-to-Action" commands.
- Proactive Monitoring: Teaching the butler to "watch" the screen for security threats (like unauthorized logins) and send an emergency "Block Action" request to the user's phone.
Log in or sign up for Devpost to join the conversation.