Inspiration

The inspiration came from a common fear in the AI community: uncontrolled autonomy. As AI agents become more capable of "Computer Use," the thought of a bot clicking buttons on your screen while you aren't looking is terrifying.

I wanted to build an agent that felt like a High-End Butler—someone with the keys to the house (your Mac) but who wouldn't dream of opening a door without a subtle nod of approval. Seeing the potential of the DigitalOcean Gradient platform, I realized I could host the "Brain" in a secure, high-performance cloud environment while keeping the "Hands" local and strictly regulated.

What it does

ExecutiveButler is a hybrid AI agent.

  1. Remote Reasoning: You chat with a Llama 3 / DeepSeek model hosted on DigitalOcean Gradient from any device (web or mobile).
  2. Local Execution: The agent can perform real macOS system actions: listing files, taking screenshots, opening tabs, or summarizing documents.
  3. Safety First: Before any action is executed, a native macOS popup appears on your computer. The action only proceeds if you manually click "Approve."
  4. Cloud Memory: If the agent takes a screenshot for you, it doesn't just sit on your hard drive; it is uploaded to a DigitalOcean Space, allowing you to view it instantly on your mobile device via a secure, time-limited URL.

How we built it

The project uses a "Split-Brain" architecture:

  • The Cloud Brain: Built using the DigitalOcean Gradient Agent Development Kit (ADK). This handles the intent recognition and decides which tools to call.
  • The Communication Bridge: We used Model Context Protocol (MCP) to standardize the way the Cloud Agent talks to the local machine.
  • The Desktop Client: A Python-based local server that listens for commands. It uses AppleScript for macOS system control and Tkinter for the real-time approval UI.
  • Storage: DigitalOcean Spaces serves as our multimodal memory, storing screenshots using the boto3 SDK to provide a seamless feedback loop from Desktop to Mobile.

Challenges we ran into

The biggest hurdle was Network Latency and Connectivity. Bridging a cloud-hosted agent to a local machine behind a residential firewall required a robust WebSocket bridge to ensure the Agent didn't "timeout" while waiting for a human to click the "Approve" button.

Additionally, handling Multimodal Feedback was tricky. Simply telling the user "I took a screenshot" isn't helpful if they are on their phone and the file is on their Mac. Integrating DigitalOcean Spaces to act as a "Cloud Clipboard" was the breakthrough that made the remote experience actually usable.

Accomplishments that we're proud of

  • Zero-Trust Execution: Successfully implemented a "Human-in-the-loop" workflow where the AI cannot bypass the local user's permission.
  • Full-Stack Integration: Effectively combined Gradient (AI), Spaces (Storage), and App Platform (Hosting) into a single cohesive product.
  • Cross-Device Flow: The feeling of typing a command on a phone and seeing a physical popup appear on a laptop thousands of miles away is incredibly satisfying.

What we learned

I learned that the Model Context Protocol (MCP) is a game-changer for the "Internet of Actions." It allows us to stop building "wrappers" and start building "bridges." I also gained a deep appreciation for the DigitalOcean Gradient ecosystem, specifically how it simplifies the deployment of complex, agentic workflows that would typically require a massive DevOps team to manage.

What's next for ExecutiveButler

  1. Fine-Tuning: Hosting a specialized "System-Control" model on a DigitalOcean GPU Droplet to reduce the token cost of long system-log summaries.
  2. Voice Integration: Using the fal.ai multimodal models via Gradient to allow for "Voice-to-Action" commands.
  3. Proactive Monitoring: Teaching the butler to "watch" the screen for security threats (like unauthorized logins) and send an emergency "Block Action" request to the user's phone.

Built With

Share this project:

Updates