Inspiration

Maes was born out of a desire to create a seamless user experience in the accessibility space. We were inspired by the need to make technology more inclusive and usable for people with disabilities, allowing them to interact with devices using voice commands. Our primary goal was to build a tool that would enable people with limited mobility to control their computer applications with ease, just by using their voice.

What it does

Maes is a voice-controlled assistant that allows users to interact with applications on their computer using simple voice commands. It listens for voice input, processes the command, and interacts with the operating system to perform tasks such as opening apps, navigating websites, and even simulating mouse movements.

How we built it

We built Maes using a combination of technologies, including Electron.js for the frontend, Node.js for the backend, and various speech recognition APIs to process voice commands. The backend was built to handle voice recognition and trigger actions on the user's computer based on pre-defined commands.

  1. Electron.js: The core framework for building desktop applications, enabling the use of web technologies for the desktop app.
  2. Node.js: Used for backend functionality and communication between the frontend and operating system.
  3. Speech Recognition API: We leveraged the power of speech recognition libraries to process the voice commands and convert them into executable actions.
  4. AI Cursor: We also implemented an AI cursor, which shows what the system is trying to do, enhancing the visual feedback for the user.
  5. Eleven Labs: We integrated Eleven Labs' advanced AI and speech synthesis technology to improve the voice commands, making Maes more interactive and responsive.

Challenges we ran into

While building Maes, we encountered several challenges:

  1. Speech Recognition Accuracy: Achieving high accuracy with speech recognition was difficult, especially in noisy environments. We had to fine-tune the API and ensure that the system could differentiate between similar-sounding commands.
  2. Cross-platform compatibility: Ensuring that Maes would work seamlessly across both macOS and Windows was challenging, as the voice recognition APIs differ between platforms. This required significant adjustments to ensure consistent performance.
  3. Simulating mouse actions: Making sure the AI cursor could appear freely across different app boundaries was tricky. It took some time to figure out how to display it without being confined to the main app window.

Accomplishments that we're proud of

  • Voice Control: We are proud to have created a functional voice-controlled assistant that can perform a wide range of tasks, improving accessibility for users with disabilities.
  • Cross-Platform Compatibility: We successfully made Maes work across both macOS and Windows, ensuring that it can be used by a broader audience.
  • AI Cursor: We developed an AI cursor that can float freely across the screen, providing users with a real-time indication of what the system is trying to do.
  • Integration of Eleven Labs: Leveraging Eleven Labs' technology for speech recognition and AI synthesis was a key accomplishment in making the voice assistant more interactive and accurate.

What we learned

  • We learned the importance of testing with real users to ensure that speech recognition and voice commands work effectively in various environments.
  • We discovered how challenging it can be to build a system that interacts with operating systems at a low level, especially when dealing with multiple platforms.
  • The project taught us how to integrate speech recognition, AI, and system-level interactions to create a cohesive user experience.

What's next for AI

The future of AI, especially in accessibility, is bright. We hope to expand Maes further, adding more advanced voice commands, enhanced AI-driven features, and more intuitive user interfaces. We also plan to integrate machine learning for better voice recognition accuracy and responsiveness, building on the work done by Eleven Labs to create a more intelligent system.

Built With

Share this project:

Updates