InSightAI | Devpost

Inspiration:

Our inspiration stemmed from the realization that the pinnacle of innovation occurs at the intersection of deep curiosity and an expansive space to explore one's imagination. Recognizing the barriers faced by learners—particularly their inability to gain real-time, personalized, and contextualized support—we envisioned a solution that would empower anyone, anywhere to seamlessly pursue their inherent curiosity and desire to learn.

What it does:

Our platform is a revolutionary step forward in the realm of AI-assisted learning. It integrates advanced AI technologies with intuitive human-computer interactions to enhance the context a generative AI model can work within. By analyzing screen content—be it text, graphics, or diagrams—and amalgamating it with the user's audio explanation, our platform grasps a nuanced understanding of the user's specific pain points. Imagine a learner pointing at a perplexing diagram while voicing out their doubts; our system swiftly responds by offering immediate clarifications, both verbally and with on-screen annotations.

How we built it:

We architected a Flask-based backend, creating RESTful APIs to seamlessly interface with user input and machine learning models. Integration of Google's Speech-to-Text enabled the transcription of users' learning preferences, and the incorporation of the Mathpix API facilitated image content extraction. Harnessing the prowess of the GPT-4 model, we've been able to produce contextually rich textual and audio feedback based on captured screen content and stored user data. For frontend fluidity, audio responses were encoded into base64 format, ensuring efficient playback without unnecessary re-renders.

Challenges we ran into:

Scaling the model to accommodate diverse learning scenarios, especially in the broad fields of maths and chemistry, was a notable challenge. Ensuring the accuracy of content extraction and effectively translating that into meaningful AI feedback required meticulous fine-tuning.

Accomplishments that we're proud of:

Successfully building a digital platform that not only deciphers image and audio content but also produces high utility, real-time feedback stands out as a paramount achievement. This platform has the potential to revolutionize how learners interact with digital content, breaking down barriers of confusion in real-time. One of the aspects of our implementation that separates us from other approaches is that we allow the user to perform ICL (In Context Learning), a feature that not many large language models don't allow the user to do seamlessly.

What we learned:

We learned the immense value of integrating multiple AI technologies for a holistic user experience. The project also reinforced the importance of continuous feedback loops in learning and the transformative potential of merging generative AI models with real-time user input.

Built With

Submitted to

HackMIT 2023
- Winner Arrowstreet Capital - Best Generative AI Data Analysis Hack

Created by

Model + Pipeline development

Eidan Erlich
In the backend, I designed RESTful APIs within a Flask application that interfaces seamlessly with user input and machine learning models. I integrated Google's Speech-to-Text during an onboarding process to transcribe users' learning preferences from audio files, while also incorporating the Mathpix API for image content extraction. To bridge user interactions, I employed the GPT-4 model, generating context-rich textual and audio responses based on image content and stored user data. Enhancing frontend interactivity, I encoded audio responses into base64 format, allowing for streamlined playback in the React framework without unwanted re-renders.

Elijah Umana
Full-Stack Software & Machine Learning engineer leveraging AI for smarter solutions. Thrives on innovation and continuous learning
Full-Stack development.
Integrated routing and inference using React and Flask. Designed the architecture of the project, focusing on a scalable and resilient solution. Used prior Deep Learning experience to optimize the solution, leveraged Mathpix, Google cloud, TensorFlow and LangChain.

Victor Samsonov