Inspiration

We were inspired by 3blue1brown's YouTube channel, where he explains advanced math topics with elegant animations. We felt that it would greatly augment the human learning process if this easy-to-understand format of education could be tailored to every student's needs.

What it does

Clulus lives within your browser, and after requesting for a hint, it pulls the html contents of your page and sends it over to Gemini to process, all without needing to click a button. Gemini processes this and creates a text hint, generates a video + audio explanation. All of this can be directly accessed on the same page without having to screenshot your progress and sending it to ChatGPT.

How we built it

We built Clulus' frontend using React, which accesses the video generation functionality through a flask server. After hovering over the tooltip, we create a screenshot by parsing html from your screen and send it to Gemini to process. Gemini's results are passed to our flask backend to create a video using Manim, the same library that 3blue1brown created to use for his videos. It is also sent over to the ElevenLabs API to create a realistic explanation of the problem.

Challenges we ran into

We had trouble in the beginning trying to narrow down our project idea scope. Once we were locked in on our current idea, we also had difficulties transferring large files such as screenshots and entire video animations between frontend and backend. We were able to compensate for the long video generation time with some clever frontend tricks to make it not seem as long.

Accomplishments that we're proud of

Our proudest accomplishment was building a fully functional, end-to-end pipeline that tranforms a simple math question into a dynamic, animated video explanation. Our pipeline can take screenshot query, convert it into a concise problem statement, intelligently break the problem down, generate a precise JSON payload that scripts a complete scene in Manim.

The agent can solve and visualize a wide range of problems, from calculus (derivatives and integrals) to algebra (finding roots) and geometry (plotting shapes). It can also dynamically generate multi-function plots, such as showing a function and its derivative on the same axes to create a powerful visual learning moment. Lastly, it can render geometric scenes, correctly interpreting when a problem requires drawing shapes instead of plotting a function.

What we learned

This hackathon was a deep dive into the art and science of advanced prompt engineering and the power of structured outputs. Our key takeaway was the incredible potential of Tool Calling. Initially, we struggled with getting consistent JSON from the model, but by reframing the task as the AI "calling a function" with specific arguments, we achieved a near-perfect level of reliability. In particular, we found great advantage from tweaking with system instructions, schema definitions, and providing few-shot examples.

What's next for Clulus

Our immediate goal is to refine and package Clulus as a publicly available extension on the Chrome Web Store. We want students and educators to be able to use this tool to supplement their learning and teaching right from their browser. The next major step is to create an "open line of dialogue" with the agent, allowing users to ask follow-up questions, request different visualizations, or ask for a simpler explanation, with the agent generating new video clips and audio narration on the fly.

Built With

Share this project:

Updates