Inspiration
The inspiration for our project, LectureGen, comes from our experience trying to learn about a new domain through papers. These papers, unintentionally or not, often create a boundary for non-experts to understand what’s going on. But whenever our professors create a lecture on the paper for a journal club, they are actually quite straightforward and even inspire us to learn more. With LectureGen, people can digest any paper they want to read in an easy-to-understand YouTube lecture. That lecture can help others who are grappling with the same paper since it’s on YouTube for others to see! Our aim is to lower the barrier for others to dive into new fields and learn what they want to learn with how they learn best.
What it Does
In LectureGen, we take your choice of a paper and we take care of the rest to get your desired lecture. If you choose to watch the YouTube video on our application, you have the option to interrupt the lecture to get clarification on whatever you need! Whether you prefer to take your videos at your own pace or want a little extra help while you’re watching, our application LectureGen has got you covered.
How we built it
We initially process the PDFs with Gemini’s great vision capabilities so that we don’t have unnecessary parts of the paper in content while making the lecture. Our lectures are synthesized with o1-preview to create the most high-quality lecture, and then we have a model create the concise slides along with the in-depth speaker notes that are spoken aloud with Cartesia’s text to speech. Having all three modes of absorbing the information really helps keep you focused and understand the material better just like your professor does! If you watch the videos on our platform, you can interrupt the lecture to talk to our empathetic voice agent powered by Hume whose context is seeded by our RAG pipeline over the paper using Chroma.
Challenges We Ran Into
- YouTube doesn’t allow you to upload a video programmatically without doing OAuth by the YouTube account. We found a reasonable tradeoff with our admin panel that approves the videos that a user has generated before uploading them to YouTube.
- o1-preview doesn’t support structured outputs natively and requires different prompting from other LLMs, so we had to work around that with smaller models to do the parts that it natively couldn’t. Integrating everything! We used many different technologies and had to connect them all together which took longer than expected.
What's next for LectureGen
- We want to have auto-replies on the YouTube channel for each video based on what’s going on in our platform. We think this could create a whole new type of interaction on YouTube where people can learn whatever they want while also getting immediate feedback, all in public. Lots of these LLM tools silo the knowledge that people get which is worse for everyone, and we’d like to help change that.
- Dynamically change the lectures as we get more information about the user and what peaks their interest in the lecture. This is how I’ve seen good professors do lectures and is something I’d want to see.
Log in or sign up for Devpost to join the conversation.