A multimodal chatbot responses your query with animations!!!!
AI-generated Customizable Quizzes
generates suggestions that learners need in the journey
Learn by teaching your student!!!

Inspiration

Let's say, one asked a chatbot to explain DFS or depth-first search. That chatbot replied to his/her query with the explanations and codes the same as the textbook. Instead of those, a moderate animation explaining DFS would help the learner learn faster. So, we all can agree that a chatbot that replies with images and animations in response to user queries can be more learner-friendly.

However, current generative models can generate artistic images or videos. But if you ask them to generate a binary tree with 7 nodes, they will fail to provide a quality output. In this scenario, generating an animation of an algorithm using a generalized or pre-trained model is far fetched dream. So, a dataset of collected images or animations on a specific field ( in this case, CS topics ) can be a viable option to elevate the performance of GenAi models.

Here comes Code Tutor, featuring a multimodal chatbot that replies to user queries on CS topics with appropriate animations. Alongside the multimodal chatbot, our platform also offers different tools such as quiz builder, suggester, and lecture builder to offer a seamless CS learning experience.

What it does

Code Tutor provides a pool of different tools to help learners learn CS topics in a more efficient, streamlined, and modular way.

A multimodal chatbot that uses our own dataset as a knowledge base

Replies user queries on different CS topics with animations. We have created a dataset with animations covering a vast area of CS topics.

Teach Your Own Chatbot

There is a concept called "learn through teaching". This tool creates a student bot, a professor bot, and a lecture plan. Learners can set the topic and difficulty of that lecture plan. the student bot asks the learner a series of questions. If the learner can't answer correctly, he/she can ask the professor bot to clear the confusion.

A Quiz Builder

Replies with AI-generated customizable Quizzes. Learners can set quiz difficulty, the amount of questions in that quiz.

A Suggestor

Replies with suggestions. Learners can ask for suggestions like: i) coding problems ( code forces, CodeChef, leet code ), ii) YouTube videos, and iii) blogs and articles on a specific topic.

How we built it

At first, we created an animation dataset covering various cs topics. To generate this dataset, GPt-4o was prompted to generate Python code that can create animations/ GIFs. Then we manually scrutinized the Python codes and created the GIFs.

Then, we vectorized the dataset and used it as the knowledge base for the multimodal chatbot. The multimodal chatbot generates its text responses prompting conversational models.

Prompt engineering helped us retrieve suggestion links ( coding problems from Codeforces and other coding platforms, YouTube videos, and articles ). Then we checked the links to see if they could be accessed.

For Quizzes and Suggestions, the response was retrieved in JSON format using GPT models. User customizations where given as input to the prompt template. Prompt engineering helped us mitigate the repetitive and irrelevant content.

Challenges we ran into

How can we leverage LLMs to generate GIFs?

LLMs are good at creating artistic images but bad at creating a tree with 7 nodes. So, directly generating GIFs or animations was far from easy. So, we first created Python code that can generate GIFs as animation. We generated the code using GPT-4o inferencing and then scrutinized it.

Retrieving or Finetuning?

As we have created our own dataset ( python code to generate animation ), we thought of fine-tuning an open-source model with that dataset so that, that model can generate the Python codes without errors. But the idea was so optimistic because even GPT-4o was failing to generate the correct and quality animation code as per our given prompt.

LLMs are not good enough to search the internet and provide working links

At first, we thought of using Gemini models as we thought they were good at Google search. However, the results were not good enough and sometimes worse than GPT models. SERP API was also a good option but we faced some country restrictions. So, we manually checked every link suggested by GPT in the backend ( for example, i) the response code should be 200, ii) the mqdefault.jpg response code for YouTube).

Accomplishments that we're proud of

1. An animation dataset At first, we thought GPT-4o was enough to generate the correct Python code for animation. However, the erroneous Python codes led us to create our own dataset for this project.

2. Quality of AI-generated Contents

Erroneous responses are handled in the backend. As a result, we could ensure effective learning experience for the learners.

What we learned

1. If the dataset is small, retrieving similar data points is better than fine-tuning with data points.

2. Some good prompt engineering techniques. ( i) avoiding 'no' words, ii) punishing LLM for bad responses, etc)

3. As a backend, We have always used ejs or similar frameworks. FastAPI was a new experience for us and we enjoyed it.

4. Learned to use Langchain as agent, and retrieval tool.

5. Fine tuning is not always better. Current models are enough powerful to generate good text and prompt engineering does the rest.

What's next for Code Tutor

Every new day, the models are becoming cheap, powerful, and multi-dimensional. So, our target will be to make the tools stronger. To be specific:

1. For the suggester, our next target is to use SERP API and build a Langchain agent to execute the task.

2. Instead of a retrieval-based animation reply, our next target will be to gather enough animations to fine-tune a model with our scrutinized Python codes so that the finetuned model can generate correct and quality Python code for animation.

3. More customizable quizzes, and lectures.

4. Currently, we can't publicize the URL of the platform for budget constraints as we are using personal tokens everywhere. Our next target will be to open the interface to all.