Inspiration

We were inspired by our previous experience of using OpenAI's ChatGPT, which led to our decision to incorporate OpenAI into our project.

What it does

The Retrieval Augmented Generation (RAG) Chatbot is an OpenAI-powered program that answers user-inputted questions based on the ICDS ROAR user guide. When the user types in a question, the chatbot runs the query on the loaded information from the user guide, retrieves relevant text, and generates an answer using OpenAI.

How we built it

First, we loaded the text from the ICDS ROAR user guide into a .txt file. Since the user guide included multiple images/diagrams, we used ChatGPT to generate detailed text descriptions of each image and added them into the .txt file in place of the images themselves. Then, we created a GUI-based chatbot application using Python's Tkinter module and utilized the LangChain library to work with OpenAI for document retrieval and language processing.

Challenges we ran into

The first challenge was processing the diagrams on the ROAR user guide. We considered loading a .rtf or .pdf file so that the original images could be loaded into the program, but we determined that it would be more efficient to convert the images to text descriptions so that the RAG would only have to search through textual information and not have to process each image for every new query. Another challenge was

Accomplishments that we're proud of

We are proud of creating a RAG without having any prior knowledge before this hackathon. In addition, we are proud of our perseveration despite various challenges and a great amount of debugging.

What we learned

We learned about RAGs, which utilize large language models to complete smaller tasks based on specific knowledge bases, particularly useful for enterprises and institutions who want to perform tasks based on company-specific or private information. In our case, our specific knowledge base was the ICDS Roar User Guide. It was fascinating to learn about the process of creating a RAG---from loading the text files, to storing the information in databases, to transforming them into vectors and the retriever types. While the project only utilized one type of retriever, embeddings, etc., the knowledge of various forms of such tools will be useful for future projects and hackathons. Additionally, we learned to use Tkinter to create an interactive GUI. Later on, in an attempt to sophisticate our chatbox, we looked into implementing BM25 to make its responses quicker and more helpful. We found that while it would have helped our responses come quicker by slight fractions of a second, it inadvertently introduced a separate problem entirely. Each unit of text is quantified into an amount of tokens, and while using the OpenAI model, we had a max of 4,000 tokens for five dollars. The ability to climb past 4,000 tokens would come with a higher cost that we felt wasn't worth the improvements on such a small scale.

What's next for RAG Chatbot HackPSU

We plan to redevelop the RAG Chatbot using Streamlit, a Python framework for web app development.

Built With

Share this project:

Updates