VisualQuest | Devpost

Inspiration

Having always been a huge fan of choose your own adventure story books, I wanted to create an application that enhances the creativity of unique images and create an immersive experience to be a master of our own choices.

What it does

VisualQuest has the user upload an image where the image is analysed and a story is generated with choices for the user to speak and decided. Each choice carrying into the next segment until the story concludes.

How we built it

VisualQuest has the user upload an image where the Llama 90-B Vision model analyzes the image and passes it to a Llama 3.2 3-B model that creates the story. Another function is chained to this prompt to rewrite the story and create choices for the user to decided.

Each choice branches the story further into a different direction. The choice is made by using the mic to speak and state the action to be taken, AssemblyAI is used to transcribe the streamed speech into text and pass it into the Llama model again until we reach the conclusion of the story

Challenges we ran into

Handling the choices was difficult at first by the randomness of the prompts when iterating to the next segment. Concluding the story was another obstacle that was encountered before it was solved by using a different prompt and function to handle it.

Accomplishments that we're proud of

I am proud to create a multimodel AI agent that chains different prompts to create such an immersive experience.

What we learned

Chaining model prompts and transcribing speech to text.

What's next for VisualQuest

Adding speech to text narration support.

Built With

assemblyai
llama
python
streamlit

Updates

Aazam Thakur started this project — Nov 24, 2024 05:02 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.