Inspiration
Current apps like Apple's Screen Time and Toggl don't keep track of what projects/tasks you're working on across your suite of apps. We wanted to bridge this gap using the power of LLMs. We were also inspired by Rewind.ai's implementation of a desktop search engine.
What it does
Show you analytics on much time you spent on different tasks, projects, websites and apps.
How we built it
We used Airtable as our backend, Python scripts to generate screenshots efficiently, Pytesseract for OCR on screenshots, GPT4 Vision Preview to understand what the OCR meant, Together.AI and LLaMaChat for analytics and a chat interface, NextJs for our frontend interface, Flask for our frontend API.
Challenges we ran into
Vision model's rate limits and query limits for Tier 1.
Accomplishments that we're proud of
Getting the Screenshot+OCR+Vision integration working smoothly.
What we learned
Tons about large language models, how to deal with screenshot data, GPT Vision
What's next for OpenTrack.ai
Add the ability to understand deeper context details like the connections between different applications in people's workflow. Right now, we use pre-set "tags" to describe projects and categories and use the vision API to classify based on those. We would expand the functionality to come up with tags and projects on the fly based on existing screengrab data.
Built With
- airtable
- gpt4
- llamachat
- nextjs
- pytesseract
- python
- together.ai
- typescript
Log in or sign up for Devpost to join the conversation.