Inspiration

Current apps like Apple's Screen Time and Toggl don't keep track of what projects/tasks you're working on across your suite of apps. We wanted to bridge this gap using the power of LLMs. We were also inspired by Rewind.ai's implementation of a desktop search engine.

What it does

Show you analytics on much time you spent on different tasks, projects, websites and apps.

How we built it

We used Airtable as our backend, Python scripts to generate screenshots efficiently, Pytesseract for OCR on screenshots, GPT4 Vision Preview to understand what the OCR meant, Together.AI and LLaMaChat for analytics and a chat interface, NextJs for our frontend interface, Flask for our frontend API.

Challenges we ran into

Vision model's rate limits and query limits for Tier 1.

Accomplishments that we're proud of

Getting the Screenshot+OCR+Vision integration working smoothly.

What we learned

Tons about large language models, how to deal with screenshot data, GPT Vision

What's next for OpenTrack.ai

Add the ability to understand deeper context details like the connections between different applications in people's workflow. Right now, we use pre-set "tags" to describe projects and categories and use the vision API to classify based on those. We would expand the functionality to come up with tags and projects on the fly based on existing screengrab data.

Built With

Share this project:

Updates