Inspiration

We love traveling, but it's a great pity to travel somewhere, take a picture of it, then forget about it. Booking a tour guide to learn about local sites is too expensive, and tourist apps are a great hassle. We want to build a complete tourist companion app that is both informative and useful.

What it does

Once you take a picture, it automatically uploads this picture into an online album for easy management (think Google Photos). This album is updated live and interactive! Crucially, we run Google Cloud's vision analytics engine to determine what tourist attraction it is, then fetch information about it online and play it back in audio to the user.

How we built it

The user side is built with Raspberry Pi, and once it takes a picture (controlled by a click button) that picture is sent to a server. The server runs Google Cloud Vision on the image, labels the image with a landmark name, then puts the image in a folder that is automatically synchronized with the web album that we created. If the picture does not contain a landmark, then we would use another algorithm (using pertained weights from YOLO https://github.com/pjreddie/darknet) to detect objects in this image.

Any information that needs to be returned to the user is put through a text-to-speech engine, which converts it into an audio file, then sent back to Raspberry Pi for playing.

Challenges we ran into

Raspberry Pi refuses to connect to U of T internet, and we spent a long time figuring out a solution. In the end, we decided to connect the Raspberry Pi to a smartphone's hot spot, which solved the issue.

It's very difficult to send files seamlessly between Raspberry Pi and our computer. We attempted many methods and settled on using SSH, because it is not limited to local connections only (unlike Samba). This means that our program can truly work anywhere in the world.

Accomplishments that we're proud of

It all works! Better than we thought.

What we learned

Raspberry Pi 4, Google Cloud, OpenCV, javascript, React, IoT, SSH, Text to Speech, Python

What's next for iStick

Optimize the time required to obtain a result: currently a bottleneck is the image-taking process, and we hope to speed this up. We can also add a screen so the user can see what the camera is capturing.

Share this project:

Updates