Inspiration

We were inspired by a pokemon generator by sayakpaul on Hugging Face. Current working model->(https://huggingface.co/spaces/sayakpaul/pokemon-sd-kerascv)

What it does

Our model generates images based on user input, using deep neural networks.

How we built it

We built the code using a GitHub repository called instaloader (https://github.com/instaloader/instaloader), to scrape and download posts from about 50 different Petr Instagram accounts. Instaloader only provided posts with every image in a post. Most posts contained images that were not Petr images (ie. drop locations, real people). We had to manually clean more than 1000 images deleting every image that was not a Petr image. We used OpenCV to resize and standardize the images to 512 by 512 resolution to make neural network processing more efficient. We then used PIL to convert images to a PIL object making uploading to a Hugging Face repository easy. From there a script takes the data from those profile folders and generates an JSON file of the images and their captions. The JSON file is provided to the neural networks so that it may run and be able to be utilized for image generation. The neural network learned text-to-image representations thus allowing a user to generate images based on text input.

Challenges we ran into

We initially were going to use another neural network to generate captions for each image, but we made the switch to using the already provided in order to save time. We had issues with the OpenCV library and preprocessing the images. Eventually we opted to conda install instead of pip install OpenCV. We then faced difficulties with formatting the images in a way for the neural network to work. After hours of research we found that we needed to save each image as PIL (Pillow) object and latin encode the byte representations. It took a while to understand how to use PIL and decide on the proper image encoding.

Accomplishments that we're proud of

Despite everything, we believe that our ability to scrape and download each image ( est. 1200~ images), and then create code which takes the post information and organizes it into a list of dictionaries in a JSON file, to be used by the generator. We spent a while reading machine learning papers to identify the suitable neural network architecture. We eventually decided on a diffusion based architecture, as these have been shown to beat GANs at image synthesis.

What we learned

It is difficult to use Beautifulsoup with Instagram. We learned about image processing (PIL/OpenCV) and especially how to efficiently format images to pass through deep neural network. We learned a lot about the underlying basis behind deep neural network models and developed strong skills in fine tuning deep neural network models.

What's next for PetrGen

We want to implement live scraping to live upload Petr images to our dataset. This way our model will be able to significantly improve the quality of generated images.

Built With

Share this project:

Updates