Scene completion demo

This is a toy implementation of Scene Completion Using Millions of Photographs for 6.870 Object Recognition and Scene Understanding.


NEW! MATLAB code for this implementation can be downloaded here.

You will also need Antonio Torralba's GIST descriptor code, available here.

The results shown below were obtained using a GIST-matching algorithm by Michael G. Ross. The implementation above is very similar, but gives slightly different nearest neighbors.


This implementation uses a database of 54,000 256x256 images from the Oliva lab for which we already have gist-based nearest neighbors data. Nearly all of these images are outdoor scenes (city streets, forests, mountains, etc.). For this demo, I took 7 test images and got their 10 nearest neighbors in gist space.

Test images with their 10 nearest neighbors

Image Image

Image Image

For these first two test images, some of the nearest neighbors are actually the same scene photographed from a slightly different angle.

Image Image

Image Image

The street and mountain are fairly common types of scenes in our database, so even though there were no exact matches, there are many very similar scenes.

Image Image

Image Image

Image Image

These last three scenes are not very prototypical and not well represented in the database, so the nearest neighbors are not very good matches (although they still tend to have some elements in common with the test image).

For each test image, I did two scene completion tests. For this, I just cut out an object or region in the test image and replaced it with the corresponding region from a near neighbor. (The paper does a lot of extra steps to find the best alignment between the test image and the neighbor and blend the two images, which I skipped in this toy version.) Click on the images to see the results, or scroll down for some examples.

Image Image

Image Image

Image Image

Image Image

Image Image

Image Image

Image Image

Results

The better the neighbor matches the original image, the better the result:

Image Image Image

Neighbors that don't match the original tend to produce really bad results:

Image Image Image

But sometimes the filled image looks pretty good, even though the fill is completely wrong:

Image Image

(The left image shows the stairs filled in with part of a crosswalk, the right image shows some of the water filled in with grass.)

Conclusion

Even in this toy version, the results look promising. A large, representative database of images is important for getting good results. Aligning the original and fill image would also give better results, but you can do pretty well even without this step (since the near neighbors in gist space tend to be pretty well aligned anyways).