Holiday Sale - 40% OFF on All Courses and Programs Image

Holiday Sale - 40% OFF on All Courses and Programs Image

Holiday Sale - 40% OFF on All Courses and Programs Image

Holiday Sale - 40% OFF on All Courses and Programs Image

Holiday Sale - 40% OFF on All Courses and Programs Image

Efficient image loading

When it comes to writing optimized code, image loading plays an important role in computer vision. This process can be a bottleneck in many CV tasks and it can often be the culprit behind bad performance. We need to get images from the disk as fast as possible. The most

Efficient image loading

When it comes to writing optimized code, image loading plays an important role in computer vision. This process can be a bottleneck in many CV tasks and it can often be the culprit behind bad performance. We need to get images from the disk as fast as possible.

The most obvious example of the importance of this task would be an implementation of a Dataloader class in any CNN training framework. It is crucial to make image loading fast. If it is not so, the training procedure becomes CPU bound and wastes precious GPU time.

Today we are going to look at some Python libraries which allow us to read images most efficiently. They are —

  • OpenCV
  • Pillow
  • Pillow-SIMD
  • TurboJpeg

Also, we will cover alternative methods of image loading from databases using:

  • LMDB
  • TFRecords

Finally, we will compare the loading time per image and find out which one is the winner!

Installation

Before we start – we need to create a virtual environment

$ virtualenv -p python3.7 venv
$ source venv/bin/activate

Then, install the required libraries:

$ pip install -r requirements.txt

Now we can go forward with our tasks.

Ways to load images

Structure

Usually, we need to load several images that are stored either in a database or just as a folder. In our scenario, an abstract image loader should be able to store the path to such a database or folder and load one image at a time from it. Moreover, we need to measure the time of some parts of the code. Optionally, some initialization may be required before the loading starts. Our ImageLoader class looks like this:

Download Code To easily follow along this tutorial, please download code by clicking on the button below. It's FREE!
import os
from abc import abstractmethod


class ImageLoader:
    extensions: tuple = \
        (".png", ".jpg", ".jpeg", ".tiff", ".bmp", ".gif", ".tfrecords")

    def __init__(self, path: str, mode: str = "BGR"):
        self.path = path
        self.mode = mode
        self.dataset = self.parse_input(self.path)
        self.sample_idx = 0

    def parse_input(self, path):

        # single image or tfrecords file
        if os.path.isfile(path):
            assert path.lower().endswith(
                self.extensions,
            ), f"Unsupportable extension, please, use one of 
                 {self.extensions}"
            return [path]

        if os.path.isdir(path):
            # lmdb environment
            if any([file.endswith(".mdb") for file in os.listdir(path)]):
                return path
            else:
                # folder with images
                paths = \
                    [os.path.join(path, image) for image in os.listdir(path)]
                return paths

    def __iter__(self):
        self.sample_idx = 0
        return self

    def __len__(self):
        return len(self.dataset)

    @abstractmethod
    def __next__(self):
        pass

Image decoding functions in different libraries can return images in different formats – RGB or BGR. In our case, we use BGR color mode as default, but it always can be converted into the required format. In case you want to know the fun reason why OpenCV uses BGR format, click on this link.

Now we can inherit new classes from the base class and use them for our task.

OpenCV

The first one is the OpenCV library. We can use one simple function to read an image from the disk – cv2.imread.

import cv2

class CV2Loader(ImageLoader):
    def __next__(self):
        start = timer()
        # get image path by index from the dataset
        path = self.dataset[self.sample_idx]
        # read the image 
        image = cv2.imread(path)
        full_time = timer() - start

        if self.mode == "RGB":
            start = timer()
            # change color mode
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  
            full_time += timer() - start

        self.sample_idx += 1
        return image, full_time

Before image visualization, we need to mention that the OpenCV cv2.imshow function requires an image in BGR format. Some libraries use RGB image mode as default, in this case, we convert images to BGR for a correct visualization.

You can try to load your image using our example with this function.

To test the OpenCV library, please, use this command:

$ python3 show_image.py --path images/cat.jpg --method cv2

This and next commands in the text will show you the image and its loading time using different libraries.

If everything goes well, you will see an image in the window like this:

![cat example](cat-example.jpg)

Also, you can show all images from a folder. Instead of using a specific image, you can mention a path to the folder with images:

$ python3 show_image.py --path images/pexels --method cv2

This will show you all images from the folder one at a time together with their loading times. To stop the demo, you can press the ESC button.

Pillow

Let’s now try the PIL library. We can read an image using Image.open function.

import numpy as np
from PIL import Image

class PILLoader(ImageLoader):
    def __next__(self):
        start = timer()
        # get image path by index from the dataset     
        path = self.dataset[self.sample_idx]  
        # read the image as numpy array
        image = np.asarray(Image.open(path))  
        full_time = timer() - start

        if self.mode == "BGR":
            start = timer()
            # change color mode
            image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)  
            full_time += timer() - start

        self.sample_idx += 1
        return image, full_time

We also convert the Image object to a Numpy array since it’s likely we’d want to apply some augmentations or pre-processing as a next step and Numpy is a default choice for it.

To check this out on a single image you can use:

$ python3 show_image.py --path images/cat.jpg --method pil

If you want to use it on the folder with images:

$ python3 show_image.py --path images/pexels --method pil

Pillow-SIMD

There is the fork-follower of the Pillow library with higher performance. Pillow-SIMD uses new techniques which allows reading and transforming images faster with the same API as standard Pillow.

Pillow and Pillow-SIMD cannot be used simultaneously in the same virtual environment – Pillow-SIMD will be used by default.

To use Pillow-SIMD and avoid mistakes caused by Pillow and Pillow-SIMD being together, you need to create a new virtual environment and use

$ pip install pillow-simd

Or you can uninstall the previous Pillow version and install Pillow-SIMD:

$ pip uninstall pillow
$ pip install pillow-simd

You don’t need to change anything in the code – the previous example is still working. To check that everything is fine you can use the commands from the previous Pillow part:

$ python3 show_image.py --path images/cat.jpg --method pil
$ python3 show_image.py --path images/pexels --method pil

TurboJpeg

There is another library called TurboJpeg. As it follows from the title – it can read only images compressed with JPEG.

Let’s create an image loader using TurboJpeg.

from turbojpeg import TurboJPEG

class TurboJpegLoader(ImageLoader):
    def __init__(self, path, **kwargs):
        super(TurboJpegLoader, self).__init__(path, **kwargs)
        # create TurboJPEG object for image reading
        self.jpeg_reader = TurboJPEG()  

    def __next__(self):
        start = timer()
        # open the input file as bytes
        file = open(self.dataset[self.sample_idx], "rb")  
        full_time = timer() - start

        if self.mode == "RGB":
            mode = 0
        elif self.mode == "BGR":
            mode = 1

        start = timer()
        # decode raw image
        image = self.jpeg_reader.decode(file.read(), mode)  
        full_time += timer() - start

        self.sample_idx += 1
        return image, full_time

TurboJpeg requires decoding of the input image, which is stored as a string of bytes.

You can try it with the following commands. But remember that TurboJpeg only allows processing of .jpeg images:

$ python3 show_image.py --path images/cat.jpg --method turbojpeg
$ python3 show_image.py --path images/pexels --method turbojpeg

LMDB

A commonly used approach to image loading when speed is a priority is to convert data into a better representation – database or serialized buffer – beforehand. One of the largest advantages of such “databases” is that they operate with zero system calls per data access, while the file system requires several system calls per data access. We can create an LMDB database that will collect all images in key-value format.

The following function allows us to create an LMDB environment with our images. LMDB’s “environment” is essentially a folder with special files created by LMDB library. This function only requires a list with image paths and save path:

import cv2
import lmdb
import numpy as np

def store_many_lmdb(images_list, save_path):
    
    # number of images in our folder
    num_images = len(images_list)  
    # all file sizes
    file_sizes = [os.path.getsize(item) for item in images_list]  
    # the maximum file size index
    max_size_index = np.argmax(file_sizes)  
    # maximum database size in bytes
    map_size = num_images * cv2.imread(images_list[max_size_index]).nbytes * 10

    # create lmdb environment
    env = lmdb.open(save_path, map_size=map_size)  

    # start writing to environment
    with env.begin(write=True) as txn:  
        for i, image in enumerate(images_list):
            with open(image, "rb") as file:
                # read image as bytes
                data = file.read()  
                 # get image key
                key = f"{i:08}" 
                # put the key-value into database
                txn.put(key.encode("ascii"), data)  

    # close the environment
    env.close()  

There is a python script which creates an LMDB environment with images:

–path argument should contain the path to your collected images folder
–output argument should be a directory where LMDB will be created

$ python3 create_lmdb.py --path images/pexels --output lmdb/images

Now, as the LMDB environment has been created we can load our images from it. Let’s create a new loader class.

In the case of loading images from the database, we need to open this database for reading. There is a new function called open_database. It returns the iterator to navigate through the opened database. Also, as this iterator comes to the end of the data, we need to return it back to the start of the database using _iter_ function.

LMDB allows us to store the data, but there is no built-in decoder for images. For the lack of a decoder, we will use cv2.imdecode function here.

class LmdbLoader(ImageLoader):
    def __init__(self, path, **kwargs):
        super(LmdbLoader, self).__init__(path, **kwargs)
        self.path = path
        self._dataset_size = 0
        self.dataset = self.open_database()

    # we need to open the database to read images from it
    def open_database(self):
        # open the environment by path
        lmdb_env = lmdb.open(self.path)  
        # start reading
        lmdb_txn = lmdb_env.begin()  
        # create cursor to iterate through the database
        lmdb_cursor = lmdb_txn.cursor() 
        # get number of items in full dataset
        self._dataset_size = lmdb_env.stat()["entries"]  
        return lmdb_cursor

    def __iter__(self):
        # set the cursor to the first database element
        self.dataset.first()  
        return self

    def __next__(self):
        start = timer()
        # get raw image
        raw_image = self.dataset.value()  
        # convert it to numpy
        image = np.frombuffer(raw_image, dtype=np.uint8)  
        # decode image
        image = cv2.imdecode(image, cv2.IMREAD_COLOR)  
        full_time = timer() - start

        if self.mode == "RGB":
            start = timer()
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            full_time += timer() - start

        start = timer()
        # step to the next element in database
        self.dataset.next()  
        full_time += timer() - start
        return image, full_time

    def __len__(self):
        # get dataset length
        return self._dataset_size  

After we have created the environment and loader class we can check its correctness and show images from it. Now in –path argument we need to mention the path to LMDB environment. Remember that you can stop showing using the ESC button.

$ python3 show_image.py --path lmdb/images --method lmdb

TFRecords

Another useful database is TFRecords. To read data efficiently it can be helpful to serialize your data and store it in a set of files (100-200MB each) that can each be read linearly (TensorFlow manual).

Before we create the tfrecords file, we need to choose the structure of the database. TFRecords allows keeping items with many additional features. You can save the file name or image width and height, if it is needed. All these things should be collected in python dictionary, i.e.

image_feature_description = {
    "height" :tf.io.FixedLenFeature([], tf.int64),
    "width" :tf.io.FixedLenFeature([], tf.int64),
    "filename": tf.io.FixedLenFeature([], tf.string),
    "label": tf.io.FixedLenFeature([], tf.int64),
    "image_raw": tf.io.FixedLenFeature([], tf.string),
}

In our example, we will use only the image in raw byte format and its unique key called “label.”

import os
import tensorflow as tf

def _byte_feature(value):
    """Convert string / byte into bytes_list."""
    if isinstance(value, type(tf.constant(0))):
        # BytesList can't unpack string from EagerTensor.
        value = value.numpy() 
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def _int64_feature(value):
    """Convert bool / enum / int / uint into int64_list."""
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

def image_example(image_string, label):
    feature = {
        "label": _int64_feature(label),
        "image_raw": _byte_feature(image_string),
    }
    return tf.train.Example(features=tf.train.Features(feature=feature))

def store_many_tfrecords(images_list, save_file):

    assert save_file.endswith(
        ".tfrecords"
    ), 'File path is wrong, it should contain "*myname*.tfrecords"'

    directory = os.path.dirname(save_file)
    if not os.path.exists(directory):
        os.makedirs(directory)
    # start writer
    with tf.io.TFRecordWriter(save_file) as writer: 
        # cycle by each image path
        for label, filename in enumerate(images_list): 
            # read the image as bytes string
            image_string = open(filename, "rb").read()  
            # save the data as tf.Example object
            tf_example = image_example(image_string, label) 
            # and write it into database
            writer.write(tf_example.SerializeToString()) 

Please, note that we convert images using tf.image.decode_jpeg function because all our images are stored as JPEG files. You can also use tf.image.decode_image as a universal decoder.

To check the correctness of the created database you can show images from it:

$ python3 show_image.py --path tfrecords/images.tfrecords --method tfrecords

Loading time comparison

Now we have five different methods of image loading. Let’s find out which one is the best!

We will use some open images from pexels.com with different shapes and jpeg extension. And all time measurements will be averaged with 5000 iterations. Moreover, averaging will mitigate the impact of OS/hardware specific logic, for example, data caching. It is expected that the first iteration in the first method under evaluation will suffer from the initial loading of the data from disk into a cache, while the other methods will be free of that.

All experiments are running for both BGR and RGB image modes to cover all potential needs and different tasks. Please, remember that Pillow and Pillow-SIMD can not be used in the same virtual environment. To create the final comparison table we did two separate experiments for Pillow and Pillow-SIMD.

To run the measurements use:

$ python3 benchmark.py --path images/pexels --method cv2 pil turbojpeg lmdb tfrecords --iters 100 --mode BGR
LibraryModeMean read time (sec)Median read time (sec)
OpenCVBGR0.0035910.0010559
OpenCVRGB0.0037310.0010915
PillowBGR0.0040180.0012519
PillowRGB0.0039600.0012235
Pillow-SIMDBGR0.0028250.0008151
Pillow-SIMDRGB0.0027910.0007866
TurboJpegBGR0.0022590.0006032
TurboJpegRGB0.0022570.0006026
LMDBBGR0.0035090.0009936
LMDBRGB0.0035600.0010263
TFRecordsBGR0.0028180.0010221
TFRecordsRGB0.0026400.0009445
![mean median bgr](mean-median-bgr.png)
![mean median rgb](mean-median-rgb.png)

Moreover, it would be interesting to compare databases reading speed with the same decoder function. It can show which database loads its data faster. In this case, we use cv2.imdecode function for both TFRecords and LMDB.

LoaderMean read time (sec)Median read time (sec)
TFRecords0.0041820.001356
LMDB0.0036880.001023

All experiments were calculated on:

  • Intel® Core™ i7-2600 CPU @ 3.40GHz × 8
  • Ubuntu 16.04 64-bit
  • Python 3.7

Summary

In this post, we considered some approaches to image loading and compared them with each other. The comparison results on JPEG images are really interesting. We can see that the TurboJpeg is the fastest library to load the images as numpy, but with one exception – it can read files only with jpeg extension.

Another important thing to mention is that Pillow-SIMD is faster than the original Pillow. In our task the loading speed increased nearly by 40%.

If you plan to use an image database – TFRecords shows better mean results than LMDB, in particular, because of the built-in decoder function. On the other hand, LMDB allows us to read images faster. Surely, you can always combine a decoder function and a database, for example, use TurboJpeg as a decoder and LMDB as an image storage.



Read Next

VideoRAG: Redefining Long-Context Video Comprehension

VideoRAG: Redefining Long-Context Video Comprehension

Discover VideoRAG, a framework that fuses graph-based reasoning and multi-modal retrieval to enhance LLMs' ability to understand multi-hour videos efficiently.

AI Agent in Action: Automating Desktop Tasks with VLMs

AI Agent in Action: Automating Desktop Tasks with VLMs

Learn how to build AI agent from scratch using Moondream3 and Gemini. It is a generic task based agent free from…

The Ultimate Guide To VLM Evaluation Metrics, Datasets, And Benchmarks

The Ultimate Guide To VLM Evaluation Metrics, Datasets, And Benchmarks

Get a comprehensive overview of VLM Evaluation Metrics, Benchmarks and various datasets for tasks like VQA, OCR and Image Captioning.

Subscribe to our Newsletter

Subscribe to our email newsletter to get the latest posts delivered right to your email.

Subscribe to receive the download link, receive updates, and be notified of bug fixes

Which email should I send you the download link?

 

Get Started with OpenCV

Subscribe To Receive
Image

We hate SPAM and promise to keep your email address safe.​