Augmenting Data for Machine Learning

Training a machine learning model with sparse data sometimes requires data augmentation, or tricking the model into thinking that the same image is something new, in order for it to learn to recognize patterns from different perspectives of the same image. There are a number of different types of image augmentations, including rotation, shearing, zooming, cropping, flipping, changing the brightness level, swapping the background or other layers, and many others. You can even enhance images using more advanced techniques, such as principal component analysis, or manual operations on pixels to affect the image to your taste.

Sometimes you’d want to perform augmentation even if you have enough data to train on, in order to increase the model’s invariance property. This allows the model to be able to better recognize the same object in new situations, even if it’s in a different position, size, distorted, or even illuminated differently.

There are different methods for implementing data augmentation, each of which may make more or less sense depending on your resources and the amount of data that needs to be augmented. One method involves iterating through each of your images and performing the augmentations, creating new images to add into your dataset all in one batch, known as offline augmentation. The other method instead augments batches of images on the fly just before they are ingested by your model. This method is preferable when you have very large datasets as the augmentation would take extraordinarily long otherwise.

Regardless of which overall method you use, there are a number of packages and methods available to help with augmentation tasks, including from OpenCV, Pillow, scikit-image, keras.preprocessing, tensorflow, and many others. Depending on the format your data is currently in, some of these packages may make more or less sense in order to reduce additional computation to transform the data from one format to another to perform some operation to augment the image. If these packages don’t make sense for your particular use, you can always write your own custom functions to augment the data using the method most suitable to your needs. As images are often stored as arrays of pixels, which can allow us to take advantage of techniques in linear algebra for efficient computations, you can always manipulate these pixels by hand, by performing various matrix operations, reshaping, clipping, or convoluting the data held in the arrays.

While I’ve talked mostly about image augmentation, you can perform other types of augmentation tasks on other types of data. This can be tricky depending on the context, and often can require specialized knowledge about the storage methods of different elements, such as audio, radar, lidar, etc. Each of these might require different techniques to enhance or augment the data.

I hope this brief intro was helpful to your learning process, as I’m continuously learning all the things myself. Good luck in your journey!