Play with Images — Open CV example for beginners

10 min readSep 14, 2021

In this article, we will understand how to perform certain operations to change the pixel properties of the images that we have. We will look at turning a color image into various forms like gray scale and binary using different techniques. We also understand how to visualize these changes effectively and all this will be done using a very powerful image processing tool called OpenCV.

Pip command to install the library —

pip install opencv-python

let’s see how to read, display and save an image that we wish to perform some processing on. The following steps will involve the following three functions: cv2.imread(), cv2.imshow(), and cv2.imwrite(). You will also learn how to display images using Matplotlib. So, let’s get started.

Download this picture of a cat. We will see how to perform various operations on this image.

Open any python IDE you like and make a new file called ‘Playwithimages.py’, where .py represents the python extension. And write the following code to open the image with openCV

import numpy as np
import cv2
img = cv2.imread('cat.jpg',1)
cv2.imshow('image',img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Let’s have a look at what this code means. The first line of code is importing a library which specifically deals with scientific computing, and is an essential component for any machine learning application that we are trying to build.

We abbreviate numpy as np for our convenience and python will recognize np as numpy from this point on wards. The same can be done with other libraries. The second line of code imports the OpenCV library for use in image processing. The third line creates a variable called img and using the function cv2.imread() we read the image of the cat that we saved.

Note: For this to work the image you’re trying to read must be in the same directory as that of the python script.

The function cv2.imread() has two arguments. The first refers to the image which needs to be read and the second refers to how the image needs to be read. The second argument can take values of 0,1,and -1. 0 will load the image in grayscale, -1 will load the image unchanged and 1 will load the image in color, and any transparency in the image will be neglected. By default, 1 is chosen if the second argument is not specified.

Once the image is read, the fourth line will show/display the image using the cv2.imshow() function. The function also has two arguments. The first argument is the title of the window where we want the image to be displayed. If the window doesn’t exist, a window of that name will be created. The second is the name of the variable which contains the image, which in this case is img which we created in the previous line.

The next two lines of code are intended to display the image continuously until any key is pressed on the keyboard. Once a key is pressed the image displayed will automatically be destroyed as described by the last line of the code.

cv2.imshow() command must always be put with cv2.waitkey() command. cv2.waitKey() is a keyboard binding function having it’s only argument as the time in milliseconds. The function waits for the specified number of milliseconds. If 0 is passed, it waits indefinitely until a key is stroked. It can also be set to detect specific key strokes, such as the key ‘q’ for quitting. Please note that this function also processes many other GUI events, so it is mandatory to be used for actually displaying an image.

cv2.destroyAllWindows(), as the name suggests will destroy all the windows we created. If you want to destroy any specific window, pass the exact window name as the argument into this function.

import numpy as np
import cv2
img = cv2.imread('cat.jpg')
cv2.imshow('image',img)
i = cv2.waitKey(0) 
if i == 27: 
    cv2.destroyAllWindows()
elif i == ord('s'):
    cv2.imwrite('cat_saved.jpg',img)
cv2.destroyAllWindows()

The above code will read, display and save the file.

The cv2.imwrite() function has two arguments. The first specifies the name of the file to be saved and in the desired format. In this case, we save the file as ‘cat_saved.jpg’, and the variable which contains the picture we want to save is our second argument. This completes the first part of our objective.

Alternatively, we can also use the matplotlib library to display the images. Matplotlib is a plotting library that gives us publication quality plots of our data. Here’s how we do it:

import numpy as np
import cv2
from matplotlib import pyplot as plt
img = cv2.imread('cat.jpg',0)
plt.imshow(img, cmap = 'gray', interpolation = 'bicubic')
plt.xticks([]), plt.yticks([])  # to hide tick values on X and Y axis 
plt.show()

The first two lines are the same as the previous code. The third line imports the pyplot package from matplotlib and the fourth reads the image in grayscale.. The following link explains the meaning of the various arguments line 5: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html

Grayscale and Binary Images

It is possible to convert gray scale image to a binary image using a technique called image thresholding. The reason we do this is to further-simplify visual data for analysis.

You may be wondering why gray-scale may not be a sufficient simplification. The reason is that gray scale still has at least 255 values. Image thresholding can fundamentally simplify the image by converting everything to white or black, based on a threshold value. We have a pixel values ranging from 0 to 255.

Let’s assume that we want the threshold to be 127, then everything that was 127 and under would be converted to 0 (or black)and everything above 127 would be converted to 255 (or white).

If you don’t convert to grayscale before performing image thresholding, you will get threshold pictures, but there will be color, which doesn’t really help in most cases. This is the reason it is recommended to convert all images to gray scale before performing any binary thresholding operations.

The OpenCV function used for converting a color image to grayscale is called cv2.cvtColor()

gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

The first argument is the source image, which should be a color image. The second argument specifies that the image should be converted to the Gray color space. There are other color spaces like HSV.

Note: The color is converted from BGR to Gray. This is because OpenCV reads color images in the BGR format and not in the RGB format. If you need the RGB image for any reason, you can use this same command with the second argument as cv2.COLOR_BGR2RGB.

The OpenCV function used for converting a grayscale image to binary is called cv.threshold().

th1, threshold = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)

The first argument is the source image, which should be a gray scale image as explained earlier. The second argument is the threshold value which is used to classify the pixel values, usually in the range between 125–150.

In some cases, we will need to adapt to a specific picture, say a book, which has many curvatures that may cause shadows at the curvatures. In this case, with simple binary thresholding, the region at the curvature will be completely turned to black, making the entire exercise useless. For this reason, we can use adaptive thresholding.

th2=cv2.adaptiveThreshold(img,255,cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY,11,2)

Here, the first argument img is the variable name of the image which needs to be thresholded. The second argument 255 is the value which needs to be applied if a pixel value is greater than the calculated threshold. The third argument cv2.ADAPTIVE_THRESH_MEAN_C indicates that the threshold is calculated using the mean of values around a pixel. cv2.THRESH_BINARY specifies the type of thresholding to be performed after a threshold is calculated. The third argument tells the block size around a pixel which is taken to calculate the mean. In this case, a 11×11 grid around every pixel is taken to calculate the mean. The last argument specifies an overall negative correction which needs to be applied after calculating the mean. In this case, the threshold for every pixel will be [mean(11×11 grid around pixel) — 2 ].

Shrinking and Zooming

Let’s now take a look a scaling function offered by OpenCV. We can use this function to perform tasks such as zooming and shrinking images, which becomes vital in many image processing applications. Scaling is just resizing of the image. OpenCV comes with a function cv2.resize() for this purpose. Let’s first shrink an image. Then we will zoom it back using the same scale so that we can see how the image was affected by the shrinking.

import cv2
import numpy as np
img = cv2.imread('cat.jpg')
res = cv2.resize(img,None,fx=0.1, fy=0.1, interpolation = cv2.INTER_AREA)
res = cv2.resize(img,None,fx=10, fy=10, interpolation = cv2.INTER_AREA)

Try experimenting by changing the scaling factors and compare the different outputs that you get! You can observe that there is an argument in the cv2.resize() function called interpolation. When the size of an image changes, we need to calculate the pixel values of the new image from the old one. This process is called interpolation and there are several ways of doing this; INTER_AREA is one such argument that helps us achieve this.

Using cv2.INTER_AREA gives a pixelated image when zooming. To get a better image when zoomed, try more complex interpolation methods like cv2.INTER_CUBIC.

Gaussian Blur

Just like in one-dimensional signals, images can also be filtered using various low-pass filters(LPF) and high pass filters(HPF). LPF helps in removing noise, blurring the image etc. HPF helps define the edges in any given image.

Gaussian blurring is a kind of low-pass filter that uses a “Gaussian Kernel”. A “Gaussian Kernel” is a square array of pixels where the pixel values will correspond to a set of values of a Gaussian curve which means that the image will be blurred uniformly.

The function used for this purpose is:

cv2.GaussianBlur(image, (kernel_width,kernel_height), standard_deviation)

In this we specify the image on which Gaussian Blur has to be applied, the width and height of the kernel which should be positive and odd, and the standard deviation in the X and Y directions as the functions’ arguments. To explore the mathematics further, click here.

Converting BGR to HSV

Computer vision and graphics make use of several color spaces. For example, some commonly used ones include RGB(Red, Green, Blue) and CMYK(Cyan, Magenta, Yellow, Key), which uses various combinations of primary colors to generate a spectrum of other colors.

Often, we come across scenarios where we are interested in specific colored objects in an image. This can be carried out more efficiently in alternate color spaces like HSV.

HSV is slightly different in that it is defined in a way that is similar to how humans perceive color. HSV stands for: hue, saturation, and value. This color space describes colors (H — hue or tint) in terms of their shade (S — saturation or amount of grey) and their corresponding brightness value(V- value).

The generalized function used for color conversion is

cv2.cvtColor(input_image, flag)

where flag determines the type of conversion. For BGR \ HSV, we use the flag cv2.COLOR_BGR2HSV.

Similarly, BGR \ Grey conversion can also done using this method where we use the flag cv2.COLOR_BGR2GRAY. To find the flags for conversions to other color spaces follow this link.

Putting it all together!

Now that we’ve understood the various processes that can be carried out on an image, it’s time to put it all together in a single code and compare the outputs to better appreciate these functions. Each of these functions have a specific purpose for various machine learning applications. We decide which technique to use based on the image available and the kind of information we wish to extract from it.

To stack the image, we use np.concatenate() function from the numpy library. This helps because the openCV images are always stored as numpy arrays, so we combine images by simply concatenating the arrays.

The following command will stack 3 images side by side

temp1 = np.concatenate((img1,img2,img3), axis=1)

The following command will stack one image on top of another:

img_final = np.concatenate((temp1,temp2), axis=0)

The first argument for this command is a list of images put within parantheses and seperated by commas — (img1,img2,…etc.). The second argument is the axis to concatenate — axis 0 signifies vertical concatenation, which means one image will be on top of the other, and axis 1 signifies horizontal concatenation, which means the images will be side-by-side.

Final Code

Now let’s perform all 6 manipulations we learnt, and stack all images in a 2×3 grid.

Note:
The grid is designed to expect a BGR format of color(since most of our manipulated images are BGR). Certain images like the Gray image, the shrunk image and the HSV image are not in BGR format after manipulation. Hence they have special considerations.
The Gray image consists of just one array of pixels, but the BGR grid expects three arrays(B array, G array and R array). So we use the command cv2.merge((img_gray,img_gray,img_gray)) to have a gray image with three arrays. This does not affect the manipulation we have made (converting from BGR to GRAY).
When we shrink an image, the width and height reduces. However, the np.concatenate() command expects all arrays to be of the same size. So we will zoom the image by the same amount that we have shrunk it (shrink to 0.1 and then zoom to 10). This will show you a pixelated image.
All pixels in the HSV image will be of the HSV color space. Since we are displaying it on a BGR grid( since most other images are BGR), the color of the image will look extremely unnatural.

import numpy as npimport cv2img1 = cv2.imread('cat.jpg')h,w,bpp = np.shape(img1)img_gray = cv2.cvtColor(img1,cv2.COLOR_BGR2GRAY)img2=cv2.merge((img_gray,img_gray,img_gray))th1, img3 = cv2.threshold(img2, 127, 255, cv2.THRESH_BINARY)img4 = cv2.resize(img1,None,fx=0.1, fy=0.1, interpolation = cv2.INTER_AREA)img4 = cv2.resize(img4,None,fx=10, fy=10, interpolation = cv2.INTER_AREA)img5 = cv2.GaussianBlur(img1,(9,9),10)img6 = cv2.cvtColor(img1,cv2.COLOR_BGR2HSV)temp1 = np.concatenate((img1,img2,img3), axis=1)temp2 = np.concatenate((img4,img5,img6), axis=1)img_final = np.concatenate((temp1,temp2), axis=0)cv2.imshow("result",img_final)cv2.waitKey(0)cv2.destroyAllWindows()

There you go!

Your final output after putting all these images together will look something like this.

The images in order are — Original image, Grayscale image, Thresholded image, Shrunk and Zoomed image, Gaussian Blurred image, and image in HSV color space.

With this, you are equipped with all the necessary skills to manipulate an image and extract all the useful information necessary for your machine learning application to function effectively.