Linear Algebra and Image Processing I

In this module, you will be introduced to some fundamental concepts in linear algebra that form the foundation of many data science and AI models you will encounter as you progress through your study program. Rather than teach it as a core mathematical topic, the aim of this module is to provide you with exposure to key topics in linear algebra that are essential for data science. Today we will examine applications of linear algebra to image processing, in particular the PCA algorithm for dimensionality reduction and facial recognition.

Today's learning objectives

Understand the relationship between linear algebra and image processing.
Understand the notion of dimensionality reduction in images.
Familiarise yourself with the PCA algorithm to process images.

Recap - Representing digital images

We now take a quick detour back to Block A!

What is a pixel?

Digital images consist of hundreds, thousands, or millions of discrete ‘picture elements', otherwise known as pixels. Each pixel can be thought of as a single square point of colored light. You are probably familiar with the terms 480p, 720p, .... The number 480 stands for the number of vertical pixels in the image. A 480p video consists of frames which have 858 * 480 pixels in total.

Take a standard image and zoom in until you can see the individual pixels.

When you zoom in, you may notice that each pixel is composed of a colour, but neighbouring pixels can also have different colours (indicating object boundaries). However, when we view them from a distance, these pixels seem to blend together to form a coherant image.

The RGB Colour model

Digital images use some color model to create a broad range of colors from a small set of primary colors. Although there are several different color models that are used for images, the most commonly occurring one is the RGB (Red, Green, Blue) model [1].

The RGB model is an additive color model, which means that the primary colors are mixed together to form other colors. In the RGB model, the primary colors are red, green, and blue – thus the name of the model. Each primary color is often called a channel.

Most frequently, the amount of the primary color added is represented as an integer in the closed range [0, 255] representing the red, green, and blue channels, respectively. A larger number in a channel means that more of that primary color is present.

You may also come across images represented as a 24bit depth. This is because of how computers store numbers. To store the number 256, a computer does so using a binary number system. In this system, each symbol in a number is called a bit instead of a digit, and there are only two values for each bit (0 and 1). A decimal number 256 is represented using 8 bits in binary. Since, we have 3 channels RGB each represented using 8 bits, am image is therefore represented using 24bits.

Convert the decimal number 256 to binary.

In sum, using the RGB model, we can represent any colour using a list of three primary colours [R, G, B], representing the red, green, and blue channels, respectively. A larger number in a channel means that more of that primary color is present.

lookup google for a standard RGB colour table, what's unique about the colours black, gray, and white in the RGB model?

Taken together, an image is a two-dimensional array of pixels and each pixel is a coloured (square) box, where the color and its intensity is specified by a 24-bit RGB array.

Representing digital images in Python

We now know that digital images are represented as rectangular arrays of individually-coded square pixels, and that the intensity or color of each pixel can be represented as an RGB triplet e.g., (34, 255, 67). Now that we have understood how digital images are represented, let's turn our attention to how we can use Python to represent and manipulate such digital images. In Python, images are stored in a similar manner using data structures provided by Python. In particular, images are stored as three-dimensional NumPy arrays where each dimension represents the color of each pixel.

The rectangular shape of the array corresponds to the shape (num rows, num columns) of the image. The "depth" of the array for an skimage image is three, with one layer for each of the three channels R, G and B. When we think of a pixel in an image, we think of its (x, y) coordinates (in a left-hand coordinate system) like (113, 45) and its color, specified as a RGB triple like (245, 134, 29). In an python image, the same pixel would be specified with (y, x) coordinates (45, 113) and RGB color (245, 134, 29).

Since images are represented digitally as matrices, this implies that we can use the tools of linear algebra we learned to do some really cool things with images. We will start by examining the PCA algorithm, and in the next workshop the concept of convolution

## Image Processing & Linear Algebra - PCA Algorithm

Let's first examine a popular application of linear algebra to images processing in the context of dimensionality reduction using the Principal Components Analysis (PCA) Algorithm. In simple terms, this refers to the process of reducing the shape of a large matrix while retaining most of the information contained in the matrix. While this sounds a bit abstract, maybe it makes more sense when I link to facial recognition!

Today we take facial recognition for granted. However, when the technology was still being developed, linear algebra and the PCA algorithm played a key role! The idea was to collect a large database of faces and each face was represented as a multidimensional vector (recall that images are just matrices). Researchers then used these vectors to obtain certain key vectors - which they termed eigenfaces. Interestingly, they found that they could represent a large amount of variance in the human face using not too many eigenfaces! Further, using this reduced representation of the human face, they could reconstruct any human face to reasonable accuracy.

But, how is this useful practise? Suppose there is a no-fly list at schipol. The manager of Schipol just has to feed in all the faces in the no-fly list and obtain the eigenfaces using the PCA algorithm. Now when a person enters Schipol, the security system can automatically detect if the key features in the face of the person entering Schipol matches any of the eigenfaces saved. If there is a close match, then it is highly likely that the person is on a no-fly list and flag security.

The PCA algorithm in practice

We will spend the rest of the afternoon exploring facial recognition with the PCA algorithm. Do not worry if you find it hard to understand how the algorithm works. What's useful to notice is that the PCA algorithm uses concepts that you are familiar with - mean, variance, transpose, inverse, dot products, and if you were implementing the algorithm from scratch, you would need to know how these concepts taken together lead to the PCA algorithm. In you role as a data scientist or a consultant, it is more important to develop a high level intuition for the algorithms and understand how to apply and interpret the results of such algorithms. If you are curious, here's a slightly more techincal introduction to eigenfaces and what they really represent.

Assignment

Please download and open the jupyter notebook PCAinPractice from your github repositories, explore the code and kindly answer the following questions.

Please continue with the Matrices course in Khan Academy which can be found here and upload evidence of completion to your learning logs.