Convolution Kernel Example

by Steven J. Owens (unless otherwise attributed)

This is an attempt at a very detailed explanation of how convolution kernels work. Even excellent explanations like the one at http://setosa.io/ev/image-kernels/ leave out steps, because they're mostly written for programmers. This article is written for non-programmers.

This article is about understanding how convolution kernels are implemented. There's lots of good writing elsewhere about convolution kernels in general. Please go read some of that to give yourself a basic understanding. A good start might be wikipedia:Kernel (Image Processing)

In spite of what I just said, I'm going to give you a little general description of convolution kernels. If you're already familiar with this, you might want to skip down to the next section.

Convolution and convolution kernels can be applied to a variety of types of data, but in this case we're talking about images. Convolution kernels are used a lot in processing images, either for the output effect or to do things like edge-detection.

An image can be considered a 2-dimensional matrix or 2-dimensional array, i.e a set of rows and columns, where each value is a number representing the color of the corresponding pixel. More often each pixel gets a set of numbers, often red, green and blue (RGB), or cyan, magenta, yellow and black (CMYK). This technically makes it a 3-dimensional array or matrix, but a lot of image processing tutorials seem to just take the multiple color values for granted.

Note: That taking-it-for-granted may be because it's quite common for those multiple color values to be combined into a single number through a technique called bit-shifting, which is a bit too complicated to get into here.

So what the heck is a convolution? A convolution is where the program iterates through every pixel of the input image, and applies a sort of recipe, called a kernel, to generate the color value of the corresponding pixel in the output image. I started to describe that here, then I remembered that's what the rest of this page is for.

I will assume you've read a bit about image convolution and understand at least what a kernel looks like, i.e. the most common simple kernel example is a 3 by 3 grid of values, like this:

111
111
111

Or this:

0 -1 0
-1 5 -1
0 -1 0

The kernel is a list of coordinates and weightings. When kernels are illustrated the way I have it above, the coordinates are implied - the center cell of the kernel is the input pixel and the surrounding cells are applied to pixels in the same relative position. In practice, a program usually defines a kernel something like this python example of a list of lists:

self.offsets_list = [ [-1, -1, 0], 
		      [-1, 0, -1], 
                      [-1, 1, 0], 
                      [ 0, -1, -1], 
                      [ 0, 0, 5], 
                      [ 0, 1, -1],
                      [ 1, -1, 0], 
                      [ 1, 0, -1], 
                      [ 1, 1, 0]   ]

If I reformat that code to make the layout look more like the square above, it becomes more obvious:

self.offsets_list = [ [-1, -1,  0], [-1, 0, -1], [-1, 1,  0], 
                      [ 0, -1, -1], [ 0, 0,  5], [ 0, 1, -1],
                      [ 1, -1,  0], [ 1, 0, -1], [ 1, 1,  0] ]

The first two values are a set of (x,y) coordinate offsets, a coordinate relative to the center pixel. So in this example the first value is [-1, -1, 0]. The [-1, -1] part means one up and one to the left of the current pixel. The third value is a "weighting", meaning take the input value from the pixel whose location is defined by the coordinate offset, and multiply it by the weighting value to get an output value. Add up all of the out values and then divide by 9 to average them, and write that value to the output pixel. In a color image, this step would be done individually for each of the red, green or blue colors (or CMYK, etc).

Edge Pixels

One of the things that complicates explaining convolution kernels is what happens when you're dealing with pixels at the edge of the picture, where there is no "neighbor" pixel to use as an input.

In this example we are going to sidestep this whole question to keep things simple.

Reminder, in most programming languages you start counting from 0. So the first row, first column is (0, 0), then first row, second column is (0, 1), etc. We're going to start at pixel (1, 1) and end at pixel (1, 4), and completely ignore the pixels along the first row at the top edge (that's (0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (0, 5)) and the pixels on either end of the second row (that's (1, 0) and (1, 5)).

Averaging

In the example below, I left out averaging the values. The final value would then be divided by the number of inputs, in this case 9. Some convolutions average the values, some don't. In this case, we're not. A blurring convolution usually averages.

Example Colors

In the example below, the orange pixel is the pixel we're calculating the output value for. The red pixels are inputs to the convolution kernel (along with the orange input pixel).


Step 0: Inputs

Input Image
2 2 2 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 2 2 2 2
    Kernel
0 -1 0
-1 5 -1
0 -1 0
            Output Image
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0

Step 1: calculate the output pixel at (1, 1)

Input Image
2 2 2 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 2 2 2 2

Input pixels
2 2 2
2 2 3
2 2 3
x Kernel
0 -1 0
-1 5 -1
0 -1 0
= Apply Kernel
2 x 0 2 x -1 2 x 0
2 x -1 2 x 5 3 x -1
2 x 0 2 x -1 3 x 0
= Kernel Results
0 -2 0
-2 10 -3
0 -2 0
=
1
Output Image
0 0 0 0 0 0
01 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0

Step 2: calculate the output pixel at (1, 2)

Input Image
2 2 2 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 2 2 2 2
Input Pixels
2 2 2
2 3 2
2 3 2
x Kernel
0 -1 0
-1 5 -1
0 -1 0
= Apply Kernel
2 x 0 2 x -1 2 x 0
2 x -1 3 x 5 2 x -1
2 x 0 2 x -1 3 x 0
= Kernel Results
0 -2 0
-2 15 -2
0 -2 0
=
7
Output Image
0 0 0 0 0 0
0 17 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0

Step 3: calculate the output pixel at (1, 3)

Input Image
2 2 2 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 2 2 2 2
Input Pixels
2 2 2
3 2 2
3 2 2
x Kernel
0 -1 0
-1 5 -1
0 -1 0
= Apply Kernel
2 x 0 2 x -1 2 x 0
3 x -1 2 x 5 2 x -1
3 x 0 2 x -1 2 x 0
= Kernel Results
0 -2 0
-3 10 -2
0 -2 0
=
1
Output Image
0 0 0 0 0 0
0 1 71 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0

Step 4: calculate the output pixel at (1, 4)

Input Image
2 2 2 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 3 2 2 2
2 2 2 2 2 2
Input Pixels
2 2 2
2 2 2
2 2 2
x Kernel
0 -1 0
-1 5 -1
0 -1 0
= Apply Kernel
2 x 0 2 x -1 2 x 0
2 x -1 2 x 5 2 x -1
2 x 0 2 x -1 2 x 0
= Kernel Results
0 -2 0
-2 10 -2
0 -2 0
=
2
Output Image
0 0 0 0 0 0
0 1 7 12 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0

Step 5: Rinse, repeat

Now you repeat this process for each of the remaining lines, i.e. the pixels at (2, 1), (2, 2), (2, 3), (2, 4), and after that the pixels at (3, 1), (3, 2), (3, 3), (3, 4), etc.


See original (unformatted) article

Feedback

Verification Image:
Subject:
Your Email Address:
Confirm Address:
Please Post:
Copyright: By checking the "Please Post" checkbox you agree to having your feedback posted on notablog if the administrator decides it is appropriate content, and grant compilation copyright rights to the administrator.
Message Content: