Convolution Kernel Example
by Steven J. Owens (unless otherwise attributed)
This is an attempt at a very detailed explanation of how convolution
kernels work. Even excellent explanations like the one at
http://setosa.io/ev/image-kernels/
leave out steps, because they're mostly
written for programmers. This article is written for non-programmers.
This article is about understanding how convolution kernels are
implemented. There's lots of good writing elsewhere about convolution
kernels in general. Please go read some of that to give yourself a
basic understanding. A good start might be wikipedia:Kernel (Image Processing)
In spite of what I just said, I'm going to give you a little
general description of convolution kernels. If you're already
familiar with this, you might want to skip down to the next section.
Convolution and convolution kernels can be applied to a variety of
types of data, but in this case we're talking about images.
Convolution kernels are used a lot in processing images, either for
the output effect or to do things like edge-detection.
An image can be considered a 2-dimensional matrix or 2-dimensional
array, i.e a set of rows and columns, where each value is a number
representing the color of the corresponding pixel. More often each
pixel gets a set of numbers, often red, green and blue (RGB), or cyan,
magenta, yellow and black (CMYK). This technically makes it a
3-dimensional array or matrix, but a lot of image processing tutorials
seem to just take the multiple color values for granted.
Note: That taking-it-for-granted may be because it's quite
common for those multiple color values to be combined into a single
number through a technique called bit-shifting, which is a bit too
complicated to get into here.
So what the heck is a convolution? A convolution is where the
program iterates through every pixel of the input image, and applies a
sort of recipe, called a kernel, to generate the color value of the
corresponding pixel in the output image. I started to describe that
here, then I remembered that's what the rest of this page is for.
I will assume you've read a bit about image convolution and
understand at least what a kernel looks like, i.e. the most common
simple kernel example is a 3 by 3 grid of values, like this:
Or this:
The kernel is a list of coordinates and weightings. When kernels
are illustrated the way I have it above, the coordinates are implied -
the center cell of the kernel is the input pixel and the surrounding
cells are applied to pixels in the same relative position. In practice,
a program usually defines a kernel something like this python example of
a list of lists:
self.offsets_list = [ [-1, -1, 0],
[-1, 0, -1],
[-1, 1, 0],
[ 0, -1, -1],
[ 0, 0, 5],
[ 0, 1, -1],
[ 1, -1, 0],
[ 1, 0, -1],
[ 1, 1, 0] ]
If I reformat that code to make the layout look more like the square
above, it becomes more obvious:
self.offsets_list = [ [-1, -1, 0], [-1, 0, -1], [-1, 1, 0],
[ 0, -1, -1], [ 0, 0, 5], [ 0, 1, -1],
[ 1, -1, 0], [ 1, 0, -1], [ 1, 1, 0] ]
The first two values are a set of (x,y) coordinate offsets, a
coordinate relative to the center pixel. So in this example the first
value is [-1, -1, 0]. The [-1, -1] part means one up and one to the
left of the current pixel. The third value is a "weighting", meaning take the input value from
the pixel whose location is defined by the coordinate offset, and
multiply it by the weighting value to get an output value. Add up all
of the out values and then divide by 9 to average them, and write that
value to the output pixel. In a color image, this step would be done
individually for each of the red, green or blue colors (or CMYK, etc).Edge Pixels
One of the things that complicates explaining convolution kernels
is what happens when you're dealing with pixels at the edge of the
picture, where there is no "neighbor" pixel to use as an input.
In this example we are going to sidestep this whole question to
keep things simple.
Reminder, in most programming languages you start counting from 0.
So the first row, first column is (0, 0), then first row, second
column is (0, 1), etc. We're going to start at pixel (1, 1) and end
at pixel (1, 4), and completely ignore the pixels along the first row
at the top edge (that's (0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (0,
5)) and the pixels on either end of the second row (that's (1, 0) and
(1, 5)).
Averaging
In the example below, I left out averaging the values. The final
value would then be divided by the number of inputs, in this case 9.
Some convolutions average the values, some don't. In this case, we're
not. A blurring convolution usually averages.
Example Colors
In the example below, the orange pixel is the pixel we're calculating
the output value for. The red pixels are inputs to the convolution
kernel (along with the orange input pixel).
Step 0: Inputs |
Input Image
2 | 2 | 2 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 2 | 2 | 2 | 2 |
|
|
|
Kernel | |
|
|
|
|
|
Output Image
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
|
Step 1: calculate the output pixel at (1, 1) |
Input Image
2 | 2 | 2 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 2 | 2 | 2 | 2 |
|
Input pixels
| x |
Kernel
| = |
Apply Kernel
2 x 0 | 2 x -1 | 2 x 0 |
2 x -1 | 2 x 5 | 3 x -1 |
2 x 0 | 2 x -1 | 3 x 0 |
| = |
Kernel Results
| = |
|
Output Image
0 | 0 | 0 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
|
Step 2: calculate the output pixel at (1, 2) |
Input Image
2 | 2 | 2 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 2 | 2 | 2 | 2 |
|
Input Pixels | x |
Kernel | = |
Apply Kernel
2 x 0 | 2 x -1 | 2 x 0 |
2 x -1 | 3 x 5 | 2 x -1 |
2 x 0 | 2 x -1 | 3 x 0 |
| = |
Kernel Results
| = |
|
Output Image
0 | 0 | 0 | 0 | 0 | 0 |
0 | 1 | 7 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
|
Step 3: calculate the output pixel at (1, 3) |
Input Image
2 | 2 | 2 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 2 | 2 | 2 | 2 |
|
Input Pixels | x |
Kernel | = |
Apply Kernel
2 x 0 | 2 x -1 | 2 x 0 |
3 x -1 | 2 x 5 | 2 x -1 |
3 x 0 | 2 x -1 | 2 x 0 |
| = |
Kernel Results
| = |
|
Output Image
0 | 0 | 0 | 0 | 0 | 0 |
0 | 1 | 7 | 1 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
|
Step 4: calculate the output pixel at (1, 4) |
Input Image
2 | 2 | 2 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 3 | 2 | 2 | 2 |
2 | 2 | 2 | 2 | 2 | 2 |
|
Input Pixels | x |
Kernel | = |
Apply Kernel
2 x 0 | 2 x -1 | 2 x 0 |
2 x -1 | 2 x 5 | 2 x -1 |
2 x 0 | 2 x -1 | 2 x 0 |
| = |
Kernel Results
| = |
|
Output Image
0 | 0 | 0 | 0 | 0 | 0 |
0 | 1 | 7 | 1 | 2 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
|
Step 5: Rinse, repeat
Now you repeat this process for each of the remaining lines, i.e. the pixels at (2, 1), (2, 2), (2, 3), (2, 4),
and after that the pixels at (3, 1), (3, 2), (3, 3), (3, 4), etc.
See original (unformatted) article
Feedback