Image Processing – Basic Concepts
影像處理基本概念
主講:周昌民
Contents
Images
sampling and resolution Manipulation
Filtering, geometrical transformations GIF, JPEG, JPEG2000
compression
Sampling and resolution
An input device (e.g. camera or scanner) will sample (measure) the colours in a scene at a number of finite points on a 2D rectangle.
Resolution can refer to the number of points sampled (e.g. 640 by 480) or the size of the dots (e.g. 300 dpi). The pixel-depth is related to the number of quantisation levels used for each colour, e.g. 24-bit colour
Why would we want to manipulate an image?
Image Manipulation
Deficiencies in the image
Focus blur, motion blur, red-eye, poor lighting, noise, ... Sepia, painting styles, combining images, ...
Special effects required
What methods are available?
Pixel level processing Statistical processing Group of pixel processing Geometrical transformations
Pixel level changes
Image Manipulation
Brightness – add an equal value to the R, G and B values of each pixel
Pixel level changes
Image Manipulation
Contrast – Multiply the RGB values by some value and reset overall brightness
Pixel level changes
Image Manipulation
Colour balance – Vary the R, G and B brightnesses independently
Pixel level changes
Image Manipulation
Colour manipulation
Grey scale – average RGB weighted to human perceptual system: approx. 0.4R+0.4G+0.2B
"Greying out" (e.g. disabled button) – Blend pixel values with grey: e.g. R' = (R+200)/2, G' = (G+200)/2, etc
Statistical Processing
Image Manipulation
Histogram equalisation (automatically adjust contrast)
Create a histogram H with one bin for each grey scale allowed e.g. for G grey scales
for each pixel (x, y), H[image(x,y)]++;
Hc[0]=H[0]; for each grey scale g from 1 to G-1, Hc[g] = Hc[g-1]+H[c];
Create a cummulative histogram Hc
Hc[g] is now increasing and Hc[G-1] equals the number of pixels in the image
Rescale Hc to the range of grey scales i.e.
Hc[g]*=G/(width*height); for each pixel (x, y), image(x,y) = Hc[image(x,y)];
Remap image grey scales
Image Manipulation
Histogram equalisation
example image with 90 pixels and 10 grey scales [0-9]
histogram = {0, 0, 0, 10, 20, 30, 20, 10, 0, 0} Hc = {0, 0, 0, 10, 30, 60, 80, 90, 90, 90} Hc' = {0, 0, 0, 1, 3, 6, 8, 9, 9, 9} new histogram = {0, 10, 0, 20, 0, 0, 30, 0, 20, 10} Pushes intensities apart
dark pixels get darker light pixels get lighter
Image Manipulation
Pixel group processing
e.g. Convolution New pixel values are a weighted sum of neighbouring pixel's original values
A filter specifies the weights in the sum filter(i,j)=filter_x(i)filter_y(j)
Can often use separable 1-D filters for efficiency. Different (positive and negative) filter coefficients (weights) have different effects e.g.
Values like {1, 4, 6, 4, 1}/16 will blur the pixels Values like {-1, -3, 3, 1}/8 will perform edge detection
Pixel group processing changes
Image Manipulation
Blurring
Use a low-pass filter, e.g. {1,4,6,4,1}/16 applied along rows then columns.
Pixel group processing changes
Image Manipulation
Edge enhancement using unsharp masking.
Subtracting the blurred version from the original leaves just the "edges" Add these edges back to the original to bring out the areas of contrast.
Pixel group processing changes
Image Manipulation
Edge detecting filters
Filters such as {1, 3, -3, -1} can be applied either horizontally or vertically (usually after smoothing) to locate the intensity changes (edges)
Horizontal edges (+ 127)
Vertical edges (+ 127)
Pixel level changes
Image Manipulation
Combined effects
e.g. embossed = a * original + b * (127+edges at angle q)
e.g. {b, 3b, 127+a, -3b , b}
Pixel level changes
Image Manipulation
Art effects e.g. charcoal sketch (looks like an edge detector) Also paint strokes that perform local, directional blending of colours for pointillism etc..
Geometrical transformations
Image Manipulation
Map each pixel (x,y) to some other position (x',y')
newImage(x,y) = oldImage(x',y') Uses backward coordinate mapping, can you see why? would usually sample the from oldImage at non-integer position (x', y') using bilinear interpolation.
Shearing: x' = x+ay, y' = y Rotation: x' = xsin(T)+ycos(T), y' = xcos(T)-ysin(T) Interpolate translations of points across image Free-form deformations, thin-plate splines etc
Many simple effects e.g.
More complex effects
Geometrical transformations
Image Manipulation
Rotation: x' = xsin(T)+ycos(T), y' = xcos(T)-ysin(T)
Geometrical transformations
Image Manipulation
Using interpolated user-specified translations
Image compression is required for storage and transmission
Image Compression
Lossless compression methods
No data is lost in the compression Suitable to all kinds of data e.g. text Data is thrown away in compression cycle Choose data which the human visual system is insensitive to.
Lossy compression methods
e.g. small high frequency components
File formats: GIF, JPEG, JPEG2000
Repetition supression
Image Compression
e.g. Run-length encoding
aaaaaabbbbbbccddddd... a6b6ccd5...
Statistical encoding
e.g. Huffman encoding
Use short binary strings for common characters Use longer binary strings for uncommon characters aaadaabbbaacaaabbaacaabaab 8bits each = 26*8=208bits a 0 b 10, c 110 , d 111 000111001010100011000010100011000100010 = 39bits
GIF image compression
GIF images use a mixture
Restricted colours (only 256 different colours) Run-length encoding Staistical encoding (LZW algorithm)
Therefore GIF is lossless for images of less than 256 colours
i.e. they can be reconstructed exactly
Lossy Image Compression
Human eye is fairly insensitive to certain kinds of image information
Large objects generally more important than fine detail, textures etc
Quite different to audio compression Can quantise colours more coarsely
Intensity more important than hue
JPEG Image Compression
Algorithm overview
Transform and code each 8x8 block independently Perform Discrete Cosine Transform (DCT) on each block Differentially quantise block's DCT values Run length encode in zig-zag path Statistical encode resulting string
JPEG Image Compression
The Discrete Cosine Transform (DCT)
Separates the images high and low frequency components
Related to the Fourier Transform
The DCT itself is reversible i.e. lossless
1 DCT u,v Cu Cv 2N x0
N 1 N 1 y0
cos
1 2x u 1 2y v cos pixel x,y 2N 2N
1 for u,v0 C u ,C v 2 1 otherwise
JPEG Image Compression
A visual map of the DCT
Each pixel in the DCT block is a weighted sum of the pixels in the input block
This diagram shows which weights are applied for each pixel in the DCT.
Differential Quantisation
Quantise higher frequency components with fewer levels
Human eye is relatively insensitive to high frequency components This is where data is thrown away More coarsely quantised frequency components require fewer bits to store Varying the values in the quantisation matrix allows different compression levels
Trade off quality for small file size.
The lowest frequency component is at (0,0) and highest at (N,N)
Zig-zag encoding
Use zig-zag path to encode block
Use run-length encoding on resulting string
tends to group 0 coefficients
Use Huffman encoding on run-length encoded string
Additional steps
JPEG compression
Some additional steps are performed to squeeze more compression out of the data
Colour
the image is first converted to YUV colour space and the Y (luminance) is coded with higher quality than the 2 colour channels (U and V). The colour channels are also often "down sampled" i.e. reduced by a factor of 2 along rows and/or columns
Predictive compression
The first element in the DCT block is essentially the brightness of the block. These values are coded separately using predictive compression to remove redundancy between blocks
JPEG decompression
Decode strings
Reverse Huffman and run-length encoding
zig-zag to reconstruct N by N block
"Dequantise" block values
e.g. if quantised to 4 levels and decoded to 256 levels then multiply value by 64. Very similar to DCT
Inverse DCT
JPEG 2000
Completely different algorithm to standard JPEG
Uses EZW compression
Based on wavelet theory
Doesn't involve blocking of the data like standard JPEG
Blocking artefacts are common in over-compressed JPEG images.
JPEG 2000
Wavelet transforms
The Fourier Transform (FT) converts a signal (or image) into its component frequencies
Looses spatial information e.g. doesn't tell us where the high frequencies are located in the image
Retains some spatial information (i.e. the location of the block), but looses frequency correlations between blocks
DCT – similar to an FT applied to each block
Wavelet Transform (WT) – a (smooth) trade off between frequency and spatial representation
Filter horizontally with two filters
Wavelet transforms
even pixels: low-pass filter odd pixels: high-pass wavelet Group low pass filtered components to left and high pass filtered to right
Repeat vertically
Repeat recursively on low-pass image
...
Exploiting subband correlation
Subband correlation
Although the wavelet transform decorrelates image information within a subband, there is still a high degree of correlation between subbands. Early wavelet methods struggled to find a compact way to exploit this correlation
Exploiting subband correlation
Zero-trees
EZW encodes zeros, rather than the data.
A zero at a coarse scale is a good predictor of zeros at a finer scale.
Can encode a lot of information in specifying just the root of a zero-tree. Similar to RL encoding
The chances of finding a zero tree increase with a coarser quantisation
Successive quantisation
More small values are set to zero.
Quantise the (remaining) image data using successively finer quantisation steps. This leaves a binary image to encode at each step
Values only 0 or 1 Zero trees completely specify a binary image
Successive quantisation
Encode using zero trees Successive zero-trees encode less significant bits
Successive quantisation algorithm
Successive quantisation
1. Choose an high quantisation step Q (half the max value) 2. Until finest quantisation step, repeat
2.1 Quantise wavelet coefficients using Q ie. if (w(x,y)