Prior Approaches to Foveation ROI in Compression Anup Basu
Document Sample


Prior Approaches to Foveation & ROI in Compression
Anup Basu et al.
1. The VR Transform
The variable resolution (VR) [1,4] transform used here has two parameters which affect the
resulting image: the scaling factor (s) and distortion ratio (δ), which controls the distortion at the
edges of the image with respect to the fovea. A high δ value gives a sharply defined fovea with a
poorly defined periphery; a small δ value makes the fovea and periphery closer in resolution.
Under the VR transform, a pixel with polar coordinates (r,θ) is mapped to (v, θ) where:
v = ln(r * δ + 1) * s
This transformation is easily reversed, allowing r to be defined in terms of v:
exp(v / s ) − 1
r=
δ
The value s is a scaling factor used to control the overall compression ratio. It is calculated so that
points at the maximum distance from the fovea in the original image are at the maximum possible
distance in the VR image:
v max
s=
ln(r max* δ + 1)
The images resulting from this basic VR transform are not rectangular, which makes it difficult to
use the transformed images in applications, e.g. MPEG encoding, where rectangular images are
required. The problem is magnified with high δ values and when the fovea is not located in the
center of an image. There are two approaches to solve this problem. One is called stretched VR
transform (SVR), which uses multiple scaling factors, each scaling factor dependent on the angle
θ in polar coordinates. This method does a reasonable job of maintaining the isotropic properties
of the original formulae, but is relatively complex. It is also not suitable for foveated MPEG
compression, because of the computation time for recalculating the look up table when the fovea
changes its location.
Another approach to dealing with nonrectangular compressed images is to greatly simplify the
formulae by isolating the vertical and horizontal components [“Cartesian variable resolution”
(CVR)] [1]. For a given image with the fovea located at ( x0, y 0) , for every pixel ( x, y ) in the
1
original image, we define the distance from ( x, y ) in x and y directions as dx and dy,
respectively, using the following equations: dx = x − x 0 dy = y − y 0
Then, ( x, y ) is mapped to ( x1, y1) where:
x1 = x 0 + ln(dx * δ + 1) * Sx y1 = y 0 + ln(dy * δ + 1) * Sy
In other words, here a pixel is moved from dx and dy to dvx and dvy units away from the fovea in
the x and y directions, where:
dvx = ln(dx * δ + 1) * Sx dvy = ln(dy * δ + 1) * Sy
The values of Sx and Sy are scaling factors used to control the overall compression ratio. If the
scaling factors are computed independently from the position of the fovea the compressed image
will vary in size depending on the fovea location. Thus, the look up table constructed for
mapping the original image to a foveated image does not need to be recomputed each time the
fovea changes its location. This is important for reducing the computational load when
implementing a moving fovea in decoded MPEG frames.
The CVR transform can be easily reversed at the decoding end to produce an undistorted image
(Figure 2, right) from a foveated image (Figure 2, center).
2. Foveated MPEG
In this paper, the CVR compression techniques are combined with MPEG to produce a hybrid
CVR/MPEG video compression format. Figure 1 illustrates how the input image sequence is
passed first through the CVR routines before being further compressed by the MPEG algorithm.
The process of decompressing a CVR/MPEG file consists of MPEG decoding followed by the
inverse CVR transformation. The software used to implement MPEG [3] was not altered. The
CVR parameters are included in a separate file and paired with the compressed video file. The
necessary parameters include the decompressed image dimensions as well as the compression
parameter, distortion factor δ, and the fovea location values for each image.
2
CVR Parameters Image Sequence MPEG
Parameters
CVR
Compression
MPEG
Encoder
Figure 1: CVR/MPEG video compression.
In other words, the size of an image sequence can be adjusted depending on available network
resources by CVR compression. Any MPEG encoder can be used to compress the resulting CVR
compressed sequence. The CVR parameters need to be transmitted to client site (receiver end)
before the hybrid compressed video is transmitted.
At the client site, first the transmitted CVR/MPEG compressed sequence is put through an
MPEG decoder. This produces a sequence of images that are distorted by the CVR transform. It
is necessary to apply the Inverse CVR (ICVR) transform to adjust for the distortion and
compression of the CVR transform before the final uncompressed video is displayed.
Figure 2 shows an original image (left) and a CVR compressed version of the same (middle) and
the result of decompression of the CVR compressed image (right). The value of δ for this
example was 0.5 and the compression ratio was adjusted to about 0.75. The effects of changing δ
values, fovea location, and the compression ratio which is controlled by scaling for a fixed δ are
shown in Figures 3-5. Notice that the hat in Figure 3 is much clearer than the one in Figure 2.
Also, as δ gets closer to zero the resolution drops off more slowly from the fovea while the
sharpness at the fovea is reduced. For example, the hat in Figure 5 is much clearer than the one in
Figure 4, while the nose/mouth is not as clear in Figure 5.
3
Figure 2: CVR compression/decompression with δ = 0.5 and 75% compression, fovea on nose.
Figure 3: CVR compression/decompression with δ = 0.5 and 75% compression, fovea on hat.
4
Figure 4: CVR compression/decompression with δ = 0.5 and 90% compression, fovea on nose.
Multiresolution foveation methods rely on transmitting the overall image at low resolution and a
region of interest at higher resolution. In these methods it is not possible to control degradation of
image quality through simple variations in transformation parameters. The sudden variation of
resolution at the boundaries between regions of interest and the peripheral areas may be less
acceptable than continuous variation of resolution.
Figure 5: CVR compression/decompression with δ = 0.03 and 90% compression, fovea on nose.
5
5. Experimental results and analysis
Data was collected to find the relationship between the compression ratio and the scaling and
distortion factors. A hierarchy of MPEG clips was constructed using different scaling and
distortion factors. According to the current bandwidth and the requirement of the user, one video
clip, which adapts to the current bandwidth, is selected from this hierarchy and sent to the user.
We will show the distortion values (δ’) scaled between 10 to 900 to avoid decimal numbers.
5.1 Effect of Scaling Factor on Compression Ratio
From Fig. 7, we can see that the background information was strongly affected with variations of
the scaling factor, but the fovea remained relatively unchanged.
Figure 8 shows that when the scaling factor is increased the compression ratio is also increased.
Bikes frame100 (scale = 94, δ’ = 100)
Bikes frame100 (scale = 96, δ’ = 100)
6
Bikes frame100 (scale = 98, δ’ = 100)
Figure 7: Effect of scale on a MPEG frame.
Compression Ratio (Vertical axis) vs. Scaling Factors
3.00
2.50
2.00
1.50
1.00
0.50
0.00
90.00 92.00 94.00 96.00 98.00
bike.mpg airwolf.mpg berger.mpg ski.mpg
Figure 8: Compression Ratio vs. Scaling Factors
5.2 Effect of Distortion on Compression Ratio
The distortion (δ’) only controls the effect of the fovea, it does not affect the compression ratio of
a still image. However, because a MPEG encoder is used to encode a sequence of foveated still
images, the relationship between adjacent images is important.
7
Bikes frame100 (scale = 96, δ’ = 10)
Bikes frame100 (scale = 96, δ’ = 900)
8
Figure 9: Foveated MPEG frame with different distortion factors.
C om p ression R atio (V ertical) vs D istortion
3 .0 0
2 .5 0
2 .0 0
1 .5 0
1 .0 0
0 .5 0
0 .0 0
1 0 .0 0 5 0 .0 0 1 0 0 .0 0 5 0 0 .0 0 9 0 0 .0 0
b ik e.m p g airw olf.m p g
b erger.m p g sk i.m p g
Figure 10: Compression Ratio vs. Distortion Factor
A high δ’ value will cause the resolution of the periphery to drop substantially as compared to the
fovea region, which reduces the similarity between images next to each other, and makes an
MPEG encoder produce larger P frames. A low δ’ value will cause the resolution of the periphery
to drop only slightly as one moves out from the fovea, which keeps information between adjacent
foveated images closely related. In Figure 9, we can see that with higher distortion factors the
peripheral scene outside the fovea is degraded. The motion relation in this area between two
frames cannot be retrieved, which will result in larger inter-frame sizes. This explains why in
Figure 10 a lower distortion factor δ’ will result in higher compression ratios, compared with
higher δ’-values, and also makes the image clearer in the periphery. Thus, with various distortion
factors combined with different scaling factors, we can achieve a certain compression ratio while
maintaining high perceptual quality.
9
Continuously Foveated MPEG frame100 Multi-resolution MPEG frame100
Figure 11: Foveated MPEG vs. VR MPEG
5.3 Comparison of VR MPEG and Foveated MPEG
The Foveated MPEG result are compared with traditional multi-resolution MPEG. Given the
same compression ratio, the Foveated MPEG gives more satisfactory results. Figure 11 shows
the results at the same compression ratio of 2, for multiresolution MPEG [2] vs. Foveated
MPEG. The Foveated MPEG overcomes the interlacing problem caused by the pyramid
mechanism, it offers smoother and perceptually more acceptable results. Also, adapting the size
of video sequence can be achieved more easily in the Foveated MPEG by modifications in two
parameters; for VR MPEG it is not easy to adapt the size in a systematic manner.
REFERENCES
[1] A.Basu, A.Sullivan and K.J.Wiebe, “Variable resolution teleconferencing”, IEEE Systems,
Man, and Cybernetics Conference Proceedings, 1993, pp.170-175. Extended versions in Int.
Conference on Pattern Recognition, 1994 & IEEE Transactions on SMC 1998.
[2] T.H. Reeves and J.A. Robinson “Adaptive Foveation of MPEG video”, The Fourth ACM
International Multimedia Conference Proceedings, 1996, pp.231-241.
[3] ISO/IEC 11172-2 International Standard. “Information Technology --- Coding of moving
pictures and associated audio for digital storage media at up to about 1.5 Mbits/s --- Part 2:
Video”, August, 1993.
[4] C. Weiman and G. Chaikin, “Logarithmic spiral grids for image processing”, CGIP, 197-226,
1979.
10
Related docs
Get documents about "