Abstract

Document Sample
Abstract Powered By Docstoc
					E->Mpeg2 Medical Video Compression Improvements
Recommendations to MPEG 89 Session, London 2009
   - Wayne Picard Fergus, Department of Electrical Engineering – University of
     Alberta, Canada wpicard@ece.ualberta.ca May 31 2009

Abstract
The worlds economy now feeds off of promoting problems that often have simple cures if
thinking beyond the envelope is used. This paper describes novel cures to cancer
detection and cure through compression based vision algorithms described for medical
robots to reduce the high cost of medical care. A cure for computer eyestrain and fatigue
using a $100 laptop terminal with a more vision friendly 3D LCD screen display
algorithm described in detail. Finally a solution to the world’s monetary crisis by pegging
the US dollar to Federal Reserve Bank real estate mortgages instead of precious metals to
promote confidence. An oatmeal breakfast made with tea cures world hunger due to
poverty and the car based US economy could be saved by promoting large cars and
energy conservation simultaneously through car pools of 4 or more occupants. A survey
of home and work postal codes among interested parties would be a good start. Termite
stomach bacteria could turn straw into liquid gold (oil) and hydrogen from seawater is
possible using silver chloride which is light sensitive like silver oxide for film in a
photographic process with a nickel catalyst (Egypt). Also helping global warming is a
1000%->10,000% lithium ion battery development improvement (China) in conjunction
with laser fusion of deuterium fuel pellets (Siberia) for electric power.


Mpeg Compression Review


   MPEG-1 Motion Compensation [2]




               B-Frame Coding based on bidirectional motion compensation




Figure 1 (From Intro to Mpeg Video Coding – S.P. Vimal, CSIS Group)

Before the difference is taken and DCT coded, the left frame above is the I-reference (or
intra-frame), with the right P frame used to generate a center B or predicted frame from
the I and P frames (P and B are inter-frames).
   MPEG-1 Motion Compensation [3]


   • Notation
      – M  interval between P-
        Frame and preceding I or P
        Frame
      – N  interval between two
        consecutive I-Frames
      – In Fig M=3 & N=9
                                           MPEG Frame Sequence




Figure 2 (From Intro to Mpeg Video Coding – S.P. Vimal, CSIS Group)

While the reference I frames are sent 76/sec in a DVD video, space can be saved in the
frame for 16x16 or 32x32 coefficients (1080/4080p) by substituting much smaller P
frames until the scene change requires a new I frame and a complete refresh, maximizing
coding efficiency.




Figure 3, 2x2,4x4,8x8 DCT Basis Functions, same for 16x16 and 32x32 DCT’s

While only an 8x8, 64 basis functions are shown, they are backward compatible with
increasing finer lines with only a log2(n) (or 4x for 32x32) speed penalty as the 16x16 or
32x32 DCT increases the pictures coefficient weighting of the above basis functions to
approximate the I intra-frame picture in finer and finer detail with increasing smaller
horizontal and vertical lines.
     For each 16x16 motion block (-MB) only 1, possibly 2 motion vectors (-MV) are
       transmitted.
     The search window for MB’s is 512 pixels in each direction with ½ pixel
       resolution.
     In P and B inter-frames, prediction differences or error images (and not the whole
       picture) are coded with the 2D DCT and transmitted with MV’s.
     DCT coefficients are weighted quantized in a matrix with course high frequency
       quantization and fine low frequency quantization favouring image quality (human
       perception optimization).
     The P and B inter-frame error signal takes into account differences (a differential
       subtraction of pixels) in predicted and actual encoder input frames as well as
       quantization errors in the encoder DPCM loop.
     Mpeg2 improvements consists of a downloadable quantization matrix and
       scalability extensions in framing that can be used for 1080/4080p improvements
       in resolution.

Zero Eye Strain Terminal for Education
The primary focus of the project is the initiation of ideas for the development of a worlds
standard computing platform consisting of a video compression decoder chip based (sub
QCIF up to commercial theatre Panavision resolution) terminals connected via a high
speed communications network to a remote server such as a home PC, parallel computer
with television and internet hosts. Under study will be the required changes to the
MPEG-2 (such as Enhanced MPEG2 compression standard, described below) and
possibly MPEG 7 XML formats (under development) to achieve this. The present text
based computing platforms would move toward an internet multi-media type interface
and would benefit from hyperlinks embedded in their material. This architecture would
allow supercomputer power to be available to even CellPhones’s and network appliances
ie a convection oven (Bluetooth). The primary emphasis on a user interface section will
be to consider the effects of the video compression delay of about .7 seconds. This
latency rules out most interactive games which rely on an immediate response but office
applications and the internet can be supported with some hardware support (primarily
buffering and local echoing of input data similar to a half duplex dumb terminal as well
as raster-op state machines for local cursor movement on the screen). This application is
targeted for the third world where some latency can be tolerated in the interests of low
cost.
The actual clipboard window interface simply involves the interception of the remote
programs video screen on the central server, its adjustable resolution E->MPEG2
compression before sending it out through normal remote user web channels. The
clipboard buffer data is also intercepted and sent out to the remote terminal on another
internet port, where it is displayed for editing by the remote keyboard before being sent
back (by a remote terminal enter key press) to the hosts buffer with a automatic paste
command to insert any changes back into the cursor point. Remote mouse controlled
cursor commands will be passed thru so that icons, commands and text can be selected
without any modifications to the running software.
The Zero eye strain is the result of an inherent adjustable resolution of the image using E-
>Mpeg2 compression. (From QCIF – cell phone quality up to 35 mm film or 4080p
resolution which as with Mpeg2 causes undetectable eye strain for long periods of
viewing).

E->Mpeg2
        Cell phone QCIF to 35 MM Panavision film resolution
In the down-converting of 4080p 35 MM motion picture frames to 480/1080p, all of the
high def info is thrown away by sub sampling algorithms and DVD player makers such
as Toshiba tries to restore this detail by clever interpolation at a later stage in the viewing
process by up converting.
What is needed is the 4080/1080p frame converted to 4x4 super macro-blocks of 32x32
pixel Discrete Cosine Transform array of 900 coefficients instead of down-converting a
super macro-block into a single 8x8 one by throwing away 15 of the 16 bits that should
be saved. Each macro-block of the super-macro-block could be sent in the I reference
frame of Mpeg2 in sequence, starting from the upper left hand corner DCT basis function
coefficients in the standard zigzag order with only a times 4 speed penalty. As the Cosine
transform function coefficients is a weighted combination of basis block patterns of
increasing horizontal and vertical pattern resolution (scaled by the coefficients in Figure
3), the excess super macro-blocks coefficients could be sent in sequential I frame blocks
depending on the screen resolution i.e. 4x4 for QCIF size then 8x8 for 480p, 16x16 for
1080p then 32x32 for 4080p (35 mm film max resolution) to give extra detail and
backward compatibility to a standard I frame 480p macro-block (all 64 coefficients used
in a enhanced 8x8 480p picture), or left as is for 1080/4080p resolution. (4080p for
Commercial Panavision theatre viewing using the same disk). No matter what size is
used, the basis functions of the DCT are backwards compatible fixed to the upper left
sides origin and the DCT coefficients scale the quantity of each basis function used to
approximate the original image.
An I frame is sometimes skipped in favour of an extra P BB P frame sequence in order to
increase Mpeg2 video times on a DVD disk and I propose to use this method with extra
resolution I frames sent (for enhanced picture resolution) using existing Mpeg2 framing
codes for 480p and new I frame codes for the rest of up to 32x32 enhanced detail
coefficients for1080 4080p compatibility on a standard MPEG2 DVD disk. A new I
frame would be sent only for scene changes which would result in greater detail for still
pixels and dirty pixels for motion areas which should not be a problem for human visual
perception. As the Walter Murch discussion below describes an action scene could be
encoded using standard 8x8 DCT compression due to the inability of the human eye to
perceive fine detail here, but the dialog scenes at 4 cuts/minute instead of 30 could be
coded at 32x32 DCT compression for full 35 mm film quality, which the human eye has
time to appreciate. A scene change every 15 seconds results in roughly 38 I frames with
nearly identical coefficients which can be compressed into a super 2.6 I frames of 900
coefficients each for 32x32 resolution or 9.3 I frames for 16x16 256 coefficient
resolution. The DCT coding would be adjusted on the fly for the scene change rate and
optimum resolution for the available I frame bandwidth.
       SUNDAY, APRIL 08, 2007
Walter Murch on the Music of the Spheres... and Editing
Walter Murch, editor of such films as Apocalypse Now and Cold Mountain, is an
omnivorous intellect, as this free-ranging discussion shows. Among the most
incredible assertions:
For instance: to make a convincing action sequence requires, on average, fourteen different
camera angles a minute. I don’t mean fourteen cuts – you can have many more than fourteen
cuts per minute – but fourteen new views. Let’s say there is a one-minute action scene with
thirty cuts, so that the average length of each is two seconds – but, of those thirty cuts, sixteen
of them will be repeats of a previous camera angle.


Now what you have to keep in mind is that the perceiving brain reacts differently to
completely new visual information than it does to something it has seen before. In the second
case, there is already a familiar template into which the information can be placed, so it can
be taken in faster and more readily.


So with fourteen “untemplated” angles a minute, a well-shot action sequence will feel thrilling
and yet still comprehensible: just on the edge of chaos, which is how action feels if you are in
the middle of it. If it’s less than fourteen, the audience will feel like something is lacking, and
they’ll disengage; if it’s more than fourteen, so much new information is being thrown at the
audience that they’ll also disengage, though for different reasons.


At the other end of the spectrum, dialogue scenes seem to need an average of four new
camera angles a minute. Less than that, and the scene will seem flat and perfunctory; more
than that, and it will be hard for the audience to concentrate on the performances and the
meaning of the dialogue: the visual style will get in the way of the verbal content and the
subtleties of the actors’ performances.


This rule of “four to fourteen” seems to hold across all kinds of films and different styles and
periods of filmmaking.
Timely Cancer Detection Through E->Mpeg2 Imaging and Robotic Surgery
Further thoughts on curing cancer without waiting for a massively parallel networks of
computers to be built (for PET scan analysis) leads me to the following differential
instead of brute force detection algorithm. Simply take a complete PET or MRI scan in
middle age when a person is healthy and cancer free. This would serve as a reference data
base stored on an set of HD/Blue-Ray DVD's to be compared to the next scan taken every
six months to a year. Any differences according to my technical background is a tumour
or cancer whose progress can be tracked and cured on demand. E->Mpeg2 would be
required for its resolution. The following algorithm for robotic machine perception (in
combination with the next 3D from 2D imaging algorithm described below) can be used
for independent autonomous surgical robots for cancer tumour removal (as detected
above) or organ transplants. A qualified surgeon could supervise the simultaneous work
of 5 to 10 robots reducing the costs of our medical system by 50->75%!

Surgical Robotic Vision
In Mpeg2, many of the viewed frames are generated from a present and a future frame
(actually it is past and present for an inherent time delay in Mpeg2 making it useless for
teleconferencing). For robotic vision, if the present frame is compared to a desired
arrangement or outcome generated by the FPGA brain described below, the Mpeg2
algorithm will generate predicted in-between images with motion vectors and an error
signal that can be used to drive a robots motors or actuators to achieve each predicted
image in real life i.e. vision control of a robot and the resulting clips from Terminator I
realized as the present Governor of California tracks his prey being replace by surgical
operations by autonomous robots in the robots viewfinder (helping instead of harming
humans)!


Medical Robot Electronic Brain

A typical run of the mill brain cell consists of a neuron with thousands of interconnecting
synapse connections some of which are hard wired over millions of years of evolution,
others like the ones in football players are programmed slowly through experience and
practice.
A FPGA gate array, thousands of times smaller and running many millions of times faster
than the human brain can duplicate this learning and experience with the right synaptic
processing element for each programmable element (replacing the logical circuit used
now) and the aforementioned connections through simple user programming of the
device.
Two configurations are possible for the decision or analysis element. Either the
processing element feeding other sequential elements as is found in the neuron
configuration in human brains and in present FPGA’s and/or a Von Neuman memory
programmable configuration similar to present Single Instruction Multiple Data (SIMD)
single cell processors fed by a large single bit memory slice in vast parallel arrays.
Simultaneous processing of 3D visual images of unlimited bit width is feasible with the
latter.
As described in Ref[3] PC-TV Integration in Mpeg2 etc, A nanotechnology switching
element such as the CAEN resonant diode’s layout footprint is 800nm versus 100,000 nm
for a CMOS transistor. These are fabricated with a fault prone chemical process requiring
fault tolerance that can be achieved by doubling or tripling the logic gates required by
requiring a majority vote decision at critical circuit decision points. The resulting
circuitry would be fail safe compared to CMOS fabrications so that this is a small price to
pay for a 500x improvement in circuit density.



3D Imaging (Human Vision Friendly)
Another use for the extra I frame bandwidth with E->Mpeg2 would be a second
stereoscopic image to allow 3D compatibility modes instead or with the extra enhanced
resolution. 3D if done as per present production motion pictures from Disney Studios is
more friendly to the human visual system (unlike 2D motion pictures and television),
especially on the more recent LCD screens which do not scatter the image as in theatre
screens, CRT and Plasma displays..
3D conversion of present 2D movies is easily achieved if the center 2D frame is
converted from raster to vector format (using an old version of Corel Draw which had a
separate utility for this.) Once vector zed, the center image could be converted into a
left/right view frame with motion vectors generated to move the blocks of the original
center 2D image around to achieve 3D picture sequences along with the prediction error.
Only the 3D motion vectors and error between the original center 2D frame and the two
stereoscopic 3D desired images would be sent with the original 2D center image! If the
quality is not acceptable then an electronic artist using edge detection and detail fill could
be used (using my electronic brain built with FPGA's as described above.)
The key idea is to multiplex the I reference frames which tend to contain the same
information for a few seconds with about 76 - I frames/sec containing the same info at
present as long as the scene does not change (Based on my HDTV Terminal paper
calculations for a 6.5 Mbps communications rate as found on present DVD's). Motion
compensation vector saturation could determine scene changes, where the super macro
block sequencing could be reset to the new I frame coefficients of the next scene.

World Problem Thinking
   Energy Crisis and Large Car Safety
      By indicating a willingness to car pool with fellow commuters with similar
      home/work zip or postal codes (during a census or survey etc) combined with a
      sliding gas price (depending on the amount used per month per car registration
      bar code at the time of purchase which would subsidize car poolers who take turns
      driving in Detroit friendly bigger American sized cars for safety) we can beat
      global warming, the energy crisis, balance of payments and GM's impending
      bankruptcy due to lack of sales compared to smaller more energy efficient cars.
      This could be done with one single step. I believe the more gas one uses the more
   cost per litre should be paid to finance this subsidy! (even for country specific
   OPEC pricing). A separate 4 passenger/car lane for buses too could be provided. I
   do not believe that public transportation (unless in New York) is the answer to our
   problems and based on personal experience at a rural IBM plant in Quebec
   Canada, car pooling with widely different personalities is the spice of life as I still
   laugh, remembering some conversations and work stories and rumours that we
   shared with wildly different personalities! By buying gas with a cars registration
   to track consumption on a sliding price scale and normal gas station camera
   surveillance, interstate transportation of stolen cars could be eliminated (which is
   where they are disposed of and the primary cause of high car insurance)!



Financial Crisis
          o Dollar backed by Real Estate Mortgage
          Alan Greenspan got it right despite being criticised for increasing the
          money supply several fold when he was the Federal Reserve chairman
          during the Clinton and later George Bush Presidencies as the money was
          lent to banks to finance one of the greatest housing booms in history in the
          form of mortgages to homeowners. While he claimed that market
          exuberance caused the recent rash in foreclosures when interest rates rose
          in response to an overheated economy and a rapid rise in oil prices, a few
          bad apples didn’t spoil the barrel as unlike Japan with a 1000% drop in
          residential (10,000% for commercial) real estate from their boom years
          peaks (an order of magnitude conservative estimate compared to other
          published data), the US has only a 30% drop in something that has real
          value and a cash flow unlike precious metals such as gold and silver which
          in spite of minor industrial uses are useless as an investment except in
          times of panic on world markets. Because of the disparity the world will
          never quite catch the Japanese sickness of the last decade as is being
          predicted.
          The Federal Reserve should swap its Mortgage money with the Treasury
          Bills held by China and the world and stop the run on the US dollar whose
          sharp long term drop will kill the world’s economy. A countries real estate
          should back their currency increases in the form of new mortgages and not
          gold or silver which have no intrinsic value and their scarcity and reliance
          led to the great depression (also blamed on the panic caused by women’s
          temperance unions or prohibition in some circles)!

           o Stock Trading Sanity
           The panic on worlds markets can be controlled by allowing stocks to be
           sold by investors only when the market is going up and bought when
           going down, otherwise there will be a limit on the dollar amount of the
           trade per day. Taking profits should have its limits to insure that investors
           are rewarded for their foresight but allowing latecomers to have some
           stock gains too without everyone being selfish at the markets detriment.
      World Poverty
         o Hydroponics
             On thoughts on photo-syntheses, I don’t see why a toilet hooked up
             to hydro-phonics based farms in the warmer sunnier regions of the
             world couldn’t produce food at rate orders of magnitude greater
             than the water wasting irrigated farms and fertilizer eco-systems
             found at present. Plastic pipe based growing channels would
             support orders of magnitude more people with present water
             supplies and available sunlight.


                o Cost of Living Reduction
Further thoughts on food concern Oatmeal as found in Quaker Oats Instant
Oatmeal packets with sulphites. The Oat plant is almost a weed as far as growing
conditions are concerned, surviving even in the highlands of Scotland (which is
above the tree line and as barren as the moon.) When two packets of Instant
Oatmeal replace a tea bag in a cup of tea (after several minutes) a healthy, system
friendly breakfast with a caffeine pick me up is achieved, especially as part of a
bran muffin with margarine for $1.39 CDN at McDonalds (value pick selection).
This tasty treat eliminates many of the complaints against oatmeal which could
replace food for humans instead of fodder and hay for cattle and other wasteful
meat farming operations in poor soil/climate regions. A cheese Pizza-Toast from a
bread slice could form the lunch and it doesn’t matter a wit what increased
populations do as long as there is food to feed them and cheap, low interest rate
mortgages. (Economics 2001)!


References
   1. Image and Video Compression etc. 2nd ED, Y.Q. Shi and H. Sun, CRC
      Press 2008, ISBN 978-0-8493-7364-0
   2. Novel Architecture for a Computer Utility Service Using HDTV as a
      Home Terminal, paper presented SPIE Photonics East, Shanghai 2001
   3. PC-TV Integration in Mpeg2 and HDTV Frameworks, Masters Thesis
      University of Alberta 2004, picardconsultants.com
   4. picardconsultants.com website

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:34
posted:3/14/2010
language:English
pages:9