Docstoc

Camera parameters

Document Sample
Camera parameters Powered By Docstoc
					  Understanding images:
   1- Acquiring Images
     2- Human Vision
Reference: The Image Processing
     Handbook, Fifth Edition
         John C. Russ
       1- Acquiring Images
Video cameras
• The tube-type camera has now been
  largely supplanted by the solid-state chip
  camera, the simplest form of which is the
  CCD (charge-coupled device).
                          CCD
• The camera chip contains an array of diodes that
  function as light buckets. Light that enters the
  semiconductor raises electrons from the valence to the
  conduction band, so the number of electrons is a direct
  linear measure of the light intensity.
• The diodes are formed in a way they have a perfectly
  regular pattern, with no image distortion or sensitivity to
  the presence of stray fields
•
                                CCDin
    Each CCD bucket represents one ―pixel‖
  the camera (this word has a lot of different
  meanings in different contexts, so it must be
  used with some care).
• With anywhere from a few hundred thousand
  to several million detectors on the chip, it is
  impractical to run wires directly to each one to
  read out the signal. Instead, the electrons that
  accumulate in each bucket due to incident
  photons are transferred, one line at a time, to
  a readout row.
• On a clock signal, each column of pixels shifts
  the charge by one location. This places the
  contents of the buckets into the readout row,
  and that row is then shifted, one pixel at a
  time but much more rapidly, to dump the
  electrons into an amplifier, where the
  information can be sent out as an analog
  signal from a video camera or measured
  immediately to produce a numeric output from
  a digital camera.
                spectral response
• One significant problem with the chip camera is its spectral
  response. Even if the chip is reversed and thinned so that light
  enters from the side opposite the electrodes, very little blue light
  penetrates into the semiconductor to produce electrons. On the
  other hand, infrared light penetrates easily, and these cameras have
  red and infrared (IR) sensitivity that far exceeds that of human vision,
  usually requiring the installation of a blocking filter to exclude it
  (because the IR light is not focused to the same plane as the visible
  light and would produce blurred or fogged images)




                                Chip          human
problems with video cameras using
              chips
• During the transfer and readout process,
  unless the camera is shuttered either
  mechanically or electrically, photons
  continue to produce electrons in the chip,
  This produces a large background signal
  that further degrades (reduce) dynamic
  range and may produce blurring.
• ** the Panasonic AG-DVX100A we use
  has Min Shutter Speed 1/60 sec
                  Color cameras
• Our cameras use three sensors (3ccd). A prism array splits the
  incoming light into red, green, and blue components, which are
  recorded by three different sensors whose outputs are combined
  electronically to produce a standard video image.
• This approach is more costly, since three chips are needed, but for
  video applications they need not be of particularly high resolution
  (fewer pixels).
          Camera resolution
• The signal coming from the chip is an analog
  voltage, even if the digitization takes place within
  the camera housing. This means that the voltage
  cannot vary rapidly enough to correspond to
  brightness differences for every pixel.
• Because of this interpolation, the actual image
  resolution with a single-chip camera and filter
  arrangement will, in most cases, be one-half to
  two-thirds the value that might be expected by
  the advertised number of pixels in the camera.
     Electronics and bandwidth
             limitations
• With either the single-chip or three-chip camera, the blue
  channel is typically the noisiest due to the low chip
  sensitivity to blue light and the consequent need for
  greater amplification.
• Video images are often digitized into a 640 × 480 array
  of stored pixels, but this is not the actual resolution of
  the image.
• The broadcast bandwidth limits the high frequencies
  and eliminates any rapid variations in brightness and
  color. A video image has no more than 330 actual
  elements of resolution in the horizontal direction for the
  brightness (luminance) signal, and about half that for
  the color (chrominance) information.
     Electronics and bandwidth
             limitations
• Color information is intentionally reduced in
  resolution because human vision is not very
  sensitive to blurring of color beyond boundary
  lines.
• Obtaining absolute color information from video
  cameras is impossible because of the broad
  range of wavelengths passed through each filter,
  the variation in illumination color (e.g., with slight
  voltage changes on an incandescent bulb), and
  the way the color information is encoded.
                      pixel
• the word ―pixel‖ in some contexts, refers to the
  number of light detectors in the camera (without
  regard to any color filtering, and sometimes
  including ones around the edges that do not
  contribute to the actual image but are used to
  measure dark current).
• In other contexts, it describes the number of
  recorded brightness or color values stored in the
  computer, although these may represent empty
  magnification (‫.)تكبير‬
                   pixel
• In other situations, the term is used to
  describe the displayed points of light on
  the computer monitor, even if the image is
  shown in a compressed or enlarged size
• For the most common types of image-
  acquisition devices, such as cameras, the
  pixels represent an averaging of the signal
  across a finite area of the scene or
  specimen
          Gray-scale resolution
• commercial flash analog-to-digital converters usually measure each
  voltage reading to produce an 8-bit number from 0 to 255
• A typical ―good‖ camera specification of a 49-dB signal-to-noise ratio
  implies that only 7 bits of real information are available and that the
  eighth bit is random noise
• When this stored image is subsequently displayed from memory, the
  numbers are used in a digital-to-analog converter to produce
  voltages that control the brightness of a display monitor, often a
  cathode-ray tube (CRT) or liquid crystal display (a flat-screen LCD).
• This process is comparatively noise free and high resolution, since
  computer display technology has been developed to a high level for
  other purposes. These displays typically have 256 steps of
  brightness for the red, green, and blue signals, and when equal
  values are supplied to all three, the result is perceived as a neutral
  gray value.
       Gray-scale resolution
• The human eye cannot distinguish all 256
  different levels of brightness in this type of
  display. About 20 to 30 gray levels can be
  visually distinguished on a CRT, LCD, or
  photographic print, suggesting that the
  performance of the digitizers in this regard is
  more than adequate, at least for those
  applications where the performance of the eye
  was enough to begin with and where the
  purpose of the imaging is to produce prints.
                    Color imaging
• Another important development in video is digital video (DV)
  recording. digital video writes each scan line onto the tape at an
  angle using a moving head that rotates as the tape moves past it
• The signal is encoded as a series of digital bits that offers several
  advantages. Just as CD technology replaced analog audio tapes,
  the digital video signal is not subject to loss of fidelity as images are
  transmitted or copies are made. More important, the high-frequency
  information that is discarded in analog recording because
              Color imaging
• The result is that DV images have about 500 ×
  500-pixel resolution and nearly 8 bits of contrast,
  and can be read into the computer without a
  separate digitizer board, since they are already
  digital in format.
• Some DV video cameras are now replacing the
  tape cartridge with flash (solid state) memory to
  store the data, but the same format is used.
   Digital camera limitations
• image compression that creates the most
  important problem. JPEG ( Joint
  Photographers Expert Group) and other
  forms of image compression are used.
  Edges are broken up and shifted, color
  and density values are altered, and fine
  details can be eliminated or moved.
       Color spaces - YIQ/YUV
• Video cameras signal is usually broadcasted using YIQ scheme
• Conversion from RGB (the brightness of the individual red, green,
  and blue signals, as captured by the camera and stored in the
  computer) to YIQ/YUV and to the other color encoding schemes is
  straightforward and loses no information except for possible round-
  off errors.
• Y, the ―luminance‖ signal, is just the brightness of a panchromatic
  monochrome image that would be displayed by a black-and-white
  television receiver. It combines the red, green, and blue signals in
  proportion to the human eye’s sensitivity to them.
• The I and Q (or U and V) components of the color signal are chosen
  for compatibility with the hardware used in broadcasting; the I signal
  is essentially red minus cyan, while Q is magenta minus green. The
  relationship between YIQ and RGB is shown in Table 1.1. An
  inverse conversion from the encoded YIQ signal to RGB simply
  requires inverting the matrix of values.
       Color spaces - YIQ/YUV
Inter-conversion of RGB and YIQ Color Scales

 • Y = 0.29 R + 0.587 G + 0.114 B
 • I = 0.596 R – 0.274 G – 0.32 B
 • Q = 0.211 R – 0.523 G + 0.312 B
 • R = 1.000 Y + 0.956 I + 0.621 Q
 • G = 1.000 Y – 0.272 I – 0.647 Q
 • B = 1.000 Y – 1.106 I + 1.703 Q
       Color spaces - RGB
• RGB signals space is a Cartesian cubic
  space, since the red, green, and blue
  signals are independent and can be added
  to produce any color within the cube.
• There are other encoding schemes that
  are more useful for image processing and
  are more closely related to human
  perception.
             Color spaces CIEXYZ
• The CIE (Commission Internationale
  de L’Eclairage) chromaticity diagram is
  a two-dimensional plot defining color.
  The third (perpendicular) axis is the
  luminance, which corresponds to the
  panchromatic brightness which, like
  the Y value in YUV, produces a
  monochrome (gray scale) image. The
  other two primaries, called x and y, are
  always positive (unlike the U and V
  values) and combine to define any
  color that we can see.

X= 0.412453R + 0.357580G + 0.180423B
Y= 0.212671R + 0.715160G + 0.072169B
Z= 0.019334R + 0.119193G+ 0.950227B
          Color spaces CIEXYZ
• a triangle on the CIE diagram, with its corners at the red, green, and
  blue locations of emission phosphors used in a cathode-ray tube
  (CRT), defines all of the colors that the tube can display.
• Some colors cannot be created by mixing these three phosphor
  colors, shown by the fact that they lie outside the triangle. The range
  of possible colors for any display or other output device is called the
  ―gamut‖;
• Complementary colors are shown in the CIE diagram by drawing a
  line through the central point, which corresponds to white light.
  Thus, a line from green passes through white to magenta.
• One of the drawbacks of the CIE diagram is that it does not indicate
  the variation in color that can be discerned by eye. Sometimes this
  is shown by plotting a series of ellipses on the diagram. These are
  much larger in the green area, where small changes are poorly
  perceived, than elsewhere. Variation in saturation (distance out from
  the center toward the edge) is usually more easily discerned than
  variation in hue (position around the diagram).
             Color spaces - HSV
• The CIE diagram provides a tool for color definition, but corresponds
  neither to the operation of hardware nor directly to human vision.
• An approach that does is embodied in the HSV (hue, saturation, and
  value), HSI (hue, saturation, intensity), and HLS (hue, lightness, and
  saturation) systems. These are closely related to each other and to
  the artist’s concept of tint, shade, and tone. In this system, hue is the
  color as described by wavelength, for instance the distinction
  between red and yellow. Saturation is the amount of the color that is
  present, for instance the distinction between red and pink. The third
  axis (called lightness, intensity, or value) is the amount of light, the
  distinction between a dark red and light red or between dark gray
  and light gray.
            Color spaces - HSV
• The space in which these three values is plotted can be shown as a
  circular or hexagonal cone or double cone, or sometimes as a
  cylinder. It is most useful to imagine the space as a double cone, in
  which the axis of the cone is the gray-scale progression from black
  to white, distance from the central axis is the saturation, and the
  direction is the hue.
• The HSI spaces are useful for image processing because they
  separate the color information in ways that correspond to the human
  visual system’s response, and also because the axes correspond to
  many physical characteristics of specimens
• But these spaces are mathematically awkward: not only does the
  hue value cycle through the angles from 0 to 360° and then wrap
  around, but the conical spaces mean that increasing the intensity or
  luminance can alter the saturation.
          Color spaces CIELAB
• A geometrically simpler space that is close enough to the HIS
  approach for most applications and easier to deal with
  mathematically is the spherical L*a*b* model.
• L* as usual is the gray-scale axis, or luminance, while a* and b* are
  two orthogonal axes that together define the color and saturation.
• The a* axis runs from red (+a*) to green (–a*) and the b* axis from
  yellow (+b*) to blue (–b*).
• Notice that the hues do not have the same
  angular distribution in this space as in the
  usual color wheel. These axes offer a
  practical compromise between the simplicity
  of the RGB space that corresponds to
  hardware and the more physiologically based
  spaces such as HSI, which are used in many
  color management systems,
  spectrophotometers, and colorimeters.
        Color spaces - CIELAB
• The CIELab color space is considered to be ―perceptually uniform,‖
  meaning that a just-detectable visual difference constitutes a
  constant distance in any location or direction within the space.
  Despite the apparent precision of the six-digit numerical values in
  the equations below, this is somewhat oversimplified, since the
  numbers are based on a limited number of human testers, and
  technically this space applies to the viewing of a hard-copy print
  under specific illuminating conditions, not to emissive displays such
  as a computer monitor. Nevertheless, CIELab is widely used as a
  standard space for comparing colors. It has several additional
  shortcomings, the most important of which is that simple radial lines
  do not maintain a constant hue.
     Color spaces – L* B* C*
• A much simpler spherical color model than this
  can be used as a medium for image processing
  that retains the essential character of CIELab
  and is simpler than a true HSI space because it
  uses orthogonal axes instead of representing
  color as an angle. In this spherical space the
  luminance (intensity) value L* is simply the
  average of R, G, and B, and the a* and b*
  values are coordinates along two orthogonal
  directions. A typical conversion is:
                  Storing
• Color images are typically digitized as 24-
  bit RGB, meaning that 8 bits or 256
  (linear) levels of brightness for red, green,
  and blue are stored. This is enough to
  allow display on video or computer
  screens, or for printing purposes, but,
  depending on the dynamic range of the
  data, it may not be enough to adequately
  measure small variations within the image.
                  Color correction
•   When images are acquired under different lighting conditions, the color
    values recorded are affected.
• Human vision is tolerant of considerable variation in lighting, apparently
    using the Periphery (surroundings) of the viewing field to normalize the color
    interpretation.
• Some cameras, especially video cameras, have an automatic white-point
    correction that allows recording an image from a gray card with no color,
    and using that to adjust subsequent colors. This same correction can be
    applied in software by adjusting the relative amount of red, green, and blue
    to set a region that is known to be without any color to pure gray.
In our Panasonic AG-DVX100A
White Balance Custom , Presets , Automatic
White Balance Presets Indoor , Outdoor
• This same correction can be applied in software by adjusting the relative
    amount of red, green, and blue to set a region that is known to be without
    any color to pure gray. Displaying colors on the screen and measuring them
    with the spectrophotometer produces another calibration curve, which is
    used to adjust values sent to the display.
              Color correction
• A simpler, but quite accurate approach to color
  adjustment for image acquisition is tristimulus correction.
  This requires measuring a test image with areas of pure
  red, green, and blue color, such as the calibration card
  shown in Figure 1.57. In practical situations, such a card
  can be included in the scene being digitized.
                2- Human Vision
What we see and why
• We acquire images with two forward-facing eyes capable of
  detecting light over a wavelength range of about 400 to 700 nm
  (blue to red).
• Human beings are intensely visual creatures. Most of the
  information we acquire comes through our eyes (and the related
  circuitry in our brains), rather than through touch, smell, hearing, or
  taste.
• It is not easy for humans to imagine what the world looks like to a
  bat. Indeed, even the word ―imagine‖ demonstrates the problem.
  The root word ―image‖ implies a picture, constructed inside the mind,
  reflecting our dependence on images to organize our perceptions of
  the world, and our language illustrates that bias.
• Understanding the differences in the types of information to be
  extracted, and the biases introduced by our vision systems, is a
  necessary requirement for the scientist who would trust his or her
  results.
 Does picture worth a thousand
             words
• An oft-quoted adage states that ―a picture is worth a thousand
  words‖, But the statement is wrong in many ways. First, a
  typical image, digitized and stored in a computer, occupies
  the space of several million words of text. Even though the
  resolution of modern digital cameras is far less than that of
  the human eye, which has about 160 million rods and cones.
• Second, as a means of communicating information from one
  person to another, the image is very inefficient. There is little
  reason to expect another person to derive the same
  information from a picture as we did without some supporting
  information to bring it to their attention and create a context
  for interpreting it. Research indicates that cultural differences
  strongly affect what we see in an image. And, of course,
  recognition of something in a scene that is familiar to the
  observer strongly influences where attention is focused.
               Human Recognition
•   The goal of much of human vision is recognition. To
    be recognized, an object or feature must have a
    name, some label that our consciousness can
    assign. Behind that label is a mental model of the
    object that can be expressed either in words,
    images, memories of associated events, or perhaps
    in other forms.
•   The basic technique that lies at the root of human
    vision is comparison. Nothing in images is measured
    by the eye and mind; we have no rulers and
    protractors in our heads for size, and no
    spectrophotometers for color or brightness. Features
    that can be viewed next to each other with similar
    orientation, surroundings, and lighting can be
    compared most easily.
•   A ―threshold logic unit‖ implements the process that
    can signal recognition based on the weighted sum of
    many inputs. Recognition is frequently described in
    terms of a ―grandmother cell.‖ This is a theoretical
    construct, and not a single physical cell someplace
    in the brain, but it provides a useful framework to
    describe some of the significant features of the
    recognition process.
                   Human Recognition
•   Many people have had the experience of thinking that they recognized someone
    (―grandmother‖) and then, on closer inspection, realized that it was not actually the right
    person at all. There were enough positive clues, and an absence of negative clues, to
    trigger the recognition process. Perhaps in a different situation, or with a different point of
    view, we would not have made that mistake. On the other hand, setting the threshold value
    on the weighted sum of positive inputs too high, while it would reduce false positives, would
    be inefficient, requiring too much time to collect more data.
•   The benefit of the fast and efficient procedure is the ability to perform recognitions based
    on incomplete data.
•   In some implementations of this logic, it is possible to assign a probability or a degree of
    confidence to an identification, but the utility of this value depends in high degree upon the
    quality of the underlying model. This can be represented as the weights in a neural
    network, as the rules in a fuzzy logic system, or in some other form. In human recognition,
    the list of factors in the model is not so explicit. Writing down all of the characteristics that
    help to identify grandmother (and especially the negative exclusions) is very difficult. In
    most scientific experiments, we try to enumerate the important factors, but there is always
    a background level of underlying assumptions that may or may not be shared by those who
    read the results.
                Technical specs
• In the human eye the size of the lens aperture (5 × 10−3 m)
  and the wavelength of light (about 5 × 10−7 m for green), the
  theoretical resolution should be about 10−4 radians or 1 /3 arc
  min. The lens focuses light onto the retina‫ ,شبكيت‬and it is only in
  the fovea‫ ,نقرة‬the tiny portion of the retina (covering
  approximately 2°) in which the cones are most densely
  packed, that the highest resolution is retained in the sensed
  image. One arc min is a reasonable estimate for the overall
  resolution performance of the eye, a handy number that can
  be used to estimate distances.
• Estimate the size of the smallest objects you can resolve,
  multiply by 3000, and that is how far away you are in the
  same units. For example, a car about 13 ft long can be
  resolved from an airplane at 40,000 ft, and so on.
               Technical specs
• The number of 160 million rods and cones in the retina does
  not estimate the actual resolution of images. When we ―look
  at‖ something, we rotate our head or our eyeballs in their
  sockets so that the image of that point falls onto the fovea,
  where the cone density is highest.
• The periphery ‫ محيط‬of our vision has relatively fewer cones
  (which respond to color) as compared with rods (which sense
  only brightness), and is important primarily for sensing motion
  and for judging scene illumination so that we can correct for
  color balance and shading.
• Human vision achieves something quite miraculous by rapidly
  shifting the eye to look at many different locations in a scene
  and, without any conscious effort, combining those bits and
  pieces into a single perceived image.
               Technical specs
• The eye can capture images over a very wide range of
  illumination levels, covering about nine or ten orders of
  magnitude ranging from a few dozen photons on a starlit night
  to a bright sunny day on the ski slopes.
• Some adaptation comes from changing the aperture ‫ ثقب‬with
  the iris, but most of it depends on processing in the retina.
• Adaptation to changing levels of illumination takes some
  time, up to several minutes, depending on the amount of
  change.
• In the darkest few orders of magnitude we lose color
  sensitivity and use only the rods. Since the fovea is rich in
  cones but has few rods, looking just ―next to‖ what we want to
  see (averted vision) is a good strategy in the dark. It shifts the
  image over to an area with more rods to capture the dim
  image, although with less resolution.
                Technical specs
• Rods are not very sensitive to light at the red end of the
  visible spectrum.
• The cones come in three kinds, each of which responds
  over slightly different wavelength ranges long-,
  medium-, and short-wavelength. By comparing the
  response of each type of cone, the eye characterizes
  color. Yellow is a combination of red and green,
  magenta is the relative absence of green, and so on.
• he overall sensitivity of the eye is
  greatest for green light and
  poorest for blue light. But this
  sensitivity comes at a price: it is
  within this same range of green
  wavelengths that our ability to
  distinguish one color from
  another is poorest.
                   Color for Humans
•   Like most of the things that the eye does, the perception of color is
    determined in a comparative rather than an absolute way. It is only by
    comparing something to a known color reference that we can really
    estimate color at all.
•   The usual color reference is a white object, since that (by definition) has all
    colors. If the scene we are looking at contains something known to be a
    neutral gray in color, then any variation in the color of the illumination can
    be compensated for.
•   This is not so simple as it might seem, because many objects do not reflect
    light of all colors equally and appear to change color with angle or
    illumination.
•   Because just three types of color-sensitive cone receptors are available,
    each with broad and overlapping wavelength response, there are many
    different combinations of wavelengths of light that evoke the same visual
    response, which may furthermore vary from individual to individual.
•   Color matching is a specialized topic that depends on careful calibration
     ‫ُع َر‬
    ‫.م َاي َة‬
   RGB and wavelength of light
• Generating colors using RGB
  components on different computer
  displays is difficult enough because
  of the different phosphors (and their
  aging) and the difference between
  CRTs (cathode-ray tubes), LCDs
  (liquid crystal displays), and other
  devices.
• As a useful way to represent the
  variation of colors with wavelength,    p. 92
  a model such as that shown in
  Figure can be used (Bruton 2005).
                      Brightness
• Being able to detect brightness or color is not the same thing
  as being able to measure it or detect small variations in either
  brightness or color.
• While human vision functions over some nine to ten orders of
  magnitude, we cannot view a single image that covers such a
  wide range, nor can we detect variations of 1/109
• One of the common methods for improving the visibility of
  local detail is computer enhancement that reduces the global
  (long range) variation in brightness while increasing the local
  contrast. This is typically done by comparing a pixel to its local
  neighborhood. If the pixel is slightly brighter than its
  neighbors, it is made brighter still, and vice versa.
• Local and abrupt (sudden) changes in brightness (or color)
  are the most readily noticed details in images.
   Colors we can see and colors
        that are produced
• Overall, the eye can detect only about 20 to 30 shades of gray in an
  image, and in many cases fewer will produce a visually satisfactory
  result.
• 30 shades of brightness in each of the red, green, and blue cones
  would suggest that 303 = 27,000 colors might be distinguished, but
  that is not so. Sensitivity to color changes at the ends of the
  spectrum is much better than in the middle (in other words, greens
  are hard to distinguish from each other).
• Only about a thousand different colors can be distinguished.
• Since computer displays offer 256 shades of brightness for the R,
  G, and B phosphors, or 2563 = 16 million colors, we might expect
  that they could produce any color we can see…this is not the case.
  Both computer displays and printed images suffer from limitations in
  gamut — the total range of colors that can be produced — as
  compared with what we can see. This is another reason that the
  ―typical image‖ may not actually be representative of the object or
  class of objects.
      Acuity (seeing sharpness)
•   Acuity (spatial resolution) is normally specified in units of cycles per degree.
•   Human vision achieves its highest spatial resolution in just a small area at
                                                 ‫ن ْر‬
    the center of the field of view (the fovea ‫ ,) ُق َة‬where the density of light-
    sensing cones is highest. At a 50-cm viewing distance, details with a width
    of 1 mm represent an angle of slightly more than 1/10 of a degree.
•   The upper limit (finest detail) visible with the human eye is about 50 cycles
    per degree, which would correspond to a grating in which the brightness
    varied from minimum to maximum about five times over that same 1 mm. At
    that fine spacing, 1 00% contrast would be needed, in other words black
    lines and white spaces. This is where the common specification arises that
    the finest lines distinguishable without optical aid are about 100 μm.
•   Brightness variations about 1 -mm wide represent a spatial frequency of
    about 9 cycles per degree, and under ideal viewing conditions can be
    resolved with a contrast of a few percent, although this assumes the
    absence of any noise in the image and a very bright image. (Acuity drops
    significantly in dark images or in ones with superimposed random
    variations, and is much poorer at detecting color differences than brightness
    variations.)
•   At a normal viewing distance of about 50 cm, 1 mm on the image is about
    the optimum size for detecting the presence of detail. On a typical computer
    monitor, that corresponds to about 4 pixels.
      Acuity (seeing sharpness)
•   Because the eye does not ―measure‖ brightness, but simply makes
    comparisons, it is very difficult to distinguish brightness differences unless
    the regions are immediately adjacent.
•   (a) shows four gray squares, two of which are 5% darker than the others.
    Because they are separated, the ability to compare them is limited. Even if
    the regions are adjacent, as in (b), if the change from one region to another
    is gradual it cannot be detected. Only when the step is abrupt, as in Figure
    2.10c, can the eye easily determine which regions are different.
       What the eye tells the brain
•   Human vision is more than rods and cones in the retina. An enormous amount of
    processing takes place, some of it immediately in the retina and some in the visual cortex
    at the rear of the brain. The light-sensing rods and cones are at the back, and light must
    pass through several layers of processing cells to reach them. Only about 10% of the
    photons that enter the human eye are detected.
•   There are about 100 times as many sensors as there are neural connections in the optic
    nerve, implying a considerable processing to extract the meaningful information.
•   Within the retina, outputs from the individual light sensors are combined and compared by
    layers of neurons. Comparing the output from one sensor or region with that from the
    surrounding sensors, so that excitation of the center is tested against the inhibition from the
    surroundings, is a basic step that enables the retina to ignore regions that are uniform or
    only gradually varying in brightness, and to efficiently detect locations where a change in
    brightness occurs.
•   Detection of the location of brightness changes (feature edges) creates a kind of mental
    sketch of the scene, which is dominated by the presence of lines, edges, corners, and
    other simple structures.
•   The extraction of changes in brightness or color with position or with time explains a great
    deal about what we see in scenes, and about what we miss.
•   Changes that occur gradually with position, such as shading of light on a wall, is ignored.
    We have to exert a really conscious effort to notice such shading. But even small changes
    in brightness of a few percent are visible when they occur abruptly, producing a definite
    edge. Similarly, any part of a scene that is static over time tends to be ignored, but when
    something moves it attracts our attention.
           Spatial comparisons
• The basic idea behind center-surround or excitation-inhibition ( ‫التحفيز‬
  ‫ )و التثبيط‬logic is comparing the signals from a central region (which
  can be a single detector or progressively larger scales by averaging
  detectors together) to the output from a surrounding annular region
• Similar excitation-inhibition comparisons are made for color values.
  Boundaries between blocks of color are detected and emphasized,
  while the absolute differences between the blocks are minimized.
• Whether described in terms of brightness, hue, and saturation or the
  artist’s tint, shade, and tone, or various mathematical spaces, three
  parameters are needed.
• Simplistically, we can think of the sum of all three, or perhaps a
  weighted sum that reflects the different sensitivities of the red,
  green, and blue detectors, as being the brightness, while ratios of
  one to another are interpreted as hue, with the ratio of the greatest
  to the least corresponding to the saturation.
• Many of the questions about exactly how color information is
  processed in the human visual system have not yet been answered.
                      Example
• The ―light gray‖ walls on
  the shadowed side of the
  building in the figure are
  actually darker than the
  ―black‖ shingles on the
  sunlit side, but our
  perception understands
  that the walls are light
  gray, the trim white, and
  the shingles black on all
  sides of the building.
   Local to global hierarchies
• Interpretation of elements in a scene relies heavily on
  grouping them together to form features and objects.
• Our natural world does not consist of lines and points;
  rather, it consists of objects that are represented by
  connected lines and points. Our vision systems perform
  this grouping naturally, and it is usually difficult to
  deconstruct a scene or structure into its component
  parts.
• learning how to look for everything and nothing is a hard
  skill to master. Most of us see only the things we expect
  to see, or at least the familiar things for which we have
  previously acquired labels and models.
                   It’s about time
• Staring at a fixed pattern or color target for a brief time will
  chemically deplete the rods or cones. Then looking at a blank page
  will produce an image of the negative or inverse of the original
• Motion sensing is obviously important. It alerts us to changes in our
  environment that may represent threats or opportunities. And the
  ability to extrapolate motion lies at the heart of tracking capabilities
  that enable us to perform actions such as catching a thrown ball.
• Humans are very good at tracking moving objects and predicting
  their path, taking into account air resistance and gravity (for
  instance, an outfielder catching a fly ball). Most animals do not have
  this ability. What we typically describe in humans as ―eye–hand‖
  coordination involves an enormous amount of subtle computation
  about what is happening in the visual scene.
• human eye has some 1 50+ million light sensors, and for each of
  them some 25,000 to 50,000 processing neurons are at work
  extracting lots of information that evolution has decided we can use
  to better survive
                           3D
• Humans use stereo vision by rotating the eyes in their
  sockets to bring the same feature to the fovea in each
  eye, It is the feedback from the muscles to the brain that
  tell us whether one feature is closer than another,
  depending on whether the eyes had to rotate in or out as
  we directed our attention from the first feature to the
  second.
• Stereoscopy only works for things that are fairly close. At
  distances beyond about 100 ft, the angular differences
  become too small to notice
• Relative size also plays an important role in judging
  distance
  Seeing what isn’t there, and vice
               versa
• One problem that plagues eyewitness testimony and
  identification is that we tend to see (i.e., pick out from a
  scene) things that are familiar (i.e., already have mental
  labels).
• Fail to see or recognize things that are unfamiliar,
  misjudge things for which we do not have an appropriate
  set of stored clues, and truly believe that we have seen
  characteristics in one image that have been seen in
  other instances that are remembered as being similar.
  That’s what it means to be human, and those are the
  tendencies that a careful scientific observer must combat
  in analyzing images.
             Image compression
• Compression discards what people do not easily see in images.
  Human vision is sensitive to abrupt local changes in brightness,
  which correspond to edges. These are kept, although they may shift
  slightly in location and in magnitude. On the other hand, absolute
  brightness is not visually perceived, so it is not preserved. Since
  changes in brightness of less than a few percentage points are
  practically invisible, and even larger variations cannot be seen if
  they occur gradually over a distance in the image, compression can
  eliminate such details with minor effect on visual interpretation.
• Color information is reduced in resolution because boundaries are
  primarily defined by changes in brightness. The first step in most
  compression schemes is to reduce the amount of color information,
  either by averaging it over several neighboring pixels or by reducing
  the number of colors used in the image, or both.
• It is also possible to reduce the size of video or movie files by finding
  regions in the image that do not change, or do not change rapidly or
  very much. In some cases, the background behind a moving object
  can be simplified, even blurred, while the foreground feature can be
  compressed because we do not expect to see fine detail on a
  moving object.
                         conclusion
• Human vision is an extremely powerful tool, evolved over millennia
  (thousands of years) to extract from scenes those details that are
  important to our survival as a species. The processing of visual
  information combines a hierarchy of highly parallel neural circuits to
  detect and correlate specific types of detail within images. Many
  shortcuts that work ―most of the time‖ are used to speed recognition.
  Studying the failure of these tricks, revealed in various visual illusions,
  aids in understanding of the underlying processes.
• An awareness of the failures and biases of human vision is also
  important to the scientist who relies on visual examination of images to
  acquire or interpret data.
• Visual inspection is a comparative, not a quantitative process, and it is
  easily biased by the presence of other information in the image.
  Computer image-analysis methods are available that overcome most of
  these specific problems, but they provide answers that are only as good
  as the questions that are asked. In most cases, if the scientist does not
  visually perceive the features or trends in the raw images, their
  subsequent measurement will not be undertaken