Docstoc

Introduction to computer vision

Document Sample
Introduction to computer vision Powered By Docstoc
					COMPUTER RECOGNITION SYSTEM FOR DETECTING AND TRACKING
              OBJECTS IN 3D ENVIRONMENT




                      Sravan Kumar Reddy Mothe
     B.Tech., Jawaharlal Nehru Technological University, India, 2007




                               PROJECT




                   Submitted in partial satisfaction of
                   the requirements for the degree of



                        MASTER OF SCIENCE


                                    in


                    MECHANICAL ENGINEERING


                                    at




        CALIFORNIA STATE UNIVERSITY, SACRAMENTO


                                SPRING
                                 2011
   COMPUTER RECOGNITION SYSTEM FOR DETECTING AND TRACKING
                 OBJECTS IN 3D ENVIRONMENT




                                 A Project


                                    by


                         Sravan Kumar Reddy Mothe




Approved by:


________________________________, Committee Chair
Yong S.Suh, Ph. D.

_________________________
Date




                                     ii
Student: Sravan Kumar Reddy Mothe




       I certify that this student has met the requirements for format contained in the

University format manual, and that this project is suitable for shelving in the Library and

credit is to be awarded for the Project.




________________________, Graduate Coordinator              _____________________
Kenneth Sprott, Ph. D.                                       Date




Department of Mechanical Engineering




                                            iii
                                         Abstract


                                            of


       COMPUTER RECOGNITION SYSTEM FOR DETECTING AND TRACKING
                     OBJECTS IN 3D ENVIRONMENT




                                            by


                               Sravan Kumar Reddy Mothe


In recent times computer recognition has become most powerful tool for many robotic

applications. Applications like inspection, surveillance, industrial automation and gaming

needs 3D positional data in order to interact with external environment and this can be

achieved by computer recognition. Computer recognition can be done by using many

different tools which are OpenCV, Matlab, OpenGL, etc. OpenCV has an optimized

library with 500 useful functions for detecting, tracking, image transformation, 3D vision,

etc.


         The scope of this project is to get 3D position of an object from two sensors. The

sensors are two cameras which are needed to be calibrated before they see the 3D object.

Calibration is the process in which the output image from two cameras is vertically

aligned, which means all pixel points are vertically aligned. After calibration these two

images from camera1 and camera 2 are inputted into openCV 3D function.                This

application is majorly designed for Ping Pong game shooter. The coding part of this
                                         iv
project includes writing code in C++ using OpenCV libraries for calibrating cameras,

and recognition for tracking 3D object. The output of the coding part is to get 3D position

of player’s bat from cameras in camera coordinate system. This 3D positional data can be

inputted into the shooter so that shooter’s joints can move automatically to shoot the ball

exactly to the player’s bat for training purposes. This 3D vision technology can be used

in many other applications like industrial robots, unmanned vehicles, intelligent

surveillance, medical devices, gaming, etc.




_______________________________, Committee Chair
Yong S.Suh, Ph. D.



______________________
Date




                                              v
                                ACKNOWLEDGMENTS


While working on this project, some people helped me to reach where I am today and I

would like to thank all for their support and patience.


Firstly, I would like to thank Professor Dr. Yong S.Suh for giving me an opportunity to

do this project. His continuous support was the main thing that helped me to develop

immense interest on the project that led to do this project. Dr.Yong S.Suh helped me by

providing many sources of information that needed from beginning of the project till the

end. He was always there to meet, talk and answer the questions that came across during

the project.


Special thanks to my advisor Dr Kenneth Sprott for helping me to complete the writing of

this dissertation, without his encouragement and constant guidance I could not have

finished this report.


Finally, I would also like to thank all my family, friends and Mechanical engineering

department who helped me to complete this project work successfully. Without any of

the above-mentioned people the project would not have come out the way it did. Thank

you all.




                                             vi
                                 TABLE OF CONTENTS


                                                                        Page


Acknowledgments                                                          vi

List of Figures                                                          ix

Software Specifications                                                  x

Chapter


1. INTRODUCTION AND BACKGROUND                                           1

   1.1    Introduction to computer vision                                2

   1.2    Applications of computer vision                                2

   1.3    Tools available for computer vision                            8

          1.3.1   OpenCV (Open Source Computer Vision)                   8

          1.3.2   VXL (Vision-something-Libraries)                       9

          1.3.3   BLEPO                                                  10

          1.3.4   MinGPU (A minimum GPU library for computer vision)     10

   1.4    Description of the project                                     11

2. EXPERIMENTAL SETUP AND OPENCV LIBRARY LOADING
   PROCEDURES                                                            13

   2.1    Steps to calibrate stereo cameras and obtain 3D data           13

          2.1.1   Step1 ( Loading OpenCV library)                        14

          2.1.2   Step2 ( Preparing chessboard and seting up cameras)    17

          2.1.3   Step3 (C++ program to capture calibration data)        19
                                           vii
         2.1.4   Step4 (Code for calibration and for 3D object tracking)   25

3. IMPORTANT OPENCV FUNCTIONS AND CODE FOR 3D VISION                       26

   3.1 Important functions used for this project                           26

         3.1.1   cvFindChessboardCorners                                   26

         3.1.2   cvDrawChessboardCorners                                   28

         3.1.3   cvStereoCalibrate                                         28

         3.1.4   cvComputeCorrespondEpilines                               32

         3.1.5   cvInitUndistortRectifyMap                                 33

         3.1.6   cvStereoBMState                                           36

         3.1.7   cvReprojectImageTo3D                                      39

   3.2 Pseudo code for stereo calibration and 3D vision                    40

4. RESULTS OF CALIBRATION AND 3D VISION                                    73

   4.1 Project Application Information                                     73

   4.2 Coordinates of a colored object in front of cameras                 75

   4.3   Results are graphically shown                                     76

5. CONCLUSION                                                              81

6. FUTURE WORK                                                             83

Bibliography                                                               84




                                          viii
                                    LIST OF FIGURES

                                                                           Page

1. Figure 1.1: Interaction of various fields of study defining interests
               in computer vision                                           2

2. Figure 2.1: Setting up OpenCV path for Environmental Variables           14

3. Figure 2.2: Creating of new Win32 Console Application                    16

4. Figure 2.3: Loading libraries to Visual Studio 2010                      16

5. Figure 2.4: Loading Additional Dependencies to the project               17

6. Figure 2.5: Chessboard used for calibration of stereo cameras            18

7. Figure 2.6: Two USB cameras are fixed to a solid board in front of
               3D object                                                    19

8. Figure 2.7: Calibration data from camera 1 and camera 2                  24

9. Figure 2.8: Text file from that calibration code reads from              25

10. Figure 4.1: Camera coordinate system                                    75

11. Figure 4.2: Detected corners on the image taken from left camera        76

12. Figure 4.3: Detected corners on the image taken from right camera       77

13. Figure 4.4: Rectified image pair                                        78

14. Figure 4.5: Displays average error to its sub-pixel accuracy            78

15. Figure 4.6: Coordinates of an object with respective to left camera     79

16. Figure 4.7: Disparity between left and right image                      80




                                             ix
                              SOFTWARE SPECIFICATIONS


1. The initial requirement to run the program is to have C++ compiler, preferable

   compiler is Visual Studio 2010.

2. Download OpenCV libraries and load them to Visual Studio.

3. Create new project in Visual Studio by opening the folder that is on the disc and open

   the source file and run it. Note: Two USB cameras should be connected before

   running the source file.

4. OpenCV loading procedures are clearly illustrated on the report.

5. Operating System: Windows 7 or Windows Vista (preferred).

6. System requirements: 4 GB RAM and 2.53 GHz processor speed (preferred).




                                            x
                                                                                         1

                                          Chapter 1

                       INTRODUCTION AND BACKGROUND


1.1)   Introduction to computer vision:


Vision is our most powerful sense. It provides us with a remarkable amount of

information about our surroundings and enables us to interact intelligently with the

environment. Through it we learn the position and identities of objects and relation

between them. Vision is also most complicated sense. The knowledge that we have

accumulated about how our biological vision system operate is still fragmentary and

confined mostly to processing stages directly concerned with signals from the sensors.

Today one can find vision systems successfully deal with a variable environment as parts

of machine.


       Computer vision (image understanding) is a technology that studies how to

reconstruct, and understand a 3D scene from its 2D images in terms of the properties of

the structures present in the scene. Computer vision is concerned with modeling and

replicating human vision using computer software and hardware. It combines knowledge

in computer science, electrical engineering, mathematics, physiology, biology, and

cognitive science. It needs knowledge from all these fields in order to understand and

simulate the operation of the human vision system. As a scientific discipline, computer

vision is concerned with the theory behind artificial systems that extract information from
                                                                                             2

images. The image data can take many forms, such as video sequences, views from

multiple cameras, or multi-dimensional data.


   Biological studies
                                               Artificial
                                             Intelligence
                                              /Cognitive
   Computer Science                             Studies
    and Engineering



                                                                     Computer
                                                                      vision
      Electronics
      Engineering




      Mechanical                               Robotics
      Engineering

 Figure 1.1: Interaction of various fields of study defining interests in computer vision.


1.2) Applications of computer vision:


Much of artificial intelligence deals with autonomous planning or deliberation for robot

systems to navigate through an environment. A detailed understanding of these

environments is required to navigate through them. Information about the environment

could be provided by a computer vision system, acting as a vision sensor and providing

high-level information about the environment and the robot. Potential application areas
                                                                                            3

for vision-driven automated systems are many. Each brings its own particular problems

which must be resolved by system designers if successful operation is to be achieved but,

generally speaking, applications can be categorized according to the processing

requirements they impose. To illustrate I briefly describe a number of such areas of

application.


Examples are categorized under their principal application area. [Ref: 8]


           Three-dimensional modeling:


       1.    Creates 3D models from a set of images. Objects are imaged on a calibration.

       2.    Photo Modeler software allows creation of texture-mapped 3-D models from a

             small number of photographs. Uses some manual user input.

       3.    Uses projected light to create a full 3-D textured model of the human face or

             body in sub-second times.


           Traffic and road management:


       1. Created the Auto scope system that uses roadside video cameras for real-time

             traffic management. Over 100,000 cameras are in use.

       2. Imaging and scanning solutions for road network surveying.


           Web Applications:


       1. Image retrieval based on face recognition.
                                                                                           4

    2. Develops a system for image search on the web. Uses GPUs for increased

       performance.

    3. Image retrieval based on content.

    4. Virtual makeover website, TAAZ.com uses computer vision methods to allow

       users to try on makeup, hair styles, sunglasses, and jewelry.


    Security and Biometrics:


    1. Systems for intelligent video surveillance.

    2. Systems for biometric face recognition.

    3. Fingerprint recognition systems with a novel sensor.

    4. Systems for behavior recognition in real-time video surveillance.

    5. Fingerprint recognition systems.

    6. Smart video surveillance systems.

    7. Security systems using novel sensors, such as registered visible and thermal

       infrared images and use of polarized lighting.

    8. Security systems for license plate recognition, surveillance, and access control.

    9. Image processing and computer vision for image forensics.

    10. Automated monitoring systems, including face and object recognition.

    11. Detection and identification of computer users.

    12. Detection and monitoring of people in video streams.

    13. Face verification and other biometrics for passport control.
                                                                                         5

    People tracking:


    1. Tracking people within stores for sales, marketing, and security.

    2. Systems for counting and tracking pedestrians using overhead cameras.

    3. Tracking people in stores to improve marketing and service.


    Object Recognition for Mobile Devices:


    1. Visual search for smart phones, photo management, and other applications.

    2. Image recognition and product search for camera phones.


    Industrial automation and inspection:


    1. Industrial robots with vision for part placement and inspection.

    2. Vision systems for the plastics industry.

    3. Inspection systems for optical media, sealants, displays, and other industries.

    4. Develops 3D scanners for sawmills and other applications.

    5. Vision systems for industrial inspection tasks, including food processing,

       glassware, medical devices, and the steel industry.

    6. Develops 3D vision systems using laser sensors for inspection of wood

       products, roads, automotive manufacturing, and other areas.

    7. Industrial mobile robots that use vision for mapping and navigation.

    8. Trainable computer vision systems for inspection and automation.

    9. Laser-based inspection and templating systems.
                                                                                       6

    10. Vision systems for surface inspection and sports vision applications.

    11. Systems to inspect output from high-speed printing presses.

    12. Vision systems for textile inspection and other applications.

    13. Systems for inspection and process control in semiconductor manufacturing.

    14. Automated inspection systems for printed circuit boards and flat panel displays.

    15. Creates 3D laser scanning systems for automotive and other applications.

    16. Has developed a system for accurate scanning of 3D objects for the automotive

       and other industries. The system uses a 4-camera head with projection of

       textured illumination to enable accurate stereo matching.


    Games and Gesture Recognition:


    1. Time-of-flight range sensors and software for gesture recognition. Acquired by

       Microsoft in 2010.

    2. Tracks human gestures for playing games or interacting with computers.

    3. Real-time projected infrared depth sensor and software for gesture recognition.

       Developed the sensing system in Microsoft's Xbox Kinect.

    4. Interactive advertising for projected displays that tracks human gestures.

    5. Uses computer vision to track the hand and body motions of players to control

       the Sony Play station.
                                                                                      7

    Film and Video: Sports analysis:


    1. Uses multiple cameras to provide precise tracking in table tennis, cricket, and

       other sports for refereeing and commentary.

    2. Creates photorealistic 3D visualization of sporting events for sports

       broadcasting and analysis.

    3. Systems for tracking sports action to provide enhanced broadcasts.

    4. Develops Piero system for sports analysis and augmentation.

    5. Systems for tracking sports players and the ball in real time, using some human

       assistance. (My project can be used for this application.)

    6. Vision systems to provide real-time graphics augmentation for sports

       broadcasts.

    7. Provides 3D tracking of points on the human face or other surfaces for character

       animation. Uses invisible phosphorescent makeup to provide a random texture

       for stereo matching.

    8. Systems for creating virtual television sets, sports analysis, and other

       applications of real-time augmented reality.

    9. Video content management and delivery, including object identification and

       tracking.

    10. Systems for tracking objects in video or film and solving for 3D motion to allow

       for precise augmentation with 3D computer graphics.
                                                                                       8

1.3) Tools available for computer vision:


1.3.1) OpenCV (Open Source Computer Vision):


OpenCV is a library of programming functions for real time computer vision

applications. The library is written in C and C++ and runs under different platforms

namely Linux, Windows and Mac OS X. OpenCV was designed for strong focus on real-

time applications. Further automatic optimization on Intel architectures can be achieved

by Intel’s Integrated Performance Primitives (IPP), which consists of low-level optimized

routines in many different algorithmic areas. One of OpenCV’s goals is to provide a

flexible computer vision infrastructure that helps us build fairly sophisticated vision

applications quickly. The OpenCV Library has over 500 functions that can be used for

many areas in vision, including factory product inspection, medical imaging,

surveillance, user interface, camera calibration, stereo vision, and robotics.


OpenCV core main libraries:


 1. “CVAUX” for Experimental/Beta.

 2. “CXCORE” for Linear Algebra and Raw matrix support, etc.

 3. “HIGHGUI” for Media/Window Handling and Read/write AVIs, window displays,

 etc.
                                                                                      9

OpenCV’s latest version is available from http://SourceForge.net/projects/opencvlibrary.

We can be able to download openCV library and build it into Visual Studio 2010 and

steps to build library will be illustrated later in this report.



1.3.2) VXL (Vision-something-Libraries):


VXL is a collection of C++ libraries designed for computer vision research and

implementation. It was created from TargetJr with the aim of making a light, fast and

consistent system. VXL is written in ANSI/ISO C++ and is designed to be portable over

many platforms.


Core libraries in VXL are:


  1. VNL (Numeric): Numerical containers and algorithms like matrices, vectors,

  decompositions, optimizers.


  2. VIL (Imaging): Loading, saving and manipulating images in many common file

  formats, including very large images.


  3. VGL (Geometry): Geometry for points, curves and other elementary objects in 1, 2

  or 3 dimensions.


  4. VSL (Streaming I/O), VBL (Basic templates), VUL (Utilities): Miscellaneous

  platform-independent functionality.
                                                                                              10

As well as the core libraries, there are libraries covering numerical algorithms, image

processing, co-ordinate systems, camera geometry, stereo, video manipulation, and

structure recovery from motion, probability modeling, GUI design, classification, robust

estimation, feature tracking, topology, structure manipulation, 3D imaging, etc. Each

core library is lightweight, and can be used without reference to the other core libraries.


1.3.3) BLEPO:


Blepo is an open-source C/C++ library to facilitate computer vision research and

education. Blepo is designed to be easy to use, efficient, and extensive.


   1. It enable researchers to focus on algorithm development rather than low-level

       details such as memory management, reading/writing files, capturing images, and

       visualization, without sacrificing efficiency;

   2. It enable educators and students to learn image manipulation in a C++

       environment that is easy to use; and

   3. It captures a repository of the more mature, well-established algorithms to enable

       their use by others both within and without the community to avoid having to

       reinvent the wheel.


1.3.4) MinGPU: A minimum GPU library for computer vision:


In computer vision it is becoming popular to implement algorithms in whole or in part on

a Graphics Processing Unit (GPU), due to the superior speed GPUs can offer compared
                                                                                       11

to CPUs. GPU has implemented two well known computer vision algorithms – Lukas-

Kanade optical flow and optimized normalized cross-correlation as well as homography

transformation between two 3D views. Minimum GPU is a library which contains, as

minimal as possible, all of the necessary functions to convert an existing CPU code to

GPU. MinGPU provides simple interfaces which can be used to load a 2D array into the

GPU and perform operations on it. All GPU and OpenGL related code is encapsulated in

the library; therefore users of this library need not to know any details on how GPU

works. Because GPU programming is currently not that simple for anyone outside the

Computer Graphics community, this library can facilitate an introduction to GPU world

to researchers who have never used the GPU before. The library works with both nVidia

and ATI families of graphics cards and is configurable.


1.4) Description of the Project:


The goal of the project is design a sensor that detects and tracks objects in 3D

environment, for this project we specifically designed for the Ping Pong game shooter.

The sensors used for this project are stereo cameras and these cameras make the shooter

to know where the player’s bat is at. This project is basically about writing code using

OpenCV library for cameras to see the player’s bat. The output from the code is exact 3D

location of the bat (X, Y & Z coordinates) from the sensor.


The output from the stereo cameras is just a video stream and this video stream is further

analyzed by the computer program written in C++. The program basically consists of
                                                                                       12

different function namely Calibration, Rectification, Disparity, Background Subtraction

and Reprojection3D. The Calibration basically does is to remove the distortion in the

video streams which is usually caused by improper alignment and quality of the lenses.

The Rectification function rectifies both images so that all pixels from the both images

are vertically aligned. Disparity function calculates the differences in x-coordinates on

the image planes of the same feature viewed in the left and right cameras. Background

subtraction function subtracts the background and makes the program to see only the

colored object (bat) for tracking. Finally the function Reprojection3D takes the disparity

as an input and outputs the coordinates of an object from the sensors. These coordinates

can be inputted into the shooter so the shooter can understand the bat position and

implement the advance training modes for the player to practice the game.
                                                                                          13

                                         Chapter 2


  EXPERIMENTAL SETUP AND OPENCV LIBRARY LOADING PROCEDURES


2.1) Steps to calibrate stereo cameras and obtain 3D data:


   1. Loading OpenCV libraries into Visual Studio 2010 and check with the sample

        program whether it is working well or not. If step 1 is okay move to step 2.

   2.   Make a good sized chessboard and tape it to a solid piece of wood, plastic (make

        sure the image is not bent or calibration will not work). Make sure we focus our

        cameras so that the text on the chessboard is readable, make sure we don't modify

        the focus and distance between the cameras after calibration or during calibration

        since these two are important factors in calibration. If we modify position or

        focus of cameras we need to calibrate ones again.

   3. Compile a C++ code that can capture chessboard pictures from left and right

        cameras at same time. The code takes 16 sets of calibration pictures. The waiting

        time for every set of pictures is 10 seconds so that we can move the chessboard

        into different position in front of stereo cameras. All constant values are

        preprogrammed in the code and can be adjustable according to the requirements.

   4. After getting the calibration data from step3 we can execute the stereo calibration

        code that outputs the 3D data of a colored object. It actually takes calibration data

        from step3 as an input. The program basically does is to calibrate stereo cameras

        and locate 3D object. X, Y and Z are output values of colored object from stereo
                                                                                      14

       cameras. Repeat step3 until we get good results. The average error function in the

       code should be minimum, usually less than 0.3. Lesser the error, better the output

       (3D data). Each step is clearly illustrated bellow:


2.1.1) Step1 (Loading OpenCV libraries):


1. Download OpenCV source files from http://sourceforge.net/projects/opencvlibrary/.

   The downloaded file should contain folders namely Bin, Src, Include and Lib.

   Downloaded openCV2.1.0 folder should be in ProgramFiles folder of our computer.

2. Go to Start menu->Computer and right click on it-> Click on properties->click on

   Advance System Settings->click on Environmental Variables->Add new path in User

   Variable ABC box as shown in the figure 2.1-1.




Figure 2.1: Setting up OpenCV path for Environmental Variables.
                                                                                     15

3. To get installation done we need to follow these steps shown below in figures and

   text.

   Go to Start -> All Programs -> Microsoft Visual Studio 2010 (Express) -> Microsoft

   Visual (Studio / C++ 2010 Express).

     File -> New -> Project

     Name: 'OpenCV_Helloworld', with selecting ‘Win32Console Application' click

      'OK' and click 'Finish'


     Go to project -> OpenCV_Helloworld Properties...Configuration Properties ->

      VC++ Directories

     Go to Executable Directories and click add: ‘C:\ProgramFiles\OpenCV2.1\bin;’

     Go to Include Directories and click add: ' C:\Program

      Files\OpenCV2.1\include\opencv;'

     Go to Library Directories and click add: ' C:\Program Files\OpenCV2.1\lib;'

     Go to Source Directories and click add and add five source files

      ‘C:\ProgramFiles\OpenCV2.1\src\cv;C:\ProgramFiles\OpenCV2.1\src\cvaux;C:\P

      rogramFiles\OpenCV2.1\src\cvaux\vs;C:\ProgramFiles\OpenCV2.1\src\cxcore;C:

      \ProgramFiles\OpenCV2.1\src\highgui;C:\ProgramFiles\OpenCV2.1\src\ml;C:\Pr

      ogramFiles\OpenCV2.1\src1;’

     Go to Linker -> Input -> Additional Dependencies and click add:

      ‘cvaux210d.lib;cv210d.lib;cxcore210d.lib;highgui210d.lib;ml210d.lib;’
                                                        16




  Figure 2.2: Creating new Win32 Console Application.




Figure 2.3: Loading libraries to Visual Studio 2010.
                                                                                       17




Figure 2.4: Loading Additional Dependencies to the project.


2.1.2) Step2 ( Preparing chessboard and seting up cameras):


Prepare a chessboard of 90 by 70 centimeters sized for cameras which are 40 centimeters

apart. The chess borard should have atleast 9 boxes in vertical and 6 boxes in horizontal

and not less than these values. The reason because cameras wouldn’t able to recognise

corners if the box size is too small. So we should make sure the size of the box should be

larger and number of boxes should be minimum for above mentioned size. Figure 2.5

displays the chessboard used for this project.


       Cameras are      approximately placed 40 centimeters apart. After calibration

cameras are should not be moved, the reason is because functions in code makes the

rotational and transulational relation between two cameras. Distance and focus are the
                                                                                     18

major factors in the relation. Larger the distance between the cameras and larger the

distance we can track an object from the camera. Cameras are fixed to a solid board with

some consatnt distance so that we don’t have to calibrate every time when we need to

track or get 3D data of an object. Figure 2.1-6 shows the camera setup.




Figure 2.5: Chessboard used for calibration of stereo cameras.
                                                                                  19




Figure 2.6: Two USB cameras are fixed to a solid board in front of 3D object.


2.1.3) Step3 (C++ program to capture calibration data):


The below program that is used to check whether the OpenCV libraries are working well

and to get 16 sets of calibration data.


#include "cv.h"

#include "cxmisc.h"

#include "highgui.h"

#include <vector>

#include <string>

#include <algorithm>
                                                                                    20

#include <stdio.h>

#include <ctype.h>

int main(int argc, char **argv)

 {

  //Initializing image arrays and capture arrays for frames, gray frames from two

 cameras.

  IplImage* frames[2];

  IplImage* framesGray[2];

  CvCapture* captures[2];

  for(int lr=0;lr<2;lr++)

     {

     captures[lr] = 0;

     frames[lr]=0;

     framesGray[lr] = 0;

     }

 int r =cvWaitKey(1000);//setting wait time to 1000 milliseconds before capturing

 chessboard images.

 captures[0] = cvCaptureFromCAM(0); //capture from first cam

 captures[1] = cvCaptureFromCAM(1); //capture from second cam

 int count=0;
                                                                                     21

while(1)// loop is used to get every captured frame and if count reaches to the number

of pictures we need for the calibration, loop exits.

 {

 frames[0] = cvQueryFrame(captures[0]);//getting frame

 frames[1] = cvQueryFrame(captures[1]);

 cvShowImage("RawImage1",frames[0]);//showing raw image

 cvShowImage("RawImage2",frames[1]);

 framesGray[0] = cvCreateImage(cvGetSize(frames[0]),8,1);//creating image for gray

 image conversion

 framesGray[1] = cvCreateImage(cvGetSize(frames[1]),8,1);

 cvCvtColor(frames[0], framesGray[0], CV_BGR2GRAY); //converting BGR image

 to gray scale image

 cvCvtColor(frames[1], framesGray[1], CV_BGR2GRAY);

 cvShowImage("frame1",framesGray[0]);//show converted gray image

 cvShowImage("frame2",framesGray[1]);

  printf("count=%d\n",count);

 if(count==0)

     {

     cvSaveImage("calibleft1.jpg",framesGray[0]); //saving gray image to drive

     cvSaveImage("calibright1.jpg",framesGray[1]);

     count=count++;
                                                                                        22

    //loop exits and count is incremented and count is 1. In the next loop it goes to next

    for count==1 and same process repeats for 6 image pairs

    }

else if(count==1)

{

cvSaveImage("calibleft2.jpg",framesGray[0]);

cvSaveImage("calibright2.jpg",framesGray[1]);

count=count++;

}

else if(count==2)

{

cvSaveImage("calibleft3.jpg",framesGray[0]);

cvSaveImage("calibright3.jpg",framesGray[1]);

count=count++;

}

else if(count==3)

{

cvSaveImage("calibleft4.jpg",framesGray[0]);

cvSaveImage("calibright4.jpg",framesGray[1]);

count=count++;

}
                                                                                      23

else if(count==4)

{

cvSaveImage("calibleft5.jpg",framesGray[0]);

cvSaveImage("calibright5.jpg",framesGray[1]);

count=count++;

}

else if(count==5)

{

cvSaveImage("calibleft6.jpg",framesGray[0]);

cvSaveImage("calibright6.jpg",framesGray[1]);

count=count++;

}// if we need more image sets then we need to add more else if functions.

else if(count==6)

    {

     cvSaveImage("calibleft7.jpg",framesGray[0]);

     cvSaveImage("calibright7.jpg",framesGray[1]);

    count=count++;

    }

int c= cvWaitKey(5000)//for every picture it waits for 5000 milliseconds so that we

can change the position of chessboard;

}}
                                                                                     24




Figure 2.7: Calibration data from camera 1 and camera 2.


All images are automatically saved into

‘C:SampleProgram\SampleProgram\SampleProgram\’. If the project name is

‘Sampleprogram’ and C++ file name is also ‘SampleProgram’ then all image files are
                                                                                            25

should be copied to ‘C: SampleProgram\SampleProgram\Debug\StereoData’ so that

calibration code can read a text file which has the address of these images.


 The text file is shown below in a picture.




Figure 2.8: Text file that the calibration code reads from.


2.1.4) Step4 (Code for calibration and for 3D object tracking):


Stereo calibration code and functions are illustrated clearly in the next section of this

report. The outputs and execution is showed in video named StereoCalibOP.avi. The

source code, calibration data (images), and video is in the compact disc attached at the

end of the report.
                                                                                       26

                                          Chapter 3


         IMPORTANT OPENCV FUNCTIONS AND CODE FOR 3D VISION


3.1) Important functions used for this project:


3.1.1) cvFindChessboardCorners: This is openCV inbuilt function used to find out

chessboard inner corners. Calibration of stereo cameras is done by finding corners of

chessboard to its sub-pixel accuracy. Detecting the chessboard corners by the cameras

can be achieved by the below function.

int cvFindChessboardCorners (


const void* image, //image – Source chessboard view; it must be an 8-bit grayscale or

color image//


CvSize patternSize, //patternSize – The number of inner corners per chessboard row and

column ( patternSize = cvSize(columns, rows) )//


CvPoint2D32f* corners, // corners – The output array of corners detected//


int* cornerCount=NULL, // cornerCount – The output corner counter. If it is not NULL,

it stores the number of corners found//


int flags=CV_CALIB_CB_ADAPTIVE_THRESH)


//flags –Various operation flags, can be 0 or a combination of the following values:
                                                                                            27

CV_CALIB_CB_ADAPTIVE_THRESH - Adaptive threshold is to convert the image to

black and white, rather than a fixed threshold level (computed from the average image

brightness).


CV_CALIB_CB_NORMALIZE_IMAGE - Normalize the image gamma with function

cv EqualizeHist() before applying fixed or adaptive threshold value.


CV_CALIB_CB_FILTER_QUADS - Additional criteria (like contour area, perimeter,

square-like shape) to filter out false quads that are extracted at the contour retrieval stage.


The function attempts to determine whether the input image is a view of the chessboard

pattern and locate the internal chessboard corners. The function returns a non-zero value

if all of the corners have been found and they have been placed in a certain order (row by

row, left to right in every row), otherwise, if the function fails to find all the corners or

reorder them, it returns 0. For example, a regular chessboard has 9x7 squares has 8x6

internal corners, that is, points, where the black squares touch each other. The coordinates

detected are approximate, and to determine their position more accurately, we may use

the function cvFindCornerSubPix().


Note: the function requires some white space around the board to make the detection

more robust in various environment (otherwise if there is no border and the background is

dark, the outer black squares could not be segmented properly and so the square grouping

and ordering algorithm will fail).//
                                                                                      28

3.1.2) cvDrawChessboardCorners: The function draws the individual chessboard corners

detected as red circles if the board was not found or as colored corners connected with

lines if the board was found.


cvDrawChessboardCorners(


CvArr* image, // image – The destination image; it must be an 8-bit color image//


CvSize patternSize, //Same as in function cvFindChessboardCorners()//


CvPoint2D32f* corners, //Same as in function cvFindChessboardCorners()//


int count, // The number of corners//


int patternWasFound, //Indicates whether the complete board was found                 or

not       . One may just pass the return value to cvFindChessboardCorners() function.//


3.1.3) cvStereoCalibrate: Stereo calibration is the process of computing the geometrical

relationship between the two cameras in space. Stereo calibration depends on finding the

rotation matrix R and translation vector T between the two cameras.


cvStereoCalibrate(


constCvMat* objectPoints, // objectPoints, is an N-by-3 matrix containing the physical

coordinates of each of the K points on each of the M images of the 3D object such that N
                                                                                           29

= K × M. When using chessboards as the 3D object, these points are located in the

coordinate frame attached to the object (and usually choosing the Z-coordinate of the

points on the chessboard plane to be 0), but any known 3D points may be used. //

const CvMat* imagePoints1,

const CvMat* imagePoints2, \\ imagePoints1 and imagePoints are N-by-2 matrices

containing the left and right pixel coordinates (respectively) of all of the object points. If

you performed calibration using a chessboard for the two cameras, then imagePoints1

and imagePoints2 are just the respective returned values for the multiple calls to

cvFindChessboardCorners() for the left and right camera views. //


const CvMat*pointCounts, // Integer 1xM or Mx1 vector (where M is the number of

calibration pattern views) containing the number of points in each particular view. Sum

of vector elements must match the size of object Points and image Points.


CvMat* cameraMatrix1,


CvMat* cameraMatrix2,


The input/output first and second camera matrix:


Where fx and fy are focal lengths of camera, and Cx and Cy are the center of coordinates

on the projection screen.

CvMat* distCoeffs1,
                                                                                         30

CvMat* distCoeffs2,


//The input/output lens distortion coefficients for the first and second camera 5x1 or 1x5

floating-point vectors                                             .


CvSize imageSize, // Size of the image, used only to initialize intrinsic camera matrix.//


CvMat* R, //The output rotation matrix between the 1st and the 2nd cameras’ coordinate

systems.//


CvMat*T, //The output translation vector between the cameras’ coordinate systems.//


CvMat* E=0, //The optional output essential matrix.//


CvMat* F=0, //The optional output fundamental matrix.//


CvTermCriteriaterm_crit

=cvTermCriteria( CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 30, 1e-6), // The

termination criteria for the iterative optimization algorithm.//


Int flags=CV_CALIB_FIX_INTRINSIC) //Different flags, may be 0 or combination of

the following values:


CV_CALIB_FIX_INTRINSIC - If it is set, camera Matrix as well as distcoeffs are fixed,

so that only R, T, E and F are estimated.
                                                                                             31

CV_CALIB_USE_INTRINSIC_GUESS - The flag allows the function to optimize some

or all of the intrinsic parameters, depending on the other flags, but the initial values are

provided by us.


CV_CALIB_FIX_PRINCIPAL_POINT - The principal points are fixed during the

optimization.



CV_CALIB_FIX_FOCAL_LENGTH -                     and       are fixed.



CV_CALIB_FIX_ASPECT_RATIO -                    is optimized, but the ratio            is fixed.



CV_CALIB_SAME_FOCAL_LENGTH - Enforces                                  and


CV_CALIB_ZERO_TANGENT_DIST - Tangential distortion coefficients for each

camera are set to zeros and fixed there.


CV_CALIB_FIX_K1,           CV_CALIB_FIX_K2,           CV_CALIB_FIX_K3 -         Fixes        the

corresponding radial distortion coefficient (the coefficient must be passed to the function)


The function estimates transformation between the 2 cameras making a stereo pair. For

stereo camera the relative position and orientation of the 2 cameras are fixed. The poses

will relate to each other, i.e. given (    ,   ) it should be possible to compute (      ,   )-

we only need to know the position and orientation of the 2nd camera relative to the 1st

camera. That’s what the function does. It computes (      ,   ) such that:
                                                                                        32

                                    Optionally, it computes the essential matrix E:




Where     are components of the translation vector T:

And also the function can compute the fundamental matrix F:




Besides the stereo-related information, the function can also perform full calibration of

each of the 2 cameras. However, because of the high dimensionality of the parameter

space and noise in the input data the function can diverge from the correct solution.


3.1.4) cvComputeCorrespondEpilines:

The OpenCV function cvComputeCorrespondEpilines() computes, for a list of points in

one image, the epipolar lines in the other image. For any given point in one image, there

is a different corresponding epipolar line in the other image. Each computed line is

encoded in the form of a vector of three points (a, b, c) such that the epipolar line is

defined by the equation: ax + by + c = 0


cvComputeCorrespondEpilines(


const CvMat* points, // The input points. 2xN, Nx2, 3xN or Nx3 array (where N number

of points). Multi-channel 1xN or Nx1 array is also acceptable//
                                                                                                33

int whichImage, // Index of the image (1 or 2) that contains the points.//


const     CvMat* F,     //   The      fundamental     matrix     that         can   be    estimated

using FindFundamentalMat or StereoRectify.//


CvMat* lines // The output epilines, a 3xN or Nx3 array. Each line                               is

encoded by 3 numbers (a, b, c). //)


For points in one image of a stereo pair, computes the corresponding epilines in the other

image. From the fundamental matrix definition, line              in the second image for the

point      in the first image (i.e. whenwhichImage=1) is computed as:                          and,

vice versa, when which Image=2,           is computed from              as:                   Line

coefficients are defined up to a scale. They are normalized, such that


3.1.5) cvInitUndistortRectifyMap:

The function cvInitUndistortRectifyMap() outputs mapx and mapy. These maps indicate

from where we should interpolate source pixels for each pixel of the destination image;

the     maps   can    then   be    plugged     directly   into   cvRemap().         The    function

cvInitUndistortRectifyMap() is called separately for the left and the right cameras so that

we can obtain their distinct mapx and mapy remapping parameters. The function

cvRemap() may then be called, using the left and then the right maps each time we have

new left and right stereo images to rectify.
                                                                                                34




cvInitUndistortRectifyMap(


const CvMat* cameraMatrix,


const CvMat* distCoeffs,


const CvMat* R, // The optional rectification transformation in object space (3x3

matrix). R1 or R2, computed by StereoRectify can be passed here. If the matrix is NULL,

the identity transformation is assumed.//


const CvMat*newCameraMatrix,


CvArr* map1, // The first output map of type CV_32FC1 or CV_16SC2 - the second

variant is more efficient.//


CvArr* map2, // The second output map of type CV_32FC1 or CV_16UC1 - the second

variant is more efficient.//)


The function computes the joint un-distortion and rectification transformation and

represents the result in the form of maps for Remap. The undistorted image will look like

the   original,    as    if     it   was     captured   with        a    camera      with   camera

matrix =newCameraMatrix and           zero     distortion.     In       the   case     of    stereo

camera newCameraMatrix is normally set to P1 orP2 computed by StereoRectify. Also,
                                                                                         35

this new camera will be oriented differently in the coordinate space, according to R. That,

for example, helps to align two heads of a stereo camera so that the epipolar lines on both

images become horizontal and have the same y- coordinate (in the case of horizontally

aligned stereo camera).


The function actually builds the maps for the inverse mapping algorithm that is used

by Remap. That is, for each pixel         in the destination (corrected and rectified) image

the function computes the corresponding coordinates in the source image (i.e. in the

original image from camera). The process is the following:




where                        are the distortion coefficients.


In the case of a stereo camera this function is called twice, once for each camera head,

after StereoRectify, which in its turn is called after StereoCalibrate. But if the stereo

camera was not calibrated, it is still possible to compute the rectification transformations

directly from the fundamental matrix using StereoRectifyUncalibrated. For each camera
                                                                                           36

the function computes homograph H as the rectification transformation in pixel domain,

not a rotation matrix R in 3D space. The R can be computed from H as


                                                     where the camera matrix can be chosen

arbitrarily.

3.1.6) cvStereoBMState* cvCreateStereoBMState(


int preset=CV_STEREO_BM_BASIC, // Any of the parameters can be overridden after

creating the structure.//


int numberOfDisparities=0// The number of disparities. If the parameter is 0, it is taken

from the preset; otherwise the supplied value overrides the one from preset.//)


#define CV_STEREO_BM_NARROW


#define CV_STEREO_BM_FISH_EYE 1


#define CV_STEREO_BM_BASIC 0


The function creates the stereo correspondence structure and initializes it. It is possible to

override       any   of     the   parameters    at    any    time    between      the    calls

to FindStereoCorrespondenceBM.


typedef struct CvStereoBMState {

//pre filters (normalize input images):
                                                                                          37

int preFilterType;

int preFilterSize;//for 5x5 up to 21x21

int preFilterCap;

//correspondence using Sum of Absolute Difference (SAD):

int SADWindowSize; // Could be 5x5,7x7, ..., 21x21

int minDisparity;

int numberOfDisparities;//Number of pixels to search

//post filters (knock out bad matches):

int textureThreshold; //minimum allowed

float uniquenessRatio;// Filter out if:

// [ match_val - min_match <uniqRatio*min_match ] over the corr window area

int speckleWindowSize;//Disparity variation window

int speckleRange;//Acceptable range of variation in window

// temporary buffers

CvMat* preFilteredImg0;

CvMat* preFilteredImg1;

CvMat* slidingSumBuf;

} CvStereoBMState;



The state structure is allocated and returned by the function cvCreateStereoBMState().

This function takes the parameter preset, which can be set to any one of the following.
                                                                                       38

CV_STEREO_BM_BASIC

Sets all parameters to their default values

CV_STEREO_BM_FISH_EYE

Sets parameters for dealing with wide-angle lenses

CV_STEREO_BM_NARROW

Sets parameters for stereo cameras with narrow field of view

This function also takes the optional parameter numberOfDisparities; if nonzero, it

overrides the default value from the preset. Here is the specification:

The state structure, CvStereoBMState{}, is released by calling

void cvReleaseBMState(CvStereoBMState **BMState);

Any stereo correspondence parameters can be adjusted at any time between cvFindStereo

CorrespondenceBM calls by directly assigning new values of the state structure fields.

The correspondence function will take care of allocating/reallocating the internal buffers

as needed.

Finally, cvFindStereoCorrespondenceBM() takes in rectified image pairs and outputs a

disparity map given its state structure:

void cvFindStereoCorrespondenceBM(

const CvArr *leftImage,


const CvArr *rightImage,


CvArr *disparityResult,
                                                                                       39

CvStereoBMState *BMState

);


3.1.7) cvReprojectImageTo3D (


const CvArr* disparity, // The input single-channel 16-bit signed or 32-bit floating-point

disparity image.//


CvArr* _3dImage, // The output 3-channel floating-point image of the same size

as disparity. Each element of_3dImage(x, y, z) will contain the 3D coordinates of the

point (x, y), computed from the disparity map.//


const CvMat* Q, // The          perspective transformation matrix that can be obtained

with StereoRectify.//


int handleMissingValues=0 //If true, when the pixels with the minimal disparity (that

corresponds to the outliers; will be transformed to 3D points with some very large Z

value (currently set to 10000). The function transforms 1-channel disparity map to 3-

channel image representing a 3D surface. That is, for each pixel (x,y) and the

corresponding disparity d=disparity(x, y) it computes:




The matrix Q can be arbitrary        matrix computed by StereoRectify.
                                                                                           40

3.2) Pseudo code for stereo calibration and 3D vision:

Source Code for this project is available in the compact disc attached at the end of the

report.

Initialize libraries which OpenCV has for computer vision and load ‘C++’ libraries too.

Below are the libraries

 #include <cv.h

 #include <cxmisc.h>

 #include <highgui.h>

 #include <cvaux.h>

 #include <vector>

 #include <string>

 #include <algorithm>

 #include <stdio.h>

 #include <ctype.h>

using namespace std;

All the elements of the standard C++ library are declared within what is called a

namespace, the namespace with the name ‘std’. So in order to access its functionality we

declare with this expression that we will be using these entities. This line is very frequent

in C++ programs that use the standard library

Given a list of chessboard images, the number of corners (nx, ny) on the chessboards, and

a flag: useCalibrated for calibrated (0) or (1) for unCalibrated.
                                                                                             41

(1: use cvStereoCalibrate(), 2: compute fundamental matrix separately) stereo. Calibrate

the cameras and display the rectified results along with the computed disparity images.



static void

Creating a function named SteroCalib which takes four parameters from the main

program, the first one is name of the text file that has the list of image files for

calibration; second and third inputs are number of chessboard inner corners in X direction

and Y direction respectively; fourth input is an integer to use another function named

‘useUncailbrated’ in this ‘SteroCalib’ function; if int=0 don’t use function or int=1 use

the function.

StereoCalib(const char* imageList, int nx, int ny, int useUncalibrated)

{

Initializing constant values and declaring integers, vectors and float elements that are

used by this program.

int DisplayCorners = 1; Making ‘DisplayCorners’ as true used in if statement ahead.

int showUndistorted = 1; Making ‘showUndistorted’ as true used in if statement ahead.

bool isVerticalStereo = false

const int maxScale = 1;

const float squareSize = 1.f; //Set this to your actual square size.

‘rt’ is the file name which is first input to this function and we open this file by using this

function fopen FILE* f = fopen(imageList, "rt");
                                                                                             42

Declaration of i, j, lr, nframes, n = nx*ny, N = 0 as integers;

Declaration of arrays of strings, CvPoints in 3D or 2D, chars.

vector<string> imageNames[2];

vector<CvPoint3D32f> objectPoints;

vector<CvPoint2D32f> points[2];

vector<int> npoints;

vector<uchar> active[2];

vector<CvPoint2D32f> temp(n);

CvSize is function to initialize image size. Initially declaring as {0, 0} and storing the

values to ‘imageSize’

CvSize imageSize = {0, 0};

Initializing arrays, vectors.

Creating multi dimensional arrays for creating CvMat

double M1[3][3], M2[3][3], D1[5], D2[5], Q[4][4];

double R[3][3], T[3], E[3][3], F[3][3];

Creating matrices using function CvMat for saving camera matrix, distortion, rotational,

translation, fundamental and projection matrices between camera 1 and 2.

CvMat _M1 = cvMat(3, 3, CV_64F, M1 ); CameraMatrix for camera 1

CvMat _M2 = cvMat(3, 3, CV_64F, M2 ); CameraMatrix for camera 2

CvMat _D1 = cvMat(1, 5, CV_64F, D1 ); Distortion coefficients for camera 1

CvMat _D2 = cvMat(1, 5, CV_64F, D2 ); Distortion coefficients for camera 2
                                                                                               43

CvMat _R = cvMat(3, 3, CV_64F, R ); Rotational matrix between cam 1 and 2

CvMat _T = cvMat(3, 1, CV_64F, T );Translational matrix between cam 1 and 2

CvMat _E = cvMat(3, 3, CV_64F, E ); Essential matrix between cam 1 and 2

CvMat _F = cvMat(3, 3, CV_64F, F ); Fundamental matrix between 1 and 2

CvMat _Q = cvMat(4, 4, CV_64F, Q); Projection matrix between 1 and 2



If ( displayCorners is true ) then

cvNamedWindow( "corners", 1 ); this function creates a new window named corners

which we can see the color image as function says ‘1’ at end.

// READ IN THE LIST OF CHESSBOARDS:

if( !f ) if the file is not opened it will exit the loop below and we did in the earlier part of

the program.

{

fprintf(stderr, "can’t open file %s\n", imageList ); displays error message if the program

couldn’t be able to load the image list.

return;

}

for(i=0;;i++)

{

An array that can store up to 1024 elements of type chars.

char buf[1024];
                                                                                               44

int count = 0, result=0;

lr = i % 2;

Crating a vector array to load multiple points [lr] the remainder of ‘i’ in every iteration.

vector<CvPoint2D32f>& pts = points[lr];

if( !fgets( buf, sizeof(buf)-3, f )) if not some file name with some chars then exit the loop.

break;

size_t len = strlen(buf);

while( len > 0 && isspace(buf[len-1]))

buf[--len] = '\0'; this loop reads all file names and rejects the files that begin with ‘#’.

if( buf[0] == '#')

continue;

IplImage* img = cvLoadImage( buf, 0 ); loads the file with openCV function

cvLoadImage.

if( not a image )

break;

imageSize = cvGetSize(img); getting the size of image by openCV function.

imageNames[lr].push_back(buf); imagename is 0,1,2,3……………….and moves next

step for finding corners of chessboard image from cam 1 and cam 2.

//FIND CHESSBOARDS AND CORNERS THEREIN:

for( int s = 1; s <= maxScale; s++ )

{
                                                                                     45

IplImage* timg = img; Initializing loaded image as some temp image.

if( s > 1 ) loop which is grater then one as it is incremented above.

{

Creating temp image with same properties of image multiplied with‘s’.

timg = cvCreateImage(cvSize(img->width*s,img->height*s),

img->depth, img->nChannels );

resizing img with tmig using openCV function cvResize

cvResize( img, timg, CV_INTER_CUBIC );

}

Doing all operations on temp image instead of using main loaded image results from

function cvFindChessboardCorners loaded into ‘result’.

result = cvFindChessboardCorners( timg, cvSize(nx, ny),

&temp[0], &count,

CV_CALIB_CB_ADAPTIVE_THRESH |

CV_CALIB_CB_NORMALIZE_IMAGE);

The above function takes temp image and number of corners ‘nx’ and ‘ny’.

if( timg is not equal to img )

then release temp image data as below.

cvReleaseImage( &timg );

if( result || s == maxScale )

for( j = 0; j < count; j++ )
                                                                                         46

{

Below functions divided by‘s’ and saves pixel values in temp[j].

temp[j].x /= s;

temp[j].y /= s;

}

if( result )

break;

}

Below loop displays the corners on the image.

if( displayCorners )

{

printf("%s\n", buf); displays file which the program is finding corners from.

Creating another temp image called cimg to display corners.

IplImage* cimg = cvCreateImage( imageSize, 8, 3 );

cvCvtColor( img to cimg with CV_GRAY2BGR );

cvDrawChessboardCorners( cimg, cvSize(nx, ny), &temp[0],

count, result ); Draw chessboard corners with colored circles around the found corners

with connecting lines between points.

cvShowImage( "corners", cimg ); displays image on the screen with corners on

chessboard.

cvReleaseImage( &cimg );
                                                                           47

if( cvWaitKey(0) == 27 ) //Allow ESC to quit

exit(-1);

}

else

putchar('.');

N = pts.size();

pts.resize(N + n, cvPoint2D32f(0,0));

active[lr].push_back((uchar)result);

//assert( result != 0 );

if( result )

{

Calibration will suffer sub-pixel interpolation.

cvFindCornerSubPix( img, &temp[0], count, cvSize(11, 11), cvSize(-1,-1),

cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 30, 0.01));

copy( temp.begin(), temp.end(), pts.begin() + N );

}

cvReleaseImage( &img );

}

fclose(f);

printf("\n");

// HARVEST CHESSBOARD 3D OBJECT POINT LIST:
                                                                                      48

nframes = active[0].size();//Number of good chessboads found

objectPoints.resize(nframes*n);

for( i = 0; i < ny; i++ )

for( j = 0; j < nx; j++ )

objectPoints[i*nx + j] = cvPoint3D32f(i*squareSize, j*squareSize, 0);

for( i = 1; i < nframes; i++ )

copy( objectPoints.begin(), objectPoints.begin() + n, objectPoints.begin() + i*n );

npoints.resize(nframes,n);

N = nframes*n;

CvMat _objectPoints = cvMat(1, N, CV_32FC3, &objectPoints[0] );

CvMat _imagePoints1 = cvMat(1, N, CV_32FC2, &points[0][0] );

CvMat _imagePoints2 = cvMat(1, N, CV_32FC2, &points[1][0] );

CvMat _npoints = cvMat(1, npoints.size(), CV_32S, &npoints[0] );

cvSetIdentity(&_M1);

cvSetIdentity(&_M2);

cvZero(&_D1);

cvZero(&_D2);



// CALIBRATE THE STEREO CAMERAS

printf("Running stereo calibration ...");

fflush(stdout);
                                                                                      49

The object points and image points from the above section are needed for stereo camera

calibration.

cvStereoCalibrate( &_objectPoints, &_imagePoints1, &_imagePoints2, &_npoints,

&_M1, &_D1, &_M2, &_D2, imageSize, &_R, &_T, &_E, &_F,

cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 100, 1e-5),

CV_CALIB_FIX_ASPECT_RATIO +

CV_CALIB_ZERO_TANGENT_DIST +

CV_CALIB_SAME_FOCAL_LENGTH);

printf(" done\n");

CALIBRATION QUALITY CHECK

Because the output fundamental matrix implicitly includes all the output information, we

can check the quality of calibration using the epipolar geometry constraint:

m2^t*F*m1=0

vector<CvPoint3D32f> lines[2]; initializing two matrices for epipolar line.

points[0].resize(N); points resize for every image for two cameras.

points[1].resize(N);

_imagePoints1 = cvMat(1, N, CV_32FC2, &points[0][0] ); pixel points of first camera

_imagePoints2 = cvMat(1, N, CV_32FC2, &points[1][0] ); pixel points of second camera

lines[0].resize(N); crating line for every corner point in image of cam 1

lines[1].resize(N); crating line for every corner point in image of cam 2

CvMat _L1 = cvMat(1, N, CV_32FC3, &lines[0][0]);
                                                                                            50

CvMat _L2 = cvMat(1, N, CV_32FC3, &lines[1][0]);

Always work in undistorted space

cvUndistortPoints( &_imagePoints1, &_imagePoints1, &_M1, &_D1, 0, &_M1 );

cvUndistortPoints( &_imagePoints2, &_imagePoints2, &_M2, &_D2, 0, &_M2 );

Function ‘undistortion’ which mathematically removes lens distortion, and rectification,

which mathematically aligns the images with respect to each other.

cvComputeCorrespondEpilines( &_imagePoints1, 1, &_F, &_L1 );

cvComputeCorrespondEpilines( &_imagePoints2, 2, &_F, &_L2 );

double avgErr = 0;

The method (which is avgErr) is the sum of the distances of the points from the

corresponding epipolar lines to its subpixel accuracy.

for( i = 0; i < N; i++ )

{

double err = fabs(points[0][i].x*lines[1][i].x + points[0][i].y*lines[1][i].y + lines[1][i].z)

+ fabs(points[1][i].x*lines[0][i].x +points[1][i].y*lines[0][i].y + lines[0][i].z);avgErr +=

err;

}

printf( "avg err = %g\n", avgErr/(nframes*n) );



COMPUTE AND DISPLAY RECTIFICATION

if( showUndistorted is true as the value is 1 )
                                                                         51

{

Initializing matrices for rectification purposes.

CvMat* mx1 = cvCreateMat( imageSize.height,imageSize.width, CV_32F );

CvMat* my1 = cvCreateMat( imageSize.height,imageSize.width, CV_32F );

CvMat* mx2 = cvCreateMat( imageSize.height,imageSize.width, CV_32F );

CvMat* my2 = cvCreateMat( imageSize.height,imageSize.width, CV_32F );

CvMat* img1r = cvCreateMat( imageSize.height,imageSize.width, CV_8U );

CvMat* img2r = cvCreateMat( imageSize.height,imageSize.width, CV_8U );

IplImage* disp= cvCreateImage( imageSize, IPL_DEPTH_16S, 1 );

CvMat* vdisp = cvCreateMat( imageSize.height,imageSize.width, CV_8U );

CvMat* pair;

double R1[3][3], R2[3][3], P1[3][4], P2[3][4];

CvMat _R1 = cvMat(3, 3, CV_64F, R1);

CvMat _R2 = cvMat(3, 3, CV_64F, R2);

int c=0;

IF BY CALIBRATED (BOUGUET'S METHOD)

if( useUncalibrated == 0 )

{

CvMat _P1 = cvMat(3, 4, CV_64F, P1);

CvMat _P2 = cvMat(3, 4, CV_64F, P2);
                                                                                        52

cvStereoRectify( &_M1, &_M2, &_D1, &_D2, imageSize,&_R, &_T,&_R1, &_R2,

&_P1, &_P2, &_Q,0/*CV_CALIB_ZER O_DISPARITY*/ );

Return parameters are Rl and Rr, rectification rotations for the left and right image

planes. Similarly, we get back the 3-by-4 left and right projection equations Pl and Pr. An

optional return parameter is Q, the 4-by-4 reprojection matrix used in cvReprojectto3D to

get 3D coordinates of the object.

isVerticalStereo = fabs(P2[1][3]) > fabs(P2[0][3]);

Precompute maps for cvRemap()

cvInitUndistortRectifyMap(&_M1,&_D1,&_R1,&_P1,mx1,my1);

cvInitUndistortRectifyMap(&_M2,&_D2,&_R2,&_P2,mx2,my2);

mx and my of two cameras from cvInitUndistortRectifyMap are relocate matrix of pixel

points for row aligned left and right images.

}

OR ELSE HARTLEY'S METHOD

else if( useUncalibrated == 1 || useUncalibrated == 2 )

Use intrinsic parameters of each camera, but compute the rectification transformation

directly from the fundamental matrix.

{

double H1[3][3], H2[3][3], iM[3][3];

CvMat _H1 = cvMat(3, 3, CV_64F, H1);

CvMat _H2 = cvMat(3, 3, CV_64F, H2);
                                                                                          53

CvMat _iM = cvMat(3, 3, CV_64F, iM);

Just to show independently used F

if( useUncalibrated == 2 )

The fundamental matrix F is just like the essential matrix E, except that

F operates in image pixel coordinates whereas E operates in physical coordinates. The

fundamental matrix has seven parameters, two for each epipole and three for the

homography that relates the two image planes.

cvFindFundamentalMat( &_imagePoints1,&_imagePoints2, &_F);

The function cvStereoRectifyUncalibrated computes the rectification transformations

without knowing intrinsic parameters of the cameras and their relative position in space,

hence the suffix "Uncalibrated". Another related difference from cvStereoRectify is that

the function outputs not the rectification transformations in the object (3D) space, but the

planar perspective transformations, encoded by the homography matrices H1 and H2.

cvStereoRectifyUncalibrated( &_imagePoints1,&_imagePoints2, &_F,imageSize,&_H1,

&_H2,3);

cvInvert(&_M1, &_iM);

cvMatMul(&_H1, &_M1, &_R1);

cvMatMul(&_iM, &_R1, &_R1);

cvInvert(&_M2, &_iM);

cvMatMul(&_H2, &_M2, &_R2);

cvMatMul(&_iM, &_R2, &_R2);
                                                                                           54

Precompute map for cvRemap()

cvInitUndistortRectifyMap(&_M1,&_D1,&_R1,&_M1,mx1,my1);

cvInitUndistortRectifyMap(&_M2,&_D1,&_R2,&_M2,mx2,my2);

}

else

assert(0);

cvNamedWindow( "rectified", 1 );



RECTIFY THE IMAGES AND FIND DISPARITY MAPS

if( it is not VerticalStereo )

Creating a single image; double the size original image in vertical direction. If it is

vertical stereo so that we can see both left and right rectified images in single image.

pair = cvCreateMat( imageSize.height, imageSize.width*2, CV_8UC3 );

else

pair = cvCreateMat( imageSize.height*2, imageSize.width, CV_8UC3 );

Setup for finding stereo correspondences

This is done by running a window of different sizes of 5-by-5, 7-by-7, . . . , 21-by-21.

The center for each feature in the left image, we search the corresponding row in the right

image for a best match.



CvStereoBMState *BMState = cvCreateStereoBMState();
                                                                                          55

assert(BMState != 0);



BMState->preFilterSize=7; In the pre-filtering step, the input images are normalized to

reduce lighting differences and to enhance image texture.

BMState->preFilterCap=30;

BMState->SADWindowSize=5; Correspondence is computed by a sliding SAD window

BMState->minDisparity=0; minDisparity is where the matching search should start.

CvScalar p;

if(c>0)

{

if(p.val[2]>-100.00)

{

BMState->numberOfDisparities=256;

The disparity search is then carried out over ‘numberOfDisparities’ counted in pixels.

printf("nd%d\n",BMState->numberOfDisparities);

}

else

{

BMState->numberOfDisparities=128;

printf("nd%d\n",BMState->numberOfDisparities);

}
                                                                                    56

}

else

{

BMState->numberOfDisparities=256;

}

Each disparity limit defines a plane at a fixed depth from the cameras.

BMState->textureThreshold=10; Texture is search parameter.

BMState->uniquenessRatio=15;

BMState->speckleWindowSize=21;

BMState->speckleRange=4;

c=c++;

CvCapture *capture1 = 0; Initializing capture for camera parameters.

CvCapture *capture2 = 0;

IplImage *imageBGR1 = 0; Initializing frame images from capture and setting initially

to null.

IplImage *imageBGR2 = 0;

int    key = 0;

capture1 = cvCaptureFromCAM( 0 ); The function that initializes cameras 1 and 2.

capture2 = cvCaptureFromCAM( 1 );

Checking if capture is happening.

if ( !capture1 ) {
                                                                                 57

fprintf( stderr, "Cannot open initialize webcam!\n" );



} Create a window for the video

cvNamedWindow( "result", CV_WINDOW_AUTOSIZE );

while( key != 'q' ) this loop is used get frames from capture continuously

{ get a frame

imageBGR1 = cvQueryFrame( capture1 );

imageBGR2 = cvQueryFrame( capture2 );

Check

if( !imageBGR1 ) break; if not frame the brake the loop.

Create some GUI windows for output display.

cvShowImage("Input Image1", imageBGR1);

cvShowImage("Input Image2", imageBGR2);

IplImage* imageHSV1 = cvCreateImage( cvGetSize(imageBGR1), 8, 3); Full HSV color

image.

cvCvtColor(imageBGR1, imageHSV1, CV_BGR2HSV); converting color to BGR to

HSV which is easy to separate color planes.

cvShowImage("s1",imageHSV1);

initializing planes

IplImage* planeH1 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Hue component.
                                                                                      58

IplImage* planeS1 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Saturation

component.

IplImage* planeV1 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Brightness

component.

cvCvtPixToPlane(imageHSV1, planeH1, planeS1, planeV1, 0); display current frame

IplImage* planeH11 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Hue component.

IplImage* planeS11 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Saturation

component.

IplImage* planeV11 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Brightness

component.

Converting pixel to plane

cvCvtPixToPlane(imageHSV1, planeH11, planeS11, planeV11, 0); Extracting the 3 color

components.

Setting up saturation and brightness to maximum in order to separate Hue plane from

others.

cvSet(planeS11, CV_RGB(255,255,255));

cvSet(planeV11, CV_RGB(255,255,255));

IplImage* imageHSV11 = cvCreateImage( cvGetSize(imageBGR1), 8, 3);

Full HSV color image.

IplImage* imageBGR11 = cvCreateImage( cvGetSize(imageBGR1), 8, 3);

Full RGB color image.
                                                                                     59

cvCvtPlaneToPix( planeH11, planeS11, planeV11, 0, imageHSV11 );

Combine separate color components into one.

cvCvtColor(imageHSV11, imageBGR11, CV_HSV2BGR);

Convert from a BGR to an HSV image.

cvReleaseImage(&planeH11);cvReleaseImage(&planeS11);

cvReleaseImage(&planeV11);

cvReleaseImage(&imageHSV11); cvReleaseImage(&imageBGR11);

Doing same thing for other color components as above with Hue component

IplImage* planeH21 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Hue component.

IplImage* planeS21 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Saturation

component.

IplImage* planeV21 = cvCreateImage( cvGetSize(imageBGR1), 8, 1);Brightness

component.

cvCvtPixToPlane(imageHSV1, planeH21, planeS21, planeV21, 0); //Extract the 3 color

components.

cvSet(planeS21, CV_RGB(255,255,255));

//cvSet(planeV21, CV_RGB(255,255,255));

IplImage* imageHSV21 = cvCreateImage( cvGetSize(imageBGR1), 8, 3); HSV color

image.

IplImage* imageBGR21 = cvCreateImage( cvGetSize(imageBGR1), 8, 3); RGB color

image.
                                                                                   60

cvCvtPlaneToPix( planeH21, planeS21,planeV21, 0, imageHSV21 ); combine the

separate color components.

cvCvtColor(imageHSV21, imageBGR21, CV_HSV2BGR Convert from a BGR to an

HSV image.

cvReleaseImage(&planeH21);cvReleaseImage(&planeS21);cvReleaseImage(&planeV21

);

cvReleaseImage(&imageHSV21);cvReleaseImage(&imageBGR21);

IplImage* planeH31 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Hue component.

IplImage* planeS31 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Saturation

component.

IplImage* planeV31 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); Brightness

component.

cvCvtPixToPlane(imageHSV1, planeH31, planeS31, planeV31, 0); Extract the 3 color

components.

//cvSet(planeS31, CV_RGB(255,255,255));

cvSet(planeV31, CV_RGB(255,255,255));

IplImage* imageHSV31 = cvCreateImage( cvGetSize(imageBGR1), 8, 3); // Full HSV

color image.

IplImage* imageBGR31 = cvCreateImage( cvGetSize(imageBGR1), 8, 3); // Full RGB

color image.
                                                                                   61

cvCvtPlaneToPix( planeH31, planeS31, planeV31, 0, imageHSV31 ); Combine separate

color components into one.

cvCvtColor(imageHSV31, imageBGR31, CV_HSV2BGR); Convert from a BGR to an

HSV image.

cvReleaseImage(&planeH31);cvReleaseImage(&planeS31);cvReleaseImage(&planeV31

);

cvReleaseImage(&imageHSV31);cvReleaseImage(&imageBGR31);

cvThreshold(planeH1, planeH1, 170, UCHAR_MAX, CV_THRESH_BINARY);

cvThreshold(planeS1, planeS1, 171, UCHAR_MAX, CV_THRESH_BINARY);

cvThreshold(planeV1, planeV1, 136, UCHAR_MAX, CV_THRESH_BINARY);

// Show the thresholded HSV channels

IplImage* img1 = cvCreateImage( cvGetSize(imageBGR1), 8, 1); // Greyscale output

image.

cvAnd(planeH1, planeS1, img1);imageColor = H {BITWISE_AND} S.

cvAnd(img1, planeV1, img1);

imageColor = H {BITWISE_AND} S {BITWISE_AND} V.

Show the output image on the screen.

cvNamedWindow("Skin Pixels1", CV_WINDOW_AUTOSIZE);

cvShowImage("color Pixels1", img1); form camera 1 and same above steps are repeated

for second camera
                                                                                   62

IplImage* imageHSV2 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full HSV

color image.

cvCvtColor(imageBGR2, imageHSV2, CV_BGR2HSV);

//cvShowImage("s",imageHSV2);

IplImage* planeH2 = cvCreateImage( cvGetSize(imageBGR2), 8, 1);// Hue component.

IplImage* planeS2 = cvCreateImage( cvGetSize(imageBGR2), 8, 1);// Saturation

component.

IplImage* planeV2 = cvCreateImage( cvGetSize(imageBGR2), 8, 1);// Brightness

component.

cvCvtPixToPlane(imageHSV2, planeH2, planeS2, planeV2, 0);

display current frame

IplImage* planeH12 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Hue component.

IplImage* planeS12 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Saturation

component.

IplImage* planeV12 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Brightness

component.

cvCvtPixToPlane(imageHSV2, planeH12, planeS12, planeV12, 0); Extract the 3 color

components.

cvSet(planeS12, CV_RGB(255,255,255));cvSet(planeV12, CV_RGB(255,255,255));

IplImage* imageHSV12 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full HSV

color image.
                                                                                   63

IplImage* imageBGR12 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full RGB

color image.

cvCvtPlaneToPix( planeH12, planeS12, planeV12, 0, imageHSV12 );// Combine separate

color components into one.

cvCvtColor(imageHSV12, imageBGR12, CV_HSV2BGR);// Convert from a BGR to an

HSV image.

cvReleaseImage(&planeH12);cvReleaseImage(&planeS12);cvReleaseImage(&planeV12

);

cvReleaseImage(&imageHSV12);cvReleaseImage(&imageBGR12);

IplImage* planeH22 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Hue component.

IplImage* planeS22 = cvCreateImage( cvGetSize(imageBGR2), 8, 1);Saturation

component.

IplImage* planeV22 = cvCreateImage( cvGetSize(imageBGR2), 8, 1);Brightness

component.

cvCvtPixToPlane(imageHSV2, planeH22, planeS22, planeV22, 0); Extract the 3 color

components.

cvSet(planeS22, CV_RGB(255,255,255));//cvSet(planeV2, CV_RGB(255,255,255));

IplImage* imageHSV22 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full HSV

color image.

IplImage* imageBGR22 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full RGB

color image.
                                                                                  64

cvCvtPlaneToPix( planeH22, planeS22, planeV22, 0, imageHSV2 );// Combine separate

color components into one.

cvCvtColor(imageHSV22, imageBGR22, CV_HSV2BGR);// Convert from a BGR to an

HSV image.

cvReleaseImage(&planeH22);cvReleaseImage(&planeS22);cvReleaseImage(&planeV22

);

cvReleaseImage(&imageHSV22);cvReleaseImage(&imageBGR22);

IplImage* planeH32 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Hue component.

IplImage* planeS32 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Saturation

component.

IplImage* planeV32 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); Brightness

component.

cvCvtPixToPlane(imageHSV2, planeH32, planeS32, planeV32, 0); // Extract the 3 color

components.

//cvSet(planeS3, CV_RGB(255,255,255));cvSet(planeV32, CV_RGB(255,255,255));

IplImage* imageHSV32 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full HSV

color image.

IplImage* imageBGR32 = cvCreateImage( cvGetSize(imageBGR2), 8, 3); // Full RGB

color image.

cvCvtPlaneToPix( planeH32, planeS32, planeV32, 0, imageHSV32 ); Combine separate

color components into one.
                                                                                   65

cvCvtColor(imageHSV32, imageBGR32, CV_HSV2BGR); Convert from a BGR to an

HSV image.

cvReleaseImage(&planeH32);cvReleaseImage(&planeS32);cvReleaseImage(&planeV32

);cvReleaseImage(&imageHSV32);cvReleaseImage(&imageBGR32);



cvThreshold(planeH2, planeH2, 170, UCHAR_MAX, CV_THRESH_BINARY);

cvThreshold(planeS2, planeS2, 171, UCHAR_MAX, CV_THRESH_BINARY);

cvThreshold(planeV2, planeV2, 136, UCHAR_MAX, CV_THRESH_BINARY);

// Show the thresholded HSV channels

IplImage* img2 = cvCreateImage( cvGetSize(imageBGR2), 8, 1); // Greyscale output

image.

cvAnd(planeH2, planeS2, img2);// imagecolor = H {BITWISE_AND} S.

cvAnd(img2, planeV2, img2);// imagecolor = H {BITWISE_AND} S {BITWISE_AND}

V.

// Show the output image on the screen.//cvNamedWindow(“color Pixels2",

CV_WINDOW_AUTOSIZE);

//cvShowImage("color Pixels2", img2);

These color pixel images are imputed to remap to rectify both color images.

If color images 1 and 2( img1 && img2 )

{

CvMat part;
                                                                                          66

cvRemap( img1, img1r, mx1, my1 );

cvRemap( img2, img2r, mx2, my2 );

(if is not VerticalStereo || useUncalibrated != 0 )

{

// When the stereo camera is oriented vertically then useUncalibrated==0 does not

transpose the image, so the epipolar lines in the rectified images are vertical. Stereo

correspondence function does not support such a case.

Applying StereoCorrespondenceBM to rectified color images to get disparity map.

cvFindStereoCorrespondenceBM( img1r, img2r, disp, BMState);

Saving disparity to the text file//cvSave("disp.txt",disp);

IplImage* real_disparity= cvCreateImage( imageSize, IPL_DEPTH_8U, 1 );

cvConvertScale( disp, real_disparity, 1.0/16, 0 ); cvNamedWindow( "disparity" );

cvShowImage( "disparity",real_disparity );

if( useUncalibrated == 0 )//Using Bouguet, we can calculate the depth

{

Calculating depth using Bouguet Method using function ‘cvReprojectImageTo3D’ and

the input is calculated disparity from above step.

IplImage* depth = cvCreateImage( imageSize, IPL_DEPTH_32F, 3 );

cvReprojectImageTo3D(real_disparity , depth, &_Q);

Steps to diplay depth of colored object (X, Y, Z coordinates from cameras) on console

window:
                                                                                         67

The below steps are just finding out minimum, maximum and average distance of colored

pixel from the camera.

int l=0;

float r[10000];float o[10000];float m[10000]; creating arrays for storing pixel values

which are extracted from depth image.

for(int i=0;i<imageSize.height;i++){

for (int j=0;j<imageSize.width;j++)

{

CvScalar s;

p=cvGet2D(depth,i,j); // get the (i,j) pixel value

s=cvGet2D(real_disparity,i,j); getting disparities which are greater than 1(colored pixels

are<1) from real_disparity image.

if(s.val[0]!=0)

{

//printf("X=%f, Y=%f, Z=%f\n",p.val[0],p.val[1],p.val[2]);

r[l]=p.val[2];

o[l]=p.val[0];

m[l]=p.val[1];

//printf("%f",r[l]);

//printf("------------next->>>>>>>>");

l=l++;
                                                                                        68

}

//if(l==1)

//break;

}

}

float minr =r[0];float mino =o[0];float minm =m[0];

float maxr= r[0];float maxo= o[0];float maxm= m[0];

float sumr=0;float sumo=0;float summ=0;

float avgr; float avgo; float avgm;

calculating sum of all x,y z values of color pixels.

for(int pl=0;pl<l;pl++)

{

sumr=sumr+r[pl];sumo=sumo+o[pl];summ=summ+m[pl];

calculating minimum of all x,y z values of color pixels and it same like calculating min,

max and average of elements in a array.

if (r[pl] < minr)

{

minr = r[pl];

}

if (r[pl] > maxr)

{
                       69

maxr = r[pl];

}

if (o[pl] < mino)

{

mino = o[pl];

}

Calculating maximum;

if (o[pl] > maxo)

{

maxo = o[pl];

}

if (m[pl] < minm)

{

minm = m[pl];

}

if (m[pl] > maxm)

{

maxm = m[pl];

}}

avgr=(float)sumr/l;

avgo=(float)sumo/l;
                                                                                       70

avgm=(float)summ/l;

Outputting value to console window

printf("MAX Z=%f\nMIN Z=%f\n",minr,maxr);

printf("AVG Z=%f\n",avgr);

printf("MAX X=%f\nMIN X=%f\n",mino,maxo);

printf("AVG X=%f\n",avgo);

printf("MAX Y=%f\nMIN Y=%f\n",minm,maxm);

printf("AVG Y=%f\n",avgm);

printf("X=%f, Y=%f, Z=%f\n",avgo,avgm,avgr);

printf("----------\n");

}

The below part is used only to show green lines on Pair image so that we can notice that

left and right images are rectified or not. /* if( !isVerticalStereo )

{

cvGetCols( pair, &part, 0, imageSize.width ); Copy elements from multiple adjacent

columns of an array

cvCvtColor( img1r, &part, CV_GRAY2BGR );

cvGetCols( pair, &part, imageSize.width, imageSize.width*2 );

cvCvtColor( img2r, &part, CV_GRAY2BGR );

for( j = 0; j < imageSize.height; j += 16 )

cvLine( pair, cvPoint(0,j),
                                                 71

cvPoint(imageSize.width*2,j),

CV_RGB(0,255,0));

}

else

{

cvGetRows( pair, &part, 0, imageSize.height );

cvCvtColor( img1r, &part, CV_GRAY2BGR );

cvGetRows( pair, &part, imageSize.height,

imageSize.height*2);

cvCvtColor( img2r, &part, CV_GRAY2BGR );

for( j = 0; j < imageSize.width; j += 16 )

cvLine( pair, cvPoint(j,0),

cvPoint(j,imageSize.height*2),

CV_RGB(0,255,0));

}

//cvShowImage( "rectified", pair );*/

key = cvWaitKey(1);

}

cvReleaseImage( &img1 );

cvReleaseImage( &img2 );

}
                                    72

cvReleaseStereoBMState(&BMState);

cvReleaseMat( &mx1 );

cvReleaseMat( &my1 );

cvReleaseMat( &mx2 );

cvReleaseMat( &my2 );

cvReleaseMat( &img1r );

cvReleaseMat( &img2r );

cvReleaseImage( &disp );

}

}

int main(void)main function

{

StereoCalib("1.txt", 5, 7, 0);

return 0;}
                                                                                         73

                                         Chapter 4

                   RESULTS OF CALIBRATION AND 3D VISION

4.1) Project Application information:


The major purpose of this project is to design a sensor that can detect and track the

player’s bat for table tennis shooter. The sensor can be linked to a robotic system that can

throw the ball to the player’s bat for training purposes. The sensor is designed to detect a

colored object (Player’s bat) from range 2 to 3 meters away from the sensor. The sensors

that can see the colored object are stereo cameras which are placed at convenient place

for tracking, outputs are (X, Y, &Z) coordinates of a colored object from one of the

camera in camera coordinate system. These coordinates can be inputted into robotic

system (shooter) so that shooter can move their joints to throw the ball for different kinds

of training purposes. The shooter can basically have two types of training modes. 1.

Random mode; in this the shooter can throw anywhere on the table for second stage of

training. 2. Spot mode; in this the shooter can only throw the ball on to the table so that

after bouncing back it should exactly hit the player’s bat, for spot mode the shooter needs

the bat coordinates from stereo cameras (sensors). We could add intelligence and

machine learning techniques to the shooter by having the coordinates of an object.


For this specific project we used two USB cameras as a sensor to see the object. Cameras

are placed approximately placed 50 cm apart so see the object at 2 to 3 meters away from
                                                                                           74

     the sensors. Before cameras see the 3D object, the cameras needs to be calibrated so that

     images from two cameras are vertically aligned.


     There are few constraints that are needed to follow every time we track the object.

     Constrains are as follows:


        1. The object that we are tracking should be always clearly seen by the two cameras,

            if not the code stops outputting the coordinates of the object and starts again if

            camera sees the object.

        2. The camera focus and the distance between the cameras are arranged so that the

            player’s playing area should be clearly seen by two cameras. Players bat should

            be on the image plane of both cameras.

        3. Cameras are not moved or refocused after calibration, if moved; we need to

            calibrate once again with new calibration data.

        4. For tracking the object which is at larger distance; distance, angle, and focus are

            need be adjusted accordingly. Larger the distance between cameras, larger

            distance object can be tracked. Focus can be adjusted according to the size of the

            object that we are tracking for. Distance between the cameras is the major factor

            for tracking the object.

5.   The outputs are (X, Y, & Z) coordinates of colored object. The outputs are categorized

     as Min, Max and Average; the reason because all pixels wouldn’t show the same value
                                                                                      75

but the average value is approximately (1 to 3cm difference) equal to the coordinates of

the object that we are tracking.


4.2) Coordinates of a colored object in front of cameras:




                                                                          (0, 0, Z)




                          Figure 4.1: Camera coordinate system



X, Y and Z values are from the left camera. The program continuously outputs the

coordinates of a colored object in a console window.



Maximum Z=-91.37 cm (The maximum depth value of a pixel in the disparity image).

Minimum Z= -70.63 cm (The minimum depth value of a pixel in the disparity image).

Average Z=-76.74 cm (The average depth value of pixels in the disparity image which is

the exact real depth value of colored object).

Maximum X=-51.85 cm (The maximum X value of a pixel in the disparity image).
                                                                                    76

Minimum X= -40.46 cm (The minimum X value of a pixel in the disparity image).

Average X=-43.89cm (The average X value of pixels in the disparity image which is the

exact real X value of colored object).

Maximum Y=-3.51 cm (The maximum X value of a pixel in the disparity image).

Minimum Y= -2.08 cm (The minimum X value of a pixel in the disparity image).

Average Y=-2.83 cm (The average X value of pixels in the disparity image which is the

exact real X value of colored object).



4.3) Results are graphically shown below with left and right calibration data.




Figure 4.2: Detected corners on the image taken from left camera.
                                                                     77




Figure 4.3: Detected corners on the image taken from right camera.
                                                                         78




Figure 4.4: Rectified image pair,


Each pixel in the left image is vertically aligned to the right image.




Figure 4.5: Displays average error to its sub-pixel accuracy.
                                                                          79




Figure 4.6: Coordinates of an object with respective to left camera.


Colored object is moved towards and away from the camera to check whether the

cameras are tracking the object correctly or not.
                                                                                          80




Figure 4.7: Disparity between left and right image.



Disparity is pixel distance between the matching pixels from left and right image. Each

disparity limit defines a plane at a fixed depth from the cameras.
                                                                                      81

                                       Chapter 5

                                    CONCLUSION


The purpose of this project is to design a sensor that can detect player’s bat for table

tennis shooter. We have designed this sensor using two USB cameras and OpenCV

computer vision library. The library basically consists of some important functions which

does everything for us. The program was coded in C++ using OpenCV library. The

program function is to calibrate stereo cameras, rectify the distortion in the images and

finally outputs coordinates of the object. In program was coded to read number of

chessboard images from a text file which contains a list of left and right stereo

(chessboard) image pairs, which are used to calibrate the cameras and then also rectify

the images. The code next reads the left and right image pairs and finds the chessboard

corners to sub-pixel accuracy. The code saves object and image points for all images.

With the list of image and object points on good chessboard images, the code calls the

important cvStereoCalibrate() to calibrate the camera. The calibration function outputs

the camera matrix and distortion vector for two cameras; it also outputs rotation matrix,

the translation vector, the essential matrix, and the fundamental matrix. The accuracy of

calibration can be checked by checking how nearly the points in one image lie on the

Epipolar lines of the other image. Epilines are computed by the function

cvComputeCorrespondEpilines. The dot product of the points with the Epipolar is 0.29

with our calibration data. The code then computes the rectification maps using the

uncalibrated method cvStereoRectifyUncalibrated() or the calibrated (Bouguet) method
                                                                                         82

cvStereoRectify(). Here we have rectified data for each source pixels for both left and

right images. The distance between and focus of cameras should not be changed after

calibration because calibration function makes rotational and translational relation

between two cameras. We initialized two cameras and made them to take images of a

colored object for tracking. Background subtraction technique is used to make the images

to see only the colored object from two cameras. The images from left and right camera

are rectified using cvRemap() and lines are drawn across the image pairs that makes us to

see how well the rectified images were aligned .The rectified images are initialized to the

block-matching state using cvCreateBMState(). We then computed the disparity maps by

using cvFindStereoCorrespondenceBM(). This Disparity is initialized to the function

cvReprojectImageTo3D() to get depth Image. The (X, Y, &Z) coordinates of colored

object are encoded in the depth image, this function also takes projection matrix which

consists of distance between two cameras and focal length of each camera.

These coordinates of colored object (Player’s bat) can be linked to the shooter; this makes

the shooter to throw the ball exactly to player’s bat for training purpose. The shooter can

be now being able to see a 3D scene that makes the shooter to use machine learning and

intelligent techniques for advance training purposes. Machine learning makes the shooter

to understand the player’s skill on different shots and that makes shooter to throw the ball

intelligently for all kinds of training purposes.
                                                                                      83

                                       Chapter 6

                                    FUTURE WORK


 Work can be extended by adding machine learning techniques for better accuracy of

   results.

 Work can be extended by designing and implementing a table tennis shooter (a

   robotic system) that can throw a ball to players bat for training purposes. This robot

   can take 3D coordinates of a bat from this project code.

 Work can be extended for industrial inspection purposes.

 Work can be extended to develop an application for part modeling; this application

   can be linked to different modeling software (Example: Solid Works, CATIA, Pro-E,

   etc) for faster part modeling.

 Work can be extended for detecting an object in front of vehicle (Unmanned

   Vehicle).
                                                                            84

                                  BIBLIOGRAPHY


[1] Gray Bradski , Adrian Khehler, “Learning OpenCV Computer Vision with OpenCV

library”.


[2] Gray Bradski, “OpenCV Wiki” [Online].


Available: http://opencv.willowgarage.com/wiki


[3] Michalel C. Fairhurst, “Computer Vision for Robotic Systems”.



[4] David Forsyth, Jean Ponce, “Computer vision: a modern approach”.



[5] Shervine Mami “Learning OpenCV” [Online].



Available: http://www.shervinemami.co.cc/openCV.html




[6] KenConley “ ROS wiki” [Online]


Available: http://www.ros.org/wiki/


[7] Vishvjit S. Nalwa “A Guided Tour of Computer Vision”


[8] David Lowe, “Computer Vision Industry” [Online].


Available: http://www.cs.ubc.ca/~lowe/vision.html

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:2/1/2013
language:Unknown
pages:94
xuxianglp xuxianglp http://
About