Docstoc

FAST ALGORITHM FOR VIDEO QUALITY ENHANCING USING VISION-BASED HAND GESTURE RECOGNITION

Document Sample
FAST ALGORITHM FOR VIDEO QUALITY ENHANCING USING VISION-BASED HAND GESTURE RECOGNITION Powered By Docstoc
					 International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
 INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING
 6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME
                            & TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 3, Issue 3, October - December (2012), pp. 501-509
                                                                           IJCET
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2012): 3.9580 (Calculated by GISI)               ©IAEME
www.jifactor.com




    FAST ALGORITHM FOR VIDEO QUALITY ENHANCING USING
          VISION-BASED HAND GESTURE RECOGNITION
                             Ms.Shaikh Shabnam Shafi Ahmed
                             Department of Computer Engineering
                     Shri.Jagdish Prasad Jhabarmal Tibrewala University,
                                       Rajasthan, India
                                 shabnamfsayyad@gmail.com
                                    Dr.Shah Aqueel Ahmed
                  Department of Electronics & Instrumentation Engineering
                  Royal Institute of Technology & Science, Hyderabad, India
                                   Shah_aqueel@yahoo.co.in
                               Mr.Sayyad Farook Bashir
                          Department of Mechanical Engineering
                  Genba Sopanrao Moze College of Engineering, Pune, India
                              farooksayyad@rediffmail.com


 ABSTRACT

         The goal of the paper is to develop a real-time system capable of understanding
 commands given by hand gestures. The system follows a hybrid approach. It recognizes both
 motion-based and static hand gestures. To implement the algorithm given real-time
 constraints was one of the most difficult tasks. The system is a novel application that allows
 communication of most necessary commands to computer. Further, the system can work with
 any camera that supports streaming video input to the computer. Its touch less interactive
 systems and mouse replacement solutions utilizes advanced computer vision to convert
 simple hand movements into direct mouse control in any environment. Thus, the method
 eliminates the need of installing any physical hardware device to interact with the computer;
 rather all the necessary functions will be accessible using hand gestures. It allows a natural
 and user-friendlier man-machine interaction with the computer thereby contributing to the
 novel communication methods for HCI that can allow the deployment of new commands that
 are not possible with the current input devices. More effectively it can be helpful to restore
 the movies since this application has a relatively small learning curve, it should be easy for
 any type of user to master and use it effectively.

 Keywords: Hand gesture recognition, Tracking, Fast Algorithm, Vision-based.


                                              501
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME


1. INTRODUCTION

        The idea was taken from Attila Licsar and Tamas Sziranyi [1]. This interface is
simple enough to be run using an ordinary webcam and requires little training. The use of
hand gestures provides an attractive alternative to cumbersome interface devices for human-
computer interaction (HCI). In particular, visual interpretation of hand gestures can help in
achieving the ease and naturalness desired for HCI. The main focus of the paper was on bare-
hands utilizing a simple web camera with a frame rate of approximately 30 frames per second
to communicate to computer all basic commands required by a human-computer interface.
Digital video has become an integral part of everyday life. It is well-known that video
enhancement is an active topic in computer vision has received much attention in recent
years. The aim is to improve the visual appearance of the video, or to provide a “better”
transform representation for future automated video processing, such as analysis, detection,
segmentation, and recognition. Moreover, it helps analyze background Information that is
essential to understand object behavior without requiring expensive human visual
inspection. Video enhancement problem can be formulated as follows: give an input low
quality video and the output high quality video for specific applications. The system can
work with any camera that supports streaming video input to the computer. Its touch less
interactive systems and mouse replacement solutions utilizes advanced computer vision to
convert simple hand movements into direct mouse control in any environment.

2. EARLY APPROACHES

    Asanterabi Malima et. al. [2] approach to the hand gesture recognition problem a robot
control context involved the use of markers on the finger tips. An associated algorithm is
used to detect the presence and color of the markers, through which one can identify which
fingers are active in the gesture. The inconvenience of placing markers on the user’s hand
makes this an infeasible approach in practice. Vafadar and Alireza Behrad[3] used Template
Based: In this approach the data obtained is compared against some reference data and using
the thresholds, the data is categorized into one of the gestures available in the reference data.
This is a simple approach with little calibration but suffers from noise and doesn’t work with
overlapping gestures. Byung-Woo Min et. al. [4] developed Hidden Markov Model (HMM)
which is commonly used and has been widely exploited for temporal gesture recognition. An
HMM consists of states and state transitions with observation probabilities. For watch gesture
a separate HMM is trained and the recognition of the gesture is based on the generation of
maximum probability by a particular HMM. This method also suffers from training time
involved and complex working nature as the results are unpredicted because of the hidden
nature. For the gesture recognition, Wing Kwong Chung et.al.[5] has presented a hand
gesture recognition modal based on “A Real-time Hand Gesture Recognition based on Haar
Wavelet Representation.In addition to voice and controller pads, hand gestures can also be an
effective way of communication between humans and robots or even between auditory
handicapped people and robots. Mu-Chun Su [6] suggested a method using Neural Network
which is based on modeling of the human nervous system element called neuron and its
interaction with the other neurons to transfer the information. Each node consists of and the
input function which computes the weighted sum and the activation function to generate the
response based on the weighted sum. Byung-Woo Min et al.[7] developed a method for
gesture recognition using Hidden Markov Model. This method has been widely exploited for
temporal gesture recognition. Bhuyan, et.al. [8] have proposed the advantage of VOP based


                                              502
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME
method for segmentation of hand image. The proposed acceleration feature works efficiently only
when the spatial end position of preceding gesture is different from start position of next gesture in the
connected gesture sequence. Shewta and Pankaj [9] have proposed that ANN provides a good and
powerful solution for gesture recognition. Artificial Neural Networks are applicable to multivariate
non-linear problems .It has fast computational ability. Gesture recognition is an important for
developing alternative human-computer interaction modalities. Zhou Ren et.al. [10] have worked in
the direction of hand gesture recognition by making use of kinetic sensor which is very much different
from the normal web camera. Hand gesture based Human-Computer-Interaction (HCI) is one of the
most natural and intuitive ways to communicate between people and machines, since it closely
mimics how human interact with each other. Hamid A Jalab [11] have proposed the succeeds to
extract features from hand gesture image based on hand segmentation using both wavelet network an
ANN. Qing and Nicolas [12] proposed a method that used the formal grammar to represent the hand
gestures and postures however limited. This method involves simple gestures requiring the fingers to
be extended in various configurations which are mapped to the formal grammar specified by specific
tokens and rules. The system involves tracker and glove. This system has poor accuracy and very
limited gesture set. Lee and Yangsheng Xu [13] developed a glove-based gesture recognition system
that was able to recognize 14 of the letters from the hand alphabet, learn new gestures and able to
update the model of each gesture in the system in online mode, with a rate of 10Hz. Over the years
advanced glove devices have been designed such as the Sayre Glove, Dexterous Hand Master and
Power Glove. Spatio-temporal vector Analysis method was proposed by Vafadar and Behrad [14]
which used to track the movement of the hand in the images of the scene and track the motion in the
sequence of image. The information about the motion is obtained by the derivatives and it is assumed
that under static background, hand motion is the fastest changing object of the scene. Then using the
refinement and variance constraint flow field is refined. This flow field captures the characteristics of
the given gesture.

3. HAND GESTURE RECOGNITION

         Consider a navigation problem, in which a user responds to the hand pose signs given by a
human, captured through a camera. The interest is in an algorithm that enables to identify a hand pose
sign in the input image, as one of five possible commands (or counts). The identified command will
then be used as a control input for controlling the video file to perform the desired action or execute a
certain task. For examples of the signs to be used in the algorithm, see Figure1.The signs could be
associated with various meanings depending on the function. For example, a “one” count towards left
means “move previous”, a “two” count is used for adjusting volumes “three” count is used for
play/pause a “four” count is used for zoom, and finally a “five” count is used for Alt Tab.




           Figure 1: Set of hand gestures, or “counts” considered in the paper.




                                                   503
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME
4. SYSTEM OVERVIEW

        Product depends on external interfaces like Web camera etc. The overall
architectural components are shown in Figure 2.




                   Figure 2: Architectural Components of the system.

5. VIDEO ENHANCEMENT

5.1 Extending the Mixture of Gaussians to Remove Shadows and Highlights.
Our first improvement is to extend the mixture of Gaussians approach to remove high lights
and shadows. As such, the highlight and shadow selection parameters of the statistical
background disturbance technique were deployed to ensure that these factors were removed.
This led to some improvement.
5.2 The Enhanced Foreground/Background Selector.
The extended mixture of Gaussians algorithm is joined with the statistical background
disturbance technique. A pixel is labeled as foreground only if both techniques labeled it as
such independently, otherwise it is labeled as background. The process gave better results
than either of the original algorithms on the data sets considered.
5.3 Dynamic Gaussian Background Distributions.
The mixture of Gaussians has more attractive properties than the statistical background
disturbance technique. The mixture of Gaussians updates the background model during the
extraction process repeatedly using the online input sequence. In contrast, the statistical
background disturbance technique uses a fixed background model. This led us to combine the
two techniques to predispose processing to favors the more robust technique. Hence, the
mixture of Gaussians forms the basis of the model, and the statistical background disturbance
technique is used to force the mixture of Gaussians to expand the background distributions
only when a pixel is labeled as a motion pixel and the background disturbance technique
disagrees. The algorithm first specifies two distribution sizes as a new compromise model:
one fits most of the background pixels without affecting the detection of a moving object.
The other is larger, to approximate the remaining background pixels. Then the extended
mixture of Gaussians algorithm is used with the small distribution size as an initial
distribution variance for each new distribution. The pixels identified as moving objects are


                                            504
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME
tested using the statistical background disturbance technique. The testing algorithm will
determine whether it agrees with the labeled motion pixels, if not the background
distributions will be enlarged (using the larger background distribution size) trying to fit the
pixels.

6. METHODS OF ENHANCING VIDEO QUALITY

        Formally, an image (or video frame) is represented as RGB triplets mapped onto a 2D
grid. Most commonly, video frames have N = 255. In the HSV color space, the V component
is defined as the maximum of the RGB values, thus V (x; y) = max (I(x, y)) (element wise).
The probability mass function (PMF) for V is:



The cumulative distribution function (CDF) of V is:




A simple form of Histogram Equalization, often referred by image editing software as “Auto-
Levels", computes a linear transform for all colors in an image. The name derives from the
fact that the algorithm tries to find a “level" value to which the white color should be
mapped, and another value for black:




Where lout is the output luminance, lin is the input luminance, best is the estimated black
level, west is the estimated white level, wd is the desired white level, and bd is the desired
black level. The luminance is usually computed as defined by the YUV, HSL or La*b*
formulations. Typically these values are chosen such that the output will cover the entire
dynamic range of the color space in which the adjustment is made. Given a luminance, the
advantage of this formulation is that it is color-space independent (the exact formulation does
not really matter), and it is very fast as it is simply a linear transform.
Using the formulation from Eq. 2, a good estimate for west is such that C(west) = 0:995 and
best is chosen such that C(best) = 0:005. This formulation would assign west and best values in
the set {0,…N}. It is more convenient, though to work with values in the [0::1] interval, and
as such, we will assume that the variables we work with are normalized to this interval. The
formulation for luminosity that we employ is the “value" component in the HSV color space.
This decision was made experimentally, as it yielded the best enhancement results overall,
and it is impossible to generate out of gamut colors by adjusting V.
Now applying this formulation independently on each frame of the video is not desired, as in
many cases the output will have a noticeable flicker due to jumps between nearby estimates
of best and west. In order to alleviate this problem, we can compute these values per frame, but
not use them directly.
Using a shot (or cut) detector, a better estimate can be computed based on the statistics of the
entire shot. In order to have a conservative estimate that will not severely change outlier
frames, it is possible to use the 95th percentile for wshot and the 5th percentile for bshot. Simply
using these values will yield a reasonable output, but unfortunately, the output will be too


                                                505
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME
conservative to be practical. An alternative approach is to use a moving average of w and b
over a window of frames. This has the side effect of “delaying" fades, and enhancing
flickering on videos that were filmed with cameras that employ automatic gain control, since
the statistics will vary across the same shot.
A compromise between the estimates at frame level, denoted as bframe and wframe, and the shot
level (bshot and wshot) can be obtained by:




Using the same pattern west can also be computed. The parameter frame is used to decide
how much influence the local estimate have over the more global shot estimate. This
formulation allows more frames to be corrected when compared to the per-shot parameters.
At the same time, a smaller number of frames will be adjusted than when using the per-frame
statistics. A good choice for frame was found to be 0.5.
The algorithm thus far will work well in the general case. However; there is a class of frames
for which it will not do very well: whenever there is a high contrast between the foreground
and the background, and the foreground object is “dark", the above algorithm will make the
object perceptually darker, and it will further increase the contrast.
However, to the human eye, the end result will seem lower quality, because it is now much
harder to distinguish any features in the dark object. In order to alleviate this problem, if the
corrected color has a lower value than the original, then the output is computed as follows:




yl is a parameter which can be adjusted by the user for the desired output. If it set to one, then
the default algorithm is applied. Experimentally, we found that 4 is a good choice of yl.
The final luminance value of the pixel, (l’out) is defined as:




An intriguing aspect of video enhancement is the Abney effect. An increase in lightness
causes a perceived decrease in saturation. Therefore, in order to preserve the perceived
saturation, we modify the saturation channel sout as follows:


Where σ is a parameter, which yields a good compromise in enhancement at 0.15. This
unique formulation of our algorithm modifies saturation as a function of luminosity
difference in order to increase the perceived clarity of the image.
Some videos may not contain enough information for our algorithm to be effective. This class
of videos can easily be detected. If the dynamic range of the luminance is too small, then no
global enhancement algorithm can work. Experimentally, we found that if the input has fewer
than ten unique luminance values, the output will likely not be visually pleasing (although, it
may present more details).


                                               506
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME
7. SOFTWARE DESIGN SPECIFICATION

7.1 Use case Diagram
Use case diagrams are one of the five diagrams in the UML for modeling the dynamic aspects
of systems (activity diagrams, state chart diagrams, sequence diagrams, and collaboration
diagrams are four other kinds of diagrams in the UML for modeling the dynamic aspects of
systems). Use case diagrams are central to modeling the behavior of a system, a subsystem,
or a class. Each one shows a set of use cases and actors and their relationships. You apply use
case diagrams to model the use case view of a system. For the most part, this involves
modeling the context of a system, subsystem, or class, or modeling the requirements of the
behavior of these elements. Use case diagrams are important for visualizing, specifying, and
documenting the behavior of an element. They make systems, subsystems, and classes
approachable and understandable by presenting an outside view of how those elements may
be used in context.




                                    Figure 3: Use case diagram

7.2 Class Diagram
A class diagram shows a set of classes, interfaces, and collaborations and their relationships.
These diagrams are the most common diagram found in modeling object-oriented systems.
Class diagrams address the static design view of a system. Class diagrams that include active
classes address the static process view of a system.




                                  Figure 4: Class Diagram



                                             507
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME
8. CONCLUSION

        The product that we are trying to develop will completely revolutionize the way
people are going to use the computer system. Presently, the webcam, microphone and mouse
are an integral part of the computer system. Our product which uses only two of them i.e.
webcam and microphone would completely eliminate the mouse. Also this would lead to a
new era of Human Computer Interaction (HCI) where no physical contact with the device is
required. This technology can be further enhanced for use in robotics, gaming and
developing systems which could understand human behavior based on their way of
interaction.

9. REFERENCES

[1]    Attila Licsar and Tamas Sziranyi, IEEE Int, “Hand Gesture based film Restoration ACM
       Cybernetics” pp. 215-22, 2004.
[2]    Asanterabi Malima, Erol Özgür, and Müjdat Çetin “A Fast Algorithm For Vision-Based
       Hand Gesture Recognition for Robot Control”IEEE Int, Conf. Systems, Man,
       Cybernetics. pp. 289-296, 2008.
[3]    Vafadar and Alireza Behrad “Human Hand Gesture Recognition Using spatio- Temporal
       Volume for Human Computer Interaction” IEE Int. Symposium on Telecommunications,
       pp.713-718,2008.
[4]    Byung-Woo Min, Ho-Sub Yoon, Jung Soh, Yun-Mo Yang and Toshiaki Ejima, “Hand
       Gesture Recognition Using Hidden Markov model” IEEE Int Conf. systems,Man, and
       Cybernetics, 1997 , pp 4232-4235.1997.
[5]    Wing Kwong Chung, Xinyu Wu and Yangsheng Xu, “A Realtime Hand Gesture
       Recognition based on Haar Wavelet Representation” Proceedings of the 2008 IEEE Int.
       Conf. on Robotics and Biomimetics , pp.336-341,2008.
[6]    Mu-Chun Su “A Fuzzy Rule-Based Approach to Spatio-Temporal Gesture Recognition”
       IEEETrans on systems, Man, And Cybernetics –Part C: Applications and Reviewws,
       Vol.30No 2, pp276-281,2000.
[7]    Byung-Woo Min, Ho-Sub Yoon, Jung Soh, Yun-Mo Yang and Toshiaki Ejima, “Hand
       Gesture Recognition Using Hidden Markov model” IEEE Int. Conf. systems,Man, And
       Cybernetics, 1997 , pp 4232-4235,1997.
[8]    M K Bhuyan, D Ghosh, P K Bora,“Feature Extraction from 2D Gesture Trajectory in
       Dynamic hand Gesture Recognition” 1–4244–0023–6/06/ IEEE pp. 432-441,2006.
[9]    Shewta Yewale ,Pankaj Bharne,“Artificial neural network approach for hand gesture
       recognition” International journal of engineering Science and technology, Vol 3 No.4,pp.
       2603-2608,2011.
[10]   Zhou Ren Jing jing Meng Junsong Yuan,Zhengou Zhang, "Robust hand gesture
       recognition with kinect Sensor” ACM multimedia 2011 conference ,pp. 978-979.2011.
[11]   Hamid A Jalab,“Static hand Gesture recognition for human computer interaction” Asian
       Network for Scientific Information information technology journal, pp. 1-7,2012.
[12]   Qing Chen, Nicolas D. Georganas, “Hand Gesture Recognition Using Haar-Like
       Features and a Stochastic Context-Free Grammar”, IEEE transactions on instrumentation
       and measurement, vol. 57, no. 8, pp.1562-1571,2008.
[13]   Christopher Lee and Yangsheng Xu, “Online, interactive learning of gestures for human
       robot interfaces” Carnegie Mellon University, The Robotics Institute, Pittsburgh,
       Pennsylvania, USA,1996.



                                               508
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –
6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 3, October-December (2012), © IAEME
[14] Vafadar,Alireza Behrad,“Human Hand Gesture Recognition Using spatio-Temporal
     Volume for Human Computer Interaction” IEE Int. Symposium on Telecommunications,
     pp.713-718,2008.
[15] Ankit Vidyarthi and Ankita Kansal, “A Survey Report On Digital Images Segmentation
     Algorithms” International journal of Computer Engineering & Technology (IJCET),
     Volume3, Issue2, 2012, pp. 85 - 91, Published by IAEME
[16] Gopal Thapa, Kalpana Sharma and M.K.Ghose, “Multi Resolution Motion Estimation
     Techniques for Video Compression: A Survey” International journal of Computer
     Engineering & Technology (IJCET), Volume3, Issue2, 2012, pp. 399 - 406, Published by
     IAEME
[17] Reeja S R and Dr. N. P Kavya, “Motion Detection For Video Denoising – The State Of
     Art And The Challenges” International journal of Computer Engineering & Technology
     (IJCET), Volume3, Issue2, 2012, pp. 518 - 525, Published by IAEME
[18] Steven Lawrence Fernandes and Dr. G Josemin Bala, “Analysing Recognition Rate Of
     Lda And Lpp Based Algorithms For Face Recognition” International journal of Computer
     Engineering & Technology (IJCET), Volume3, Issue2, 2012, pp. 115 - 125, Published by
     IAEME
[19] V.Radhakrishna,C.Srinivas and Dr.C.V.Guru Rao, “High Performance Pattern Search
     Algorithm Using Three Sliding Windows” International journal of Computer Engineering
     & Technology (IJCET), Volume3, Issue2, 2012, pp. 543 - 552, Published by IAEME




                                          509

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:14
posted:1/12/2013
language:
pages:9