Modeling the Constraints of Human Hand Motion

Document Sample
Modeling the Constraints of Human Hand Motion Powered By Docstoc
					                        Modeling the Constraints of Human Hand Motion

                                      John Lin, Ying Wu, Thomas S. Huang
                                                   Beckman Institute
                                      University of Illinois at Urbana-Champaign
                                                   Urbana, IL 61801
                                        {jy-lin, yingwu, huang}

                        Abstract                              capturing involves finding the global hand movement and
     Hand motion capturing is one of the most important       local finger motion such that the hand posture can be
parts of gesture interfaces. Many current approaches to       recovered. One possible way to analyze hand motion is
this task generally involve a formidable nonlinear            the appearance-based approach, which emphasizes the
optimization problem in a large search space. Motion          analysis of hand shapes in images [5]. However, local
capturing can be achieved more cost-efficiently when          hand motion is very hard to estimate by this means.
considering the motion constraints of a hand. Although        Another possible way is the model-based approach [3, 4,
some constraints can be represented as equalities or          6, 7, 9]. With a single calibrated camera, local hand
inequalities, there exist many constraints, which cannot be   motion parameters can be estimated by fitting a 3D hand
explicitly represented. In this paper, we propose a           model to the observation images.
learning approach to model the hand configuration space            One method of model-based approaches is to use
directly. The redundancy of the configuration space can       gradient-based constrained nonlinear programming
be eliminated by finding a lower-dimensional subspace of      techniques to estimate the global and local hand motion
the original space. Finger motion is modeled in this          simultaneously [6]. The drawback of this approach is that
subspace based on the linear behavior observed in the         the optimization is often trapped in local minima. Another
real motion data collected by a CyberGlove. Employing         idea is to model the surface of the hand and estimate hand
the constrained motion model, we are able to efficiently      configurations     using       the    ‘‘analysis-by-synthesis’’
capture finger motion from video inputs.            Several   approach [3]. Candidate 3D models are projected to the
experiments show that our proposed model is helpful for       image plane and the best match is found with respect to
capturing articulated motion.                                 some similarity measurement. Essentially, it is a search
                                                              problem in a very high dimensional space that makes this
1    Introduction                                             method computational intensive. A decomposition
                                                              method is also adopted to analyze articulated hand motion
                                                              by separating hand motion into its global motion and local
     In recent years, there has been a significant effort
                                                              finger motions[9].
devoted to gesture recognition and related work in body
                                                                   Although the 3D model-based approach makes
motion analysis due to interest in a more natural and
                                                              motion capturing from monocular images possible, it also
immersive Human Computer Interaction (HCI). As the
                                                              faces some challenging difficulties. Many current
cost for more powerful computers decreases and PCs
                                                              methods for hand posture estimation basically involve the
become more popular, a more natural interface is desired
                                                              problem of searching for the optimal hand posture in a
rather than the traditional input devices such as mouse and
                                                              huge hand configuration space, due to the high DoF in
keyboard. Using gestures, as one of the most natural ways
                                                              hand geometry. Such a search process is computationally
humans communicate with each other, thus becomes an
                                                              expensive and the optimization is prone to local minima.
apparent choice for a more natural interface. An effective
                                                              At the same time, many current approaches suffer from
recognition of hand gestures will provide major
advantages not only in virtual environments and other HCI
                                                                   However, although the human hand is a highly
applications, but also in areas such as teleconferencing,
                                                              articulated object, it is also highly constrained. There are
surveillance, and human animation.
                                                              dependencies among fingers and joints. Applying the
     Recognizing hand gestures, however, involves
                                                              motion constraints among fingers and finger joints can
capturing the motion of a highly articulated human hand
                                                              greatly reduce the size or dimensions of the search space,
with roughly 30 degrees of freedom (DoF). Hand motion
                                                              which in turn makes the estimation of hand postures more
cost-efficient. Another major advantage of applying hand            The human hand is highly articulated. To model the
motion constraints is to be able to synthesize natural hand    articulation of fingers, the kinematical structure of hand
motion and produce realistic hand animation, which             should be modeled. In our research, the skeleton of a hand
would be very useful to synthesize sign languages.             can be abstracted as stick figure with each finger as a
     There has not been much done regarding the study of       kinematical chain with base frame at the palm and each
hand constraints other than the commonly used ones.            fingertip as the end-effecter. Such a hand kinematical
Even though constraints would help reduce the size of the      model is shown in Figure 1 with the names of each joint.
search space, too many or too complicated constraints          This kinematical model has 27 Degrees of Freedom
would also add to computational complexity. Which              (DoF).
constraints to adopt becomes an important issue. Some               Each of the four fingers has four DoF. The distal
constraints have already been presented, studied, and used     interphalangeal (DIP) joint and proximal interphalangeal
in many previous works [3, 4, 9]. The common ones              (PIP) joint each has one DoF and the metacarpo-
include the constraints of joints within the same finger,      phalangeal (MCP) joint has two DoF due to flexion and
constraints of joints between fingers, and the maximum         abduction. The thumb has a different structure from the
range of finger motions. All these are presented as either     other four fingers and has five degrees of freedom, one for
equalities or inequalities. However, due to the large          the interphalangeal (IP) joint, and two for each of the
variation in finger motion, there are yet more constraints     thumb MCP joint and trapeziometacarpal (TM) joint both
that cannot be explicitly represented by equations.            due to flexion and abduction. The fingers together have
     In this paper we propose a learning approach to model     21DoF. The remaining 6 degrees of freedom are from the
the constraints directly from sampled data in the hand         rotational and translational motion of the palm with 3 DoF
configuration space (C-Space). Each point in this hand         each. These 6 parameters are ignored since we will only
configuration space corresponds to a set of joint angles of    focus on the estimation of the local finger motions rather
a hand state, which is commonly estimated in model-based       than the global motion.
approaches. Rather than studying the global hand motion,            Articulated local hand motion, i.e. finger motion, can
we will focus only on the analysis of local finger motions     be represented by a set of joint angles θ, or the hand state.
and constraints with the help of a CyberGlove developed        In order to capture the hand motion, glove-based devices
by Virtual Technologies Inc. Moreover, we will study the       have been developed to directly measure the joint angles
constraints of hand motions that are natural and feasible to   and spatial positions by attaching a number of sensors to
everyone.                                                      hand joints. CyberGlove is such a device.
     In section 2, a description of a commonly adopted              The goal of vision-based analysis of hand gesture is to
hand kinematical model and the CyberGlove is given.            estimate the hand joint angles or hand states without using
Section 3 describes how we model the configuration space       such physical devices but solely based on visual
and the observations of this model. Section 4 shows the        information. However, such glove-based device can help
results of some preliminary examples of hand posture           collecting ground truth data, which enable the modeling
estimation taking advantage of this model. Section 5           and learning processes in visual analysis.
concludes our work and discusses some future directions             In our study, we employ a right-handed CyberGlove.
regarding modeling human motion constraints.                   The glove has four sensors for the thumb, a MCP and a
                                                               PIP sensor for the PIP ( θ PIP ) and MCP ( θ MCP− F ) flexion
2      Hand skeleton model                                     angles for each of the four fingers, three more abduction
                                                               sensors for the abduction/adduction angles ( θ MCP− AA )

                                            Index              between these four fingers. There are total of fifteen
                                                               sensor readings of the finger joint angles; therefore we are
                    Pinky                                      able to characterize the local finger motion by 15
                                                               parameters. The glove can be calibrated to accurately
              DIP                                              measure the angle within 5 degrees, which is acceptable
                                                               for gesture recognition. For finger postures that are five
                                                               degrees different would still appear to be the same posture.
                                                               3    Modeling the constraints

                                                                    Modeling motion constraints is crucial to effective
                                                               and efficient motion capturing. A comprehensive study of
    Figure 1: Kinematical structure and joint notations        hand/finger motion constraints and a learning approach to
                                                               modeling the natural movement constraints are given in
                                                               this section.
3.1 Constraints overview                                        constraints are often called dynamic constraints and can
                                                                be subdivided into intra-finger constraints and inter-finger
     Hand/finger motion is constrained so that hand cannot      constraints.       The intra-finger constraints are the
make arbitrary gestures. There are many examples of such        constraints between joints of the same finger.             A
constraints. For instance, fingers cannot bent backward too     commonly used one based on hand anatomy states that in
much and the pinky finger cannot be bend without                order to bend the DIP joints, the PIP joints must also be
bending the ring finger. The natural movements of human         bend for the index, middle, ring and little fingers. The
hands are implicitly caused by such motion constraints.         relations can be approximated as following:
     Some motion constraints may have a closed form                                             2
                                                                                       θ DIP = θ PIP .                   (4)
representation, such that they are often employed in                                            3
current research of animation and visual motion capturing       By combining Eq 2-4, we are able to reduce the model
[3, 4, 9]. However, a large number of motion constraints        with 21 DoF to one that is approximated by 15 DoF.
are very difficult to be expressed in closed forms. How to      Experiments in previous work have shown that postures
model such constraints is still needs further investigation.    can be estimated using these constraints without severe
Here we present some of the most commonly used motion           degradation in performance.
constraints and justify the use of 15 parameters to                   Inter-finger constraints refer to the ones imposed on
represent the hand motion.                                      joints between fingers. For instance, when one bends his
     Hand constraints can be roughly divided into three         index finger at MCP joint, he would naturally have to
types. Type I constraints are the limits of finger motions      bend the middle MCP joint as well. Many of such Type
as a result of hand anatomy which is usually referred to as     II constraints and related equations can be found in [2, 4].
static constraints. Type II constraints are the limits          However, there are yet more constraints that can not be
imposed on joints during motion, which is usually referred      explicitly represented in equations.
to as dynamic constraints in previous work. Type III                  Type III constraints. These constraints are imposed
constraints are applied in performing natural motion,           by the naturalness of hand motions and are more subtle to
which has not yet been explored. Below we will describe         detect. Almost nothing has been done to account for
each type in more detail.                                       these constraints in simulating a natural hand motion.
     Type I constraints. This type of constraint refers to      Type III constraints differs from Type II in that they have
the limits of the range of finger motions as a result of hand   nothing to do with limitations imposed by hand anatomy,
anatomy. We will only consider the range of motion of           but rather are a result of common and natural movements.
each finger that can be achieved without applying external      Even though the naturalness of hand motions is different
forces such as bending fingers backward using the other         from person to person, it is similar for everybody. For
hand. This type of constraints is usually represented using     instance, the most natural way for every person to make a
the following inequalities:                                     fist from an open hand would be to curl all the fingers at
                       0° ≤ θ MCP − F ≤ 90° ,                   the same time instead of curling one finger at a time. This
                        0° ≤ θ PIP ≤ 110° ,                     type of constraint also can not be explicitly represented by
                      0° ≤ θ DIP ≤ 90° , and
                   −15° ≤ θ MCP − AA ≤ 15° .              (1)   3.2 Modeling the constraints in C-space
     Another commonly adopted constraint states that
middle finger displays little abduction/adduction motion            It is difficult to explicitly represent the constraints of
and the following approximation is made for middle              natural hand motions in closed form. However, they can
finger:                                                         be learned from a large and representative set of training
                       θ MCP − AA = 0 .                   (2)   samples; therefore we propose to construct the
This will reduce one DoF from the 21 DoF model.                 configuration space (i.e., joint angle space) and learn the
     Similarly, the TM joint also displays limited              constraints directly from the empirical data using the
abduction motion and will be approximated by 0 as well.         approach described below. For notational convenience, let
                        θTM − AA = 0 .                    (3)   us denote the feasible C-space by Φ ⊂ ℜ15 with each
As a result, the thumb motion will be characterized by 4        configuration denoted by φ = (θ1 ,θ 2 , θ15 ) .

parameters instead of 5.                                            1. Locating base states ζ i in Φ . We will directly
     Finally, the index, middle, ring, and little fingers are
                                                                locate the base states by fixing the hand in desired
planar manipulators. i.e. the DIP, PIP and MCP joint of
                                                                configurations and measure the 15 parameters associated
each finger move in one plane since DIP and PIP joints
                                                                with the corresponding state. Since the sensors are very
only has 1 DoF for flexion.
                                                                sensitive to finger movements, little variations in finger
     Type II constraints. This type of constraint refers to
                                                                postures will also be recorded and will be considered as
the limits imposed on joints during finger motions. These
the same state. As a result, we will use the centroid from                                           2
                                                                                                                            Making a fist

the set of training data Di = {xij , j = 1 N } as the location

of the base state ζ i . Another alternative would be to
collect huge set of training samples xi from predefined

                                                                 Joint angles measured in radian
motions and apply a clustering algorithm in order to locate                                         0.5

the base states. However, since we have full control of
how a hand must be configured to form the base state, we
do not need to apply clustering algorithms to locate the                                           −0.5
base states in C-space.



                                                                                                          0   50   100          150             200   250    300
                                                                                                                           motion samples

                                                                 Figure 3: Joint angle measurements from the motion of
                                                                 making and opening a fist.

            Figure 2a: Some base hand states                          3. Dimensionality reduction. From Figure 3, we can
                                                                 clearly observe some correlations in the joint angle
                                                                 measurements. Therefore, together with the data collected
                                                                 from static states and the finger motions, we then perform
                                                                 Principal Component Analysis (PCA) to reduce the
                                                                 dimension of the model and thus reduce the search space
                                                                 while preserving the components with the highest energy.
                                                                 We note that 95% of the energy is contained in the 7
          Figure 2b: Unfeasible configurations                   dimensions that have the largest eigenvalues. We thus
In our model, the hand gestures are roughly classified into      perform the mapping ℜ15 → ℜ 7 on Φ by projecting the
32 discrete states by quantizing each finger into one of the     original model onto a lower dimensional subspace
two states: fully extended or curled. The reason for              Φ c ⊂ ℜ 7 with principle directions associated with these 7
choosing these two states is that the entire motions of a        largest eigenvalues.
finger falls roughly between these two states. Therefore
the whole set of 32 states will roughly characterize the              4. Interpolation in compressed C-space. Once a set of
entire hand motion (Figure 2). However, since not                base states ζ i have been determined, the whole feasible
everyone is able to bend the pinky without bending the           configuration space Φ can be approximated by these base
ring finger or with the help of thumb to hold the pinky,         states ζ i and an interpolation scheme. Our approach
four of the states will not be achievable by everyone
without applying external forces. Therefore, these four          takes a linear interpolation in the lower-dimensional
states (Figure 2b) are not included in our set of base states    configuration subspace Φ c . For each configuration φ c in
in C-space modeling. Finally, the configurations that are         Φ c we will represent its parameters using a polynomial
similar are considered as the same state. For instance, the      interpolation, i.e.,
cases with five fingers opening wide apart and with all                                                                     28
fingers straightened but closed together are considered the
                                                                                                                    φc =   ∑α ζ
                                                                                                                            i =1
                                                                                                                                    i i
                                                                                                                                            ,               (5)

     2. Motion modeling. With the set of base states ζ i         in which ζ i is the location of base state i and α i is the
established, we then collect motion data for state
transitions in order to model the configurations during          parameters for φ c .
natural hand motion. A large number of sets of motion
data are collected in order to observe the Type II and III       3.3 Model characteristics
constraints of natural hand motions. An example of
motion between making and opening a fist is shown in                 Our model has three main advantages that will help
Figure 3.                                                        reduce the search space in gesture recognition. First, the
model is compact due to the dimensionality reduction                                                                                    base states.     The input images are assumed to be
using PCA.         Second, the motion constraints are                                                                                   segmented.
automatically incorporated into the model. Third, a linear
behavior is observed in the state transitions in C-space.                                                                               4.1 General approach
The reason that motion constraints are incorporated into
this model is because we sample directly from natural                                                                                        Using the result we observe from the linear behavior,
hand motions. Configurations that are outside of                                                                                        we are able to approximate a configuration by taking the
permissible range limited by hand anatomy will not be                                                                                   following steps:
achievable in natural hand motions. Consequently, the                                                                                   1. In the training stage, first associate each base state
inequalities and equalities including the intra-finger
constraints [3, 4], such as θ MCP − F = kθ PIP with k ≥ 0 , and                                                                              ζ ic with a feature vector ψ i .
inter-finger constraints [4] are automatically covered in                                                                               2.   Extract features ψ input from the input 2D image, such
this model.                                                                                                                                  as edge, area, centroid, etc.
  2                                                                                                                                     3.   Compute α i = h(ψ i ,ψ input ) , where h(ψ i ,ψ input )
                                                                                                                                             measures the closeness of ψ input to ψ i .

  1                                                             1

                                                                0                                                                       4.   Based on the observation made from Type III motion

                                                               −1                                                                            constraints, linearly interpolate the estimated

                                                                                                                                             configuration in compressed space Φ c :
−1.5                                                                                                                                3
                                                                                     −0.5                                       2
                                                                                            −1                              1
                                                                                             −1.5                  −1

                                                                                                                                                                            ∑α ζ
 −2                                                                                                           −2
 −2.5   −2   −1.5   −1   −0.5   0   0.5   1   1.5   2   2.5                                         −2

                                                                                                                                                             φ estimate =
                                                                                                                                                                                   i i
Figure 4: Motion                                              Figure 5: Motion                                                                                              i =1
transistions between                                          transistions between                                                      5.   Reconstruct the estimated configuration state
four states between.                                          eight states.                                                                  φ estimate ⊂ ℜ15 from φestimate .

     An interesting phenomenon regarding the Type III
motion constraints is observed from the motion data. We                                                                                 4.2 Experimental results
observe a nearly linear transition between states in C-
space. An example is shown for the case of transitioning
between four states from the result of moving only index
and middle fingers (Figure 4). We have projected the C-
space into ℜ 2 for observation in this case. The four
corners are the locations of the four discrete base states. A
linear transition is clearly observed from Figure 4. The
middle lines are the path resulting from curling and
extending both fingers together. This result reflects the
high correlation between fingers when performing natural
movements.        Although state transitions does not
necessarily need to be performed in this manner and there                                                                               Figure 6: Configuration estimations.
exists infinitely many ways to move from one
configuration to another, when the fingers are moving in
their most natural way, it will take a nearly straight line
path in C-space. This observation will justify Eq 5 in
estimating the hand configurations. Another example is
shown with three-finger motions projected in ℜ3 . The                                                                                        (a)           (b)               (c)          (d)
eight base states are roughly located at the eight corners of
a cube (Figure 5).                                                                                                                      Figure 7: Comparison of different technique.                   (a)
                                                                                                                                        original image. (b) estimation without Type II &               III
                                                                                                                                        constraints.    (c) estimation without Type                    III
4            Experiments
                                                                                                                                        constraints. (d) estimation with Type I, II &                  III
    In order to evaluate the validity of this model, we
perform some experiments using low-level visual features
                                                                                                                                            The results of the experiments are shown in Figure 6.
and estimate the postures constituted by a subset of the 28
                                                                                                                                        The first row shows some input images and the second
row shows the reconstructed 3D hand model based on             configuration estimation. Nevertheless, such modeling
estimation by our approach. The results are visually           provides a different interpretation of hand motions and the
agreeable. Such preliminary experiments show that the          current results look promising.
motion constraints play an important role in hand posture
estimation. More accurate and cost-efficient estimation        Acknowledgement
can be obtained when a better motion constraint model can
be applied. Better result can be obtained with better              This work was supported in part by National Science
feature extraction methods, which will be implemented in       Foundation Alliance Program and Grant CDA 96-24396.
the future research.
     A comparison of estimations using different types of      References
constraints is also shown in Figure 7. In Figure 7(b),
estimation without applying Type II and III constraints            [1] C. Chang, W. Tsai, “Model-Based Analysis of Hand
result in a feasible, yet unnatural configuration. In Figure           Gestures From Single Images Without Using Marked
7(b), a closer approximation is obtained without applying              Gloves Or Attaching Marks on Hands”, ACCV2000,
Type III constraints. The DIP and PIP joints should bend               pp.923-930, 2000
more to approximate a fist. Finally, by applying all three
types of constraints together produce the better result with       [2] C.S. Chua, H. Y. Guan and Y. K. Ho, “Model-based
a more natural approximation in Figure 7(c).                           Finger Posture Estimation”, ACCV2000, pp.43-48,

5    Conclusion/Future Development                                 [3] J. Kuch and Thomas S. Huang, “Vision-Based Hand
                                                                       Modeling and Tracking for Virtual Teleconferencing
     A posture estimation problem generally involves a                 and Telecollaboration”, ICCV95, pp.666-671, 1995.
search in high dimensional C-space.           Useful hand
constraints have been demonstrated to be able to greatly           [4] J. Lee, T. Kunii, “Model-based Analysis of Hand
reduce the search space, and thus improve gesture                      Posture”, IEEE Computer Graphics and Applications,
                                                                       Sept., pp.77-86, 1995.
recognition results. Many constraints can be represented
in simple closed forms while many more can not and have            [5] V. Pavlovic, R. Sharma, Thomas S. Huang, “Visual
not been found.                                                        Interpretation of Hand Gestures for Human-Computer
     In this paper, we presented a novel approach to model             Interaction: A Review,”, IEEE PAMI, Vol. 19, No. 7,
the hand constraints. Our model has three characteristics.             July, pp.677-695, 1997
First, it is compact by utilizing PCA technique. Second, it
incorporates constraints that can and cannot be represented        [6] J. Rheg, T. Kanade, “Model-Based Tracking of Self-
by equations. Third, it displays a linear behavior in state            Occluding Articulated Objects”, IEEE Int’l Conf.
transitioning as a result of natural motion.         These             Computer Vision, pp.612-617. 1995.
properties together simplify configuration estimation in C-        [7] N. Shimada, et al., “Hand Gesture Estimation and
space as shown in Eq 5 by a simple interpolation with                  Model Refinement Using Monocular Camera-
linear polynomials. Some preliminary gesture estimation                Ambiguity Limitatio by Inequalty Constraints”, Proc.
experiments are shown, taking advantage of this model.                 Of the 3rd Conf. On Face and Gesture Recognition,
     However, there is still much to be done to improve                1998.
this model. For instance, more states can be included to
further refine the model. Deciding which states to choose          [8] Y. Wu, Thomas S. Huang, “Human Hand Modeling,
will require more analysis of the C-space. Furthermore,                Analysis and Animation in the Context of HCI”,
other constraints might exist in the C-space that haev not             ICIP99, Japan, Oct., 1999.
yet been observed. Finally, even though a nearly linear            [9] Y. Wu, Thomas S. Huang, “Capturing Human Hand
behavior is observed in state transition, it is not exactly            Motion: A Divide-and-Conquer Approach”, IEEE Int’l
linear. A more detailed study can better approximate the               Conf. Computer Vision, Greece, 1999.
trajectories, which in turn would help improve the