Document Sample

Modeling the Constraints of Human Hand Motion John Lin, Ying Wu, Thomas S. Huang Beckman Institute University of Illinois at Urbana-Champaign Urbana, IL 61801 {jy-lin, yingwu, huang}@ifp.uiuc.edu Abstract capturing involves finding the global hand movement and Hand motion capturing is one of the most important local finger motion such that the hand posture can be parts of gesture interfaces. Many current approaches to recovered. One possible way to analyze hand motion is this task generally involve a formidable nonlinear the appearance-based approach, which emphasizes the optimization problem in a large search space. Motion analysis of hand shapes in images [5]. However, local capturing can be achieved more cost-efficiently when hand motion is very hard to estimate by this means. considering the motion constraints of a hand. Although Another possible way is the model-based approach [3, 4, some constraints can be represented as equalities or 6, 7, 9]. With a single calibrated camera, local hand inequalities, there exist many constraints, which cannot be motion parameters can be estimated by fitting a 3D hand explicitly represented. In this paper, we propose a model to the observation images. learning approach to model the hand configuration space One method of model-based approaches is to use directly. The redundancy of the configuration space can gradient-based constrained nonlinear programming be eliminated by finding a lower-dimensional subspace of techniques to estimate the global and local hand motion the original space. Finger motion is modeled in this simultaneously [6]. The drawback of this approach is that subspace based on the linear behavior observed in the the optimization is often trapped in local minima. Another real motion data collected by a CyberGlove. Employing idea is to model the surface of the hand and estimate hand the constrained motion model, we are able to efficiently configurations using the ‘‘analysis-by-synthesis’’ capture finger motion from video inputs. Several approach [3]. Candidate 3D models are projected to the experiments show that our proposed model is helpful for image plane and the best match is found with respect to capturing articulated motion. some similarity measurement. Essentially, it is a search problem in a very high dimensional space that makes this 1 Introduction method computational intensive. A decomposition method is also adopted to analyze articulated hand motion by separating hand motion into its global motion and local In recent years, there has been a significant effort finger motions[9]. devoted to gesture recognition and related work in body Although the 3D model-based approach makes motion analysis due to interest in a more natural and motion capturing from monocular images possible, it also immersive Human Computer Interaction (HCI). As the faces some challenging difficulties. Many current cost for more powerful computers decreases and PCs methods for hand posture estimation basically involve the become more popular, a more natural interface is desired problem of searching for the optimal hand posture in a rather than the traditional input devices such as mouse and huge hand configuration space, due to the high DoF in keyboard. Using gestures, as one of the most natural ways hand geometry. Such a search process is computationally humans communicate with each other, thus becomes an expensive and the optimization is prone to local minima. apparent choice for a more natural interface. An effective At the same time, many current approaches suffer from recognition of hand gestures will provide major self-occlusion. advantages not only in virtual environments and other HCI However, although the human hand is a highly applications, but also in areas such as teleconferencing, articulated object, it is also highly constrained. There are surveillance, and human animation. dependencies among fingers and joints. Applying the Recognizing hand gestures, however, involves motion constraints among fingers and finger joints can capturing the motion of a highly articulated human hand greatly reduce the size or dimensions of the search space, with roughly 30 degrees of freedom (DoF). Hand motion which in turn makes the estimation of hand postures more cost-efficient. Another major advantage of applying hand The human hand is highly articulated. To model the motion constraints is to be able to synthesize natural hand articulation of fingers, the kinematical structure of hand motion and produce realistic hand animation, which should be modeled. In our research, the skeleton of a hand would be very useful to synthesize sign languages. can be abstracted as stick figure with each finger as a There has not been much done regarding the study of kinematical chain with base frame at the palm and each hand constraints other than the commonly used ones. fingertip as the end-effecter. Such a hand kinematical Even though constraints would help reduce the size of the model is shown in Figure 1 with the names of each joint. search space, too many or too complicated constraints This kinematical model has 27 Degrees of Freedom would also add to computational complexity. Which (DoF). constraints to adopt becomes an important issue. Some Each of the four fingers has four DoF. The distal constraints have already been presented, studied, and used interphalangeal (DIP) joint and proximal interphalangeal in many previous works [3, 4, 9]. The common ones (PIP) joint each has one DoF and the metacarpo- include the constraints of joints within the same finger, phalangeal (MCP) joint has two DoF due to flexion and constraints of joints between fingers, and the maximum abduction. The thumb has a different structure from the range of finger motions. All these are presented as either other four fingers and has five degrees of freedom, one for equalities or inequalities. However, due to the large the interphalangeal (IP) joint, and two for each of the variation in finger motion, there are yet more constraints thumb MCP joint and trapeziometacarpal (TM) joint both that cannot be explicitly represented by equations. due to flexion and abduction. The fingers together have In this paper we propose a learning approach to model 21DoF. The remaining 6 degrees of freedom are from the the constraints directly from sampled data in the hand rotational and translational motion of the palm with 3 DoF configuration space (C-Space). Each point in this hand each. These 6 parameters are ignored since we will only configuration space corresponds to a set of joint angles of focus on the estimation of the local finger motions rather a hand state, which is commonly estimated in model-based than the global motion. approaches. Rather than studying the global hand motion, Articulated local hand motion, i.e. finger motion, can we will focus only on the analysis of local finger motions be represented by a set of joint angles θ, or the hand state. and constraints with the help of a CyberGlove developed In order to capture the hand motion, glove-based devices by Virtual Technologies Inc. Moreover, we will study the have been developed to directly measure the joint angles constraints of hand motions that are natural and feasible to and spatial positions by attaching a number of sensors to everyone. hand joints. CyberGlove is such a device. In section 2, a description of a commonly adopted The goal of vision-based analysis of hand gesture is to hand kinematical model and the CyberGlove is given. estimate the hand joint angles or hand states without using Section 3 describes how we model the configuration space such physical devices but solely based on visual and the observations of this model. Section 4 shows the information. However, such glove-based device can help results of some preliminary examples of hand posture collecting ground truth data, which enable the modeling estimation taking advantage of this model. Section 5 and learning processes in visual analysis. concludes our work and discusses some future directions In our study, we employ a right-handed CyberGlove. regarding modeling human motion constraints. The glove has four sensors for the thumb, a MCP and a PIP sensor for the PIP ( θ PIP ) and MCP ( θ MCP− F ) flexion 2 Hand skeleton model angles for each of the four fingers, three more abduction sensors for the abduction/adduction angles ( θ MCP− AA ) Middle Ring Index between these four fingers. There are total of fifteen sensor readings of the finger joint angles; therefore we are Pinky able to characterize the local finger motion by 15 parameters. The glove can be calibrated to accurately DIP measure the angle within 5 degrees, which is acceptable Thumb for gesture recognition. For finger postures that are five PIP degrees different would still appear to be the same posture. IP MCP 3 Modeling the constraints MCP Modeling motion constraints is crucial to effective and efficient motion capturing. A comprehensive study of Figure 1: Kinematical structure and joint notations hand/finger motion constraints and a learning approach to modeling the natural movement constraints are given in this section. 3.1 Constraints overview constraints are often called dynamic constraints and can be subdivided into intra-finger constraints and inter-finger Hand/finger motion is constrained so that hand cannot constraints. The intra-finger constraints are the make arbitrary gestures. There are many examples of such constraints between joints of the same finger. A constraints. For instance, fingers cannot bent backward too commonly used one based on hand anatomy states that in much and the pinky finger cannot be bend without order to bend the DIP joints, the PIP joints must also be bending the ring finger. The natural movements of human bend for the index, middle, ring and little fingers. The hands are implicitly caused by such motion constraints. relations can be approximated as following: Some motion constraints may have a closed form 2 θ DIP = θ PIP . (4) representation, such that they are often employed in 3 current research of animation and visual motion capturing By combining Eq 2-4, we are able to reduce the model [3, 4, 9]. However, a large number of motion constraints with 21 DoF to one that is approximated by 15 DoF. are very difficult to be expressed in closed forms. How to Experiments in previous work have shown that postures model such constraints is still needs further investigation. can be estimated using these constraints without severe Here we present some of the most commonly used motion degradation in performance. constraints and justify the use of 15 parameters to Inter-finger constraints refer to the ones imposed on represent the hand motion. joints between fingers. For instance, when one bends his Hand constraints can be roughly divided into three index finger at MCP joint, he would naturally have to types. Type I constraints are the limits of finger motions bend the middle MCP joint as well. Many of such Type as a result of hand anatomy which is usually referred to as II constraints and related equations can be found in [2, 4]. static constraints. Type II constraints are the limits However, there are yet more constraints that can not be imposed on joints during motion, which is usually referred explicitly represented in equations. to as dynamic constraints in previous work. Type III Type III constraints. These constraints are imposed constraints are applied in performing natural motion, by the naturalness of hand motions and are more subtle to which has not yet been explored. Below we will describe detect. Almost nothing has been done to account for each type in more detail. these constraints in simulating a natural hand motion. Type I constraints. This type of constraint refers to Type III constraints differs from Type II in that they have the limits of the range of finger motions as a result of hand nothing to do with limitations imposed by hand anatomy, anatomy. We will only consider the range of motion of but rather are a result of common and natural movements. each finger that can be achieved without applying external Even though the naturalness of hand motions is different forces such as bending fingers backward using the other from person to person, it is similar for everybody. For hand. This type of constraints is usually represented using instance, the most natural way for every person to make a the following inequalities: fist from an open hand would be to curl all the fingers at 0° ≤ θ MCP − F ≤ 90° , the same time instead of curling one finger at a time. This 0° ≤ θ PIP ≤ 110° , type of constraint also can not be explicitly represented by equations. 0° ≤ θ DIP ≤ 90° , and −15° ≤ θ MCP − AA ≤ 15° . (1) 3.2 Modeling the constraints in C-space Another commonly adopted constraint states that middle finger displays little abduction/adduction motion It is difficult to explicitly represent the constraints of and the following approximation is made for middle natural hand motions in closed form. However, they can finger: be learned from a large and representative set of training θ MCP − AA = 0 . (2) samples; therefore we propose to construct the This will reduce one DoF from the 21 DoF model. configuration space (i.e., joint angle space) and learn the Similarly, the TM joint also displays limited constraints directly from the empirical data using the abduction motion and will be approximated by 0 as well. approach described below. For notational convenience, let θTM − AA = 0 . (3) us denote the feasible C-space by Φ ⊂ ℜ15 with each As a result, the thumb motion will be characterized by 4 configuration denoted by φ = (θ1 ,θ 2 , θ15 ) . parameters instead of 5. 1. Locating base states ζ i in Φ . We will directly Finally, the index, middle, ring, and little fingers are locate the base states by fixing the hand in desired planar manipulators. i.e. the DIP, PIP and MCP joint of configurations and measure the 15 parameters associated each finger move in one plane since DIP and PIP joints with the corresponding state. Since the sensors are very only has 1 DoF for flexion. sensitive to finger movements, little variations in finger Type II constraints. This type of constraint refers to postures will also be recorded and will be considered as the limits imposed on joints during finger motions. These the same state. As a result, we will use the centroid from 2 Making a fist the set of training data Di = {xij , j = 1 N } as the location ¡ 1.5 of the base state ζ i . Another alternative would be to 1 collect huge set of training samples xi from predefined Joint angles measured in radian motions and apply a clustering algorithm in order to locate 0.5 the base states. However, since we have full control of 0 how a hand must be configured to form the base state, we do not need to apply clustering algorithms to locate the −0.5 base states in C-space. −1 −1.5 −2 −2.5 0 50 100 150 200 250 300 motion samples Figure 3: Joint angle measurements from the motion of making and opening a fist. Figure 2a: Some base hand states 3. Dimensionality reduction. From Figure 3, we can clearly observe some correlations in the joint angle measurements. Therefore, together with the data collected from static states and the finger motions, we then perform Principal Component Analysis (PCA) to reduce the dimension of the model and thus reduce the search space while preserving the components with the highest energy. We note that 95% of the energy is contained in the 7 Figure 2b: Unfeasible configurations dimensions that have the largest eigenvalues. We thus In our model, the hand gestures are roughly classified into perform the mapping ℜ15 → ℜ 7 on Φ by projecting the 32 discrete states by quantizing each finger into one of the original model onto a lower dimensional subspace two states: fully extended or curled. The reason for Φ c ⊂ ℜ 7 with principle directions associated with these 7 choosing these two states is that the entire motions of a largest eigenvalues. finger falls roughly between these two states. Therefore the whole set of 32 states will roughly characterize the 4. Interpolation in compressed C-space. Once a set of entire hand motion (Figure 2). However, since not base states ζ i have been determined, the whole feasible everyone is able to bend the pinky without bending the configuration space Φ can be approximated by these base ring finger or with the help of thumb to hold the pinky, states ζ i and an interpolation scheme. Our approach four of the states will not be achievable by everyone without applying external forces. Therefore, these four takes a linear interpolation in the lower-dimensional states (Figure 2b) are not included in our set of base states configuration subspace Φ c . For each configuration φ c in in C-space modeling. Finally, the configurations that are Φ c we will represent its parameters using a polynomial similar are considered as the same state. For instance, the interpolation, i.e., cases with five fingers opening wide apart and with all 28 fingers straightened but closed together are considered the same. φc = ∑α ζ i =1 i i c , (5) 2. Motion modeling. With the set of base states ζ i in which ζ i is the location of base state i and α i is the established, we then collect motion data for state transitions in order to model the configurations during parameters for φ c . natural hand motion. A large number of sets of motion data are collected in order to observe the Type II and III 3.3 Model characteristics constraints of natural hand motions. An example of motion between making and opening a fist is shown in Our model has three main advantages that will help Figure 3. reduce the search space in gesture recognition. First, the model is compact due to the dimensionality reduction base states. The input images are assumed to be using PCA. Second, the motion constraints are segmented. automatically incorporated into the model. Third, a linear behavior is observed in the state transitions in C-space. 4.1 General approach The reason that motion constraints are incorporated into this model is because we sample directly from natural Using the result we observe from the linear behavior, hand motions. Configurations that are outside of we are able to approximate a configuration by taking the permissible range limited by hand anatomy will not be following steps: achievable in natural hand motions. Consequently, the 1. In the training stage, first associate each base state inequalities and equalities including the intra-finger constraints [3, 4], such as θ MCP − F = kθ PIP with k ≥ 0 , and ζ ic with a feature vector ψ i . inter-finger constraints [4] are automatically covered in 2. Extract features ψ input from the input 2D image, such this model. as edge, area, centroid, etc. 2 3. Compute α i = h(ψ i ,ψ input ) , where h(ψ i ,ψ input ) measures the closeness of ψ input to ψ i . 1.5 1.5 1 1 0.5 0.5 0 4. Based on the observation made from Type III motion −0.5 0 −1 constraints, linearly interpolate the estimated −0.5 configuration in compressed space Φ c : −1.5 2 1.5 −1 1 0.5 0 −1.5 3 −0.5 2 −1 1 0 −1.5 −1 28 ∑α ζ −2 −2 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 −2 −3 φ estimate = c i i c (6) Figure 4: Motion Figure 5: Motion i =1 transistions between transistions between 5. Reconstruct the estimated configuration state four states between. eight states. φ estimate ⊂ ℜ15 from φestimate . c An interesting phenomenon regarding the Type III motion constraints is observed from the motion data. We 4.2 Experimental results observe a nearly linear transition between states in C- space. An example is shown for the case of transitioning between four states from the result of moving only index and middle fingers (Figure 4). We have projected the C- space into ℜ 2 for observation in this case. The four corners are the locations of the four discrete base states. A linear transition is clearly observed from Figure 4. The middle lines are the path resulting from curling and extending both fingers together. This result reflects the high correlation between fingers when performing natural movements. Although state transitions does not necessarily need to be performed in this manner and there Figure 6: Configuration estimations. exists infinitely many ways to move from one configuration to another, when the fingers are moving in their most natural way, it will take a nearly straight line path in C-space. This observation will justify Eq 5 in estimating the hand configurations. Another example is shown with three-finger motions projected in ℜ3 . The (a) (b) (c) (d) eight base states are roughly located at the eight corners of a cube (Figure 5). Figure 7: Comparison of different technique. (a) original image. (b) estimation without Type II & III constraints. (c) estimation without Type III 4 Experiments constraints. (d) estimation with Type I, II & III constraints In order to evaluate the validity of this model, we perform some experiments using low-level visual features The results of the experiments are shown in Figure 6. and estimate the postures constituted by a subset of the 28 The first row shows some input images and the second row shows the reconstructed 3D hand model based on configuration estimation. Nevertheless, such modeling estimation by our approach. The results are visually provides a different interpretation of hand motions and the agreeable. Such preliminary experiments show that the current results look promising. motion constraints play an important role in hand posture estimation. More accurate and cost-efficient estimation Acknowledgement can be obtained when a better motion constraint model can be applied. Better result can be obtained with better This work was supported in part by National Science feature extraction methods, which will be implemented in Foundation Alliance Program and Grant CDA 96-24396. the future research. A comparison of estimations using different types of References constraints is also shown in Figure 7. In Figure 7(b), estimation without applying Type II and III constraints [1] C. Chang, W. Tsai, “Model-Based Analysis of Hand result in a feasible, yet unnatural configuration. In Figure Gestures From Single Images Without Using Marked 7(b), a closer approximation is obtained without applying Gloves Or Attaching Marks on Hands”, ACCV2000, Type III constraints. The DIP and PIP joints should bend pp.923-930, 2000 more to approximate a fist. Finally, by applying all three types of constraints together produce the better result with [2] C.S. Chua, H. Y. Guan and Y. K. Ho, “Model-based a more natural approximation in Figure 7(c). Finger Posture Estimation”, ACCV2000, pp.43-48, 2000. 5 Conclusion/Future Development [3] J. Kuch and Thomas S. Huang, “Vision-Based Hand Modeling and Tracking for Virtual Teleconferencing A posture estimation problem generally involves a and Telecollaboration”, ICCV95, pp.666-671, 1995. search in high dimensional C-space. Useful hand constraints have been demonstrated to be able to greatly [4] J. Lee, T. Kunii, “Model-based Analysis of Hand reduce the search space, and thus improve gesture Posture”, IEEE Computer Graphics and Applications, Sept., pp.77-86, 1995. recognition results. Many constraints can be represented in simple closed forms while many more can not and have [5] V. Pavlovic, R. Sharma, Thomas S. Huang, “Visual not been found. Interpretation of Hand Gestures for Human-Computer In this paper, we presented a novel approach to model Interaction: A Review,”, IEEE PAMI, Vol. 19, No. 7, the hand constraints. Our model has three characteristics. July, pp.677-695, 1997 First, it is compact by utilizing PCA technique. Second, it incorporates constraints that can and cannot be represented [6] J. Rheg, T. Kanade, “Model-Based Tracking of Self- by equations. Third, it displays a linear behavior in state Occluding Articulated Objects”, IEEE Int’l Conf. transitioning as a result of natural motion. These Computer Vision, pp.612-617. 1995. properties together simplify configuration estimation in C- [7] N. Shimada, et al., “Hand Gesture Estimation and space as shown in Eq 5 by a simple interpolation with Model Refinement Using Monocular Camera- linear polynomials. Some preliminary gesture estimation Ambiguity Limitatio by Inequalty Constraints”, Proc. experiments are shown, taking advantage of this model. Of the 3rd Conf. On Face and Gesture Recognition, However, there is still much to be done to improve 1998. this model. For instance, more states can be included to further refine the model. Deciding which states to choose [8] Y. Wu, Thomas S. Huang, “Human Hand Modeling, will require more analysis of the C-space. Furthermore, Analysis and Animation in the Context of HCI”, other constraints might exist in the C-space that haev not ICIP99, Japan, Oct., 1999. yet been observed. Finally, even though a nearly linear [9] Y. Wu, Thomas S. Huang, “Capturing Human Hand behavior is observed in state transition, it is not exactly Motion: A Divide-and-Conquer Approach”, IEEE Int’l linear. A more detailed study can better approximate the Conf. Computer Vision, Greece, 1999. trajectories, which in turn would help improve the

DOCUMENT INFO

Shared By:

Categories:

Tags:
VR system, human cognition, Jaron Lanier, VR software, The brain, experience design, learning experience, learning technologies, interactive design, learning game

Stats:

views: | 17 |

posted: | 3/7/2010 |

language: | English |

pages: | 7 |

OTHER DOCS BY maclaren1

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.