Multi-view Gymnastic Activity Recognition with Fused HMM

Document Sample
Multi-view Gymnastic Activity Recognition with Fused HMM Powered By Docstoc
					Multi-view Gymnastic Activity Recognition with
               Fused HMM

                      Ying Wang, Kaiqi Huang, and Tieniu Tan

                      National Laboratory of Pattern Recognition,
            Institute of Automation, Chinese Academy of Sciences, Beijing

       Abstract. More and more researchers focus their studies on multi-view
       activity recognition, because a fixed view could not provide enough infor-
       mation for recognition. In this paper, we use multi-view features to rec-
       ognize six kinds of gymnastic activities. Firstly, shape-based features are
       extracted from two orthogonal cameras in the form of transform. Then
       a multi-view approach based on Fused HMM is proposed to combine dif-
       ferent features for similar gymnastic activity recognition. Compared with
       other activity models, our method achieves better performance even in
       the case of frame loss.

1    Introduction
Human activity recognition is a hot topic in the domain of computer vision.
There are a wide range of open questions in this field, such as dynamic back-
ground modelling, object tracking under occlusion, activity recognition and so
on [1]. Most of the previous activity recognition methods are dependent on the
view direction. In these work, there is a strong assumption that low-level features
for latter activity recognition are obtained without any ambiguity. However, rec-
ognizing actions from a single camera is affected by the unavoidable fact that
parts of the action are not available from the camera because of self-occlusions.
Moreover, action from any view looks different and some activities may not be
captured because of the loss of depth information. In [2], Madabhushi and Aggar-
wal recognized 12 different actions in the frontal or lateral view using movement
of the head, but they had not been able to model and test all the actions due
to the problem of self-occlusion for some actions in frontal view. Therefore great
efforts are taken to find robust and accurate approaches to solve this problem.
   In fact, while performing an action, the object essentially generates a view-
independent 3D trajectory or shape in (X, Y, Z) space with respect to time.
Thus 3D methods can recognize activity efficiently without the trouble of self-
occlusion and depth information loss. In [3], authors extracted 3D shape for rec-
ognizing human posture using support vector machines. In [4], Chellappa et al.
chose six joint points of the body and calculated their 3D invariants of each
posture. In [5], Motion History Volume (MHV) was proposed to extract view-
invariant features in Fourier space for recognizing actions in a variety of view-
points. Alignment and comparisons were performed using Fourier transform in

Y. Yagi et al. (Eds.): ACCV 2007, Part I, LNCS 4843, pp. 667–677, 2007.
c Springer-Verlag Berlin Heidelberg 2007
668     Y. Wang, K. Huang, and T. Tan

          Original Video                              AtoS             AupLup
                                                      AtoC                    Lup
                                                      Aup                     Still


        Extracted                                     Activity Model
       Silhouettes                                     Parameters

                                                       O11      O12     O13            O1N
                       R Transform
                                                       U11      U12     U13            U1N

                                                       O21      O22     O23            O2N

                                                       U21      U22     U23            U2N

                                                  PCA Feature

               Fig. 1. The flowchart of multi-view activity recognition

cylindrical coordinate around the vertical axis [5]. For all 3D methods, in order
to use affine transformation in activity learning and recognition, point corre-
spondence is needed, which has high computation cost. To avoid this, a simple
mechanism is to use 2D data from several views, which can integrally describe the
activities with low computation cost. Some attempts to combine these features
from different views are in the process of constructing activity model. In [6], Bui
et al. constructed Abstract Hidden Markov Model (AHMM) to hierarchically
encode the wide spatial locations from different views, and describe activities at
different levels of details. Some researchers try to directly fuse multi-view infor-
mation on feature level. In [7], Bobick et al. used two cameras at orthogonal views
to recognize activity by temporal template. Motion history information (MHI)
was proposed to represent activity, which just had temporal information but no
spatial information of motion. In [8], Huang proposed a representation “Envelop
Shape” obtained from silhouette of objects using two orthogonal cameras for
view insensitive action recognition. However, “Envelop Shape” simply overlap
the silhouettes from two views, which inevitably destroys the correspondence
between consequent frames of respective view. 2D data from multiple views are
easy to acquire, but how to use them efficiently deserves further research.
   In this paper, six kinds of gymnastic activities are recognized. Only from a
single view, many movements seem so similar that they could not be classified
correctly. Similarly, we use two cameras with orthogonal views to capture more
silhouette features. Different from previous work, we use          transform, a novel
shape descriptor, to represent gymnastics activities. Then activity models, fused
HMMs (FHMMs) based on features extracted by               transform, are trained for
                 Multi-view Gymnastic Activity Recognition with Fused HMM            669

six kinds of gymnastic activities, which could merge different activity features
captured from different views. The overall system architecture is illustrated in
Fig. 1.
   The remainder of this paper is organized as follows: In section 2, we describe
six kinds of gymnastic activities for analysis in this paper. In section 3 and
4, we provides a detailed description of     transform and FHMM. Section 5
demonstrates the effectiveness of the proposed method by comparison with other
activity models. Finally, some conclusions are drawn in section 6.

2       Activity Description
In this study, we focus our attention on gymnastic activity. Gymnastics is rhyth-
mical, and each activity starts by standing with one’s arms down without any
motion and ends with the same stance. In this framework, we divide gymnastic
video of each person into six kinds of activities. Table 1 describes these activ-
ities with their respective activity number and abbreviations. Some silhouette
examples sampled from video sequence for each activity are shown in Fig. 2. In
each sub figure, the first row shows the silhouettes from the frontal view and the
second row shows the silhouettes from the lateral view.

                          Table 1. Six type activity description

 No.     Ab.   Activity description
  1      AtoS  Raise arms out to the side with elbow straight to shoulder height, keep
               standing with arms held at such height and then put arms downwards
               to body side.
    2    AtoC Lift two arms up towards ceiling and then put arms to starting position.
    3    Aup Raise one arm forward with elbow straight to ceiling and then down.
    4   AupLup Lift two arms up towards ceiling while one leg backwards, then arms
               backwards while one leg forwards, finally put arm and leg down to
               starting position.
    5    Lup Lift two hands up to shoulder height while raise lap up till it is parallel
               to floor and knee pointing to the ceiling, finally put arm and leg down
               to starting position.
    6    Still Keep body still (which occurs in the end of each activity).

   These activities are so similar that they are difficult to discriminate from a
single view. Because of the loss of depth information, the fore-and-aft movement
facing the camera could not be captured on a 2D image plane. As for shape
sequence, there are just a little variance that could not represent the detailed
activity information, as shown in Fig. 2.4, which is hard to recognize. For exam-
ple, from the frontal view, activity 1 and activity 5 are different movements, but
have quite similar shape variance as shown in Fig. 2.1 and 2.5. So do activity
2 and 4 (Fig. 2.2 and 2.4). From the lateral view, activity 1 and 6 have the
same shape variance (Fig. 2.1 and 2.6). In order to discriminate these seemingly
670      Y. Wang, K. Huang, and T. Tan

                        (1. Hand upwards to shoulder height)

                          (2. Two hands upwards to ceiling)

                           (3. One hand upwards to ceiling)

      (4. Two hands upwards to ceiling, leg forwards and backwards alternately)

                      (5. One leg upwards till it parallel to floor)

                                (6. Body keeping still)

      Fig. 2. Examples of extracted silhouettes in video sequences from two views

similar but actually different activities, two views are needed to provide more
abundant information for discriminating activities. As shown in Fig. 2, some
activity sequences have the similar variance from one view, but discriminations
could be found from another view. So these easily misclassified activity sequences
from a single view could be recognized correctly from two views.

3     Low-Level Feature Representation by                         Transform

Feature representation is the key step of human activity recognition because
it is an abstraction of original data to a compact and reliable format for latter
                Multi-view Gymnastic Activity Recognition with Fused HMM                            671

processing. In this paper, we adopt a novel feature descriptor, transform, which
is an extended Radon transform [10].
   Two dimensional Radon transform is the integral of a function over the set of
lines in all directions, which is roughly equivalent to finding the projection of a
shape on any given line. For a discrete binary image f (x, y), its Radon transform
is defined by [11]:
                   ∞    ∞
 TRf (ρ, θ) =                f (x, y)δ(x cos θ + y sin θ − ρ)dxdy = Radon {f (x, y)} (1)
                   −∞   −∞

where θ ∈ [0, π], ρ ∈ [−∞, ∞] and δ(.) is the Dirac delta-function,

                                                  1 if x = 0
                                     δ(x) =                                                         (2)
                                                  0 otherwise

However, Radon transform is sensitive to the operation of scaling, translation
and rotation. and hence an improved representation, called Transform, is in-
troduced [9,10]:
                                     f (θ)   =        TRf (ρ, θ)dρ                                  (3)

   transform has several useful properties in shape representation for activity
recognition [9,10]:
   Translate the image by a vector − = (x0 , y0 ),
       ∞                                                        ∞
           TRf ((ρ − x0 cos(θ) − y0 sin(θ)), θ)dρ =
            2                                                          2
                                                                      TRf (ν, θ)dρ =        f (θ)   (4)
      −∞                                                       −∞

  Scale the image by a factor α,
                    ∞                                 ∞
              1          2                       1          2                  1
                        TRf (αρ, θ)dρ =                    TRf (ν, θ)dρ =           f (θ)           (5)
              α2   −∞                            α3   −∞                       α3

  Rotate the image by an angle θ0 ,
                                 TRf (ρ, (θ + θ0 ))dρ =        f (θ   + θ0 )                        (6)

  According to the symmetric property of Radon transform, and let ν = −ρ,
  ∞                               −∞                             ∞
      TRf (−ρ, (θ±π))dρ = −
       2                                 2
                                        TRf (ν, (θ±π))dν =             2
                                                                      TRf (ν, (θ±π))dν =        f (θ±π)
 −∞                              ∞                              −∞
From equations (4)-(7), one can see that:

1. Translation in the plane does not change the result of transform.
2. A scaling of the original image only induces the change of amplitude. Here
   in order to remove the influence of body size, the result of   transform is
   normalized to the range of [0, 1].
672     Y. Wang, K. Huang, and T. Tan

3. A rotation of θ0 in the original image leads to the phase shift of θ0 in
   transform. In this paper, recognized activities rarely have such rotation.
4. Considering equation (7), the period of transform is π. Thus a shape vector
   with 180D is sufficient to represent the spatial information of silhouette.
   Therefore, transform is roust to geometry transformation, which is appro-
priate for activity representation. According to [9], transform outperforms
other moment based descriptors, such as Wavelet moment, Zernike moment and
Invariant moment, on similar but actually different shape sequences, and even
in the case of noisy data.

                       1                                                                                                                                  1

                      0.9                                                                                                                                0.9

                      0.8                                                                                                                                0.8

                      0.7                                                                                                                                0.7

                      0.6                                                                                                                                0.6

                      0.5                                                                                                                                0.5

                      0.4                                                                                                                                0.4

                      0.3                                                                                                                                0.3

                      0.2                                                                                                                                0.2

                      0.1                                                                                                                                0.1

                       0                                                                                                                                  0
                            0             20       40        60        80        100       120     140     160     180                                         0         20           40        60        80       100        120     140     160     180

(Arm up and leg up)              (                      transform)                                                                 (Arm up and leg up)              (                          transform)
                                 1                                                                                                                                           1

                                0.9                                                                                                                                  0.9

                                0.8                                                                                                                                  0.8

                                0.7                                                                                                                                  0.7

                                0.6                                                                                                                                  0.6

                                0.5                                                                                                                                  0.5

                                0.4                                                                                                                                  0.4

                                0.3                                                                                                                                  0.3

                                0.2                                                                                                                                  0.2

                                0.1                                                                                                                                  0.1

                                 0                                                                                                                                           0
                                      0           20        40        60        80        100     120     140     160     180                                                    0        20        40        60        80     100     120     140     160     180

(Arm back, leg forwards) (                                       transform)                                                        (Arm back, leg forwards) (                                            transform)
                                          1                                                                                                                         1

                                      0.9                                                                                                                          0.9

                                      0.8                                                                                                                          0.8

                                      0.7                                                                                                                          0.7

                                      0.6                                                                                                                          0.6

                                      0.5                                                                                                                          0.5

                                      0.4                                                                                                                          0.4

                                      0.3                                                                                                                          0.3

                                      0.2                                                                                                                          0.2

                                      0.1                                                                                                                          0.1

                                          0                                                                                                                         0
                                              0        20        40        60        80     100     120     140     160     180                                          0           20        40        60        80        100     120     140     160     180

(Arm up, leg backwards) (                                             transform)                                                  (Arm up, leg backwards) (                                         transform)

      Fig. 3.   transform of key frames for different activities from two views

   Fig. 3 shows silhouette examples extracted from different activities. Each row
shows the same frame from two views and the sub-figure following each silhouette
is their respective transform results. transform curves of different activity
from two views show the different variance. For example, from the frontal view,
the transform curve of the first row has two peaks, one is about 90◦ and the
other is about 170◦ . That of the second row has no peak and that of the last
row has a peak close to 10◦ . This proves that      transform can describe the
spatial information sufficiently and characterize the different activity silhouettes
               Multi-view Gymnastic Activity Recognition with Fused HMM               673

4    Fused Hidden Markov Model

The following process is combining these features obtained from different views.
Here we employ FHMM, which is proposed by Pan in bimodal speech processing
[12]. Like Coupled HMM (CHMM) [13], FHMM consists of two HMMs as shown
in Fig. 4 (where circle represents the observation and rectangle represents hidden
state. Each red rectangle is one HMM component). However, unlike CHMM’s
connections between hidden states, FHMM’s connections is between hidden state
node and observation node of different HMMs, as shown in Fig. 4.

                                                 O11        O12       O13           O1N

                           HMM 1                                            HMM 1
                                                 U11        U12       U13           U1N

                                                 O21        O22       O23           O2N

                            HMM 2                                           HMM 2
                                                 U21        U22       U23           U2N

                  CHMM                                             (2)

               Fig. 4. The graphical structure of CHMM and FHMM

   Assume O1 and O2 are two different observation video. In FHMM’s parameter
training, the focus is to estimate the joint probability function P (O1 , O2 ). How-
ever, the straightforward estimation of joint likelihood P (O1 , O2 ) is not desirable
because the computation is inefficient and large training data are required. To
solve this problem, Pan et. al. train two HMM separately, and then use their
respective parameters to estimate an optimal solution for P (O1 , O2 ) [12,14].
According to maximum entropy sense, an optimal solution P (O1 , O2 ) is less
precisely equal to computing the following equation [15]:

                                                          P (w, v)
                         P (O1 , O2 ) = P (O1 )P (O2 )                                    (8)
                                                         P (w)P (v)

where w = f (O1 ), v = g(O2 ), f (.) and g(.) are mapping functions, which must
satisfy the following requirement:
 1. The dependencies between w and v can describe the dependencies between
    O1 and O2 to some extent.
 2. P (w, v) is easy to be estimated.
  In other words, f (.) and g(.) should maximize the mutual information of w
and v [16]. This is an ill-posed inverse problem with more than one solutions.
Specifically, we choose w = U1 = arg maxU1 (log p(O1 , U1 )), V = O2 according to
maximum mutual information criterion [16]. Then equation (8) is expressed as:
674      Y. Wang, K. Huang, and T. Tan

                                             P (U1 , O2 )
            P (O1 , O2 ) = P (O1 )P (O2 )                  = P (O1 )P (O2 |U1 )
                                                                           ˆ        (9)
                                            P (U1 )P (O2 )

Finally, the computation of joint probability P (O1 , O2 ) is converted to estimate
P (O1 ) and P (O2 |U1 ). According to the process mentioned above, the learning
algorithm of FHMM include three steps [12,14]:
1) Learn the parameters of two individual HMM independently by EM algo-
   rithm: (Π1 , A1 , B1 ) and (Π2 , A2 , B2 ).
2) Determine the optimal hidden states of the HMMs using Viterbi algorithm
   with obtained parameters: U1 and U2 .
3) Estimate the coupling parameters P (O2 |U1 ) using known parameters.

                                            T −1
                                                   δ(O2 − k)δ(U1 − i)
                                                       t         ˆt
                          P (O2 |U1 ) =
                                 ˆ          t=0
                                                      δ(U t − i)

      where k is the length of observation O2 and i is the number of states in
      HMM 1.

5     Experimental Analysis
5.1    Experimental Data and Feature Extraction
Experimental data are synchronized videos (320*240, 25fps) obtained by two
cameras placed roughly orthogonally. The experiments are based on 300 low
resolution video sequences of 50 different people, each performing six gymnastic
activities as described in Table 1. The resultant silhouettes contain holes and
shadows due to imperfect background segmentation. To train the activity mod-
els, holes, shadows and other noise are removed manually to form ground truth
data. 180 of 300 sequences (30 of 50 people) are used in training while 120 of
300 sequences (20 of 50 people) are used for recognition.
   Then      transform is used to extract the spatial information of posture in
video. Because transform is non-orthogonal, the shape vector of 180D is re-
dundant. In general, PCA is employed to obtain the compact and accurate infor-
mation in each video sequence. According to primary analysis of each activity,
we find 10 principal components are enough to represent 98% variance. Then six

                    Table 2. Recognition results based on FHMM

Activity 1.    AtoS 2.   AtoC 3. Aup 4. AupLup 5. Lup 6. Still Correct recognition rate
1. AtoS        16         2       1                      1               80%
2. AtoC        1         17       2                                      85%
3. Aup         2          2      15                      1               75%
4. AupLup                                19       1                      95%
5. Lup                                   1       18      1               90%
6. Still       1          1       2                     16               80%
                                                          Multi-view Gymnastic Activity Recognition with Fused HMM                                                                                                                                                  675

FHMMs, consisting of a 2-states HMM for frontal view and a 3-states HMM for
lateral view, are constructed to combine two views’ features and model six kinds
of activity in Table 1. We can find that FHMM receives good recognition results
as shown in Table 2 (Each activity has 20 testing samples). Activity 4 achieves
the best recognition rate, 95%. Even the poorest result, 75% of activity 3 is also

5.2                                   Comparison with Other Graphical Models
In order to evaluating the performance of robustness and coupling ability, FHMM
is compared with CHMM and Independent HMM (IHMM). Moreover, two com-
ponent HMMs in CHMM and IHMM have the same structures with those of
FHMM, i.e. a 2-states HMM for the frontal view and a 3-states HMM for the
lateral view. Both of them use the same training and testing data with FHMM.
   The structure of CHMM is shown as Fig. 4.1, and more details in parameter
training and inference can be found in [13].
   IHMM assumes O1 , O2 independent, so the dependence between two obser-
vations is computed by P (O1 , O2 ) = P (O1 )P (O2 ). This means IHMM simply
multiples the observation probability of two independent HMMs.
   As shown in Fig. 5.1, although three methods achieve different recognition
rates for each activity, the overall performance among is:

                                                                                          F HM M > CHM M > IHM M

   IHMM obtains the worst recognition performance. This is because it does
not consider the correlations of observation from two views. As described in
Section 2, some activity seem so similar just from one view, and thus it hard
to avoid misrecognition. This misrecognition increases linearly with the product
of P (O1 ) and P (O2 ). The recognition performance of CHMM is better than
that of IHMM but worse than that of FHMM, because CHMM optimizes all the
parameters globally by iteratively updating the component HMM’s parameters
and coupling parameter. Therefore more training data and more iterations are
needed for it to achieve convergence. Considering the same training data with
FHMM, but more requirements of training data, the parameters of CHMM may

                            1                                                                                           1                                                                                    1
                                                                                   CHMM                                                                                                                                                                            CHMM
                                                                                   FHMM                                0.9                                                FHMM                              0.9                                                    FHMM
                                                                                   IHMM                                                                                   IHMM                                                                                     IHMM
                                                                                            Correct Recognition Rate

                                                                                                                                                                                 Correct Recognition Rate
Correct Recognition Rate

                           0.8                                                                                         0.8                                                                                  0.8

                           0.7                                                                                         0.7                                                                                  0.7

                           0.6                                                                                         0.6                                                                                  0.6

                           0.5                                                                                         0.5                                                                                  0.5

                           0.4                                                                                         0.4                                                                                  0.4

                           0.3                                                                                         0.3                                                                                  0.3

                           0.2                                                                                         0.2                                                                                  0.2

                           0.1                                                                                         0.1                                                                                  0.1

                            0                                                                                           0                                                                                    0
                                    1       2      3       4        5       6                                                  1       2      3       4        5       6                                            1       2      3       4        5       6
                                   AtoH;   AtoC;   Aup;   AupLup;   Lup;   Still                                              AtoH;   AtoC;   Aup;   AupLup;   Lup;   Still                                        AtoH;   AtoC;   Aup;   AupLup;   Lup;   Still

                                 1. Ground truth data.                                                                       2. Frame loss data.                                                                  3. Frame loss data.

Fig. 5. Recognition rates for CHMM, FHMM and IHMM in the case of different data
676     Y. Wang, K. Huang, and T. Tan

not be robust. Moreover, CHMM is linked by their hidden states, which could not
fully represent the statistical relationship between observations extracted from
different cameras. That is because the dependence between the hidden states is so
weak that it can not represent coupled observation videos accurately. Compared
with them, FHMM links the hidden states of one HMM and the observation of
the other, which has stronger coupling ability than that of CHMM.

5.3   Comparison Experiments in the Case of Frame Loss
In order to compare the robustness of CHMM, FHMM and IHMM, we simulate
120 sequences (each activity type has 20 samples) with frame loss by removing 10
frames (from 26 to 35, each activity has about 50 ∼ 90 frames). Fig. 5.2 illustrates
the recognition results for three models. In spite of the lower recognition rate
than that of ground truth data, FHMM still outperforms CHMM and IHMM.
This proves that FHMM is relatively more robust to frame loss in video than
other two models.
    In order to test the coupling ability of three models, we also simulate 120 frame
loss data (each type has 20 samples), but remove 10 frames of frontal view (from
26 to 35) and different 10 frames from lateral view (from 16 to 25). Fig. 5.3
illustrates the recognition results for three models. Compared with Fig. 5.2, the
performance of FHMM and IHMM does not change much, but that of CHMM
decreases noticeably (comparing the blue parts of Fig. 5.2 and 5.3 respectively).
Note that CHMM does not always perform better than IHMM. For activity 1
and 4, CHMM gets even worse results than that of IHMM. This is because the
frame loss of two views does not happen at the same time, the state relationship
coupled in CHMM is destroyed which leads to lower recognition rate.

6     Conclusion
From the theoretical and experimental analysis of our proposed approach, we
can find that FHMM based on R transform descriptor does have many ad-
vantages for multi-view activity recognition. Firstly, only silhouette is taken
as input, which is easier to obtain than meaningful feature points which need
tracking and correspondence. Secondly,       transform descriptor captures both
boundary and internal content of the shape. The computation of 2D          trans-
form is linear, so the computation cost is low. Moreover, transform performs
well for similar but actually different shape sequences, e.g. gymnastic activi-
ties. Thirdly, activity features based on multi-view are easy to acquire and have
abundant information for discriminating similar activity. Finally, compared with
CHMM and IHMM, FHMM gets the best performance with lower model com-
plexity and computational cost. Even in the case of frame loss data, FHMM
shows strong robustness, with great binding ability to couple inputs from two
               Multi-view Gymnastic Activity Recognition with Fused HMM             677


The work reported in this paper was funded by research grants from the Na-
tional Basic Research Program of China (No. 2004CB318110), the National
Natural Science Foundation of China (No. 60605014, No. 60335010 and No.
2004DFA06900) and CASIA Innovation Fund for Young Scientists.

 1. Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object
    motion and behaviors. IEEE Trans. on Systems, Man and Cybernetics, Part C:
    Applications and Reviews 34, 334–352 (2004)
 2. Madabhushi, A.R., Aggarwal, J.K.: Using movement to recognize human activity.
    In: ICIP, vol. 4, pp. 698–701 (2000)
 3. Cohen, I., Li, H.: Inference of human postures by classification of 3D human body
    shape. In: IEEE Internal Workshop on FG, pp. 74–81. IEEE Computer Society
    Press, Los Alamitos (2003)
 4. Parameswaren, V.V., Chellappa, R.: Human Action-Recognition Using Mutual In-
    variants. Computer Vision and Image Understanding 98, 295–325 (2005)
 5. Weinland, D., Ronfard, R., Boyer, E.: Free Viewpoint Action Recognition using
    Motion History Volumes. In: CVIU (2006)
 6. Bui, H., Venkatesh, S., West, G.: Policy Recognition in the Abstract Hidden Markov
    Model. Journal of Artificial Intelligence Research 17, 451–499 (2002)
 7. Bobick, A., Davis, J.: The recognition of human movement using temporal tem-
    plates. In: PAMI, vol. 23, pp. 257–267 (2001)
 8. Huang, F., Di, H., Xu, G.: Viewpoint Insensitive Posture representation for action
    recognition (2006)
 9. Wang, Y., Huang, K., Tan, T.: Human Activity Recognition based on Transform.
    In: The 7th IEEE International Workshop on Visual Surveillance, IEEE Computer
    Society Press, Los Alamitos (2007)
10. Tabbone, S., Wendling, L., Salmon, J.-P.: A new shape descriptor defined on the
    Radon transform. Computer Vision and Image Understanding 102 (2006)
11. Deans, S.R.: Applications of the Radon Transform. Wiley Interscience Publications,
    Chichester (1983)
12. Pan, H., Levinson, S.E., Huang, T.S., Liang, Z.-P.: A Fused Hidden Markov Model
    with Application to Bimodal Speech Processing. IEEE Transactions on Signal Pro-
    cessing 52, 573–581 (2004)
13. Brand, M., Oliver, N., Pentland, A.: Coupled hidden Markov models for complex
    action recognition. In: CVPR, pp. 994–999 (1997)
14. Rabiner, L.R.: A Tutorial On Hidden Markov Models and Selected Applications
    in Speech. Proceedings of the IEEE 77(2), 257–286 (1989)
15. Luttrell, S.P. (ed.): The use of Bayesian and entropic methods in neural network
    theory. Maximum Entropy and Bayesian Methods, pp. 363–370. Kluwer, Boston
16. Pan, H., Liang, Z.-P., Huang, T.S.: Estimation of the joint probability of multisen-
    sory signals. Pattern Recogn. Letter 22, 1431–1437 (2001)