A New Passage Ranking Algorithm for Video Question Answering

Document Sample
A New Passage Ranking Algorithm for Video Question Answering Powered By Docstoc
					                  A New Passage Ranking Algorithm for
                      Video Question Answering

             Yu-Chieh Wu1, Yue-Shi Lee3, Jie-Chi Yang2, and Show-Jane Yen3
1   Department of Computer Science and Information Engineering, National Central University,
       2 Graduate Institute of Network Learning Technology, National Central University,

          No.300, Jhong-Da Rd., Jhongli City, Taoyuan County 32001, Taiwan, R.O.C.
                       bcbb@db.csie.ncu.edu.tw      yang@cl.ncu.edu.tw
    3 Department of Computer Science and Information Engineering, Ming Chuan University,

             No.5, De-Ming Rd, Gweishan District, Taoyuan 333, Taiwan, R.O.C.
                                   {leeys, sjyen}@mcu.edu.tw

         Abstract. Developing a question answering (Q/A) system involves in integrat-
         ing abundant linguistic resources such as syntactic parsers, named entity recog-
         nizers which are not only impose time cost but also unavailable in other lan-
         guages. Ranking-based approaches take the advantage of both efficiency and
         multilingual portability but most of them bias to high frequent words. In this
         paper, we propose a new passage ranking algorithm for extending textQ/A to-
         ward videoQ/A based on searching lexical information in videos. This method
         takes both N-gram match and word density into account and finds the optimal
         match sequence using dynamic programming techniques. Besides, it is very ef-
         ficient to handle real time tasks for online video question answering. We evalu-
         ated our method with 150 actual user’s questions on the 45GB video collections.
         Nevertheless, four well-known but multilingual portable ranking approaches
         were adopted to compare. Experimental results show that our method outper-
         forms the second best approach with relatively 25.64% MRR score.

1 Introduction

With the rapid expansion of video media sources such as news, TV shows, and mov-
ies, there is an increasing demand for automatic retrieval and browsing of video data.
Generally speaking, the most effective way of managing video data is to support a
video document retrieval based on the input keyword queries. In a modern video
retrieval scenario, users would query the system with short or natural language ques-
tions with his familiar language, e.g., “Where is the beginning of the Chinese cul-
ture?” or “In China, what is the most exquisite pottery city?” They expect the system
to return short answers rather than the whole videos which may be in different lan-
   To support above goals, this implies several research fields such as, video content
extraction, information processing, and question answering. Extracting contents in
videos is a very difficult but complex task and becoming an important issue. Textual,
visual, audio information is most frequently adopted features for this purpose. Among
them, text in videos, especially for the closed captions is the most powerful of high-
level semantics since it is closely related to current content and state-of-the-art optical
character recognition (OCR) techniques are far more robust than existing speech or
visual object analysis approaches [11]. Thus, almost all video content retrieval re-
searches start with video text recognition [3] [8] [18] [21]. The well-known Informe-
dia and TREC-VID projects are the typical examples.
   Over the past few years, several related research studies had been proposed and
addressed the use of either videoOCR or speech recognition (SR) techniques to sup-
port video question answering. Lin et al. [9] showed an earlier work on combining a
simple videoOCR and lexical term weighting methods. They focused on extracting
the “white” text components in image and hand-created keyword list to increase the
lexicon. Besides, they also presented three strategies for improving OCR errors. In
2003, Yang et al. [19] proposed a complex videoQ/A system by integrating abundant
resources such as, WordNet, Internet, shallow parsers, named entity taggers, and
human-made rules. They employed the term weighting approach as [12] to rank the
video shots that contain high-frequent related words. Usually when porting this sys-
tem into another language or domain, these components should be re-trained by
means of amount of annotated corpus which is often a huge work, for example build-
ing the treebank bracketing corpora. Cao et al. [2] proposed a domain-dependent
video question answering system to enable learners engaging in the learning process.
Unlike previous literatures, Cao used the pattern-matching approaches to pinpoint
answers where pattern set was constructed by a domain expert. In the same year, Wu
et al. [18] presented a cross-language videoQ/A framework based on the videoOCR
techniques as mentioned in [6] [9]. They convert auto-translated Chinese OCR tran-
scripts into English and performed the English Q/A method based on combining
named entity taggers, mapping rules, and WordNet. Alternatively, Zhang and Nuna-
maker [21] treated the video clips as answers and applied the TFIDF (term fre-
quency*inverse document frequency) term weighting schema to retrieve the manual-
segmented clips. In that approach, they also hand-created the ontology and combined
rich resources, such as parsers, and WordNet.
   In this paper, we present a new passage ranking algorithm for extending textQ/A
toward videoQ/A based on searching text information in videos. Users interact with
our videoQ/A system through natural language questions with content of the expected
videos. Our system retrieves relevant video fragments and supports both visual and
text outputs in response to the user’s questions. We consider that a passage is able to
answer the question and suitable for videos since itself forms a very natural unit of
response for question answering. Lin et al. [10] showed that users prefer passages
over exact answer-phrase because paragraph-sized chunks provide context. In con-
trast to previous studies [2] [9] [18] [19], the proposed method is relatively simple yet
and effective.

2 System Architecture and Video Processing

The overall architecture of the proposed video Q/A system is shown in Fig. 1. At first,
the video processing module recognizes the caption words as transcript through the
text localization, extraction&tracking, and OCR. Second, the Q/A processing module
ranks the segmented passages in response to the input question. In this section, we
describe the overall videoOCR processing. Section 3 presents the proposed passage
ranking algorithms.

                  Fig. 1. Framework of video question answering

2.1 Video Processing

Our video processing takes a video and recognizes the closed captions as texts. An
example of the input and output associated with the whole video processing compo-
nent can be seen in Figure 2. Our videoOCR consists of three important steps: text
localization, extraction&tracking, and OCR. The goal of text localization is to find
the text area precisely. Here, we employ the edge-based filtering [1] [11] and slightly
modify the coarse-to-fine top-down block segmentation methods [8] to locate text
components in a frame. The former removes most non-edge areas with global and
local thresholding strategy [5] while the latter incrementally segments and refines text
blocks using horizontal and vertical projection profiles.
    The target of extraction&tracking step is to integrate multi-frames and binarize the
text pixels. As we know, the video consists of a sequence of image frames. A text
component frequently appears more than one frames. To integrate them, we slightly
modify Lin’s approach [9] by counting the proportion of overlapping edge pixels
between two frames. If the portion is above 70%, then the two frames were consid-
ered to contain the same text area and merge the two frames by averaging the gray-
intensity for each pixel in the same text component. After multi-frame merging, the
Lyu’s text extraction algorithm [11] is used to binarize the text components. Unlike
previous approaches [3] [9], this method does not need to assume the text is either
bright or dark (but assume the text color is stable). At the end of this step, the output
text components are prepared for OCR.
   The aim of OCR is to identify the binarized text image to the ASCII text. In this
paper, we re-implement a naïve OCR system based on nearest neighbor classification
algorithms and clustering techniques [6] [18] [3]. We also adopted the word re-
ranking methods (see [9], strategy 3) to improve the OCR errors.
                  Fig. 2. Text extraction results of an input image

2.2 Document Processing and Passage Segmentation

Searching answers in the whole video collection is impractical since most of the
words are irrelevant to the question. By means of text retrieval technology, the search
space can be largely reduced and limited in a small number of relevant documents.
The document retrieval methods have been developed well and successfully been
applied for retrieving relevant documents for question answering [12] [14] [20]. We
replicated the Okapi BM-25 [13], which is one of the most effective retrieval algo-
rithms to find the related videoOCR documents for the input question. For the Chi-
nese word indexing, we simply use the so-called “overlapping bigram”, i.e. two con-
secutive Chinese atomic characters that had been successfully applied for many Chi-
nese document retrieval tasks [15].
    Usually, words that occur in the same frame provide a sufficient and complet de-
scription. We thus consider that words that appear in the same frame as a sentence.
The passage is then grouped with every 3 sentences and one previous sentence over-
lapping. An example of a sentence can be found in Fig. 2. The sentence of this frame
is the cascading of the two text lines, i.e. “speed-up to 1.75 miles/hr in six minutes”

3 Passage Ranking Algorithm

The ranking model receives the segmented passages from previous steps and outputs
the top-N passages to response the question. Tellex et al. [16] compared seven pas-
sage retrieval algorithms such as BM-25, density-based for TREC-Q/A task except
for the two ad-hoc methods that needed either human-generated patterns or inference
ontology which were unavailable. They indicated that the density-based methods
achieved better performance than the other ranking models. In 2005 Cui et al. [4]
showed that their fuzzy relation syntactic matching method outperformed the density
methods by up to 78% relative performance. But the limitation is that it required a
dependency parser, thesaurus, and training data. In many Asian languages like Chi-
nese, Japanese, parsing is more difficult owing to it is necessary to resolve the word
segmentation problem first. Besides, to develop a thesaurus and labeled training ex-
amples for Q/A is quite huge cost and time-consuming. Compared to Cui’s method,
traditional term weighting models are much less cost, portable and practical.
     In basic, traditional term weighting methods often give more weight on the pas-
sages that contain high frequent term match. Even density-based methods further take
the word distribution into account, they still bias to high frequent terms if the passage
tends to include abindant keywords rather than N-gram chunks. Usually, the N-gram
information is much important than high-frequent unigram words. For example, the
passage that contains three unigrams “optical”, “character“, “recognition” frequently
receives similar score as the passage, which has the trigram “optical ∧ charac-
ter ∧ recognition”. It is often the case that an N-gram chunk is much more unambigu-
ous than its individual unigrams. Thus, we attempt to put emphasis on N-gram match
and also take the density into account.
     To efficiently find the match sequence, we design an algorithm that approxi-
mately discovers the optimal match sequence. At first, we note the following nota-

                      passage P = PW1, PW2, …, PWN
                      question Q = QW1, QW2, …, QWM
                      match sequence SP = s1, s2, …, sN

PWi and QWi are the i-th word in passage and question. Here, a word is viewed as the
atomic Chinese character. The match sequence SP represents the lexical match rela-
tion on the aspect of mapping each question word to the passage. The sequence is
used to express whether the corresponding word in P is match(1) or nonmatch(0) a
word in Q. From the point of probabilistic view, given a passage P we want to find
the best-fit match sequence SP that maximize Prob(SP|P,Q). Prob(SP|P,Q) can be
considered as the generative probability of sequence SP given P and Q. By applying
Bayes rules, we have

                      Prob(P, Q | S P )
  Prob (S P | P, Q ) =                  × Prob(S P ) ≅ Prob ( P, Q | S P ) × Prob (S P )   (1)
                        Prob(P, Q)
                    = max{Prob ( P, Q | S P ) × Prob (S P )}
                         ∀S P

We skip Prob(P,Q) since it is equal for each match sequence. Prob(SP) represents the
generative probability of SP among all possible state sequences. The above equation
searches for the optimal sequence that generates P and Q with maximum probability.
We further define the following equation to compute Prob( S p = S p ) for a given state
sequence SP.
                                                      α             α
             ~           1 ngram _ cnt −1 | Sub j +1 | 1 + | Sub j | 1 ngram _ cnt − 2     (2)
   Prob(SP = SP ) =            ∑ dist(Sub , Sub )α 2 ×
                      Z (SP ) j =2                                          M
                                                        j     j +1

  where Subj = PWm, PWm+1, …., PWm+n subject to Subj = QWm’, QWm’+1, …., QWm’+n
  that is ~m = ~m + 1 = ~m + 2 = ...= ~m + n =1
          s s           s             s
ngram_cnt represents the number of N-gram match for the match sequence SP . Here
we add two stable additional N-grams (N=1) at start-of-passage (SOP) and end-of-
passage (EOP) which result least three matched N-grams for all match sequences.
Subj is the j-th N-gram of ~ P which is an N-gram match between passage and question.
If Subj starts from m-th word in P with n words length, which exact matches the equal
length of N-gram question words in Q that starts from QWm’ to QWm’+n. α1 and α2 are
the parameters that controls the impact of subsequence length and the distance meas-
urement. If we simply set α1=1 and α2=1, then equation (2) does only consider the
number of N-grams match rather than the length of N-grams. We empirically set the
two parameter as α1=1.5 and α2=0.5 which were found to be effective in our experi-
ment. Z(SP) is a normalizing constant determined by the requirement Prob(S P) = 1

that for all SP:

                     ngram _ cnt −1
                                      | Sub j +1 |α1 + | Sub j |α1          ngram _ cnt − 2                     (3)
      Z (SP ) = ∑        ∑                                              ×
                SP        j =2        dist ( Sub j , Sub j +1 )α 2               M

   The normalization factor Z(SP) of equation (2) does not effect the overall comput-
ing and it merely serves as a constant multiplier. By deduction of equation (1), the
score of the given passage P is:

      Prob ( P, Q | SP ) ≅ max Prob (SP | P, Q) × Prob(SP ) ≅ max Prob (SP )
                                 SP                                                    SP                       (4)
                                       ngram _ cnt −1
                                                        | Sub j +1 |α1 + | Sub j |α1     ngram _ cnt − 2
                          ≅ max
                                           ∑j =2        dist ( Sub j , Sub j +1 )α 2

The target of determining optimal match sequence ~ P is to find the best-fit ~ P that
                                                         S                         S
maximizes equation (4). However, there are 2N possible sequences which are not able
to find the optimum match sequence as statisfied with equation(4) in polynomial time.
Therefore, we propose a Viterbi-like algorithm to approximately find the optimal
match sequence using a dynamic programming technique (see Fig. 4.). Fig. 3. lists the
preprocessing step of the proposed algorithm.
   As shown in Fig. 3., the goal of algorithm 1 is to produce the element list, which
stores all possible appearing locations of each question word in P. For question word
QWi, Locij indicates the j-th occurrence of QWi in P where 1 ≤ j ≤ L i , and a question
word at most appears Li times in P.
   Algorithm 2 made use of a Viterbi-like method to compute the optimal score in
forward direction, and track the path in the backward stage. The corresponding vari-
able ψ t (Loc tj ) records the node of the incoming arc that led to the most probable path.
Using dynamic programming, we can calculate the most probable path through the
whole trellis as outlined in algorithm 2. In Fig. 4, we define an equation to find the
transition score between two elements in algorithm 2 (see equation (5)). Note that we
give the discounting transition score1 if the position Loc tj+ 1 < Loc ti . In this way, the
passages that preserve the word ordering sequence as the given question are given
more weight. After the path tracing, Q*= { q1* , q 2 , … , q M } can be used to indicate the
                                                          *       *

best-fit state sequence of S*= { s 1 , s * , … , s *N } by setting the corresponding location of

q* as 1, while the others as 0. By applying equation (4), the SP score is obtained. Due

1   In this paper, we simply set the discounting factor as the square of the distance which resulted
     in satisfactory ranking performance
to the length limitation, we left an example of estimating passage score using above
algorithms at the web page2.
         1       ⎧ y = Loc tj+ 1 − Loc ti
                 ⎪                        if Loc tj+ 1 > Loc ti                  (5)
      a ij =           ⎨             t +1
               y       ⎪ y = | Loc
                       ⎩              j     − Loc ti | 2      otherwise

                       Algorithm 1: Extracting_Possible_Match_Position (P, Q)
                       Input: given a passage P = PW1, PW2, …, PWN
                              question Q = QW1, QW2, …, QWM
                       Output: Set of element list { Loc 1 , Loc 1 , … , Loc 1 1 ,
                                                         1       2           L

                             Loc 1 , … , Loc 2 2 , … , Loc 1 , … , Loc
                                                            M                                      M
                                                                                                   LM   }
                             For (i := 1 ~ N) {
                                        Li := 0;
                                        For (j := 1 ~ M) {
                                                 If (QWi := PWj) {
                                                         Loc iL i := j;
                                                       Li ++; }
       Fig. 3. A preprocessing algorithm for finding the possible match positions
               Algorithm 2: Finding-the-Optimal-Match-Sequence (set_of_element-list)
               Input: Set of element list { Loc 1 , Loc 1 , … , Loc 1 1 ,
                                                1       2           L

                       Loc 1 , … , Loc 2 2 , … , Loc 1 , … , Loc MM }
               Output: Optimal path Q*= { q 1* , q 2 , … , q M }
                                                    *        *

                      Initialization: δ 1(Loc 1 ) = 1 where 1 ≤ i ≤ L 1

                                            ψ 1(Loc 1 ) = 0

                      Recursion:            δ t + 1(Loc     t +1
                                                             j     ) = max {δ t (Loc ti ) a ij }
                                                                        1≤ i ≤ L t

                                            ψ t + 1(Loc      t +1
                                                              j     ) = arg max {δ t (Loc ti ) a ij }
                                                                             1≤ i ≤ L t

                                            where 1 ≤ t ≤ M − 1 , 1 ≤ j ≤ L t + 1
                      Termination: q * = arg max δ M ( Loc                                M
                                     M                                                    i
                                                           1≤ i ≤ L M

                      Backtracking: Q * = { q 1* , q 2 , … , q M } so that q t* = ψ t + 1 ( q t*+ 1 )
                                                     *         *

Fig. 4. A Viterbi-like algorithm to find the optimal match sequence for a passage

    Again, our method does do not simply consider the number of match words in a
passage, instead, it seeks to find the match sequence that contains “dense” and “long”
N-gram match relation between P and Q. However, the original density-based method
does not tend to find the optimal match position for each word, rather it estimates the
term distribution and weighted number of match words in the passage. But when
there is only unigram match between P and Q, our method is then somewhat like
traditional density-based approaches. The difference in this case relies in our method
tries to find the optimal “one-to-one” match relation that leads to the densest distribu-
tion rather than “one-to-all”.

2   http://dblab87.csie.ncu.edu.tw/bcbb/tvqs/
4 Experimental Result

4.1 Dataset and Evaluations

The testing video corpus mainly collected from the 93 Discovery films. Table 1 su-
marizes the characteristics of this corpus. In comparison with previous videoQ/A
studies [2] [9] [18] [19] [21] which made use of less than 6 films and less than 40
questions, the used video collection and testing questions are substantially larger. The
video data had been converted into OCR document through the video processing (see
section 3). Then, we found that video OCR method recognized 49951 sentences and
677060 single Chinese characters. Averagely, a video contains 537 sentences and
7280 Chinese characters.
   We collect 150 actual user’s questions to evaluate the overall video Q/A perform-
ance. By following the answer assessment process as TREC-Q/A task [17], each
question was judged by two assessors and differences were labeled. A domain expert
reviewed each ambiguous label to determine or correct the final answer. If the system
returns the correct answer frames, then it is judged as right answer. In this paper, we
use MRR score (Voorhees, 2000) which is widely used for evaluation to measure the
passage ranking performance, while the recall and precision rates are used to evaluate
the accuracy.

                   Table 1: Statistics of the Discovery programs
                Number of videos                                  93
                Number of recognized sentences                  49951
                Number of recognized Chinese characters        677060
                Total video data size                          45.2GB

4.2 Results

At the beginning, we evaluate our videoOCR method in terms of text localization and
overall OCR recognition rates. The Discovery collection is a large video dataset,
which makes it difficult to comprehensively test on it. Rather, we only examine our
method on a small sampled set of the vides that collected from 30 short clips. Our
videoOCR sampled two frames per second with 352x240 resolution for the MPEG-1
Discovery movies. Totally there were 1684 image frames derived from the 30 clips
which contained 2195 text areas. Table 2 lists the experimental results on text local-
ization (Table 2(a)) and the overall videoOCR (Table 2(b)).
    For the overall videoQ/A evaluation, we compare our ranking algorithm with
TFIDF (term frequency multiplies inverse document frequency), BM-25 [13], and
density-based [7] approaches. These ranking models received the same segmented
passages, and retrieved top-ranked passages with their own methods. Empirically, we
found that on averagely, the retrieved passage contains 48.78 Chinese words that
represent a very short but more complete fragment than merely retunring an answer
phrase. Table 3 summarizes the overall videoQ/A results using different ranking
   As seeing in Table 3, our method produced 0.572 MRR(top5) score which outper-
formed the TFIDF, BM-25, and density-based approaches. In comparison to the sec-
ond best method (TFIDF), the proposed ranking algorithm is relatively 6.31% better
in MRR(top5) score, and 5.52%, 5.75% better in terms of recall and precision rates.
Compared to the BM-25 model, our method is 19.16%, 16.02%, and 15.72% better in
relatively MRR(top5), recall, precision rates.

Table 2(a): Text localization performance with pixel and area based evaluations
Recall (Pixel-based)    Precision (Pixel-based)    Recall (Area-based)   Precision (Area-based)
      92.97%                   95.06%                   98.93%                  97.63%

                       Table 2(b): Overall videoOCR performance
                             Recall       Precision      F1-measure
                            86.51%         83.05%          84.74%

     Table 3: System performance for different passage ranking algorithms
                              TFIDF        BM-25        Density-based     Our method
      MRR (Top1)              0.479        0.413           0.280            0.506
      MRR (Top5)              0.538        0.480           0.370            0.572
      Recall (Top5)           0.597        0.543           0.472            0.630
      Precision (Top5)        0.174        0.159           0.137            0.184

5 Conclusion

Usually, it is necessary to integrate with abundant external knowledge for answering.
This paper proposes a light-weight and multilingual portable video Q/A system that
extend the text Q/A method to multimedia. The system returns the retrieved passages
with corresponding video clips in response to the question. The experiments showed
that the proposed method outperforms the TFIDF, TF, BM-25, and density-based
approaches in terms of MRR score. In the future, we plan to adopt well known speech
recognition techniques to enhance the system performance.


1. Cai, M., Song, J., and Lyu, M. R. A new approach for video text detection. In Proceedings of
   International Conference on Image Processing, pages 117-120, 2002.
2. Cao, J., and Nunamaker J. F. Question answering on lecture videos: a multifaceted approach,
   International Conference on Digital Libraries, pages 214 – 215, 2004.
3. Chang, F., Chen, G. C., Lin, C. C., and Lin, W. H. Caption analysis and recognition for
   building video indexing systems. ACM Multimedia systems, 10(4): 344-355, 2005.
4. Cui, H., Sun, R., Li, K., Kan, M., and Chua, T. Question answering passage retrieval using
   dependency relations. In Proceedings of the 28th ACM SIGIR Conference on Research and
   Development in Information Retrieval, pages 400-407, 2005.
5. Fan, J., Yau, D. K. Y., Elmagarmid, A. K., and Aref, W. G. Automatic image segmentation
   by integrating color-edge extraction and seeded region growing. IEEE Trans. On Image
   Processing, 10(10): 1454-1464, 2001.
6. Hong, T., Lam, S. W., Hull, J. J., and Srihari, S. N. The design of a nearest-neighbor classi-
   fier and its use for japanese character recognition. In Proceedings of Third International
   Conference on Document Analysis and Recognition, pages 270-291, 1995.
7. Lee, G. G., Seo, J. Y., Lee, S. W., Jung, H. M., Cho, B. H., Lee, C. K., Kwak, B. K., Cha, J.
   W., Kim, D. S., An, J. H., and Kim, H. S. SiteQ: Engineering high performance QA system
   using lexico-semantic pattern matching and shallow NLP. In Proceedings of the 10th Text
   Retrieval Conference, pages 437-446, 2001.
8. Lienhart, R. and Wernicke, A. Localizing and segmenting text in images and videos. IEEE
   Trans. Circuits and Systems for Video Technology, 12(4): 243-255, 2002.
9. Lin, C. J., Liu, C. C., and Chen, H. H. A simple method for Chinese video OCR and its
   application to question answering. Computational linguistics and Chinese language process-
   ing, 6(2): 11-30, 2001.
10. Lin, J., Quan, D., Sinha, V., Bakshi, K., Huynh, D., Katz, B., Karger, D. R. What makes a
   good answer? the role of context in question answering. In Proceedings of the 9th interna-
   tional conference on human-computer interaction (INTERACT), page 25-32, 2003.
11. Lyu, M. R., Song, J., and Cai, M. A comprehensive method for multilingual video text
   detection, localization, and extraction. IEEE Trans. Circuits and Systems for Video Tech-
   nology, 15(2): 243-255, 2005.
12. Pasca, M., and Harabagiu, S. High-performance question answering. In Proceedings of the
   24th ACM SIGIR Conference on Research and Development in Information Retrieval,
   pages 366-374, 2001.
13. Robertson, E., Walker, S., and Beaulieu, M. Okapi at TREC-7: automatic ad hoc, filter-ing,
   VLC and interactive track. In Proceedings of the 7th Text Retrieval Conference, 1998.
14. Rus, V., and Moldovan, D. High precision logic form transformation. International Journal
   on Artificial Intelligence Tools, 11(3): 437-454, 2002
15. Savoy, J. Comparative study on monolingual and multilingual search models for use with
   Asian languages. ACM transactions on Asian language information processing (TALIP),
   4(2): 163-189, 2005.
16. Tellex, S., Katz, B., Lin, J. J., Fernandes, A., and Marton, G. Quantitative evaluation of
   passage retrieval algorithms for question answering. In Proceedings of the 26th ACM SIGIR
   Conference on Research and Development in Information Retrieval, pages 41-47, 2003.
17. Voorhees, E. M. Overview of the TREC 2001 question answering track. In Proceedings of
   the 10th Text Retrieval Conference , pages 42-52, 2001.
18. Wu, Y. C., Lee, Y. S., Chang, C. H. CLVQ: Cross-language video question/answering
   system. In Proceedings of 6th IEEE International Symposium on Multimedia Software En-
   gineering, pages 294-301, 2004.
19. Yang, H., Chaison, L., Zhao, Y., Neo, S. Y., and Chua, T. S. VideoQA: Question answer-
   ing on news video. In Proceedings of the 11th ACM International Conference on Multime-
   dia, pages 632-641, 2003a.
20. Yang, H., Chua, T. S., Wang, S. G., and Koh, C. K. Structural use of external knowledge
   for event-based open domain question answering. In Proceedings of the 26th ACM SIGIR
   Conference on Research and Development in Information Retrieval, pages 33-40, 2003b.
21. Zhang, D., and Nunamaker, J. A natural language approach to content-based video index-
   ing and retrieval for interactive E-learning. IEEE Transactions on Multimedia, 6(3): 450-
   458, 2004.

Shared By:
Description: Search engine ranking algorithm is used to index a list of its evaluation and ranking rules. Ranking algorithm to determine which results are relevant to a particular query.