submission by keralaguest

VIEWS: 9 PAGES: 2

									                         Web Track Submission, Imperial College

                                             TREC 10

                                            August 2001



1. Web Ad Hoc Task


This year’s Web Ad Hoc Task comprises 50 new queries; from query no 501 to query no 550.
3 runs were submitted: icadhoc1, icadhoc2, icadhoc3.

icadhoc1 and icadhoc2 are derived from linkage analysis and reuse of similarity measures
obtained via text analysis. Text-only search results were submitted in icadhoc3.

icadhoc1:

The authority values are computed according to the AVERAGE algorithm suggested in section 3.
The authority value of a page is the average over all similarity measures of its incoming links:

                                                 1
                          Authority( p)               Similarity(q)
                                            {q q  p} q p

icadhoc2:

The authority values are computed according to the SIM algorithm suggested in section 3.The
authority value of a page is the similarity measure of this page plus the average over all similarity
measures of incoming links (authority value + authority value conferred by other pages):

                                                         1
                 Authority( p)  Similarity( p)               Similarity(q)
                                                    {q q  p} q p

icadhoc3:

Results of Managing Gigabytes (documents indexed in truncated form as explained in section 3)


What is expected of the Web Ad Hoc Task submission?

The text-only search run was submitted as a baseline to assess the benefit or harm of text-based
methods on web retrieval.

The AVERAGE and SIM algorithms have been chosen because they yielded the best results for
TREC-9’s fifty queries among all link-based algorithms assessed during my experiments. Indeed,
even if these algorithms did not improve the average text-only search results for last year’s
queries, they are still quite efficient when linkage analysis works well (and when it fails it only
fails marginally). We want to see if for this year’s queries AVERAGE and SIM might not get a
little bit luckier. Moreover, TREC-10’s Web Ad Hoc Task human judgements are an opportunity
to test the validity of AVERAGE and SIM’s technique: the combining of similarity measures’
reuse and of linkage analysis. Since these algorithms are new they may be found an interesting
alternative to other link-based methods such as HITS and PageRank.


2. Home Page Finding Task


This year’s Home Page Finding Task comprises 145 new queries; from query no 1 to query no
145. 2 runs were submitted: ichp1, ichp2.

ichp1 is derived from linkage analysis. Text-only search results were submitted in ichp2.

ichp1 results from the merging of two ranking lists:

    -   a first list L1 of text-only search results, as submitted in ichp2.
    -   a second list L2 obtained by linkage and anchor-text analysis.

To construct L2, an “anchor-text index” was built, where each document is replaced by the
anchor-texts of its out-going links. For each query, a text-based search is carried out on the
anchor-text index. The ranking thus obtained enables us to compute hub values for retrieved
documents. The authority value of each document is then:

                                Authority( p)     Hubvalue(q)
                                                  q p


Using these authority values, we then build the ranking list L2.
We then merge L1 and L2 using the rank-merging technique discussed in section 3.4.


What is expected of the Home Page Finding submission?

The text-only search run was submitted as a baseline to assess the benefit or harm of text-based
methods on web retrieval.

Since the anchor-text and rank-merging technique described above is the same as the one that
improved home page finding results for a set of 100 home pages in my experiments, I hope that
ichp1 will yield an improvement over ichp2. This would corroborate results of WT10g’s
designers 1, according to whom linkage analysis is beneficial to home page finding.




1
 P. Bailey, N. Craswell and D. Hawking, Engineering a multi-purpose test collection for Web
retrieval experiments, DRAFT, Proc 10th TREC, June 6, 2001

								
To top