An Experiment in Social Search_

Document Sample
An Experiment in Social Search_ Powered By Docstoc
					               An Experiment in Social Search

                            Jill Freyne1 and Barry Smyth1

               Smart Media Institute, Department of Computer Science,
                 University College Dublin, Belfield, Dublin 4, Ireland
                        {Jill.Freyne, Barry.Smyth}@ucd.ie



        Abstract. Social search is an approach to Web search that attempts
        to offer communities of like-minded individuals more targeted search
        services, based on the search behaviour of their peers, bringing together
        ideas from Web search, social networking and personalization. In this
        paper we describe the I-SPY architecture for social search and present
        the results of a recent live-user evaluation that highlight the potential
        benefits of our approach in a realistic search setting.


1     Introduction
The brief history of Web search to date is characterised by a variety of significant
technological developments that have seen Web search engines evolve beyond
their information retrieval (IR) origins. For example, Meta-search [1, 2] was an
early attempt to combine the capabilities of many underlying search engines in
order to improve overall coverage and relevance ranking. More recently, search
engines such as Google [3] have argued for the need to consider factors such as
link-connectivity information (in addition to the more traditional IR term-based
factors) as a way to guide search towards more informative documents; see also
[4]. Currently most of the main Web search engines adopt a “one size fits all”
approach to search—two users with the same query receive the same result-list,
regardless of their preferences or context—and while there is broad agreement
that this is far from optimal, real developments towards practical personalization
techniques, that are both capable of coping with Internet-scale search tasks, and
that are likely to be acceptable to today’s privacy conscious users, have been
slow to emerge. Although, that said, a number of researchers have looked at
the issue of context-sensitive search, where the search engine draws additional
information context from the searcher [5].
    One area of research that may have the potential to offer the right mix of per-
sonalization, while at the same time protecting user privacy, comes from recent
work that has focused on the intersection between social networking and Web
search. Social networking applications such as Friendster (www.frienster.com) or
Orkut (www.orkut.com) allow users to create, maintain and participate in online
communities, and provide a range of applications and services to help these com-
munities socialise more effectively on-line and off-line. Inevitably, the members of
    The support of the Informatics Research Initiative of Enterprise Ireland is gratefully
    acknowledged.
a given community will all share certain characteristics and interests; for exam-
ple, the members of a “caving and potholing” community will obviously all have
an interest in caving and potholing activities, but they are also likely to share a
range of peripheral preferences too; a general interest in outdoor activities, they
may be regular travelers, etc. The point is that these well-defined communities of
like-minded individuals provide a backdrop for social search applications; search
engines that are sensitive to the needs and preferences of a specific community
of users, operating within a well-defined topical domain. Indeed social search is
an increasingly important topic for the search industry and many commentators
have predicted that it will become the “next big thing”. In this paper we will
describe our own model of social search, called I-SPY, but other similar services
such as Eurekster (www.eurkster.com) are rapidly emerging to try and take ad-
vantage of this social dimension to search. Indeed it is worth speculating about
the intentions of Google in this regard, especially since they have launched their
own social networking service (www.orkut.com) but have yet to announce how
they plan to integrate it with the Google search engine.
    I-SPY operates as a post-processing engine for traditional search engines,
or as a meta-search service over a number of underlying search engines. It al-
lows groups of users to establish online search communities and anonymously
records their interaction patterns as they search. These interaction patterns—
the queries provided and the resulting selections—allow I-SPY to adaptively
re-rank the result-lists of future queries in a way that is sensitive to the prefer-
ences of a given community of users. For example, over time I-SPY will learn that
a community of AI researchers are more likely to be interested in the work of the
Berkeley professor than the basketball star when they search under the vague
query, “Micheal Jordan”; I-SPY actively promotes such relevant sites for this
community of users. In the following sections we will describe the operation of
I-SPY in detail (see Section 3) and present the results of a live-user study over a
particular community of users (Section 4). First we will outline related work with
a particular focus on research that has sought to exploit the interaction patterns
of users to provide other types of adaptive, collaborative information services on
the Web. We will look, in particular, at collaborative browsing services a their
relationship to our own proposal for collaborative search.


2   Related Work

The research described in this paper—the I-SPY system and its collaborative-
ranking features—touches on a number of areas of related research. One legiti-
mate view of our work is that I-SPY is focused on disambiguating vague queries,
that it provides a form of context-sensitive search. As such we could compare
our work to other attempts at developing context-sensitive search techniques,
differentiating between explicit [6] and implicit [7] forms of context or between
external [8] or local [9] sources of context. Alternatively, one might view I-SPY’s
approach to search as a way of coping with disparities between the query-space
and indexing-space and cite related work in which researchers have attempted
to leverage knowledge about the query-space to solve this problem and improve
search performance [10–12]. Indeed, these particular perspectives are considered
in [13, 14]. However, as indicated in the introduction, we will take a different
standpoint, one that looks at I-SPY from an interaction perspective, one that
highlights the social and collaborative aspects of information seeking activities.


    To begin with it is worth highlighting the work of Hill and Hollan [15] in
which they introduced the concept of computational wear in the context of digital
information. Real world objects are history-rich—a well-read book naturally falls
open on a popular page, important chapters become dog-eared with regular use,
and paragraphs may be underlined or annotated with margin notes—and this
history often helps us to make better use of these objects. In contrast, digital
information is usually history-poor—Web pages do not change as surfers view
them, for example—and this lack of history can limit the manner in which we
can use and benefit from digital information. Hill and Hollan describe ways in
which interaction histories can be incorporated into digital objects. Specifically
they describe the Edit Wear and Read Wear applications which allow the user
to leave wear marks on a document as they edit or read it. These wear marks
are represented visually as part of the document’s scroll-bar and help the user
to quickly appreciate where the main focus of editing or reading activities has
been.


    The Footprints project [16], inspired by the work of Hill and Hollan, applied
the idea of computational wear and interaction histories to the Web, building
a number of tools to help users navigate through sites. For example, one tool
graphically represented the traffic through a Web site, allowing surfers to appre-
ciate the links that are being followed regularly by others. Another Footprints
tool visualises the paths (sequences of links) followed by individual surfers. These
tools represent the interaction histories of previous users, and they allow the cur-
rent user to adapt their browsing pattern as appropriate. Rather than surfing
blind, the current user can be guided by the actions of others. The Footprints
tools facilitate a form of collaborative browsing, an idea that is closely related
to the work of [17].


    The I-SPY philosophy is related to the above. I-SPY also attempts to exploit
interaction history or computational wear, as we shall discuss in the next section,
but rather than focusing on editing or reading documents, or browsing Web
pages, I-SPY focuses on search. It records the interactions of searchers with
result-lists and uses these interactions to improve result-ranking and helps the
searcher to understand which results have been found relevant in the past. Thus
search becomes a more social activity. New searchers benefit from the searches
carried out by past users. They see the results that past users have liked, and
they benefit from an ordering of results that is sensitive to the degree to which
these past results have been preferred.
3     An Architecture for Social Search

The basic architecture for our model of social search, implemented by I-SPY
(ispy.ucd.ie), is presented in Figure 1. On the face of it, I-SPY is a meta-search
engine, adapting a user query, qT , for a series of underlying search engines,
S1 , ...Sn , and merging each of their result-lists, R1 , ..., Rn , to produce a final
result-list, R, that is returned to the user. The uniqueness of I-SPY stems from
a combination of important features: (1) its capturing of interaction histories;
(2) the use of these interaction histories to re-rank search results; and (3) its
ability to separate the interaction histories of individual search communities
(social groups) so that this re-ranking can take place in a community-sensitive
manner.




       Fig. 1. The I-SPY system architecture for social, collaborative search.




3.1   Interaction Histories

Each time a user selects a page, pj , from a result-list generated by I-SPY in
response to a query, qi , a record of this selection is noted by incrementing a
counter in I-SPY’s so-called hit-matrix, H; see Figure 1. Thus, the value of Hij
represents the number of times that pj has been selected for query qi . The hit-
matrix represents the interaction history (in the sense of [16]) relative to a set
of queries and their relevant results.


3.2   Collaborative Ranking

The hit-matrix is a record of what past searchers have viewed as relevant to their
queries and I-SPY takes advantage of this information to re-rank some of the
search results that are returned from the meta-search. If any of the meta-search
results, for a query qT , have non-zero hit-values in the hit-matrix row that corre-
sponds to qT , then this provides further evidence (the number of past selections)
that these results are relevant to qT . Furthermore the degree of relevance can be
estimated as the percentage of selections that a given page has received for this
query; see Equation 1. This relevance score can be used to rank these previously
selected results ahead of the other results returned by the meta-search. So the
first results presented to the user are those that have been previously selected
for their query, ordered according to their past selection probability(see Fig-
ure 2(a)). The remainder of the results are ordered according to their standard
meta-search score. The hope is that the promoted results will turn out to be
more relevant to the searchers, helping them to locate their target information
more efficiently; we will test this hypothesis directly in Section 4.
                                                 HT j
                          Relevance(pj , qT ) =                                (1)
                                                 ∀j HT j


3.3   Search Communities
I-SPY is designed to support multiple hit-matrices, each for a different com-
munity of searchers. For example, consider one group of searchers interested in
motoring information and another interested in wildlife. The query, “jaguar” has
very different meanings for each of these communities: the former are looking
for information about the high-performance car while the latter are interested
in the big cat variety. Ordinarily a standard search engine or meta-search engine
would respond to each community in the same way. However, with I-SPY, the
likelihood is that the previous interactions of the motoring community will have
produced a hit-matrix that prioritises car-related sites for the “jaguar” query,
while the wildlife hit-matrix will prioritise wildlife-related sites; see Figure 2(A).
    Thus, I-SPY’s collaborative ranking is designed to operate for well-defined
communities of searchers. To facilitate this I-SPY allows individual users or user-
groups to configure their own search service by filling out an online form; see
Figure2(b). The result is a unique hit-matrix that is linked to a search community
and a URL that contains a version of I-SPY whose queries are associated with
this new hit-matrix. Alternatively, the searchers can add a piece of javascript to
their site to include a search-box that is linked to their own hit-matrix.

4     Evaluation
Collaborative ranking on its own is unlikely to be successful in a general-purpose
search context because the alternative meanings of vague queries are likely to
be merged in the hit-matrix; for example, we would expect to find selections for
car sites and wildlife sites recorded for the “jaguar” query in a general-purpose
search engine. The secret of I-SPY is that this problem is largely eliminated once
we allow for the separation of interaction histories for different communities of
users. And we believe that this particular combination of collaborative ranking
and community-based search will pay dividends when it comes to improving
overall search performance. To test this hypothesis we report on a live trial of
I-SPY that was carried out in late 2003 on a community of third-year computer
science students.
Fig. 2. (A) A result-list for a motoring community for the query “jaguar”, the eyes
denote promoted results; (B) I-SPY’s on-line configuration form facilitates the creation
of new community-specific search services.



4.1   Setup
A total of 92 students were asked to answer 25 general knowledge computer
science questions. They were directed to use I-SPY (configured to use Google,
HotBot, AllTheWeb, and Teoma) to source their answers and they were di-
vided into two groups of 45 and 47 students, respectively. The first group served
as a training group. They did not benefit from I-SPY’s collaborative ranking
technique—the results were ranked using the standard meta-search ranking func-
tion only—but their queries and selections were used to construct a hit-matrix
for the second group of users (the test group) who did benefit from I-SPY’s
collaborative ranking.
    Each user group was allotted 45 minutes to complete the questions. Overall
more than 70% of the training group’s queries were repeated by the test group,
and these repeated queries were used by approximately 65% of the 97 students.
This high repeat rate suggests a close correspondence between the query for-
mation capabilities of each group of users. However, it is worth pointing out
that the test users tended to use slightly shorter (fewer terms) queries than the
training users; the average query length for the test group was 2.16, compared
to 2.57 for the training group. All other things being equal this might suggest
that the training group were better able to produce focused queries and that the
training users might be at a slight advantage when it comes to their inherent
search abilities.

4.2   Selection Behaviour
Our first question concernes the selection behaviour of the training and test
users. If I-SPY’s collaborative ranking technique is working to promote more
relevant results to higher positions within result-lists then we should find that
the test users are selecting results nearer to the top of result-lists, when compared
to the selections of the training users.
    Figure 3(a) plots the position of result selections during each search session
for the training and test users; for each search session we plot the median po-
sition of the selected results. Although the position of results vary considerably
from search session to search session, as expected, there is a marked difference
between the position of results for the training and test users and it is clear that
the test users are selecting results that appear nearer to the top of result-lists
(lower position values) when compared to the training users, due to the improved
position of relevant results for the test users as a consequence of I-SPY’s collab-
orative ranking technique. To provide a clearer picture of the benefits of I-SPY’s




Fig. 3. (a) Median positions of selected results; (b) Mean positions of all result
selections between training and test groups; (c) Mean number of questions at-
tempted/correctly answered per student; (d) Percentage of students that achieve a
given test-score.




collaborative ranking we can summarise the results of Figure 3(a) by computing
the mean position values of the selections of the training group and compare
these to the mean positions of the selections of the test group; see Figure 3(b).
The test users selected results with an average position of 2.24 whereas the train-
ing users selected results with an average position of 4.26; a 47% reduction in
the position of selected results for the test users compared to the training users,
and a strong indicator of the benefit of I-SPY’s collaborative ranking function.
   These results indicate that the test users were more likely to select results
from higher positions within the result-lists. We argue that this is because these
users were able to benefit from the interaction histories of their peers within
the training group because I-SPY’s collaborative ranking technique was actively
promoting the selections of these peers. The hope is that these community prefer-
ences will turn out to be useful results, when it comes to the students answering
their questions, and we will consider this issue in the next section.

4.3   Search Accuracy
Of course finding out that the test users selected results from higher positions in
the result-lists, compared to the training users, does not really tell us anything
about the usefulness of these results. It may be, for example, that the selections
of the training users were not so relevant to the task at hand and that I-SPY’s
eagerness to promote these misleading results simply encouraged the test users to
follow false-leads, and thus hampered their ability to answer the test questions. If
this is the case then our experiment in social search will have failed. If, however,
the promoted results were more likely to be relevant, then we should find that the
test users were able to answer questions more efficiently. We should find that the
test users attempt more questions than the training users and that they answer
more of these attempted questions correctly. In short, we should find that the
test students achieve higher overall test-scores.
    Figure 3(c) compare the training and test users in terms of the mean num-
ber of questions attempted per student/user and the mean number of correctly
answered questions per student. The results indicate a clear advantage for the
test users: they did in fact attempt more questions on average than the training
group (9.93 versus 13.93, respectively) and they did answer more of these ques-
tions correctly (7.58 versus 11.54, respectively).It is worth noting that both these
differences are significant at the 0.01 significance level. Indeed it is revealing to
note that the test group of users answered more questions correctly (11.54) than
the training group even managed to attempt (9.93). To look at this another way:
of the 9.93 questions attempted, on average, by the training group, only 76% of
these questions (7.58) are answered correctly. The test group not only managed
to attempt 40% more questions but they answered a higher proportion of these
attempted questions correctly (11.54 out of 13.93 or 82%).
    Figure 3(d) plots the percentage of students in each group that achieved
different overall test-scores; the test score is the overall percentage of the 25
questions that a student has answered correctly. These scores clarify the gap
that exists between the performance of the training and test groups. For example,
more than twice as many test students (70% of the test group, or 33 students)
achieved a pass grade of 40% compared to the training students (30%, or 13
students). And while 56% of the test group achieved an honours grade (55%),
none of the training group managed to score more than 52% and 5 of the test
students achieved a distinction (70% or greater).


4.4   Summary

To begin with in this experiment it appeared that, if anything, the training
users might be at a slight advantage owing to their longer queries. However, the
results clearly indicate superior search performance from the test group. They
attempted more questions and they answered more of these questions correctly.
This advantage must be due to the model of social search implemented by I-SPY.
And although this is a small-scale experiment, 97 academically-related students
on a limited search task, we believe that this positive outcome speaks to the
potential of social search as a valuable approach to more focused search.


5     Conclusions

In this paper we have argued for the benefits of social search and we have de-
scribed a particular approach that integrates ideas from social networking and
adaptive information retrieval to provide a personalized search service to well-
defined communities of like-minded individuals. The I-SPY system has delivered
significant performance advantages in live search scenarios, with communities
able to locate the right information faster and more reliably by leveraging the
past search behaviour of their peers.
    I-SPY delivers this level of personalization in an relatively anonymous fash-
ion. Individual community members are not tracked, nor are they identified.
Instead, personalization operates at the level of the community rather than the
individual. We believe that this level of personalization strikes the right balance
between accuracy and privacy: the community-based ranking of results is suffi-
ciently accurate for the individual user to benefit from the social search, but at
the same time they can be confident that their privacy and identity have been
protected.
    In recent work we have considered a number of issues arising out of this model
of social search. We have proposed the use of various strategies to protect against
fraudulent search activity. For example, we can frustrate users who attempt to
drive the promotion of certain result pages, by making repeated selections, by
discounting or filtering sequences of consecutive result selections. Similarly we
have proposed the use of decay models to reduce hit-values over time in order to
reduce the natural bias that operates in favour of older pages; older pages will
have had more of an opportunity to attract hits and may therefore be promoted
above newer but more relevant pages. Finally, we have recently explored the
possibility of leveraging the interaction histories of similar queries to the target
query—right now I-SPY operates on the basis of exact matches between the
target query and the entries of the hit-matrix—and our initial results show that
such an extension has the potential to improve the performance of I-SPY still
further by increasing its precision and recall characteristics [18]
References
 1. Selberg, E., Etzioni, O.: The Meta-Crawler Architecture for Resource Aggregation
    on the Web. IEEE Expert Jan-Feb (1997) 11–14
 2. Dreilinger, D., Howe, A.: Experiences with Selecting Search Engines Using Meta
    Search. ACM Transactions on Information Systems 15(3) (1997) 195–222
 3. Brin, S., Page, L.: The Anatomy of A Large-Scale Web Search Engine. In: Pro-
    ceedings of the Seventh International World-Wide Web Conference. (1998)
 4. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. In: Proceed-
    ings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms. (1998)
    668–677
 5. Lawrence, S.: Context in Web Search. IEEE Data Engineering Bulletin 23(3)
    (2000) 25–32
 6. Glover, E., Lawrence, S., Gordon, M.D., Birmingham, W.P., Giles, C.L.: Web
    Search - Your Way. Communications of the ACM (2000)
 7. Rhodes, B.J., Starner, T.: Remembrance Agent: A Continuously Running Auto-
    mated Information Retrieval System. In: Proceedings of the First International
    Conference on the Practical Applications of Intelligent Agents and Multi-Agent
    Technologies. (1996) 487–495
 8. Budzik, J., Hammond, K.: User Interactions with Everyday Applications as Con-
    text for Just-In-Time Information Access. In: Proceedings International Conference
    on Intelligent User Interfaces., ACM Press (2000)
 9. Bharat, K.: SearchPad: Explicit Capture of Search Context to Support Web Search.
    In: Proceedings of the Ninth International World-Wide Web Conference. (2000)
10. Raghavan, V.V., Sever, H.: On the reuse of past optimal queries. In: Proceed-
    ings of the 18th annual international ACM SIGIR conference on Research and
    development in information retrieval, ACM Press (1995) 344–350
11. Fitzpatrick, L., Dent, M.: Automatic feedback using past queries: social search-
    ing? In: Proceedings of the 20th annual international ACM SIGIR conference on
    Research and development in information retrieval, ACM Press (1997) 306–313
12. Glance, N.S.: Community search assistant. In: Proceedings of the 6th international
    conference on Intelligent user interfaces, ACM Press (2001) 91–96
13. Freyne, J., Smyth, B., Coyle, M., Briggs, P., Balfe, E.: Further experiments in
    collaborative ranking in community-based web search. AI Review: An international
    Science and Engineering Journal (In Press) (2004)
14. Freyne, J., Smyth, B.: Query based indexing in collaborative search. In: Submitted
    to 27th Annual International ACM SIGIR Conference. (2004)
15. Hill, W., Hollan, J., Wroblewzki, D., T.McCandless: Edit Wear and Read Wear. In:
    Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,
    ACM Press (1992) 3–9
16. Wexelblat, A., Maes, P.: Footprints: History-Rich Web Browsing. In: Proceed-
    ings of the Third International Conference on Computer-Assisted Information Re-
    trieval. (1997) Montreal, Quebec, Canada.
17. M.Twindale, D.N., Paice, C.: Browsing is a collaborative process. Information
    Processing and Management 33(6) (1997) 761–83
18. Balfe, E., Smyth, B.: Cabe based collaborative web search. In: Submitted to the
    7th European Conference on Cased Based Reasoning. (2004)

				
DOCUMENT INFO
Shared By:
Tags: Social, search
Stats:
views:13
posted:1/12/2012
language:
pages:10
Description: Social search, which means that by searching for a common interest to form interpersonal circle, and by searching each person's hobbies and collections to provide users with a more accurate information. Social search engines usually have a meta search, collection, circles and other functions, to eventually reach a whole to meet their social knowledge sharing concept.