tagging by xiangpeng

VIEWS: 24 PAGES: 55

									Definitions                                                      Analisys




             From Folksonomy to Collaborative Tagging
                How Social Bookmarking can improve Web Search


                               Debora Donato

                        Yahoo! Research – Barcelona, Spain


                      Seminars of Computer Networks
Definitions                Analisys




       1     Definitions



       2     Analisys
Definitions                                                              Analisys




Taxonomy



       Tagging from taxonomies
       Taxonomy is the branch of science concerned with classification
       (for organisms, books, files, etc.).
Definitions                                                              Analisys




Taxonomy



       Tagging from taxonomies
       Taxonomy is the branch of science concerned with classification
       (for organisms, books, files, etc.).

             performed by an authority (librarian)
Definitions                                                              Analisys




Taxonomy



       Tagging from taxonomies
       Taxonomy is the branch of science concerned with classification
       (for organisms, books, files, etc.).

             performed by an authority (librarian)
             derived from author’s indications
Definitions                                                              Analisys




Taxonomy



       Tagging from taxonomies
       Taxonomy is the branch of science concerned with classification
       (for organisms, books, files, etc.).

             performed by an authority (librarian)
             derived from author’s indications
             hierarchical and exclusive.
Definitions                                                                   Analisys




Taxonomy



       Tagging from taxonomies
       Taxonomy is the branch of science concerned with classification
       (for organisms, books, files, etc.).

             performed by an authority (librarian)
             derived from author’s indications
             hierarchical and exclusive.
                 each object belongs to one, unambiguous category which is
                 within a more general one.
Definitions                                                    Analisys




Taxonomy (cont.)

       Classification Systems
       Linnaen system : for living things.
       Dewey Decimal classification for libraries.
       Computer File system for organizing electronic files.
Definitions                                                         Analisys




Taxonomy (cont.)




       Limitations
            no authority that can play the ”librarian” role
             too much content for a single authority to classify
Definitions                                                         Analisys




Taxonomy (cont.)




       Limitations
            no authority that can play the ”librarian” role
             too much content for a single authority to classify



                     Both of these traits are true for the Web
Definitions                                                                 Analisys




Tagging


       Tagging
       Tagging is the process of classifying shared content assigning it
       labels or keywords. Tagging is inclusive and non-hierarchical.
Definitions                                                                 Analisys




Tagging


       Tagging
       Tagging is the process of classifying shared content assigning it
       labels or keywords. Tagging is inclusive and non-hierarchical.

       Cat species native to Africa
           C: /articles/cats
Definitions                                                                 Analisys




Tagging


       Tagging
       Tagging is the process of classifying shared content assigning it
       labels or keywords. Tagging is inclusive and non-hierarchical.

       Cat species native to Africa
           C: /articles/cats
             C: /articles/africa
Definitions                                                                 Analisys




Tagging


       Tagging
       Tagging is the process of classifying shared content assigning it
       labels or keywords. Tagging is inclusive and non-hierarchical.

       Cat species native to Africa
           C: /articles/cats
             C: /articles/africa
             C: /articles/africa/cats
Definitions                                                                 Analisys




Tagging


       Tagging
       Tagging is the process of classifying shared content assigning it
       labels or keywords. Tagging is inclusive and non-hierarchical.

       Cat species native to Africa
           C: /articles/cats
             C: /articles/africa
             C: /articles/africa/cats
             C:/articles/cats/africa
Definitions                                        Analisys




Semantic and Cognitive Aspects of Classification




       Problems
           polysemy (for example: windows)
Definitions                                        Analisys




Semantic and Cognitive Aspects of Classification




       Problems
           polysemy (for example: windows)
             synonymy
Definitions                                        Analisys




Semantic and Cognitive Aspects of Classification




       Problems
           polysemy (for example: windows)
             synonymy
             “basic level” variation
Definitions                                        Analisys




Semantic and Cognitive Aspects of Classification




       Problems
           polysemy (for example: windows)
             synonymy
             “basic level” variation
                  superordinate
Definitions                                        Analisys




Semantic and Cognitive Aspects of Classification




       Problems
           polysemy (for example: windows)
             synonymy
             “basic level” variation
                  superordinate
                  subordinate
Definitions                                                                Analisys




       Collaborative Tagging
       “Collaborative tagging describes the process by which many users
       add metadata in the form of keywords to shared content.
       [Golder and Huberman, 2005]”
Definitions                      Analisys




Collaborative Tagging (cont.)
       Folksonomy
Definitions                                                               Analisys




Collaborative Tagging


       Social Bookmarking Systems
       Social Bookmarking websites allow users to store, organize, search,
       and manage bookmarks of web pages on the Internet with the help
       of metadata.
Definitions                                                                  Analisys




Collaborative Tagging


       Social Bookmarking Systems
       Social Bookmarking websites allow users to store, organize, search,
       and manage bookmarks of web pages on the Internet with the help
       of metadata.

       Features [Miller and Feinberg, 2006]
             centrally stored bookmark collection;
             keywords explicitly entered by the user;
             multiple tags (no hierarchical limitation);
             social nature (bookmarks visible to others, “clickable” user
             names, “clickable” tags)
Definitions                                                               Analisys




Tagging and Taxonomy (cont.)




       Social news
       Social News websites are communities which allow its users to
       submit news stories, articles and media (videos/pictures) and share
       them with other users or the general public. Some of these articles
       will be given more visibility, depending on various factors such as
       the number of user votes for each item.
Definitions    Analisys




del.icio.us
Definitions                    Analisys




Popular tags in del.icio.us
Definitions   Analisys




MyWeb
Definitions   Analisys




citeulike
Definitions   Analisys




citeulike
Definitions   Analisys




digg
Definitions                                          Analisys




Problem statement [Heymann et al., 2008]




       Can social bookmarking improve web search?
Definitions                                          Analisys




Problem statement [Heymann et al., 2008]




       Can social bookmarking improve web search?
             Are there enough URLs?
Definitions                                          Analisys




Problem statement [Heymann et al., 2008]




       Can social bookmarking improve web search?
             Are there enough URLs?
             Are there enough tags?
Definitions                                          Analisys




Problem statement [Heymann et al., 2008]




       Can social bookmarking improve web search?
             Are there enough URLs?
             Are there enough tags?
             Are the URLs valuable?
Definitions                                          Analisys




Problem statement [Heymann et al., 2008]




       Can social bookmarking improve web search?
             Are there enough URLs?
             Are there enough tags?
             Are the URLs valuable?
             Are the tags redundant?
Definitions                                     Analisys




Tags versus Other Content




       Data types accessed by search engine:
             1   page content
             2   link structure
             3   query or clickthrough log
Definitions                                                       Analisys




Tags versus Other Content




       Data types accessed by search engine:
             1   page content
             2   link structure
             3   query or clickthrough log
             4   user generate content (e.g., tags, bookmarks)
Definitions                    Analisys




             1   Definitions
             2   Analisys
Definitions                                                                    Analisys




Recently Modified URLs



             The bulk of posts to del.icio.us are modified either about a day
             ago, or about ten days ago, at the time when they are posted.
             Search engines like to return more recently modified pages.
             Furthermore, the first result returned for a query is generally
             more recently modified than the top 10 results.
             The distribution of URLs posted to del.icio.us is very similar
             to the distribution of URLs that search engines
             Both del.icio.us and search results tend to have much more
             recently modified URLs than ODP.
Definitions                      Analisys




Recently Modified URLs (cont.)
Definitions                                                                     Analisys




Recently Modified URLs (cont.)




       Conclusion
       del.icio.us users post interesting pages that are actively updated or
       have been recently created.
Definitions                                                             Analisys




New and Unindexed Pages



       25% of URLs posted to del.icio.us by users are new, unindexed
       pages which will later be indexed.
Definitions                                                                Analisys




New and Unindexed Pages (cont.)




       Conclusion
       del.icio.us can serve as a (small) data source for new web pages
       and to help crawl ordering.
Definitions                                                              Analisys




Search Result and URL Overlap


       How do the URLs that people choose to post about relate to the
       sorts of URLs that are returned as search results?
Definitions                                                                 Analisys




Search Result and URL Overlap


       How do the URLs that people choose to post about relate to the
       sorts of URLs that are returned as search results?




             queries from the AOL query log dataset.
             queries submitted using the Yahoo! Search API..
             looked in the top 10 and top 100 results for URLs that were
             present in del.icio.us.
Definitions                                                                Analisys




Search Result and URL Overlap (cont.)



       Conclusion
       del.icio.us URLs are disproportionately common in search results
       compared to their coverage.

             Top 10 Search Results in del.icio.us: 19%
             Top 100 Search Results in del.icio.us: 9%
Definitions                                                                Analisys




Search Result and URL Overlap (cont.)



       Conclusion
       del.icio.us URLs are disproportionately common in search results
       compared to their coverage.

             Top 10 Search Results in del.icio.us: 19%
             Top 100 Search Results in del.icio.us: 9%

       N. B. Set weighted toward more popular queries
Definitions                                                                Analisys




User Concentration
             As general result: social websites can become highly
             dependent on a small group of users. For example, on the
             social news site Digg the top 100 users control 56% of the
             content.
             In del.icio.us, in order to cover over 50% of the content
             posted, one needs to include over 30,000
Definitions                                                                      Analisys




High Proportion of New URLs
             User actions are largely uncoordinated: they could post the
             same URLs over and over again.
             del.icio.us has relatively little redundancy in page information
             for perhaps 50% of URLs and high redundancy for perhaps
             20%.
Definitions                                                                 Analisys




Query Terms and Tags Overlaps
             No much correlation between the popularity of a tag and the
             popularity of a query term.
             Relatively high overlap between popular query terms and
             popular tags
Definitions                                                                  Analisys




Size and Growth
             120,000 posts per day (one tenth of the blogosophere.)
             There were roughly 115 million public posts, coinciding with
             about 30-50 million unique URLs in del.icio.us as of June
                     1
             2007 ( 1000 of the Web).
Definitions                                        Analisys




Tags significance




             tags are present in:
                  50% of the text of the pages
                  16% of the title of the pages
Definitions                                                                      Analisys




Conclusions




             V Social bookmarking URLs are new and recent, though many
               tags may be redundant (given title, text, domains).
             X Social bookmarking is a large phenomenon, but not nearly as
               large as the web. item[V] Despite this, relevant URLs are well
               represented, and popular tags overlap with popular queries.
Definitions                                                              Analisys




       ¡
             Golder, S. A. and Huberman, B. A. (2005).
             The structure of collaborative tagging systems.

             Heymann, P., Koutrika, G., and Hector, G.-M. (2008).
             Can social bookmarking improve web search?
             In Proceedings of WSDM 2008.

             Miller, D. R. and Feinberg, J. (2006).
             Using social tagging to improve social navigation.
             In Workshop on the Social Navigation and Community based
             Adaptation Technologies.

								
To top