User-defined classification on the online photo sharing site Flickr

Document Sample
User-defined classification on the online photo sharing site Flickr Powered By Docstoc
					        User-defined classification on
        the online photo sharing site

Hi Thanks so much -

I’m here to present my exploratory research on the folksonomy-based
image sharing system - Flickr.

Systems that use user-defined descriptions are becoming more and
more popular, with user bases in the several millions, with millions of
tags added weekly.

Because of these systems’ growing popularity, I’m making the
assumption that these systems are solving some user need, either for
exploration, navigation, retrieval.

        Or … How I learned to stop
      worrying and love Folksonomies

                     Megan Winget
                  School of Information
               University of Texas at Austin

Although they’re popular, and used by a myriad of people for a myriad
    of purposes, many in the information and library science community
    have reservations.

The most common criticism of these systems is that they lack control:
   there’s no synonym control, no ability to define hierarchical
   relationships, there’s a lack of precision and recall, and that they’re
   susceptible to “gaming.”

Essentially (and I’m simplifying), critics fear that user-defined cataloging
1) Represent an unacceptable level of disorder, (users do not have
   rules, they don’t tend to follow rules when they have them - that
   anarchy reigns in these systems)
2) Will not be able to return relevant and thorough results.
3) And that users of these systems do not have the administrative /
   procedural systems in place to reliably and thoroughly describe their
   objects / collections.
My experience with these systems have been very positive though, so I
   wanted to take a more systematic look at flickr to see if these
   criticisms carry any weight - to see if my experiences have been
   anomalous, or if the criticisms leveled at the system has been unfair.
   I also wanted to get an deeper experience with flickr, to identify any
   fruitful avenues of research in folksonomies, tagging, and
   collaborative cataloging

           Flickr Tagging Functionality
                                   •    User-Defined
                                   •    Different types of tags

                                   1.   Title
                                   2.   Groups (Photostream / Sets)
                                   3.   “Notes”
                                   4.   Description
                                   6.   Tags
                                   7.   Technical Information
                                        (date uploaded, creator,
                                        camera info…)

Before describing my “focused exploration” in more depth, I
  thought it might be helpful for those of you who might not
  be familiar with flickr to have a brief introduction.
There are 7 descriptive methods for every image in flickr
1. Title
2. Groups - internal to the collection = sets / photostream
            -external to the collection = pools
3. “Notes” - flashed based tool that allows people to select
    parts of an image and make comments on that selection
4. Description - a narrative description of the image
5. Comments - from other flickr members - represents a
6. Tags - the one word descriptive elements
7. Technical information (not on this screen - date uploaded,
   creator, camera information…)
For this study, I looked only at #6, tags.

                          Flickr Tags

                                               Tag: sixwordstory

                                           (title of picture must
                                           tell a story in 6 words)

Tags are user defined and free-form, but generally fall into 5 categories

Tags Tend to fall into 5 categories
1. Flickr Specific tags “sixwordstory” every image has a six word title
that tells a story
2. “squaredcircle” images of circles in squares
3. Geographic Descriptors (PARIS, Europe, France)
4. Narrative / Description / Genre (me, volcano, paper, landscape)
5. Characterization (beautiful, happy)
6. Dates

Tags: Flickr Specific :: Games

                      Tag: squaredcircle

Tags: Geographic

               tag :: paris

Tags: Date

             Tag: 0611

Tags: Narrative

                  Tag: happy

Tags: Description

                Tag: landscape

Flickr query returns

                   Most recent

                 Flickr Query Returns

                                                      Most interesting

When you perform a “query” such as it is - you can access that
  information in two different ways:
1. By “most recent” - images deposited in the system most recently will
   be at the top = tend to get multiple images from one photographer
2. By “most interesting” - flickr-defined, proprietary (behind the scenes,
   “patent pending”) algorithm to determine interestingness =
   problematic in that we have no idea how “interestingness” is
   determined. I have a guess that it’s a combination of metadata and
   activity surrounding an object, but we don’t know.
3. There’s also an automatic clustering algorithm, which will determine
   clusters within any tag. So for the tag “scotland” - there are four
   clusters, roughly defined as:
       1. edinburgh,
       2. landscapes:highlands,
       3. landscapes:coastline,
       4. glasgow

Flickr Query Returns


          My Flickr Experiment…
  • Choose three arbitrary tags (Monterey,
    polarbears, volcano)
  • Top 100 “interesting” images
  • Precision
    – Number of Relevant Returns
    – Attempt to find a specific image or image type in each
  • Weak Description
    – Average number of tags per image
    – Associated tags for the tagset
    – Where appropriate, compared the Flickr tags against
      known vocabularies and thesauri.

I was interested primarily in exploring the ideas that systems dependent
   on collaborative cataloging had a number of problems: 1) that the
   lack of control led to anarchy at worst, and thin, limited description at
   best; and in a related vein, the specific inability of these systems to
   deal with synonyms and hierarchy in a meaningful way led to
   confusion usability problems; and 3) that these systems have low
   precision (I.e., that people can not find what they are looking for)
For this project I chose three arbitrary noun-tags (volcano, polarbears,
   monterey). For each tag four pieces of specific information was
1) the number of images returned in the tag-set;

For the first one-hundred “interesting” returned images of each tag,
1) To address the issue of precision, I
       1) recorded the number of relevant returns for each ta (ihere
          relevance is defined in terms of common sense - an image
          tagged “volcano” contains an object that looks like a volcano)
       2) Tried to find a specific image or image type in each tagset
2) To address the issue of thin, weak description, I recorded
       1) the average number of tags for each image,
       2) the associated tags for each image,
       3) and where appropriate and possible compared flickr user
          tags against known thesauri.
Why did I choose the “interesting” images?

                      Relevant Returns
        • Relevant returns for each tag
           – Monterey = 89 relevant images
           – Volcano = 80 + 18 relevant images
           – Polarbears = 84 + 13 relevant images

Even supporters of collaborative cataloging have bought into the idea
   that they have low precision. Bruce Sterling of Wired Magazine
   notes, “a Folksonomy is nearly useless for searching out specific,
   accurate information, but that’s beside the point.” I was interested in
   exploring this idea in more detail.

First task: determine the relevant returns for each tag
•   Monterey 89 relevant images / 11 images Not relevant = 2
    misspellings of Monterrey Mexico / 5 people in hotels (they might be
    in Monterey, they might not) / 4 mis-tagged (Santa Cruz, San
    Simeon, 2 along the Pacific Coast Highway)
•   Volcanoes = well, 80 images of either conical shaped mountains or
    mountains with lava spewing out of them / 18 images that the
    description or title explained were on the slopes of a volcano, but
    not an image of the volcano itself
•   Polarbears = 84

This Kind of Polarbear (84)

…Or this kind of polarbear (13)

          Specific Image / Image Type
        • Monterey: Jellyfish Exhibit
        • Volcanoes: Mount Merapi
        • Polarbears: Central Park Zoo

I also wanted to address the idea that a folksonomy is “nearly useless”
for finding specific information. I decided to see if I could find specific
images or image types in the system using the “tag exploration”

So, using only the tag “monterey,” I wanted to see if there were any
images of the jellyfish exhibit at the monterey bay aquarium
Using only the tag “volcano,” I wanted to see if I could find an image of
Mount Merapi in Indonesia - erupting at the time
Using only the tag “polarbears,” I wanted to see if I could find an image
of a polar bear at the central park zoo doing a specific thing

                         Tag: Monterey


Surprisingly, only 44% (40) of the images tagged with “Monterey” were scenic
or landscape images, taken either on the 17-mile drive, the Pacific Coast
Highway, or somewhere else on the Monterey peninsula.
The majority of the images, 58%, were taken in the aquarium: 37% (33) were
of the jellyfish exhibit specifically, and 21% (19) were taken of other exhibits
in the aquarium.

Highights the social aspect of Flickr system

         Tag: Monterey

              Tag: Monterey

         Tag: Monterey

       Tag: Monterey

                    Tag: Volcano

The Flickr system returned 24,623 images with the tag “volcano.” There were
four clusters: Hawaii (Volcano National Park), Washington State (Mt. Saint
Helens), Italy (Vesuvius), and Costa Rica (Arenal Volcano). Eighty of the one
hundred most interesting images were obviously images of volcanoes, either
depicting eruptions or characteristic conical mountains. Of the remaining
twenty images, eighteen were landscape images of mountainous areas, which
could have been taken in a park surrounding a volcano, or on the slopes of a
volcano, but were not obviously images of volcanos. Two were images of
people, probably on a “volcano vacation,” as evidenced by hiking gear and
camping equipment.

              Tag: Volcano

           Tag: Volcano

               Tag: Volcano

             Tag: Volcano

           Tag: Volcano

          Central Park Zoo Polar Bear


For this image, I had a very specific image in mind - one that I didn’t
know if it even existed.

Had been to NYC, saw the polarbear doing something specific, wanted
to know if that was something that the polar bear did all the time, or if I
saw something out of the ordinary.

Presence of this image would tell me whether this polarbear behavior
was typical or unique.

So I found the image, although not just using “polarbears” and
“interesting” ranking = “polarbears” and “Central Park Zoo” cluster - this
picture was in the top 5.

             So What? (1)
• Idea that these systems are disordered
  & chaotic is not necessarily true
  – Relatively easy to find specific (arbitrary)
    images within a general tagset
  – Precision
     • if narrow = 89% for Monterey, 80% for
       Volcano, 84% for Polar Bears
     • If broader = 89% for Monterey, 98% for
       Volcano, and 97% for Polar Bears

             So What? (2)
• Cognitive and cultural inferences
  – Emergent structures within an arbitrary
    tagset (particularly illustrated by Monterey),
  – What that emergent structure is - difficult to
    define given this study, but worth more in-
    depth exploration
  – These emergent properties reflect and
    support “desire lines” and “serendipitous
    exploration” - two strengths of folksonomic

Addressing Weak Description
• Average number of tags per image
  within each tagset
• “Tag Cloud” = associated tags
• Where appropriate compared Flickr
  tags against known controlled
  vocabularies and thesauri

Average Number of Tags / Image

 • In top 100 Interesting Images

   – Polarbears = 11.79 tags per image
   – Monterey = 11.68 tags per image
   – Volcano = 20.89 tags per image

                         Volcano Tags

        • 81 of 100 “interesting” images =
          specifically named volcanoes
        • Allowed Comparison with the TGN
            – Use of Preferred Names
            – Represented Hierarchical Structure of
              Place Name Designation

There were an average of 20.89 tags per “volcano” image, with three images
having less than six tags and seven images having between thirty-five and
forty-two tags. Eighty-one of the hundred “most interesting” images were
specifically named volcanoes, which allowed for reliability checks of user
tagging against a known source, the Thesaurus of Geographic Names (TGN).
Two aspects of users’ tagging behaviors were compared against the TGN: 1)
the degree to which the Flickr users chose “preferred” names over variants.
Whether, for example, users tagged their images with the preferred name,
Vesuvio, or one of the variants (Vesuvius, Mount Vesuvius, Le Vésuve,
Vesubio, Vesuv). 2) We also looked at the degree to which Flickr users
represented the hierarchical structure of place-name description. TGN’s
hierarchical description of Vesuvius, for example, is: World (facet) : Europe
(Continent) : Italy (nation) : Campania (region) : Napoli (province) : Vesuvius
(volcano). We were interested in looking at how many of those terms Flickr
users chose to include in their tagging schema.

                      Volcanos Not in TGN
             Volcano Name                     Country         Closest TGN entry
      Grimsnes Volcano             Iceland              Not in TGN
      Rangitoto Island - Volcano   Auckland, NZ         Not in TGN
      Waitop o                     Auckland, NZ         Not in TGN
      Mount Paca y a               Guatemala            Not in TGN
      Rabaul Volcano               Papua New Guinea     Rabaul (Inhabited Pla c e )
      Roche Tuilie r e             France               La Tuiliere (Inhabited Place )
      Laguna Colorada              Bolivi a             Not in TGN
      Tangkuban Perahu             Indonesia            Not in TGN
      Lago Caburg a                Ch i l e             Not in TGN
      Parinaco t a                 Bolivi a             Not in TGN
      Ranu Rara k u                Easter Islan d       Not in TGN
      Stromboli Volcano            Italy                Stromboli Isola (Island)
      Calbuco Volcan o             Chile                Calbuco (Inhabited Pla c e )
      Osorno Volcano               Chile                Osorno (Inhabited Pla c e )
      Llanquihue Lake              Chile                Llanquihue (Inhabited Place)

14 specifically named volcanoes in Flickr were not in the TGN
      Web search revealed that these are actual volcanoes
      In 6 of these cases, the closest city, which had the same name
      as the volcano, was named

          Preferred & Alternate Names
                      Volcan o                  TGN Designation s                   Flickr Nam e s
                                  Vesuvio (preferred,C,V , N )                  Vesuvio
               Mount Vesuvius     Vesuvius (C,O,N,English-P )                   Vesuvius
                                  Mount Vesuvius (display,C,O,N,Eng l i s h )
                                  Vesuvius, Mount (C,O,N,English)
                                  Le Vésuve (C,O, N )
                                  Vesubio (C,O, N )
                                  Vesuv (C,O, N )
                                  Kilauea Crater (preferred,C,V , N )           Kilauea
               Mount Kilauea      Kilauea (C,V,N )
                                  Kilauea Caldera (C,V, N )
                                  Kirauea (C,V, N )
                                  Lahainaluna (C,V, N )
                                  Lua Peleo Kilauea (C,V, N )
                                  Fuji-san (preferred,C,V , N )                 Fuji
               Mount Fuji         Fujiyama (C,V,N )                             Fujisan
                                  Fuji-no-Yama (C,V, N )
                                  Fuji San (C,V , N )                           Mt.Fuji
                                  Fuji, Mount (C,O , N )
                                  Bromo, Gunung (preferred,C,V , N )            Bromo
               Mount Brom o       Bromo (C,V , N )                              GunungBromo

                                  Rainier, Mount (preferred,C,V, N )            Mt.Ranier
               Mount Rainier      Mount Tacoma (C,V , N )                       MountRainier
                                 Taranaki, Mount (preferred, C,V,N)             Taranaki
               Mount Taranaki    Egmont, Mount (C,V,N)                          MtTaranaki

In four of the five cases (Vesuvio, Fujisan, Bromo Gundung, Mount Rainier),
Flickr users tagged the image with the preferred name of the volcano. In three
of the same cases, users also provided at least one alternate name. Only
Kilauea and Mount Rainier were not tagged with alternate names, and only the
image of Kilauea was not tagged with the preferred name.

        Hierarchical Tags :: TGN / Flickr
     Volcano Name     # Images            TGN Hierarchy                   Flickr Geographic
    Mount Augustine      13      World
                                 :: North & Central America (continent)   America, Alaska, US, AK,
                                 :: United States (nation)                Alaskapeninsula,
                                 :: Alaska (state)                        Kenaipeninsula,
                                 :: Kenai Peninsula (national division)   CookInlet, Homer,
                                 :: Augustine Volcano (peak)              MountAugustine,

    Volcán Atitlán       2       World
                                 :: North & Central America (continent)   CentralAmerica, Central,
                                 :: Guatemala (nation)                    Guatemala, AtitlanVolcan
                                 :: Sololá (department)
                                 :: Atitlán Volcán (volcano)
    Volcán Arenál        4       World
                                 :: North & Central America (continent)   CostaRica, FortunaArea,
                                 :: Costa Rica (nation)                   ArenalVolcan, Arenal
                                 :: Alajuela (province)
                                 :: Arenal, Volcan (volcano)

degree to which Flickr users tagged their images using hierarchical terms as
determined by the TGN.

This table shows the preferred volcano name, the number of images of that
volcano in the Flickr set, the TGN hierarchy, and the geographic terms chosen
by Flickr users. The table shows that Flickr users almost always included all
the terms in the TGN hierarchy, except for the “world” facet designation, and
in many cases include extra information not included in the TGN. For
example, Mount Augustine, an active volcano on the Kenai Peninsula, was
described using all of the terms defined by the TGN, like “America, Alaska,
Kenai Peninsula, and Augustine Volcano,” but Flickr users also included
additional terms, like “HomerAlaska,” the nearest town, and “CookInlet”
the closest inlet to the volcano. Flickr users describe Mount Taranaki, in New
Zealand, using both its aboriginal and anglicized names: Mount Taranaki and
Egmont; and the person who tagged the images of Mount Vesuvius gave both
Italian and English versions of all the major geographical descriptors (Italy /
Italia, Vesuvio / Vesuvius, and Naples / Napoli – the province).

   Volcano       # Images            TGN Hierarchy                    Flickr Geographic
    Name                                                                    Terms
Mount Taranaki      2       World
[Egmont]                    :: Oceania (continent)                   NewZealand, NorthIsland,
                            :: New Zealand (nation)                  Egmont, MountTaranaki,
                            :: North Island (national division)      Taranaki(Mt.Egmont)
                            :: Taranaki (mount)
                            :: Mount Taranaki (mountain)
Vesuvio             2       World
                            :: Europe (continent)                    Italia, Italy, golfo, gulf,
                            :: Italy (nation)                        Europe, Naples, Napoli,
                            :: Campania (region)                     Vesuvius, Vesuvio,
                            :: Napoli (province)
                            :: Vesuvius (volcano)

Mount Rainier       3       World
                            :: North & Central America (continent)   Washington, Mt.Rainier,
                            :: United States (nation)                MtRainier, MountRainier,
                            :: Washington (state)                    MountRainierNationalPark
                            :: Pierce (county)
                            :: Ranier, Mount (mountain)

•   User-supplied description = pretty good
•   Average number of tags per image = not low
•   General use of preferred names
•   In some cases description augments existing
    authoritative sources
    – (unrecognized volcanoes)
    – (extra geographical terms)
• Users may not have rules, but they try to do
  the right thing…
    – Try to account for synonyms
    – Try to describe hierarchy…

         Future Research (1)

• Development of a different metric for success -
  traditional IR measures and library objectives
  are not appropriate
• “user need”
  – Why are people using these systems
    (storing / exploring / ?!)
  – Why are they tagging
  – How does reputation for reliability in tagging work

       Future Research (2)
• Need a more extensive exploration of
  folksonomy tag-set (more than three tags,
  more than top 100 “interesting” images,
  random images - opposed to “interesting”)
• More systematic exploration of emergent
  structures within tagsets…(?)
• Develop open, non-proprietary, transparent
  system to study these phenomena…


Megan Winget