Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

PowerPoint Presentation - Departamento de Sistemas Informáticos

VIEWS: 21 PAGES: 63

									Statistical Profiles of
Highly-Rated Web Sites
Melody Y. Ivory    Marti A. Hearst
Group for User Interface Research
UC Berkeley

ACM CHI Conference
April 25, 2002
  The WebTango Goal
Web Site Design                                               Quality Designs
                                          Profiles


                                          Quality
                                          Checker


                                              •Predictions
                                              •Similarities
                                              •Differences
                                              •Suggestions
                                              •Design
                                               Modification
  Statistical Profiles of Highly-Rated Web Sites                                2
  The WebTango Goal
Web Site Design                                                Quality Designs
                                          Profiles


                                          Quality
                                          Checker


                                               •Predictions
                                               •Similarities
                                               •Differences
                                               •Suggestions
                                               •Design
                                                Modification
  Statistical Profiles of Highly-Rated Web Sites                                 3
Talk Outline
 Developing statistical profiles
 Applying the profiles
 Validating the profiles
 Next Steps




Statistical Profiles of Highly-Rated Web Sites   4
Developing Statistical Profiles:
The WebTango Approach
                                                 1.   Create a large set of
Idea: Reverse engineer design patterns                measures to assess various
from high-quality sites and use to                    design attributes
check the quality of other sites
                                                      (benchmark)
                                                 2.   Obtain a large set of
                Measures
                                                      evaluated sites
                                                 3.   Create models of good vs.
                                                      avg. vs. poor sites
  Validate                            Data            (guidelines)
                                                         •   Take into account the
                                                             context and type of
                                                             site
                                                 4.   Use models to evaluate
      Evaluate              Models                    other sites (guideline
                                                      review)
                                                 5.   Validate models

Statistical Profiles of Highly-Rated Web Sites                                       5
Step 1: Measuring Web
Design Aspects
 Identified key aspects from the literature
      – Extensive survey of Web design literature: texts from
        recognized experts; user studies
            • the amount of text on a page, text alignment, fonts, colors, consistency
              of page layout in the site, use of frames, …
      – Example guidelines
            • Use 2–4 words in text links [Nielsen00].
            • Use links with 7–12 useful words [Sawyer & Schroeder00].
            • Consistent layout of graphical interfaces result in a 10–25% speedup in
              performance [Mahajan & Shneiderman96].
            • Use several layouts (e.g., one for each page style) for variation within
              the site [Sano96].
            • Adhere to accessibility principles in order to create sites that serve a
              broad user community [Cooper99; Nielsen00]
            • Avoid using ‘Click Here’ for link text [Nielsen00]
            • Use left-justified, ragged-right margins for text [Schriver97]
      – No theories about what to measure
Statistical Profiles of Highly-Rated Web Sites                                           6
157 Web Design Measures
(Metrics Computation Tool)
                                     experience
                                     design      Text Elements (31)
                     SA                               –   # words, type of words
                    PP                            Link Elements (6)
                                                      –   # graphic links, type of links
                    PF
                                                  Graphic Elements (6)
                                                      –   # images, type of images
          TF         LF        GF
                                                  Text Formatting (24)
        TE           LE           GE                  –   # font styles, colors, alignment, clustering
                                                  Link Formatting (3)
                                    information,
                                                    – # colors used for links, standard colors
                                    navigation,
                                    & graphic  Graphics Formatting (7)
                                    design          – max width of images, page area
                                                  Page Formatting (27)
                                                      –   quality of color combos, scrolling
                                                  Page Performance (37)
                                                      –   download time, accessibility, scent quality
                                                  Site Architecture (16)
                                                      –   consistency, breadth, depth

Statistical Profiles of Highly-Rated Web Sites                                                      7
Step 2: Obtaining a Sample of
Evaluated Sites
 Webby Awards 2000
      – Only large corpus of rated Web sites
 3000 sites initially
      – 27 topical categories
            • Studied sites from informational categories
                  – Finance, education, community, living, health, services
 100 judges
      – International Academy of Digital Arts & Sciences
            • Internet professionals, familiarity with a category
      – 3 rounds of judging (only first round used)
            • Scores are averaged from 3 or more judges
            • Converted scores into good (top 33%), average (middle
              34%), and poor (bottom 33%)
Statistical Profiles of Highly-Rated Web Sites                                8
Webby Awards 2000
 6 criteria
      –   Content
      –   Structure & navigation
      –   Visual design
      –   Functionality
      –   Interactivity
      –   Overall experience
 Scale: 1–10 (highest)
 Nearly normally
    distributed


Statistical Profiles of Highly-Rated Web Sites   9
The Data Set
 Downloaded pages from sites using the
    Site Crawler Tool
      – Downloads informational pages at multiple
        levels of the site
 Used the Metrics Computation Tool to
    compute measures for the sample
      – Processes static HTML, English pages
            • Measures for 5346 pages
            • Measures for 333 sites
                  – No discussion of site-level models


Statistical Profiles of Highly-Rated Web Sites           10
Step 3: Creating Prediction
Models
                                                  Statistical analysis of
                                                   quantitative measures
                             Good                   – Methods
                                                        • Classification &
                                                          regression tree, linear


            ?               Average
                                                          discriminant
                                                          classification, & K-
                                                          means clustering
                                                          analysis
                                                    – Context sensitive models
                                                        • Content category, page
                              Poor                        style, etc.
                                                    – Models identify a subset
                                                      of measures relevant for
                                                      each prediction


Statistical Profiles of Highly-Rated Web Sites                                      11
Page-Level Models (5346 Pages)

 Model                                           Method      Accuracy
                                                          Good Avg. Poor
 Overall page quality                            C&RT     96% 94% 93%
 ~1782 pgs/class

 Content category quality                        LDC      92% 91% 94%
 ~297 pgs/class & cat



 ANOVAs showed that all differences in measures were
 significant (good vs. avg, good vs. poor, etc.)


Statistical Profiles of Highly-Rated Web Sites                             12
Page-Level Models (5346 Pages)
Page Type Classifier (decision tree)
      Home page, content, form, link, other
      1770 manually-classified pages, 84% accurate

 Model                                           Method         Accuracy
                                                          Good Avg. Poor
 Page type quality                               LDC      84% 78% 84%
 ~356 pgs/class & type
 Overall page quality                            C&RT     96%     94%   93%

 Content category quality                        LDC      92%     91%   94%


 ANOVAs showed that all differences in measures were
 significant (good vs. avg, good vs. poor, etc.)
Statistical Profiles of Highly-Rated Web Sites                                13
Characteristics of Good Pages
 K-means clustering to
  identify 3 subgroups                           Small
 ANOVAs revealed key                            page
  differences
      – # words on page, HTML
        bytes, table count
 Characterize clusters as:
   – Small-page cluster                          Large
     (1008 pages)                                page
   – Large-page cluster
     (364 pages)
   – Formatted-page cluster
     (450 pages)
 Use for detailed analysis of                   Formatted
    pages                                        page



Statistical Profiles of Highly-Rated Web Sites           14
Step 4: Evaluate Other Sites
 Embed prediction profiles into an Analysis
    Tool
      – For each model
            • Prediction: good, average, poor, mapped cluster
            • Rationale: decision tree rule, deviant measures, etc.
      – Example page-level feedback
            • Overall page quality model
                  – Predicted quality: poor
                  – Rationale: if ((Italicized Body Word Count is not missing AND
                    (Italicized Body Word Count > 2.5)))
            • Good page cluster model
                  – Mapped cluster: small-page, Cluster distance: 22.74
                  – Similar measures: Word Count;Good Word Count …
                  – Deviant measures: Link Count [12.0] out of range (12.40--
                    41.24);Text Link Count [2.0] out of range (4.97--27.98)…
      – Limitation: no suggestions for improvement or examples
Statistical Profiles of Highly-Rated Web Sites                                      15
Example Assessment
 Demonstrate use of profiles to assess site
    quality and identify areas for improvement
      – Similar to the evaluation scenario presented earlier
 Site drawn from Yahoo Education/Health
      – Discusses training programs on numerous health
        issues
      – Not in original study
      – Chose one that looked good at first glance, but on
        further inspection seemed to have problems.
      – Only 9 pages were available, at level 0 and 1

Statistical Profiles of Highly-Rated Web Sites               16
Sample Page (Before)
 Content Page




Statistical Profiles of Highly-Rated Web Sites   17
Page-Level Assessment
 Decision tree predicts: all 9 pages consistent
    with poor pages
      – Content page does not have accent color; has
        colored, bolded body text words
            • Avoid mixing text attributes (e.g., color, bolding, and size)
              [Flanders & Willis98]
            • Avoid italicizing and underlining text [Schriver97]




Statistical Profiles of Highly-Rated Web Sites                                18
Page-Level Assessment
 Cluster mapping
      – All pages mapped into the small-page cluster
      – Deviated on key measures, including
            • text link, link cluster, interactive object, content link word, ad
            • Most deviations can be attributed to using graphic links without
              corresponding text links
                – Use corresponding text links [Flanders & Willis98,Sano96]


  Top deviant measures
  for content page

                                                 Good Link
                        Link                                 Font Count                Display
                                     Text Link   Word                     Sans Serif   Word
                        Count                    Count
                                     Count                                Word         Count
                                                                          Count

Statistical Profiles of Highly-Rated Web Sites                                                   19
Page-Level Assessment
 Compared to models for health and
    education categories
      – All pages found to be poor for both models
 Compared to models for the 5 page
    styles
      – All 9 pages were considered poor pages by
        page style (after correcting predicted
        types)


Statistical Profiles of Highly-Rated Web Sites       20
Improving the Site
 Eventually want to automate the translation
  from differences to recommendations
 Revised the pages by hand as follows:
      – To improve color count and link count:
            • Added a link text cluster that mirrors the content of the
              graphic links
      – To improve text element and text formatting
        variation
            • Added headings to break up paragraphs
            • Added font variations for body text and headings and
              made the copyright text smaller
      – Several other changes based on small-page cluster
        characteristics
Statistical Profiles of Highly-Rated Web Sites                            21
Sample Page (After)
 Content Page




Statistical Profiles of Highly-Rated Web Sites   22
After the Changes
 All pages now classified correctly by
  style
 All pages rated good overall
 All pages rated good health pages
 Most pages rated as average education
  pages
 Most pages rated as average by style


Statistical Profiles of Highly-Rated Web Sites   23
Step 5: Validating the
Prediction Models
 Small user study
      – Hypothesis: pages and sites modified based on
        the profiles are preferred over original versions
      – 5 sites modified based on profiles (including the
        example site)
            • Modifications by 2 undergraduate (Deep Debroy & Toni
              Wadjiji) and 1 graduate student (Wai-ling Ho-Ching)
                  – Students had little to no design experience
                  – Same procedure as in the example assessment
                  – Minimal changes based on overall page quality and good
                    page cluster models
      – 13 participants
            • 4 professional, 3 non-professional, and 6 non Web
              designers

Statistical Profiles of Highly-Rated Web Sites                               24
Profile Evaluation
 Small user study
      – Page-level comparisons (15 page pairs)
            • Participants preferred modified pages (57.4% vs. 42.6%
              of the time, p =.038)
      – Site-level ratings (original and modified versions of
        2 sites)
            • Participants rated modified sites higher than original sites
              (3.5 vs. 3.0., p=.025)
            • Non Web designers had difficulty gauging Web design
              quality
      – Freeform Comments
            • Subtle changes result in major improvements



Statistical Profiles of Highly-Rated Web Sites                           25
Summary of the WebTango
Approach
                                                    Advantages
                                                     –   Derived from empirical
                                                         data
                Measures                             –   Context-sensitive
                                                     –   More insight for
                                                         improving designs
  Validate                            Data
                                                     –   Evolve over time
                                                     –   Applicable to other
                                                         types of UIs

      Evaluate              Models
                                                    Limitations
                                                     –   Based on expert ratings
                                                     –   Correlation, not
                                                         causality
                                                     –   Not a substitute for
                                                         user studies
Statistical Profiles of Highly-Rated Web Sites                                     26
Next Steps
 Update the profiles (Webby 02 data)
 Develop tool to facilitate interpretation of predictions
 Examine the profiles in more detail
      – Factor analysis to highlight design patterns
      – See which guidelines are valid empirically (studies)
            • Moving from predictions to recommendations
 Incorporate assessments of content quality (text
  analysis & studies)
 Improve site-level measures and models
      – Incorporate page-level predictions
 New page-level measures (spatial properties)
 Develop interactive Web design tool
      – Early designs and implemented sites
Statistical Profiles of Highly-Rated Web Sites                 27
Thank You
 For more information
   – http://webtango.berkeley.edu




Statistical Profiles of Highly-Rated Web Sites   28
Assessment of the CHI 2002
Home Page         Predicted page style: home
                                                 Home Page Quality: poor
                                                     Rationale: font sizes, no
                                                     graphical links, accent
                                                     colors, …
                                                 Overall Quality: poor
                                                     Rationale: no accent
                                                     color, footer text not
                                                     formatted with smaller
                                                     font
                                                 Cluster: Small page
                                                     Distance: 19.8
                                                     Rationale: slightly more
                                                     words, vertical scrolling,
                                                     tables, text columns
Statistical Profiles of Highly-Rated Web Sites                                29
Page-Level Measures




Statistical Profiles of Highly-Rated Web Sites   30
Word Count: 157




Statistical Profiles of Highly-Rated Web Sites   31
Good Word Count: 81




Statistical Profiles of Highly-Rated Web Sites   32
Body Word Count: 94




Statistical Profiles of Highly-Rated Web Sites   33
Link Count: 34




Statistical Profiles of Highly-Rated Web Sites   34
Page Title Hits: 3




Statistical Profiles of Highly-Rated Web Sites   35
Visible Link Text Hits: 25




Statistical Profiles of Highly-Rated Web Sites   36
Site-Level Measures & Model
Development




Statistical Profiles of Highly-Rated Web Sites   37
Text Element Variation: 119%
          Good Word Count = 81     Good Word Count = 733    Good Word Count = 240    Good Word Count = 292
          Average Link Words = 3   Average Link Words = 2   Average Link Words = 2   Average Link Words = 2
          …                        …                        …                        …




          Good Word Count = 236    Good Word Count = 142    Good Word Count = 72     Good Word Count = 29
          Average Link Words = 2   Average Link Words = 2   Average Link Words = 2   Average Link Words = 2
          …                        …                        …                        …




          Good Word Count = 785    Good Word Count = 294    Good Word Count = 363    Good Word Count =
          Average Link Words = 2   Average Link Words = 2   Average Link Words = 2   1350
          …                        …                        …                        Average Link Words = 2
                                                                                     …




Statistical Profiles of Highly-Rated Web Sites                                                                38
Page Title Variation: 185%
                                       Page Title Hits = 3
                                       Page Title Score = 3




                                                              Page Title Hits = 0
                                                              Page Title Score = 0




                                       Page Title Hits = 3    Page Title Hits = 2
                                       Page Title Score = 3   Page Title Score = 2




Statistical Profiles of Highly-Rated Web Sites                                       39
Example Webby Pages




Statistical Profiles of Highly-Rated Web Sites   40
Example Page from Good Site




Statistical Profiles of Highly-Rated Web Sites   41
Example Page from Avg. Site




Statistical Profiles of Highly-Rated Web Sites   42
Example Page from Poor Site




Statistical Profiles of Highly-Rated Web Sites   43
Example Assessment: Site-
Level Analysis




Statistical Profiles of Highly-Rated Web Sites   44
Example Assessment
 Questions:
  – Is this a high-quality site? (not discussed)
  – Are these high-quality pages?
  – What can be done to improve the quality?
  – Is the quality improved after making these
    changes?
 Quality examined in a user study
    (discussed later)

Statistical Profiles of Highly-Rated Web Sites     45
Site-Level Assessment
 Model predicted site to be
      – Poor site overall
         • 31% variation in link element measure, but not large
           variation in other element measures
         • Major source of link variation was text link count
             – 8 pages had 2-4 links; one page had 27 links
      – Poor health site
         • Inadequate text element variation
             – Paragraphs with no headings, only one font face
      – Good education site
         • Education and Health sites differ on all measures, except
           graphic formatting variation
         • Need to incorporate page-level predictions into the site
           model
                  – Median page-level predictions currently reported
Statistical Profiles of Highly-Rated Web Sites                         46
Analysis of Webby Awards
Judging




Statistical Profiles of Highly-Rated Web Sites   47
 What criteria contribute most to
 overall rating?

                     Figure 2a. Review Stage
          Contribution of Specific Criteria to Overall Site
                              Rating
 1
0.9
0.8
0.7
0.6
0.5
0.4
0.3

       Content        Navigation     VisualDesign   Interactivity   Functionality




 Statistical Profiles of Highly-Rated Web Sites                                     48
Summary of Findings
 The specific ratings do explain overall
  experience.
 The best predictor of overall score is
  content.
 The second best predictor is
  interactivity.
 The worst predictor is visual design



Statistical Profiles of Highly-Rated Web Sites   49
Do criteria contributions vary
across content categories?
 The importance of criteria varies by
  category.
 Content is by far the best predictor of
  overall site experience. Interactivity
  comes next.
 Visual Design does not have as much
  predictive power except in specific
  categories

Statistical Profiles of Highly-Rated Web Sites   50
Architecture of the WebTango
Tools




Statistical Profiles of Highly-Rated Web Sites   51
The WebTango Tools




Statistical Profiles of Highly-Rated Web Sites   52
The WebTango Tools




Statistical Profiles of Highly-Rated Web Sites   53
The WebTango Tools




Statistical Profiles of Highly-Rated Web Sites   54
The WebTango Tools




Statistical Profiles of Highly-Rated Web Sites   55
Prior Model Building Studies




Statistical Profiles of Highly-Rated Web Sites   56
Prior Studies
 Used quantitative measures to predict Web
    page ratings
      –   HFWeb 2000, CHI 2001 [Ivory et al. 00; Ivory et al. 01]
      –   11 quantitative measures
      –   1898 pages from 400+ sites
      –   Assumed site ratings apply to all pages in sites
      –   Distinguishing pages with good expert ratings
            • Model 1: good (top 33%) vs. rest (bottom 67%)
            • Model 2: good (top 33%) vs. poor (bottom 33%)
      – Considering context (content category)
      – Accuracies ranging from 80%–65%


Statistical Profiles of Highly-Rated Web Sites                      57
Findings in Earlier Study
[Ivory et al. 01]
      – Using linear discriminant analysis
      – Model 1: For pages from good sites vs. rest
            • 67% correct when not considering content categories
            • 73% correct when taking content categories into account
      – Model 2: For pages from good sites vs. poor sites
            • 65% correct when not considering content categories
            • 80% correct when taking content categories into account




Statistical Profiles of Highly-Rated Web Sites                      58
Study of Usability and Expert
Ratings




Statistical Profiles of Highly-Rated Web Sites   59
Do Webby Ratings Reflect
Usability?
 Do the profiles assess usability or something else?
 User study (30 participants)
      – Usability ratings (WAMMI scale) for 57 sites
            • Two conditions – actual and perceived usability
      – Contrast to judges’ ratings
 Results
      – Some correlation between users’ and judges’ ratings
      – Not a strong finding
      – Virtually no difference between actual and perceived
        usability ratings
            • Participants thought it would be easier to find info in the perceived
              usability condition



Statistical Profiles of Highly-Rated Web Sites                                        60
Home Page Assessments




Statistical Profiles of Highly-Rated Web Sites   61
Assessment of GVU Home
Page
                                                 Predicted page style:
                                                 link (average)
                                                 Overall Quality:
                                                 Average
                                                 Rationale: min graphic
                                                 width > 8.5
                                                 Cluster: Small page
                                                 Differences: word
                                                 counts
                                                 Education Quality:
                                                 Average



Statistical Profiles of Highly-Rated Web Sites                            62
Assessment of the                                      School
Home Page                                        Take away: example of
                                                 when the system fails due
                                                 to extensive use of scripts
                                                 Predicted page style: home
                                                 Home Page Quality: poor
                                                     Rationale: too few
                                                     redundant links,
                                                     interactive objects; too
                                                     many scripts, italicized
                                                     body text
                                                 Overall Quality: poor
                                                     Rationale: use of
                                                     italicized body text
                                                 Cluster: Formatted page
                                                 Education Quality: poor
Statistical Profiles of Highly-Rated Web Sites                              63

								
To top