Document Sample
quantitative Powered By Docstoc
empirical user studies

Karrie Karahalios, Eric Gilbert
6 April 2007

some slides courtesy of Brian Bailey and John Hart

 • Conduct user study to gain more precise measure of the
   usability of an interface or system
 • Complements low-fidelity techniques

 • Requires a larger investment than low-fi prototyping
 • Provide positive experience for users!
Empirical User Studies

 • Measure performance, error rate, learnability and retention,
   satisfaction, tolerable network delay…
   • adapt to your particular interface and context

 • Compare results to usability goals
 • Identify usability issues and resolve them
Overview of Doing Empirical User Studies

 • Develop materials
 • Prepare for the study

 • Conduct the study
 • Analyze results and iterate
 • Learn from the experience
Prepare for the Study

 • Identify usability goals
 • Develop experimental tasks and design

 • Recruit users
 • Instrument software/hardware
Identify Usability Goals

 • Identify questions you want answered
   • questions should be specific and measurable

 • Examples:
   • can a user perform each task in < 30s?
   • after only five minutes of instruction, can a user perform
     each task with < 2 errors?
   • are users rating the interface at least a „3‟ for overall
     satisfaction on a 5-point scale?
Develop Experimental Design

 • Structure of experiment
   • what will users do, in what order, where, etc.

 • Between groups (randomly assigned to treatment groups)
   • Control group
   • Experimental group
 • Within groups
   • Each user performs under all conditions
   • Order randomized
   • Cheaper because it uses fewer participants
Experimental Variables

 • What gets changed and what is its effect?
 • Independent variables

   • the variables you manipulate
     e.g. # of menu items, lighting conditions, mouse vs. keys
 • Dependent variables
   • measured part
     e.g. speed of menu choice, reaction time to stimuli
 • Variable type matters
   • discrete
Recruit Users

 • Typically want about 8 – 12 users
   • depends on desired confidence in the results

   • 12 is the magic number for the ANOVA test (more later)
 • This could be the most challenging aspect of the study
   • expect about a 0.1% to 10% response rate
   • may need IRB approval, especially if you want to publish
 • Give users a compelling reason to participate
Demographic Diversity

 • It is important to target your user population.
   • example: if you are developing for Firefox, make sure that
     you use people already familiar with Firefox.

 • Beyond that, it is also important to gain a diversity of different
   types of users:

   •   age
   •   sex
   •   education
   •   occupation
   •   ...
   • can tell you important things about your system, and help
     you generalize
Instrument Software/Hardware

 • Log performance and errors (if possible)
 • Determined media capture needs

   • ensure that you have access to equipment
   • manage physical layout of the testing space
 • Anything else that you need?
Conduct the Study

 • Give user an overview of the study
 • Introduce your system, allow for practice

 • Have users work through the tasks
 • Collect experimental measures (e.g., performance and error
 • Fill out questionnaire, if any
 • Debrief the user
 • Entire session should last less than 60 minutes
Tell the User At Least:

 • Purpose of the study, but not necessarily details of what you
   are testing
 • What they will be doing (the tasks)

 • They are not being tested, the interface/system is
 • They can quit at anytime and will not affect relationship with
   you, the university, the company, etc.
 • About the equipment in the room
 • Whether their face and/or actions will be recorded
 • How to think aloud (if you are collecting verbal data)
 • If you will or will not be available to answer questions
Make Users Feel Comfortable

 • Offer breaks at boundary points
 • Offer to send results in aggregate form or allows users to see
   improved interface

 • Develop understandable instructions
 • Do not “defend” your interface
 • Do not make subjective comments about users, ease or
   difficulty of tasks, etc.
Analyze Results and Iterate

 • Analyze data using statistical methods (ANOVAs and Chi-
   Squared tests common)
   • take a stats course, e.g., Stat 320, for more detail

   • did you meet the goals? How from the goals are you?
t-tests and ANOVAs

 • t-tests compare two random samples and determine if the
   samples are statistically significantly different
      • e.g., are dynamic menus better than static menus?

 • ANOVAs (analysis of variance) compare n random samples
   and determine if the samples are statistically significantly
      • e.g., which is best: dynamic, static or radial menus?

 • Both assume the samples come from normal distributions
   and both produce p-values.
Normal Distributions

   • Bell curve
   • y = exp(-x2)

   • Occurs from sum of
     independent events
                     2
     • e.g. sum of dice rolls
     • Total time = t-find + t-
       home + t-click

     • Total # of errors

 • probability value
 • The probability that the difference you observe in an
   experiment is due to random chance

 • An expression of the confidence of your result
 • Typically, a difference is called statistically significant when
   p < 0.05.
Partial eta-squared

 • Some ANOVAs produce partial eta-squared values in
   addition to p-values.
 • They are becoming widespread in HCI literature.

   • You may see them soon in a usability report.
 • Partial eta-squared values offer a practical measure of
Advantages of Empirical User Studies

 • Measure performance (time, error rate)
 • Measure user satisfaction

 • Give realistic experience of the interface
   • realistic system response
   • move among tasks seamlessly
   • designers not in control, the user is
 • Focus will be on the details
   • most big issues should already be resolved
Disadvantages of Empirical User Studies

 • Users typically must come to the lab
   • makes it more difficult to recruit them

   • users may have anxiety
 • Large setup effort involved
   • software instrumentation, hardware setup, questionnaire
     design, IRB approval, etc.
 • Prototype may crash
An Example of How This Gets Used in

 • “The Impact of Delayed Visual Feedback on Collaborative
   Performance” by Darren Gergle, presented at CHI 06.

 • What is the relationship between delayed visual feedback and
   collaboration? How much network delay can be tolerated?

   • e.g, architectural planning, telesurgery and remote repair
The Collaborative Puzzle Task

 • The experimental task was for a helper to guide a worker
   through a visual puzzle over a network connection
Independent Variables

 • Only one: visual delay in the helper‟s view window

 • Delay sampled from this distribution [60 - 3300ms]:
         • f(n) = Tn = Tn-1 * e.05 with T1 = 60
Dependent Variables

 • Only one: task performance time
 • Participants were asked to perform the puzzle task as quickly
   and accurately as possible.
Quantitative Analysis Using ANOVA

 • “For delays between 60ms and 939ms, we found no
   evidence to indicate any impact of delayed visual feedback
   on task performance (SE = (2.87), F1,610 = .028, p = .87).”
      • p > 0.05, so the samples are not significantly different

 • “However, for delay rates between 939ms and 1798ms there
   is a significant impact on task performance (F1,610 = 13.57, p
   < .001).”
      • Since p < 0.001, this result is highly significant
Graph of Delay vs. Performance

Shared By: