                                                                                                                                                     Kirk Allen, Andrea D. Stone, Maria Cohenour,
                                                                                                                                                  Teri Reed Rhoads, Teri J. Murphy, Robert A. Terry,
                                                                                                                                                   The University of Oklahoma, Norman, Oklahoma

                           Overall Goals                                                                                                                    Background                                         Publications                                                      Results
                                                                                                                                                                                                       All publications are available on the                     Average scores are typically low to mid-40% range
                                                                                                                                                                                                                                                                 for pre-tests and around 50% for post-tests. A
                                                                                                                                                  According to ABET’s EC 2000 criteria,                website:                                                  summary from Fall 2003 is shown below. Other
 How do we demonstrate that                                                                                                                       16 of 24 engineering disciplines directly                                                                      semesters are similar.
 graduates have an ability to design                                                                                                                                                                    
                                                                                                                                                  or indirectly mention probability and
 and conduct experiments, as well as                                                                                                              statistics within their accreditation                “The Statistics Concept Inventory” :                                                        Mean
                                                                                                                                                                                                                                                                     Course         Mean Pre                     Gain
 to analyze and interpret data in                                                                                                                                                                                                                                                                  Post
                                                                                                                                                  criteria.                                            ARTIST Roundtable Conference,
 combination with the additional                                                                                                                                                                       August 4, 2004.                                           Engr                42.2%         44.1%        +1.9%
 program criteria of applying and                                                                                                                                                                                                                                Math #1             43.4%           --            --
 understanding statistics?                                                                                                                                                                             “The Statistics Concept Inventory:
                                                                                                                                                  The Force Concept Inventory (FCI) in                                                                           Math #2             49.7%         49.1%         -0.6%
                                                                                                                                                                                                       Developing a Valid and Reliable
                       • Develop a multiple choice test                                                                                           physics has been instrumental in
                                                                                                                                                                                                       Instrument” : ASEE 2004 Conference,                       Outside #1          43.1%         49.5%        +6.4%
                       which attempts to answer this                                                                                              improving educational methods.
                                                                                                                                                                                                       Salt Lake City.
                       question – Statistics Concept                                                                                                                                                                                                             Outside #2a         48.1%         52.4%        +4.3%
                       Inventory (SCI)                                                                                                                                                                 “The Statistics Concept Inventory: A                      Outside #2b         46.1%         49.9%        +3.8%
                                                                                                                                                  Other concept inventories are being                  Pilot Study” : FIE 2003 Conference,
                       • Evaluate the reliability and                                                                                             developed in many engineering                        Colorado.
                       validity of the SCI according to                                                                                                                                                                                                          The lack of large gains is similar to early findings on
                                                                                                                                                  disciplines.                                                                                                   the Force Concept Inventory for classes which used
                       standards of test analysis                                                                                                                                                      “Progress on Concept Inventory
                                                                                                                                                                                                                                                                 traditional lecture format for teaching.
                                                                                                                                                                                                       Assessment Tools” : Panel Session,
                       • Disseminate the SCI to other                                                                                                                                                  FIE 2003 Conference.
                       universities and departments

                                                     Validity                                                                                                   Reliability                                Sample Question #1
 Content Validity (very important)
                 • Faculty survey at OU rated the importance                                                                                                                                           Which would be more likely to have 70% boys born on a given day: A small rural hospital or a large urban
                 of statistics topics. On-line survey for outside                                                                                 The most common measure of reliability is              hospital?
                 OU to be conducted soon.                                                                                                         coefficient alpha.                                          a) Rural
                 • Searched statistics textbooks and journals                                                                                         • Above 0.80 is an accepted standard for a              b) Urban
                 for common topics and misconceptions.                                                                                                reliable test                                           c) Equally likely
                 • Focus groups helped identify more                                                                                                                                                          d) Both are extremely unlikely
                 misconceptions and questions where                                                                                                   • Some sources may consider above 0.60
                 students use test-taking tricks.                                                                                                     reliable for classroom tests                                           Pre #1    Post #1     Pre #2     Post #2       Pre #3       Post #3
 Construct Validity (very important)                                                                                                              The largest testing effort thus far was Fall 2003.                                    17%                    32%                        23%
                                                                                                                                                  Data from six introductory statistics courses at           Choice a%        36%                   32%                       26%
                 • Factor analysis suggests that the sub-                                                                                         three four-year universities are shown below.                                        (-19%)                 (none)                     (-3%)
                 topics of the SCI are Descriptive, Inferential,
                 Probability, and Graphical.                                                                                                      Course – Fall 03 Pre-Test alpha Post-Test alpha            Choice b%         6%         7%         5%         3%             9%         10%
 Concurrent Validity                                                                                                                              Engr                  0.6863          0.7496               Choice c%        43%        69%        45%        45%            51%         63%
                                                                                                                                                  Math #1               0.7122             --
                 • Course grades used as a concurrent                                                                                                                                                        Choice d%        15%         7%        18%        19%            14%          3%
                 measure of the SCI post-test validity.                                                                                           Math #2               0.6715          0.7232
                                                                                                                                                                                                       Results from 3 classes, percent of students choosing each letter (Spring 2004).
                 • Generally, it has been valid for Engr                                                                                          External #1           0.7025          0.7314
                 courses but not Math courses.
                                                                                                                                                  External #2a          0.5709          0.6452         A is the correct answer. Change in percent correct provided in parenthesis.
 Predictive Validity
                                                                                                                                                  External #2b          0.6648          0.5843
                 • SCI pre-test scores have little value in                                                                                                                                            Misconception: do not realize the importance of sample size
                 determining final course grades.                                                                                                 Reliability numbers from other semesters are
                 • No long-term predictive validity available.                                                                                    very similar to the above chart.                     Discrimination index on the post test is 0.44, 0.27, and 0.50. So the question could be considered
                                                                                                                                                                                                          basically “good” psychometrically, as well as demonstrating the lack of knowledge gain.

                                                                                                                                                                                                        Sample Question #2
                        Item Response                                                                                                                     On-Line Test
                            Theory                                                                                                                An on-line testing system was developed in Fall
                                                                                                                                                                                                       Example of a question where students demonstrate gain on a topic that they definitely will cover but may
                                                                                                                                                                                                         not have been formally introduced to prior to a statistics class.
                                                                                                                                                  2004. Some of the features:
 General idea: For each item, a logistic curve is fit                                                                                                                                                  A scientist takes a set of 50 measurements. The standard deviation is reported as -2.30. Which of the
 which describes the probability of answering the                                                                                                     • Interface programmed with PHP.
                                                                                                                                                                                                          following must be true?
 item correctly as a function of item parameters                                                                                                      • Data contained in mySQL database                  a) Most of the measurements were negative
 (difficulty & discrimination) and a student’s latent
                                                                                                                                                                                                          b) All of the measurements less than the mean
 ability.                                                                                                                                             • Students are added to the system by an
                                                                                                                                                                                                          c) All of the measurements were negative
 From the SCI                                                                                                                                         administrator (e.g. instructor or TA) and then
                                                                                                                                                                                                          d) The standard deviation was calculated incorrectly
                                                                                                                                                      receive password via email.
 Slightly Difficult item                                                 Easy item
                                                                                                                                                      • Questions are presented in random order                              Pre #1     Post #1     Pre #2     Post #2        Pre #3       Post #3
 (strong discrimination)                                                 (weak discrimination)
                                                                                                                                                      to reduce likelihood of collaboration.                 Choice a%         9%         14%         8%          3%           26%           20%
                        Item Characteristic Curv e: XP2A                                        Item Characteristic Curv e: XG5                       • SCI contains four topic areas, and                   Choice b%        21%          0%        21%         10%           17%           23%
                               a = 2.210             b = 0.746                                         a = 0.584             b = -1.774
                 1.0                                                                     1.0                                                          instructors have the option of administering
                 0.8                                                                     0.8
                                                                                                                                                      only certain areas.                                    Choice c%         7%          3%         5%          0%            9%            3%
                                                                                                                                                  The system has been tested by a small group of                                          79%                    87%                         53%
                 0.6                                                                     0.6
                                                                                                                                                                                                             Choice d%        60%                    61%                       43%
  Probabil ity

                                                                          Probabil ity

                                                                                                                                                  students at the end of the Fall semester. More                                        (+19%)                 (+26%)                      (+10%)
                 0.4                                                                     0.4
                                                                                                                                                  extensive testing is planned for the Spring
                 0.2                                                                     0.2                                                      semester.                                            Discrimination index: 0.45, 0.27, 0.79
                                                      b                                                b

                  0                                                                       0
                   -3     -2         -1       0           1      2   3                     -3     -2         -1       0         1         2   3
                                           Ability                                                                 Ability
                                                                                                                                                                                                       Misconception: do not understand how standard deviation is calculated (i.e., it can never be negative)

