Docstoc
EXCLUSIVE OFFER FOR DOCSTOC USERS
Try the all-new QuickBooks Online for FREE.  No credit card required.

Integrating a Wide Variety of Student Information Sources to

Document Sample
Integrating a Wide Variety of Student Information Sources to Powered By Docstoc
					Integrating a Wide Variety of Student Information
Sources to Support Institutional e-Learning Decisions: A
Stellenbosch University Case Study
Antoinette van der Merwe and Liezl van Dyk
University of Stellenbosch, South Africa
advdm@sun.ac.za
lvd@sun.ac.za
Abstract: It is common knowledge that the amount of information accessible to people has increased
exponentially. The problem is no longer the amount of information, but how to utilize the information effectively
to support decision-making. Specifically with regards to e-learning, Higher Education institutions continually
need to make decisions with respect to the development of infrastructure, processes and resources. These
decisions sometimes come at a considerable financial cost and hence need to be made carefully. In this
paper, Stellenbosch University is presented as case study of how information can be utilized to support
decisions with respect to the development of infrastructure, processes and resources towards the
advancement of e-learning. The information included in the study is captured at different instances for
different purposes:

Feedback about the computer literacy of first-year students extracted from the ALPHA baseline questionnaire,
completed by all first-year students during their first week on campus;
Tracking data extracted from the learning management system (LMS);
Demographic data taken from the student information system;

Feedback from a custom-made questionnaire administered at the end of 2007 with regards to the student
experience of the e-learning systems, technology, processes and resources on campus.

This paper will focus on the following: (1) A discussion of the conceptual framework of the study; (2) The
design of a data warehouse from which decision support information is extracted; and (3) A discussion of
valuable decision support provided by this information.

Keywords: e-learning, information, data warehouse, institutional decision-making

1. Introduction
It is common knowledge that the amount of information accessible to people has increased
exponentially. The problem is no longer the amount of information, but how to utilize the
information effectively to support decision-making. Specifically with regards to e-learning, Higher
Education institutions continually need to make decisions with respect to the development of
infrastructure, processes and resources. These decisions sometimes come at a considerable
financial cost and hence need to be made carefully.

Stellenbosch University is no exception in this regard. In this paper, Stellenbosch University is
presented as case study of how 2007 data about specifically first-year students can be utilized to
support decisions with respect to the development of infrastructure, processes and resources
towards the advancement of e-learning and to support these students more effectively as part of
the First-year Academy initiative. The information included in the study is captured at different
instances for different purposes:
ƒ Feedback about the computer literacy of first-year students extracted from the Alpha Baseline
    Questionnaire, completed by all first-year students during their first week on campus
    (January/February 2007);
ƒ   Tracking data extracted from the learning management system (LMS) focusing on large first-
    year modules (2007-data);
ƒ   Demographic and academic data taken from the student information system (2007-data
    captured during registration of students);
ƒ   Feedback from a custom-made questionnaire administered at the end of 2007 with regards to
    the student experience of the e-learning systems, technology, processes and resources on
    campus (October 2007).




                                                     491
Antoinette van der Merwe and Liezl van Dyk

The First-year Academy was implemented in 2007 and focuses on the success of all first-year
students taken a holistic view of student success and support.
This paper will focus on the following:
ƒ A discussion of the conceptual framework / context of the study that includes the institutional
    initiatives that can be identified as possible strategic drivers
ƒ   The design of a “proof of concept” data warehouse from which decision support information is
    extracted including the model and data sources used
ƒ   A discussion of valuable decision support provided by this information to inform decision-
    making related to the support of first-year students, planning about infrastructure, and advising
    lecturers with regards to e-learning activities.
The concluding comments focus on some issues that still need to be addressed for this “proof of
concept” to deliver the desired results.

2. Conceptual framework (Context)
Strategic initiatives such as the First-year Academy as well as the e-learning activities at
Stellenbosch University increasingly increase the need for integrating the different data sources to
assist decision-making on infrastructure for students as well as the type of support the students
and lecturers need.

2.1 Motivation for focusing on first-year students
This paper started out as focusing on the e-learning needs of all students, but we realised that we
have a wealth of information about specifically first-year students. Although we are only focusing
on first-year data, we believe that if we are able to cater for these students’ needs, we would also
be able to cater for the more senior students. The first-year group is traditionally the most
vulnerable to fail, least computer literate and they least well-informed of the student year groups.
By focusing on their specific issues, we are not only addressing their immediate needs but also
making an investment in their subsequent studies by ensuring that they get the relevant
information and support to enable them to continue with their studies.

The focus on first-year students should further be placed within the context of the First-year
Academy that was fully implemented in 2007. The First-year Academy is the coordination of a
range of activities focused on first-year success by taking a holistic view of what this success
entails. The initiative takes into account the wide range of in- and out-of-class activities that might
have an influence on the success of first-year students. Research has shown that the first-year
students are the most vulnerable group at a higher education institution and timely and adequate
support can increase their chances of success dramatically. To provide support, however, accurate
information about their activities and progress need to be provided. As already mentioned, there is
certainly not a lack of information available on the first-year students. The challenge is rather to
integrate all these data sources to provide meaningful information to support decision-making.

2.2 e-Learning issues
Stellenbosch University has been using WebCT (now Blackboard) since 1999. The Centre for
Teaching and Learning (CTL) provides the technical and educational support and training. The
Information Technology Division provides the technical infrastructure (maintenance of server and
software). The students can access computers in the faculty based computer user areas or from
their residence rooms where they have network points.

There has been an exponential increase in the number of WebCT modules over the past ten years.
Lecturers are also using WebCT more and more for assessment as part of the early assessment
requirement of the First-year Academy. According to this requirement, a mark needs to be loaded
for all first-year modules after the first six weeks of classes. Because of the large first-year classes,
lecturers are opting more and more to use e-assessment.

This increase in especially e-assessment, is putting more and more strain on the computer
infrastructure available to students. Although the computer:student ratio is on average 1:10 in the
computer user areas with the areas open 24 hours a day, seven days a week, the perception does




                                                    492
                                                      Antoinette van der Merwe and Liezl van Dyk

exist that this is no longer adequate. With the increase in e-assessment, lecturers also need more
dedicated computer classrooms between 8h00 and 16h30.

Increasingly questions are asked regarding infrastructure for students as well as how to advise
lecturers with regard to e-assessment. WebCT collects valuable information with regards to what
types of tools students are using as well as the time they spend online. The CTL also did a survey
with questions about both the infrastructure as well as the students’ technology use in 2007.

The University is also conscious of the fact that students are increasingly coming to University with
expectations that technology will be used in their learning activities. It is however also true that
many students from disadvantaged backgrounds did not have access to computers at school. To
assess what percentage of students would need extra help in terms of computer training as well as
to advise lecturers on the types of technologies students use, questions were included in the e-
learning survey and Alpha Baseline Questionnaire with regards to computer ownership, length of
computer use as well as the types of applications students frequently use.

2.3 Data warehouse
Business intelligence entails the gathering of data from internal and external data sources, as well
as the storing and analysis thereof to make it measurable, so as to assist and sustain more
efficient and longitudinal decision-making (Kimball, 2002 and Imnon et al., 2001). The business
intelligence approach was followed in the design of a data warehouse concept to provide decision
support within the conceptual framework described in the previous section.

2.4 Design of data warehouse concept
Figure 1 is the roadmap for the design of this data warehouse and is based on the data warehouse
approach by Kimball and Ross (2002). Data that already exists within information system sources
are extracted, transformed and loaded (ETL) into a data warehouse that consists out of one or
more data marts. From this data warehouse ad hoc queries and longitudinal business measures
can be drawn as needed.

2.5 Data mart
A data mart is described in the DMReview Magazine Glossary (2008) as “a subset of an
organization data warehouse that is usually orientated to a specific purpose of major data subject.”
According to the EDUCAUSE Higher Education data warehouse directory (Heise, 2007) Higher
Education institutions typically draw data from data marts such as Alumni, Prospective students,
modules and facilities. In the case of Alumni and Prospective Students the decision support
provided is typically in line with customer relationship management (CRM) principles. Some
institutions use for example information about prospective students to provide targeted career
advice.

Within the conceptual framework provided as introduction to this paper, it is clear that the primary
data subject is the student. Valuable data about prospective students is also available within this
data mart. Once the prospective student becomes alumni, his data can be transferred to the
Alumni data mart, if the appropriate data structures exist. The University takes a holistic view of the
student and the support of the student life-cycle from potential prospective to prospective, to first-
year, final year and alumnus. The data about these different stages of the student life-cycle could
potentially also be integrated at a later stage.

2.6 Data sources
The data sources from which data are drawn for purposes of this data warehouse, are indicated on
the left hand side of Figure 1 as the Student Information System (SIS), the LMS Tracking Data as
well as data from two student surveys.

2.6.1 Student information system (SIS)
A SIS is defined by Gartner’s e-learning glossary (Lundy, et. al) "...the system used to enroll and
register students, track curricula, courses and students. Transcripts, administrative details of
courses taken, progress towards a degree and grades for evaluative information are typically



                                                493
Antoinette van der Merwe and Liezl van Dyk

included." For purposes of this case study, student information from 4 525 first-year students of
2007 is sourced to provide historical context (e.g. secondary school, computer usage at school and
race) as well as current context and academic data (programme enrolled for, distance from main
campus, type of accommodation, current access to accommodation and aggregate results).

2.6.2 LMS tracking data
Each time a lecturer or student logs into a Learning Management System (LMS), participates in
online discussions, completes an electronic quiz or reads an electronic document, an electronic
transaction is performed. With each transaction performed, data is captured by the LMS. As a
result a significant amount of data is created, which is most often only archived for record keeping
purposes and not used to support decision-making (Conradie and Van Dyk, 2007). For this study,
an effort is made to aggregate this data into a format so that it can be associated in a useful way
with other sources. Ten first-year modules were selected for this purpose. These modules are
representative across faculties and are modules with fairly large numbers of students. These
modules have a total of 10 246 LMS seats. A total of 4 073 first year students are associated with
one or more of these seats.
Figure 1: The business intelligence framework for this context

                                                                                                              Ad-Hoc Analysis (Strategic)
                                                                                                                 Infrastructure Planning
                                                                                                              Student Support and Training

                                                                                                               Ad-Hoc Analysis (Tactical)
       “Manual” Input
                                                                                                             Faculty / Departmental Planning
       E-learning survey (N=1224)
                                                                                                           Early identification of students at risk
       Alpha Baseline (N=3256)
                                                             Data Marts
                                                                                                            Ad-Hoc Analysis (Operational)
                                                                              Facilirties                           Case specific
                                          Module
                                         Data Mart       (Prospective)
  LMS (N=4073)                                           Student Data
  Tracking Data                                              Mart                                                 Standard measures
                                                                               Alumni
                                                                                                                      Data mart
                            Staging      Student                              Data Mart
                          Conversion   Demographics     Examples of Data
  SIS (N=4525)             (Extract-                  Warehouse dimensions
  Demographics            Transform-     LMS tools                              Lecturer
     Results                 Load)                    Modules     Results     Demographics
                                                                                                        Longitutional Measures
        Other                                                                                           E-Learning adoption rates
(E.g. ERP: Finance;                                                                                         E-learning trends
  Library Records)                                                                                     Demands on IT infrastructure


                                                                          Meta Data Repository
                                                                  (Definition of Sources, formulas,etc.)




2.6.3 Manual input
An e-learning impact survey was distributed to all students registered at Stellenbosch University in
November 2007. In this survey students were asked about their perceived computer competency,
computer usage patterns in terms of time of day, place and type of activity as well as their
experience concerning the use of technology to facilitate learning. A total of 1 254 responses were
received.

The Alpha Baseline Questionnaire is completed by all first-year students during their first week at
University. This Questionnaire is aimed at gathering data about the students’ needs, uncertainties
and perceptions about studying at the University before they start attending lectures.

2.6.4 Longitudinal data
This Stellenbosch University case study is currently only based on 2007 data. However, the
warehouse is designed in such a way as to make provision for the capturing of the same data for
2008 and subsequent years. In some instances longitudinal data can be traced back a few years
and added to the data warehouse. The Alpha baseline questionnaire was used, for example, for
the first time in 2003. In 2005 it was for the first time administered via WebCT during the orientation
week and a marked increase in the response rate can be seen in Figure 2 below.




                                                                494
                                                      Antoinette van der Merwe and Liezl van Dyk


                       100

                                                      76.12           76.36
                         80                   73.37           72.71

                         60           53.04

                         40   33.54


                         20


                          0
                              2003    2004    2005    2006    2007    2008


Figure 2: Alpha baseline questionnaire response rates (2003-2008)
Furthermore, the questions asked for the e-learning impact study conducted in 2003 was used as
point of departure for the design of the survey that was distributed in 2007 to enable longitudinal
comparisons as far as possible. The current LMS was piloted in 2005. Hence the 2005 (for the pilot
modules) and the 2006 tracking data are available in the same format as the 2007 data. This will
also be the case for the 2008 data.

3. Extract, transform and load (ETL)
ETL is a frequently used acronym to refer to the process of extracting data from the respective
data sources, cleaning (transforming) the data into an appropriate format and loading the data into
a data warehouse. Advances in technology and standardization make the routine ETL of data from
LMSs more possible. About 90% of the HEIs on the data warehouse directory (Heise, 2007)
indicated that they make use of an Oracle database. Stellenbosch University’s LMS (Blackboard
Vista 4) is based on ORACLE technology. The “PowerSight Kit” that is available as part of this
LMS is a database that provides access to the finest detail of module, user and tracking data.

MS Access is used as database for purposes of this prove of concept. A screenshot of the design
of the (Prospective) Student Data Mart are shown in Figure 3. MS Access is definitely not a
feasible option for a full scale data warehouse. For purposes of this case study only the tracking
data of the 11 largest first year modules were used. This alone amounted to more that 2 million
tracking transaction lines, which had to be aggregated with effort outside MS Access, before it
could be loaded into the database. Ideally, the data warehouse described in this section could be
build up and held together with SQL-code within this database.

The other data sources (SIS data, Alpha Baseline Questionnaire and e-learning survey data) are
simply extracted as flat files from their original data sources and loaded into the Access Database
where it is associated (through the student number) with other data sources.

3.1 Analyses to support decision-making
The analyses required from the data warehouse (right hand side of Figure 1) dictate the design of
the data marts as does the data sources that feed into the data marts (left hand side of Figure 1).
The following are typical decisions faced by institutions such as Stellenbosch University: ad hoc
strategic, tactical and operational as well as standard longitudinal impact measurement.

Examples of ad hoc strategic decisions include infrastructure planning in terms of institutional
provision of computer infrastructure in the faculty-based computer user areas. In this regard the
infrastructure index that can be generated from the data (see section 4.2 below) can provide
valuable information when correlated per faculty-based computer user area and even the student’s
place of residence. The question is often asked whether more computers should be provided in
central computer user areas or to decentralise the provision of computers more. Similarly, the
WebCT and computer literacy indices (see section 4.2 below) can be correlated with faculty and



                                               495
Antoinette van der Merwe and Liezl van Dyk

other demographic variables, to gauge what type of faculty-specific support first-year students
need with regard to general computer and WebCT literacy. Students entering University are often
referred to as “digital natives”, often ignoring the needs of the students who have never used a
computer in learning activities. Based on this type of analysis, information on how to provide
support to (at-risk) first-year students with regards to e-learning can then be fed back to the
faculties and departments via the Teaching and Learning coordination points established within
each faculty as part of the First-year Academy.




Figure 3: A screenshot of the (Prospective) Student Data Mart
Examples of ad hoc tactical and operational decisions include how the Centre for Teaching and
Learning advises lecturers on the use of e-learning based on the analysis of a combination of the
types of activities students engage in on the LMS (hits, time spent etc.) as well as the types of non-
LMS activities (e.g. use of blogs and wikis). This type of ad hoc decision support can in many
cases be provided without a formal data warehouse. However, the value of a data warehouse
approach for ad hoc querying lies in the ability to include disparate data sources as well as the
efficiency of this approach. Once data are cleaned and integrated it is much easier to use. A
business intelligence approach towards the building of a data warehouse becomes an absolute
prerequisite when the purpose is to do longitudinal impact measurement. Ideally, a Standard
Business Measure Data Mart should also be part of the data warehouse design to enable
longitudinal comparisons. To enable comparisons across years, the data and measurements from
year to year must be in the same format and based on the same assumptions. Typical questions
that could be asked at Stellenbosch University for a specific time period could focus on e-learning
adoption rates, e-learning trends as well as how these adoption rates and trends affect the
demands on IT infrastructure.

4. Application
A great portion of this paper is devoted to a discussion of the context and design of a data
warehouse framework to leverage existing data to support decision-making towards the increase of
student success. In this section the application of this data warehouse concept is demonstrated
through some examples.

4.1 Correlation between different student attributes
The correlation matrix (Table 1) includes attributes that were drawn from the SIS (Age, Mid-Year
Results, Aggregate School Score), the Alpha Baseline Questionnaire (“On a scale of 1-7 indicate
the extent to which you were challenged by school exams”) as well as the LMS Hit counter




                                                  496
                                                                              Antoinette van der Merwe and Liezl van Dyk

(Announcement, Assessment, Assignment, Discussion, My-Grades, Consistency of Hits, Total
LMS Time and Total LMS Hits). Upon measuring pair-wise the Spearmen-Pearson correlation
coefficient, the possibility that this correlation is significant is indicated by the p-value. When
p<0.05 is generally assumed that the statistical correlation between two attributes are significant.
Table 1: Correlation Matrix (N=573)
                             Age (days) Mid-Year    Year      School    Extent   LMS Hits: LMS Hits:   LMS Hits: LMS Hits:  Consis-   Total LMS Total LMS
                                        Results    Weight     Score     exams Announce- Assess-         Assign- Discussion tency of     Time       Hits
                                                                       challenge   ment      ment       ments              LMS Hits
                                                                          you
Age (days)
Mid-Year Results
Year Weight                             p<0.05
School Score                  p<0.05    p<0.05     p<0.05
Extent exams challenge you                                    p<0.05
LMS Hits: Announcement        p<0.05    p<0.05     p<0.05
LMS Hits: Assessment                    p<0.05     p<0.05     p<0.05              p<0.05
LMS Hits: Assignments                              p<0.05                         p<0.05
LMS Hits: Discussion                    p<0.05                p<0.05              p<0.05               p<0.05
Consistency of LMS Hits                 p<0.05     p<0.05     p<0.05              p<0.05    p<0.05     p<0.05    p<0.05
Total LMS Time                p<0.05    p<0.05     p<0.05     p<0.05                        p<0.05     p<0.05    p<0.05    p<0.05
Total LMS Hits                          p<0.05                                    p<0.05    p<0.05     p<0.05    p<0.05    p<0.05     p<0.05

Some interesting conclusions can be drawn from Table 1. It is important to note that these
conclusions may be case specific, since the tracking data of only a limited number of modules is
included. A full scale data warehouse concept should enable case specific, faculty specific as well
as institution specific data. Taking this into account, the following conclusions can be made from
the available data analysed in Table 1:
ƒ The older a student the more time is spent on the LMS. However, age does not correlate
     significantly with the total number of hits or hits consistency.
ƒ     The number of times students use assignments correlates significantly with the Year Weight
      (portion of modules passed), but not with the Mid-Year Results.
ƒ     The total LMS time correlates significantly with Mid-Year Results, Year Weight as well as
      School Score, whilst LMS hits correlate significantly with Mid-Year Results only.

4.2 Use of Indices
Other interesting results can be obtained by combining the different variables to construct indices
in SPSS. Table 2 gives an example of the computer, WebCT and Infrastructure indices that can be
constructed as well as examples of the variables that can be included in each index. These
variables are taken from three of the four main data sources: The Alpha Baseline questionnaire,
the student survey and the LMS tracking data. Once constructed these indices can then be
correlated with demographic variables such as age groups, race, gender and faculty and the
academic variables such as Academic programme, June university average, % credits passed and
M-count taken from the SIS data. The results of these correlations can provide valuable
information both in terms of ad hoc, strategic, tactical and operational decisions with regards to
student training and support as well as computer infrastructure provision on faculty level (see
section 3.5 above).
Table 2: Possible Indices
      Computer literacy Index                               WebCT literacy Index                     Infrastructure Index
      Length of computer use                                Length of LMS use                        Where do you access
      Rating of computer skills                             Rating of LMS use                        computers?
      Own computer                                          Hits consistency                         How many hours do you
      Enjoy working with computers                          Total time spent on LMS                  spend in CUA (e-mail, Internet,
      Computer can help me                                  Total LMS hits                           class notes, assessments,
      academically                                          How often do you use these               assignments etc.)
      How often do you use these                            computer applications
      computer applications (e-mail,                        (WebSTudies)
      Internet, Turnitin, Microsoft Word,
      Excel, Powerpoint, blogs, Wikis
      etc.)




                                                                       497
Antoinette van der Merwe and Liezl van Dyk

5. Concluding remarks
The design of a data warehouse concept to support institutional decision support concerning first-
year success was proposed in this paper. To prove the concept, data of the 2007 first-year
students of Stellenbosch University was extracted from the SIS, LMS tracking sources as well as
two questionnaires. It was then transformed and loaded into a data warehouse. The purpose of a
data warehouse is not to provide the answers before it is asked (although it can provide useful
sources for data mining), but rather to have data available in such a way as to enable the
extraction of useful information once the questions are asked. The purpose of the paper was not to
answer specific questions, but rather to prove the concept of the data warehouse. For this purpose,
interesting information drawn from this warehouse was presented and the usefulness thereof to
support decision-making discussed.Further plans include:
ƒ Integrating early assessment data of first-year students. Lecturers have to load a mark for all
     first-year modules within the first six weeks of the first semester.
ƒ   Expanding and refining the LMS tracking data.
ƒ   Using the data warehouse for longitudinal studies.
ƒ   Integrating data from the Alpha progress questionnaire that all first-year students complete at
    the end of their first-year.
ƒ   Further analyses of the data in SPSS, specifically using indices.
On a more technical level, a consideration is also to consider other types of technologies and,
more specifically, an ORACLE database to allow for scalability. Most importantly though, we feel
that we have a proof of concept that we would like to share with a wider audience at Stellenbosch
University to determine what types of questions need to be asked. As alluded to in this paper, the
three questions that possibly need urgent attention are issues around infrastructure, training and
support of first-year students, and advising lecturers with regards to teaching and learning
activities.

References
Conradie, P.J. and Van Dyk, L. (2007). “Creating Business Intelligence from Course Management Systems”,
         Campus-Wide Information Systems, Vol 24, Issue 2, pp. 120 – 133.
DM Review Magazine Glossary. “Data Management Review”, SourceMedia, Inc. 2008
Heise, D. (2007). “EDUCAUSE Decision Support Data Warehouse Constituent Group”. Downloaded from
         dheise.andrews.edu/dw/DWData.html on [26 March 2007].
Kimball, R. and Ross, M. (2002). The data warehouse toolkit. John Wiley and Sons, 2nd edition. ISBN 0-471-
         20042-7.
Lundy, J., Harris, K., Igou, B., and Zastrocky, M. (2002). “Gartner's e-learning glossary”. Research Note M-14-
         9025, Gartner Research.




                                                       498

				
DOCUMENT INFO