VIEWS: 7 PAGES: 8 POSTED ON: 2/17/2011
Integrating a Wide Variety of Student Information Sources to Support Institutional e-Learning Decisions: A Stellenbosch University Case Study Antoinette van der Merwe and Liezl van Dyk University of Stellenbosch, South Africa email@example.com firstname.lastname@example.org Abstract: It is common knowledge that the amount of information accessible to people has increased exponentially. The problem is no longer the amount of information, but how to utilize the information effectively to support decision-making. Specifically with regards to e-learning, Higher Education institutions continually need to make decisions with respect to the development of infrastructure, processes and resources. These decisions sometimes come at a considerable financial cost and hence need to be made carefully. In this paper, Stellenbosch University is presented as case study of how information can be utilized to support decisions with respect to the development of infrastructure, processes and resources towards the advancement of e-learning. The information included in the study is captured at different instances for different purposes: Feedback about the computer literacy of first-year students extracted from the ALPHA baseline questionnaire, completed by all first-year students during their first week on campus; Tracking data extracted from the learning management system (LMS); Demographic data taken from the student information system; Feedback from a custom-made questionnaire administered at the end of 2007 with regards to the student experience of the e-learning systems, technology, processes and resources on campus. This paper will focus on the following: (1) A discussion of the conceptual framework of the study; (2) The design of a data warehouse from which decision support information is extracted; and (3) A discussion of valuable decision support provided by this information. Keywords: e-learning, information, data warehouse, institutional decision-making 1. Introduction It is common knowledge that the amount of information accessible to people has increased exponentially. The problem is no longer the amount of information, but how to utilize the information effectively to support decision-making. Specifically with regards to e-learning, Higher Education institutions continually need to make decisions with respect to the development of infrastructure, processes and resources. These decisions sometimes come at a considerable financial cost and hence need to be made carefully. Stellenbosch University is no exception in this regard. In this paper, Stellenbosch University is presented as case study of how 2007 data about specifically first-year students can be utilized to support decisions with respect to the development of infrastructure, processes and resources towards the advancement of e-learning and to support these students more effectively as part of the First-year Academy initiative. The information included in the study is captured at different instances for different purposes: Feedback about the computer literacy of first-year students extracted from the Alpha Baseline Questionnaire, completed by all first-year students during their first week on campus (January/February 2007); Tracking data extracted from the learning management system (LMS) focusing on large first- year modules (2007-data); Demographic and academic data taken from the student information system (2007-data captured during registration of students); Feedback from a custom-made questionnaire administered at the end of 2007 with regards to the student experience of the e-learning systems, technology, processes and resources on campus (October 2007). 491 Antoinette van der Merwe and Liezl van Dyk The First-year Academy was implemented in 2007 and focuses on the success of all first-year students taken a holistic view of student success and support. This paper will focus on the following: A discussion of the conceptual framework / context of the study that includes the institutional initiatives that can be identified as possible strategic drivers The design of a “proof of concept” data warehouse from which decision support information is extracted including the model and data sources used A discussion of valuable decision support provided by this information to inform decision- making related to the support of first-year students, planning about infrastructure, and advising lecturers with regards to e-learning activities. The concluding comments focus on some issues that still need to be addressed for this “proof of concept” to deliver the desired results. 2. Conceptual framework (Context) Strategic initiatives such as the First-year Academy as well as the e-learning activities at Stellenbosch University increasingly increase the need for integrating the different data sources to assist decision-making on infrastructure for students as well as the type of support the students and lecturers need. 2.1 Motivation for focusing on first-year students This paper started out as focusing on the e-learning needs of all students, but we realised that we have a wealth of information about specifically first-year students. Although we are only focusing on first-year data, we believe that if we are able to cater for these students’ needs, we would also be able to cater for the more senior students. The first-year group is traditionally the most vulnerable to fail, least computer literate and they least well-informed of the student year groups. By focusing on their specific issues, we are not only addressing their immediate needs but also making an investment in their subsequent studies by ensuring that they get the relevant information and support to enable them to continue with their studies. The focus on first-year students should further be placed within the context of the First-year Academy that was fully implemented in 2007. The First-year Academy is the coordination of a range of activities focused on first-year success by taking a holistic view of what this success entails. The initiative takes into account the wide range of in- and out-of-class activities that might have an influence on the success of first-year students. Research has shown that the first-year students are the most vulnerable group at a higher education institution and timely and adequate support can increase their chances of success dramatically. To provide support, however, accurate information about their activities and progress need to be provided. As already mentioned, there is certainly not a lack of information available on the first-year students. The challenge is rather to integrate all these data sources to provide meaningful information to support decision-making. 2.2 e-Learning issues Stellenbosch University has been using WebCT (now Blackboard) since 1999. The Centre for Teaching and Learning (CTL) provides the technical and educational support and training. The Information Technology Division provides the technical infrastructure (maintenance of server and software). The students can access computers in the faculty based computer user areas or from their residence rooms where they have network points. There has been an exponential increase in the number of WebCT modules over the past ten years. Lecturers are also using WebCT more and more for assessment as part of the early assessment requirement of the First-year Academy. According to this requirement, a mark needs to be loaded for all first-year modules after the first six weeks of classes. Because of the large first-year classes, lecturers are opting more and more to use e-assessment. This increase in especially e-assessment, is putting more and more strain on the computer infrastructure available to students. Although the computer:student ratio is on average 1:10 in the computer user areas with the areas open 24 hours a day, seven days a week, the perception does 492 Antoinette van der Merwe and Liezl van Dyk exist that this is no longer adequate. With the increase in e-assessment, lecturers also need more dedicated computer classrooms between 8h00 and 16h30. Increasingly questions are asked regarding infrastructure for students as well as how to advise lecturers with regard to e-assessment. WebCT collects valuable information with regards to what types of tools students are using as well as the time they spend online. The CTL also did a survey with questions about both the infrastructure as well as the students’ technology use in 2007. The University is also conscious of the fact that students are increasingly coming to University with expectations that technology will be used in their learning activities. It is however also true that many students from disadvantaged backgrounds did not have access to computers at school. To assess what percentage of students would need extra help in terms of computer training as well as to advise lecturers on the types of technologies students use, questions were included in the e- learning survey and Alpha Baseline Questionnaire with regards to computer ownership, length of computer use as well as the types of applications students frequently use. 2.3 Data warehouse Business intelligence entails the gathering of data from internal and external data sources, as well as the storing and analysis thereof to make it measurable, so as to assist and sustain more efficient and longitudinal decision-making (Kimball, 2002 and Imnon et al., 2001). The business intelligence approach was followed in the design of a data warehouse concept to provide decision support within the conceptual framework described in the previous section. 2.4 Design of data warehouse concept Figure 1 is the roadmap for the design of this data warehouse and is based on the data warehouse approach by Kimball and Ross (2002). Data that already exists within information system sources are extracted, transformed and loaded (ETL) into a data warehouse that consists out of one or more data marts. From this data warehouse ad hoc queries and longitudinal business measures can be drawn as needed. 2.5 Data mart A data mart is described in the DMReview Magazine Glossary (2008) as “a subset of an organization data warehouse that is usually orientated to a specific purpose of major data subject.” According to the EDUCAUSE Higher Education data warehouse directory (Heise, 2007) Higher Education institutions typically draw data from data marts such as Alumni, Prospective students, modules and facilities. In the case of Alumni and Prospective Students the decision support provided is typically in line with customer relationship management (CRM) principles. Some institutions use for example information about prospective students to provide targeted career advice. Within the conceptual framework provided as introduction to this paper, it is clear that the primary data subject is the student. Valuable data about prospective students is also available within this data mart. Once the prospective student becomes alumni, his data can be transferred to the Alumni data mart, if the appropriate data structures exist. The University takes a holistic view of the student and the support of the student life-cycle from potential prospective to prospective, to first- year, final year and alumnus. The data about these different stages of the student life-cycle could potentially also be integrated at a later stage. 2.6 Data sources The data sources from which data are drawn for purposes of this data warehouse, are indicated on the left hand side of Figure 1 as the Student Information System (SIS), the LMS Tracking Data as well as data from two student surveys. 2.6.1 Student information system (SIS) A SIS is defined by Gartner’s e-learning glossary (Lundy, et. al) "...the system used to enroll and register students, track curricula, courses and students. Transcripts, administrative details of courses taken, progress towards a degree and grades for evaluative information are typically 493 Antoinette van der Merwe and Liezl van Dyk included." For purposes of this case study, student information from 4 525 first-year students of 2007 is sourced to provide historical context (e.g. secondary school, computer usage at school and race) as well as current context and academic data (programme enrolled for, distance from main campus, type of accommodation, current access to accommodation and aggregate results). 2.6.2 LMS tracking data Each time a lecturer or student logs into a Learning Management System (LMS), participates in online discussions, completes an electronic quiz or reads an electronic document, an electronic transaction is performed. With each transaction performed, data is captured by the LMS. As a result a significant amount of data is created, which is most often only archived for record keeping purposes and not used to support decision-making (Conradie and Van Dyk, 2007). For this study, an effort is made to aggregate this data into a format so that it can be associated in a useful way with other sources. Ten first-year modules were selected for this purpose. These modules are representative across faculties and are modules with fairly large numbers of students. These modules have a total of 10 246 LMS seats. A total of 4 073 first year students are associated with one or more of these seats. Figure 1: The business intelligence framework for this context Ad-Hoc Analysis (Strategic) Infrastructure Planning Student Support and Training Ad-Hoc Analysis (Tactical) “Manual” Input Faculty / Departmental Planning E-learning survey (N=1224) Early identification of students at risk Alpha Baseline (N=3256) Data Marts Ad-Hoc Analysis (Operational) Facilirties Case specific Module Data Mart (Prospective) LMS (N=4073) Student Data Tracking Data Mart Standard measures Alumni Data mart Staging Student Data Mart Conversion Demographics Examples of Data SIS (N=4525) (Extract- Warehouse dimensions Demographics Transform- LMS tools Lecturer Results Load) Modules Results Demographics Longitutional Measures Other E-Learning adoption rates (E.g. ERP: Finance; E-learning trends Library Records) Demands on IT infrastructure Meta Data Repository (Definition of Sources, formulas,etc.) 2.6.3 Manual input An e-learning impact survey was distributed to all students registered at Stellenbosch University in November 2007. In this survey students were asked about their perceived computer competency, computer usage patterns in terms of time of day, place and type of activity as well as their experience concerning the use of technology to facilitate learning. A total of 1 254 responses were received. The Alpha Baseline Questionnaire is completed by all first-year students during their first week at University. This Questionnaire is aimed at gathering data about the students’ needs, uncertainties and perceptions about studying at the University before they start attending lectures. 2.6.4 Longitudinal data This Stellenbosch University case study is currently only based on 2007 data. However, the warehouse is designed in such a way as to make provision for the capturing of the same data for 2008 and subsequent years. In some instances longitudinal data can be traced back a few years and added to the data warehouse. The Alpha baseline questionnaire was used, for example, for the first time in 2003. In 2005 it was for the first time administered via WebCT during the orientation week and a marked increase in the response rate can be seen in Figure 2 below. 494 Antoinette van der Merwe and Liezl van Dyk 100 76.12 76.36 80 73.37 72.71 60 53.04 40 33.54 20 0 2003 2004 2005 2006 2007 2008 Figure 2: Alpha baseline questionnaire response rates (2003-2008) Furthermore, the questions asked for the e-learning impact study conducted in 2003 was used as point of departure for the design of the survey that was distributed in 2007 to enable longitudinal comparisons as far as possible. The current LMS was piloted in 2005. Hence the 2005 (for the pilot modules) and the 2006 tracking data are available in the same format as the 2007 data. This will also be the case for the 2008 data. 3. Extract, transform and load (ETL) ETL is a frequently used acronym to refer to the process of extracting data from the respective data sources, cleaning (transforming) the data into an appropriate format and loading the data into a data warehouse. Advances in technology and standardization make the routine ETL of data from LMSs more possible. About 90% of the HEIs on the data warehouse directory (Heise, 2007) indicated that they make use of an Oracle database. Stellenbosch University’s LMS (Blackboard Vista 4) is based on ORACLE technology. The “PowerSight Kit” that is available as part of this LMS is a database that provides access to the finest detail of module, user and tracking data. MS Access is used as database for purposes of this prove of concept. A screenshot of the design of the (Prospective) Student Data Mart are shown in Figure 3. MS Access is definitely not a feasible option for a full scale data warehouse. For purposes of this case study only the tracking data of the 11 largest first year modules were used. This alone amounted to more that 2 million tracking transaction lines, which had to be aggregated with effort outside MS Access, before it could be loaded into the database. Ideally, the data warehouse described in this section could be build up and held together with SQL-code within this database. The other data sources (SIS data, Alpha Baseline Questionnaire and e-learning survey data) are simply extracted as flat files from their original data sources and loaded into the Access Database where it is associated (through the student number) with other data sources. 3.1 Analyses to support decision-making The analyses required from the data warehouse (right hand side of Figure 1) dictate the design of the data marts as does the data sources that feed into the data marts (left hand side of Figure 1). The following are typical decisions faced by institutions such as Stellenbosch University: ad hoc strategic, tactical and operational as well as standard longitudinal impact measurement. Examples of ad hoc strategic decisions include infrastructure planning in terms of institutional provision of computer infrastructure in the faculty-based computer user areas. In this regard the infrastructure index that can be generated from the data (see section 4.2 below) can provide valuable information when correlated per faculty-based computer user area and even the student’s place of residence. The question is often asked whether more computers should be provided in central computer user areas or to decentralise the provision of computers more. Similarly, the WebCT and computer literacy indices (see section 4.2 below) can be correlated with faculty and 495 Antoinette van der Merwe and Liezl van Dyk other demographic variables, to gauge what type of faculty-specific support first-year students need with regard to general computer and WebCT literacy. Students entering University are often referred to as “digital natives”, often ignoring the needs of the students who have never used a computer in learning activities. Based on this type of analysis, information on how to provide support to (at-risk) first-year students with regards to e-learning can then be fed back to the faculties and departments via the Teaching and Learning coordination points established within each faculty as part of the First-year Academy. Figure 3: A screenshot of the (Prospective) Student Data Mart Examples of ad hoc tactical and operational decisions include how the Centre for Teaching and Learning advises lecturers on the use of e-learning based on the analysis of a combination of the types of activities students engage in on the LMS (hits, time spent etc.) as well as the types of non- LMS activities (e.g. use of blogs and wikis). This type of ad hoc decision support can in many cases be provided without a formal data warehouse. However, the value of a data warehouse approach for ad hoc querying lies in the ability to include disparate data sources as well as the efficiency of this approach. Once data are cleaned and integrated it is much easier to use. A business intelligence approach towards the building of a data warehouse becomes an absolute prerequisite when the purpose is to do longitudinal impact measurement. Ideally, a Standard Business Measure Data Mart should also be part of the data warehouse design to enable longitudinal comparisons. To enable comparisons across years, the data and measurements from year to year must be in the same format and based on the same assumptions. Typical questions that could be asked at Stellenbosch University for a specific time period could focus on e-learning adoption rates, e-learning trends as well as how these adoption rates and trends affect the demands on IT infrastructure. 4. Application A great portion of this paper is devoted to a discussion of the context and design of a data warehouse framework to leverage existing data to support decision-making towards the increase of student success. In this section the application of this data warehouse concept is demonstrated through some examples. 4.1 Correlation between different student attributes The correlation matrix (Table 1) includes attributes that were drawn from the SIS (Age, Mid-Year Results, Aggregate School Score), the Alpha Baseline Questionnaire (“On a scale of 1-7 indicate the extent to which you were challenged by school exams”) as well as the LMS Hit counter 496 Antoinette van der Merwe and Liezl van Dyk (Announcement, Assessment, Assignment, Discussion, My-Grades, Consistency of Hits, Total LMS Time and Total LMS Hits). Upon measuring pair-wise the Spearmen-Pearson correlation coefficient, the possibility that this correlation is significant is indicated by the p-value. When p<0.05 is generally assumed that the statistical correlation between two attributes are significant. Table 1: Correlation Matrix (N=573) Age (days) Mid-Year Year School Extent LMS Hits: LMS Hits: LMS Hits: LMS Hits: Consis- Total LMS Total LMS Results Weight Score exams Announce- Assess- Assign- Discussion tency of Time Hits challenge ment ment ments LMS Hits you Age (days) Mid-Year Results Year Weight p<0.05 School Score p<0.05 p<0.05 p<0.05 Extent exams challenge you p<0.05 LMS Hits: Announcement p<0.05 p<0.05 p<0.05 LMS Hits: Assessment p<0.05 p<0.05 p<0.05 p<0.05 LMS Hits: Assignments p<0.05 p<0.05 LMS Hits: Discussion p<0.05 p<0.05 p<0.05 p<0.05 Consistency of LMS Hits p<0.05 p<0.05 p<0.05 p<0.05 p<0.05 p<0.05 p<0.05 Total LMS Time p<0.05 p<0.05 p<0.05 p<0.05 p<0.05 p<0.05 p<0.05 p<0.05 Total LMS Hits p<0.05 p<0.05 p<0.05 p<0.05 p<0.05 p<0.05 p<0.05 Some interesting conclusions can be drawn from Table 1. It is important to note that these conclusions may be case specific, since the tracking data of only a limited number of modules is included. A full scale data warehouse concept should enable case specific, faculty specific as well as institution specific data. Taking this into account, the following conclusions can be made from the available data analysed in Table 1: The older a student the more time is spent on the LMS. However, age does not correlate significantly with the total number of hits or hits consistency. The number of times students use assignments correlates significantly with the Year Weight (portion of modules passed), but not with the Mid-Year Results. The total LMS time correlates significantly with Mid-Year Results, Year Weight as well as School Score, whilst LMS hits correlate significantly with Mid-Year Results only. 4.2 Use of Indices Other interesting results can be obtained by combining the different variables to construct indices in SPSS. Table 2 gives an example of the computer, WebCT and Infrastructure indices that can be constructed as well as examples of the variables that can be included in each index. These variables are taken from three of the four main data sources: The Alpha Baseline questionnaire, the student survey and the LMS tracking data. Once constructed these indices can then be correlated with demographic variables such as age groups, race, gender and faculty and the academic variables such as Academic programme, June university average, % credits passed and M-count taken from the SIS data. The results of these correlations can provide valuable information both in terms of ad hoc, strategic, tactical and operational decisions with regards to student training and support as well as computer infrastructure provision on faculty level (see section 3.5 above). Table 2: Possible Indices Computer literacy Index WebCT literacy Index Infrastructure Index Length of computer use Length of LMS use Where do you access Rating of computer skills Rating of LMS use computers? Own computer Hits consistency How many hours do you Enjoy working with computers Total time spent on LMS spend in CUA (e-mail, Internet, Computer can help me Total LMS hits class notes, assessments, academically How often do you use these assignments etc.) How often do you use these computer applications computer applications (e-mail, (WebSTudies) Internet, Turnitin, Microsoft Word, Excel, Powerpoint, blogs, Wikis etc.) 497 Antoinette van der Merwe and Liezl van Dyk 5. Concluding remarks The design of a data warehouse concept to support institutional decision support concerning first- year success was proposed in this paper. To prove the concept, data of the 2007 first-year students of Stellenbosch University was extracted from the SIS, LMS tracking sources as well as two questionnaires. It was then transformed and loaded into a data warehouse. The purpose of a data warehouse is not to provide the answers before it is asked (although it can provide useful sources for data mining), but rather to have data available in such a way as to enable the extraction of useful information once the questions are asked. The purpose of the paper was not to answer specific questions, but rather to prove the concept of the data warehouse. For this purpose, interesting information drawn from this warehouse was presented and the usefulness thereof to support decision-making discussed.Further plans include: Integrating early assessment data of first-year students. Lecturers have to load a mark for all first-year modules within the first six weeks of the first semester. Expanding and refining the LMS tracking data. Using the data warehouse for longitudinal studies. Integrating data from the Alpha progress questionnaire that all first-year students complete at the end of their first-year. Further analyses of the data in SPSS, specifically using indices. On a more technical level, a consideration is also to consider other types of technologies and, more specifically, an ORACLE database to allow for scalability. Most importantly though, we feel that we have a proof of concept that we would like to share with a wider audience at Stellenbosch University to determine what types of questions need to be asked. As alluded to in this paper, the three questions that possibly need urgent attention are issues around infrastructure, training and support of first-year students, and advising lecturers with regards to teaching and learning activities. References Conradie, P.J. and Van Dyk, L. (2007). “Creating Business Intelligence from Course Management Systems”, Campus-Wide Information Systems, Vol 24, Issue 2, pp. 120 – 133. DM Review Magazine Glossary. “Data Management Review”, SourceMedia, Inc. 2008 Heise, D. (2007). “EDUCAUSE Decision Support Data Warehouse Constituent Group”. Downloaded from dheise.andrews.edu/dw/DWData.html on [26 March 2007]. Kimball, R. and Ross, M. (2002). The data warehouse toolkit. John Wiley and Sons, 2nd edition. ISBN 0-471- 20042-7. Lundy, J., Harris, K., Igou, B., and Zastrocky, M. (2002). “Gartner's e-learning glossary”. Research Note M-14- 9025, Gartner Research. 498
"Integrating a Wide Variety of Student Information Sources to "