Docstoc

UbiCC Journal & ICIT 2009 conference - Applied Computing

Document Sample
UbiCC Journal & ICIT 2009 conference - Applied Computing Powered By Docstoc
					Special Issue on ICIT 2009 Conference - Applied Computing

THE INFLUENCE OF BLENDED LEARNING MODEL ON DEVELOPING LEADERSHIP SKILLS OF SCHOOL ADMINISTRATORS
Tufan AYTAÇ

The Ministry of National Education, Ankara, TURKEY taytac1@yahoo.com
ABSTRACT The usage of b-learning approach on in-service education activities in Turkish education system are getting more and more important these days. Generally, traditional education and computer based education applications are used on in-service education activities. Blended learning (b-learning) combines online learning with face-to-face learning. The goal of blended learning is to provide the most efficient and effective learning experience by combining learning environments. The purpose of this research is to find out the effect of b-learning approach on developing administrators’ leadership skills. To identify what the school administrators’ educational needs and to know their existing leadership skills, needs assessment questionnaire was applied to 72 school administrators who were selected from 33 primary schools in 11 region of Ankara capital city. According to the descriptive statistical analysis results of questionnaire, in-service training programme was prepared for the development of school administrators’ leadership skills. The school administrators were separated into three groups as computer based learning (CBL) (25 participants), blended learning (BL) (23 participants) and traditional learning (TL) (24 participant) groups. These groups were trained separately with these three different learning environments by using the in-service training programme. According to the results of pre-test, post test and achievements score means, it was observed that BL groups’ score is the highest when compared to TL and CBL groups. As a result of this research, in terms of achievements and effectiveness, b-learning was found to be the most effective learning environment when compared to the others. Both learners and tutors findings strongly suggest that blended learning is available alternative delivery method for inservice education activities.1 Keywords: Blended Learning, e-Learning, Information Technology, In-service education

1 INTRODUCTION Blended Learning (b-Learning or Hybrid Learning) consists of the combination of e-Learning and traditional education approach. Blended learning combines online learning with face-to-face learning. The goal of blended learning is to provide the most efficient and effective learning experience by combining different learning environments. b-Learning stands in the forefront in respect of interactivity with target learner group, enriching learning process and integration of technology into education [1,2,3,16,21]. E-learning has had an interesting impact on the learning environment. Blended learning is the most logical and natural evolution of our learning agenda. It suggests an elegant solution to the challenges of tailoring learning and development. It represents an opportunity to integrate the

innovative and technological advances offered by online learning with the interaction and participation offered in the best of the traditional learning [20]. The ground of blended learning approach constitutes the powerfull side of traditional education and computer based educations instead of using one or the other on its own. Basic characteristics of Blended learning which reflects the values of 21st century education are [2];  Providing a new way of learning and teaching,  Teaching how to learn,  Creating digital learners,  Be more economical,  Focusing on technology and communication  Improving project-based learning,  And improving teaching process.

1 This research project article has been supported by The Scientific and Technological Research Council of Turkey (TÜBİTAK) (SOBAG 1001 Programme).

UbiCC Journal – Volume 4 No. 3

538

Special Issue on ICIT 2009 Conference - Applied Computing

Blended Learning practices provide project based learning opportunities for active learning and interaction among learners and especially provides as a way to meet the educational needs of the learners. Blended learning programs may include several forms of learning tools, such as real-time virtual/collaboration software, self-paced web-based courses, electronic performance support systems (EPSS) embedded within the learning-task environment, and knowledge management systems. Blended learning contains various event-based activities, including face-to-face learning, elearning, and self-paced learning activities. Blended learning often occurs as a mixture of traditional instructor-led training, synchronous online training, asynchronous self-paced study, and structured task based training from a teacher or mentor. The aim of blended learning is to combine the best of classroom face-to-face learning experiences with the best of online learning experiences. Overall, blended learning refers to the integration (or the so-called blending) of e-learning tools and techniques with traditional face-to-face teaching delivery methods. The two important factors here are the time spent on online activities and the amount of technology utilized, see Concept of Blended Learning figure 1 below: [3,4,6,7,8,9,10,11,12,15,16,19].

course materials. While such uses may be unique and engaging, they are not exactly novel [13].

Figure 2: A Blend of Learning Theories By applying learning theories of Keller, Gagné, Bloom, Merrill, Clark and Gery, (see Figure 2) five key ingredients emerge as important elements of a blended learning process (see Figure 2): 1. Live Events: Synchronous, instructor-led learning events in which all learners participate at the same time, such as in a live “virtual classroom.” 2. Self-Paced Learning: Learning experiences that the learner completes individually, at his own speed and on his own time, such as interactive, Internet-based or CD-ROM training. 3. Collaboration: Environments in which learners communicate with others, for example, e-mail, threaded discussions or online chat. 4. Assessment: A measure of learners’ knowledge. Preassessments can come before live or self-paced events, to determine prior knowledge, and post-assessments can occur following live or self-paced learning events, to measure learning transfer. 5. Performance Support Materials: On-the-job reference materials that enhance learning retention and transfer, including PDA downloads, and printable references, summaries, and job aids. 2
PURPOSE

Fig. 1: Concept of Blended Learning If two or more of these learning environments which are stated above are used to teach an educational objective, it can be said that blended learning is realized. However blended learning has more meaning than showing a web page during a lesson in the classroom and using information immediately in the web page to explain the lesson. Blended learning is a learning of environment which combines environments of face to face learning and web-based distance learning. Blended learning overcomes this limitation of an elearning only approach [12]. Today blended learning primarily functions as a replacement for extension of face-to face environments. For instance, it might be used to foster learning communities, extend training events, offer follow-up resources in a community of practice, access guest experts, provide timely mentoring or coaching, present online lab or simulation activities, and deliver prework or supplemental

The purpose of this research is to find out the effects of b-learning approach on developing school administrators’ leadership skills. 3 RESEARCH DESIGN To determine what the school administrators’ educational needs on leadership skills, needs assessment questionnaire was applied to 72 school administrators who were selected from 33 primary schools in 11 regions within Ankara capital city. According to the results of this questionnaire, in-service

UbiCC Journal – Volume 4 No. 3

539

Special Issue on ICIT 2009 Conference - Applied Computing

training programme on developing school administrator’s leadership skill was prepared. The most needed leadership skills of school administrators according to the results of needs assessment were determined as human relations in administration, basic management skills for school principles, job satisfaction at organizations and motivation. After that, content and learning activities of "School Administrators Leadership Skills Development In-service Programme" were prepared. Beside that course notes as training materials were prepared to be distributed to the participants in the form of CDROM and printed documents. The school administrators were separated into three groups as Computer Based Learning (CBL) (25 participants), Blended Learning (BL) (23 participant) and Traditional Learning (TL) (24 participant) groups. These groups were trained according to three different methods by preparing education programme. The groups were given two days course. Before the in-service training the school administrators who were in BL group reached the digital content and studied learning activities included in "School Administrators Leadership Skills Development In-service Programme" which is prepared by using Moodle Learning Managing System Softwware and published on http://beg.meb.gov.tr:8088/ website. The school administrators who are in the BL group were entered to the http://beg.meb.gov.tr:8088/ webpage by using their usernames and passwords given to them three weeks ago, before the in-service training. The interface of the website is shown in the Fig. 2. The school administrators in this group shared information, chatted, and studied activities with their colleagues and subject area specialist about the content and learning activities included in the site whenever they want. As online learner, school administrators build their confidence and learning processes as they get used to working independently online. Blended learning activities included online knowledge gathering and construction in teams or groups, publishing of electronic content, interactive elements like online brainstorming, discussion, several forms of feedback, evaluation and assessment, as well as other blended learning techniques. Lecturers posted messages to the BL group as a whole and to each administrators individually to meet their need for support. They posted explanation to guide learners in more complex tasks, encouraged them to communicate, to do their individual assignments, and to use the Moddle platform tools. They have at their disposal to facilitate their work. Tutors controlled and marked the online assignments, filled in learners’ performance reports, and write feedback on their performance in their online portfolios. Lecturers followed school administrators learning improvements and gave encouragement when motivation level began to falter. And after that this group was trained by lecturer as subject area specialist. Lecturer trained this group by using face to face education, computer based education and online training website prepared by moodle software.

Figure. 3: The Moodle interface On the other hand; all the in-service training content and activities were taught to CBL group by lecturer with aid of computer and projector. Finally, TL group was trained in a traditional way by using blackboard Multiple choice test which was made up of 20 questions were applied to the groups to investigate their achievements on leadership skills. This test was shown to content experts to identify its content validity. To find out the statistical significant difference among three groups score means, oneway Anova and Scheffe test were used. This test was applied to all groups as pre-test at the beginning and as post-test at the end of in-service training [5]. Blended Learning Model which was used on the research process showed Figure 3.

Figure. 3: The Process of Blended Learning Model

UbiCC Journal – Volume 4 No. 3

540

Special Issue on ICIT 2009 Conference - Applied Computing

4 RESULTS When three groups’ pre-test score means were compared, it was seen that there were significant differences among them (F (2-69)=53,350, p<.01). (Table I). Table 1: The One-Way Anova Results on the Difference Between Groups According to Pre-Test Scores Source of Variance Between groups Within groups Total Sum of Squares 278,668 180,207 458,875 df Mean F Sig. Difference BDE-KÖ, 2 139,334 53,350 ,000 GÖ-KÖ 2,612 Mean Square

Table 3: The One-Way Anova Results on the difference Between Groups According to Achievement Scores (Difference between pre-test and post test)

Source of Sum of Variance Squares Between 46,540 groups Within 88,779 groups Total 135,319

df 2 69 71

Mean Sig. Difference BDE-KÖ, 23,270 18,086 ,000 GÖ-KÖ F 1,287

Mean Square

69 71

When three groups were statistically compared according to data, BL groups’ pre-test score mean ( =12.87) was found statistically higher than the other two groups (BDE =9.12, TL =8.29). The reason of this might be that BL group was more ready and successful than others. Since they studied earlier all content and activities which were prepared with Moodle software and published on internet. Table 2: The One-Way Anova Results on the Difference Between Groups According to Post-Test Scores Source of Variance Between groups Within groups Total Sum of df Mean Squares Square 544,539 2 272,270 207,336 69 751,875 71 3,005 Mean Difference

It was seen that BL group’s pre-test, post test and achievements score means were the highest when compared to TL and CBL groups. The reason of this might be that BL group might be more ready than others since they studied content and activities which were published with Moodle software before other groups. They also experienced both face to face and computer based learning environments. 5 CONCLUSION The influence of b-learning model on developing leadership skills of school administrators was more effective than computer based education and traditional learning. As a result of this research, in terms of time, cost and effectiveness, b-Learning was found to be the most effective method to in comparison with the other approaches. Particularly, it appeared that it is necessary to use more bLearning approach in in-service training of administrators and teachers. It is required effective usage of b-Learning approaches for integrating education with information technologies, enriching learning-teaching process, implementing face to face education, providing computer based learning, realizing hands on learning and individualizing the learning. At the research blended-learning arrangements involved e-mentoring or e-tutoring. The role of the e-mentor/tutor is critical as this requires a transformation process to that of learning facilitator. Being teachers and online tutors has introduced beneficial qualitative changes in teachers’ roles, but it has also meant a quantitative increase in the number of hours dedicated to learners. Lecturers less spent time in faceto-face classes than the online environment (Moddle platform). Moodle programme which is used blended learning approach has great potential to create a successful blended learning experience by providing a plethora of excellent tools that can be used to enhance conventional classroom instruction, in hybrid courses, or any distance learning arrangements [18]. Finally, lecturers identified learners who may be experiencing particular problems and help them address their weakneses in remedial work sessions if necessary.

F

Sig.

BDE-KÖ, 90,61 ,000 BDÖ-GÖ, 0 GÖ-KÖ

When school administrators’ post test score means were compared among three groups, it was found that there was a significant difference between post test score means. (F (269)=90,610, p<.01). (Table II). There was also statistically significant differences among BL group’s post test score mean ( =17.35), CBL group’s post test score mean ( =12.44) and TL group’s post test mean ( =10.79). Especially, BL group administrators’ post test score mean is the highest of all. The difference between school administrators’ pre-test and post test scores was being calculated to identify their achievement scores. It was seen that there was a meaningful difference among the groups’ achievement score means (F (269)=18.086, p<.01) (Table 3). BL group’s achievement means ( =4.48) has been higher than CBL ( =3.28) and TL ( =2.50) groups’ achievement means.

UbiCC Journal – Volume 4 No. 3

541

Special Issue on ICIT 2009 Conference - Applied Computing

We observed that b-Learning opportunities for teaching objectives make learning entertaining, funny, lasting and economics as an effective way. In this sense, according to us trainers should use b-Learning environment for the integration of ICT effectively in learning and teaching. Last year, the Turkish Ministry of National Education Inservice Training Department implemented more than 700 inservice training courses. The usage of b-Learning methodology especially in these in-service trainings will enrich and support the learning-teaching process of those inservice training. More projects about the usage of b-Learning in-service training should be supported and performed. Particularly, the initiatives of the Turkish Ministry of National Education for improving schools information technologies and internet infrastructure, distributing authoring software to the teachers, developing education portal and its content, moodle and similar learning management system software should be used for supporting blearning usage in-service training. School administrators state that b-learning approaches will be used more effectively in the class. All school administrators’ comments regarding the blended course were positive. It is cited as below that the positives of the blended learning course activities which are used at this research; - Improvement in the quantity and/or quality of the communications among the school administrators in discussion board or online groups and face to face activities in the classroom. - Good cooperative learning activities - Blended learning were more effective than classroom alone. Higher learner value and impact; the effectiveness greater than for nonblended approaches. Learners like blearning appraoches. - Accessibility to b-learning content and activities rapidly (every time, everywhere) - Improved relationships between tutors and students - The immediate feedback that could be given school administrators - Flexibility in scheduling and timetabling of course work. - An increase of the time actually spent on face-to-face in classroom - Cost effectiveness for both the accrediting learning institution and the learner The increased cost, reduced training time, and the ability to easily update training materials offer additional compelling reasons for educators to embrace blended learning [22]. At the research there are some problems according to school administrators opinions cited as below: - Some technical web, internet problems access to moddle platform. - The failure of online Power Point presentation of lecture material to meet some school administrators’ expectations. - Some school administrations lack of enthusiasm being in a blended learning course. - Limited knowledge in the use of technology.

- Blended learning takes time for both the instructor and learner to adapt to this relatively new concept in delivering instruction. Especially, it can be concluded that all the in-service training should be taught more effectively by using bLearning approach. The technological leadership role of the school administrations is very important for the success of bLearning approach. The feature of blended learning models has a vital importance for applying individual learning and active learning. According to some authors “a blend is integrated strategy to delivering on promises about learning and performance [17]. In sum, both learners and tutors findings strongly suggest that blended learning is available alternative delivery method for courses. In supporting blended learning, especially inservice education courses remains both a national leader in the effective use of technology for teaching and learning, and a pioneer in identifying the right mix of face-to-face and online communication practices that will enhance learning effectiveness [19]. The result of this research backs up all of these. To develop the technological leadership of school administrators, b-learning approaches should be used effectively. Blended learning offers opportunities for both inservice school administrators, in-service teachers and their learners.

REFERENCES [1] Aytaç, T. Eğitimde Bilişim Teknolojileri. Asil Yayın Dağıtım, pp. 48-53 (2006). [2] Aytaç, T. The Influence of B-Learning Model on Developing Leadership Skills of Education Administrators Research Education Programme, pp. 48-53. (2006). [3] Singh, H. “Building Effective Blended Learning Programs”, Educational Technology, Vol. 43, Number 6, pp. 51-54, November – December, (2003). [4] Oliver, M. ve Trigwell, K. ‘‘Can ‘Blended Learning’ Be Redeemed?. E-Learning, Vol.2. Number 1, pp. 17, (2005). [5] Büyüköztürk, Ş. Sosyal Bilimler İçin Veri Analizi El Kitabı. İstatistik, Araştırma deseni SPSS Uygulamaları ve Yorum, 8. Baskı, PegemA Yayıncılık, Pp: 40-53, Ankara, (2007). [6] Bonk, C. J.; Olson, T. M.; Wisher, R. A. and Orvis, K. L. Learning from Focus Groups: An Examination of Blended Learning’’, Journal of Distance Education. Vol. 17, No 3. pp. 100. (2002). [7] Marsh, J. How to Design Effective Blended Learning. www.brandon-hall.com. Erişim tarihi: 15 February 2009. [8] Orhan, F. Altınışık, S. A. and Kablan, Z. “Karma Öğrenme (Blended Learning) Ortamına Dayalı Bir

UbiCC Journal – Volume 4 No. 3

542

Special Issue on ICIT 2009 Conference - Applied Computing

Uygulama: Yıldız Teknik Üniversitesi Örneği”, IV. Uluslararası Eğitim Teknolojileri Sempozyumu, 24-26 Kasım 2004, Sakarya, Vol: 1, pp.646-651, (2004). [9] Dracup, Mary. "Role Play in Blended Learning: A Case Study Exploring the Impact of Story and Other Elements, Australasian Journal of Educational Technology, 24(3), pp.294-310, (2008). [10] Cooper, G. and Heinze, A. "Centralization of Assessment: meeting the challenges of Multi-year Team Projects in Information Systems Education." Journal of Information Systems Education, 18, 3, pp.345 – 356, (2007). [11] Heinze, A. Lecturer in Information Systems, http://www.aheinze.me.uk/Blended_Learning_Higher_E ducation.html, Erişim tarihi: 15 February 2009. [12] Langley, Amanda. “Experiential Learning, E-Learning and Social Learning: The EES Approach to Developing Blended Learning” The Fourth Education in a Changing Environment Conference Book, Edited by Eamon O’Doherty, İnforming Science Press, pp.171-172, (2007). [13] Bonk, C. J. & Graham, C. R. (Eds.). “Future Directions of Blended Learning In Higher Education and Workplace Learning Settings” Handbook of blended learning: Global Perspectives, local designs. San Francisco, CA: Pfeiffer Publishing. (2004). [14] Carman, Jared M. Blended Learning Design: Five Key Ingredients, Director, Product Development KnowledgeNet, October 2002 www.brandon-hall.com. Erişim tarihi: 15 February 2009. [15] Derntl M. Motschnig-Pitrik, Renate. A Layered Blended Learning Systems Structure, Proceedings of IKNOW ’04 Graz, Austria, June 30 - July 2, (2004). [16] Bañadosa, Emerita. Blended-learning Pedagogical Model for Teaching and Learning EFL Successfully Through an Online Interactive Multimedia Environment, CALICO Journal, Vol. 23, No. 3, p-p 533-550, (2006). [17] Rosset, A., Douglis, F. &Frazee, R. V. Strategies for building blended learning. Learning Circuits. Retrieved August 13, 2007, from http://www.learningcircuits.org/2003/jul2003/rossett.htm . [18] Brandl, K. (2005). Are you ready to moodle?. Language, Learning & Technology, Vol. 9, No. 2, pp. 16-23, May (2005). [19] Blended Learning Pilot Project, Final Report for 20032004 and 2004-2005 Rochester Institute of Technology. (2004). Blended Learning Pilot Project: Final Report for the Academic Year 2003 – 2004. Retrieved Feb 5, ,fromhttp://distancelearning.rit.edu/blended/Files/Blende dPilotFinalReport2003_04.pdf. (2009).

[20] Thorne, K. Blended Learning: How to Integrate Online and Traditional Learning. United States, Kogan Page, (2004). [21] Rovai, Alfred P. and Jordan, Hope M. "Blended Learning with Traditional and Fully Online Graduate Courses." International Review of Research in Open and Distance Learning. 2004. Retrieved Sept 27, from http://www.irrodl.org/content/v5.2/rovaijordan.html. (2008) [22] G. Thorsteinsson and T. Page. “Blended Learning Approach to Improve, In-Servıce Teacher Education In Europe Through The Fıste Comenıus 2.1. Project” Ict in Education: Reflections and Perspectives, Bucharest, June 14-16, (2007).

UbiCC Journal – Volume 4 No. 3

543

Special Issue on ICIT 2009 Conference - Applied Computing

TOWARDS THE IMPLEMENTATION OF TEMPORAL-BASED SOFTWARE VERSION MANAGEMENT AT UNIVERSITI DARUL IMAN MALAYSIA
M Nordin A Rahman, Azrul Amri Jamal and W Dagang W Ali Faculty of Informatics Universiti Darul Iman Malaysia, KUSZA Campus 21300 K Terengganu, Malaysia mohdnabd@udm.edu.my, azrulamri@udm.edu.my, wan@udm.edu.my ABSTRACT Integrated software is very important for the university to manage day-to-day operations. This integrated software is going through evolution process when changes are requested by the users and finally the new versions are created. Software version management is the process of identifying and keeping track of different versions of software. Complexity level of this process would become complicated should software was distributed in many places. This paper presents a temporal-based software version management model. The model is purposely implemented for managing software versions in Information Technology Centre, Universiti Darul Iman Malaysia. Temporal elements such as valid time and transaction time are the main attributes considered, to be inserted into the software version management database. By having these two attributes, it would help the people involved in software process to organize data and perform monitoring activities with more efficient. Keywords: version management, temporal database, valid time, transaction time. 1. INTRODUCTION Software evolution is concerned with modifying software once it is delivered to a customer. Software managers must devise a systematic procedure to ensure that different software versions may be retrieved when required and are not accidentally changed. Controlling the development of different software versions can be a complex task, even for a single author to handle. This task is likely to become more complex as the number of software authors increases, and more complex still if those software authors are distributed geographically with only limited means of communication, such as electronic mail, to connect them. Temporal based data management has been a hot topic in the database research community since the last couple of decades. Due to this effort, a large infrastructure such as data models, query languages and index structures has been developed for the management of data involving time [11]. Nowadays, a number of software has adopted the concepts of temporal database management such as artificial intelligence software, geographic information systems and robotics. Temporal management aspects of any objects could include:  The capability to detect change such as the amount of change in a specific project or object over a certain period of time.   The use of data to conduct analysis of past events e.g., the change of valid time for the project or version due to any event. To keep track of all the transactions status on the project or object life cycle.

Universiti Darul Iman Malaysia (UDM) is the first full university at East Cost of Malaysia located at the state of Terengganu. It was setup on 1st January 2006. UDM has two campus named as KUSZA Campus and City Campus. Another new campus known as Besut Campus will be operated soon. To date, KUSZA Campus has six faculties and City Campus has three faculties. The university also has an Information Technology Centre (ITC-UDM) that purposely for developing and maintaining the university information systems and information technology infrastructure. In this paper, we concentrate on the modelling of a temporal-based software version management. Based on the model, a simple web-based web application has been developed and suggested to be used by ITC-UDM. The rest of the paper is organized as follows: next section reviews the concept of temporal data management. Section 3 discusses on the current techniques in software version management. Current issues in software version management at ITC-UDM are discussed in Section 4. The specifications of the proposed temporal-based software version management model are explained in Section 5. Conclusion is placed in Section 6.

UbiCC Journal – Volume 4 No. 3

544

Special Issue on ICIT 2009 Conference - Applied Computing

2. TEMPORAL DATA CONCEPT To date, transaction time and valid time are the two well-known of time that are usually considered in the literature of temporal database management [2, 4, 6, 9, 10, 11, 12]. The valid time of a database fact is the time when the fact is true in the miniworld [2, 6, 9, 10]. In other words, valid time concerns the evaluation of data with respect to the application reality that data describe. Valid time can be represented with single chronon identifiers (e.g., event time-stamps), with intervals (e.g., as interval time-stamps), or as valid time elements, which are finite sets of intervals [9]. Meanwhile, the transaction time of a database fact is the time when the fact is current in the database and may be retrieved [2, 6, 9, 10]. This means, that the transaction time is the evaluation time of data with respect to the system where data are stored. Supporting transaction time is necessary when one would like to roll back the state of the database to a previous point in the time. [9] proposed four implicit times could be taken out from valid time and transaction time:   valid time – valid-from and valid-to transaction time – transaction-start transaction-stop

3. RELATED TOOLS IN SOFTWARE VERSION MANAGEMENT In distributed software process, a good version management combines systematic procedures and automate tools to manage different versions in many locations. Most of the methods of version naming use a numeric structure [5]. Identifying versions of the system appears to be straightforward. The first version and release of a system is simply called 1.0, subsequent versions are 1.1, 1.2 and so on. Meanwhile, [3] suggests that every new version produced should be placed in a different directory or location from the old version. Therefore, the version accessing process would be easier and effective. Besides that, should this method be implemented using a suitable database management system, the concept of lock access could be used to prevent the occurrence of overlapping process. Present, there are many software evolution management tools available in market. Selected tools are described as follows:  Software Release Manager (SRM) – SRM is a free software and supported on most UNIX and LINUX platforms. It supports the software version management for distributed organizations. In particular, SRM tracks dependency information to automate and optimize the retrieval of systems components as well as versions. Revision Control System (RCS) – RCS uses the concepts of tree structures. Each branch in the tree represents a variant of the version. These branches will be numbered by an entering sequence into a system database. RCS records details of any transaction made such as the author, date and reason for the updating. Change and Configuration Control (CCC) – CCC is one of the complete tools for software configuration management. It provides a good platform for an identification, change control and status accounting. CCC allows a simultaneously working for a same version via virtual copies. This can be merged and changes can be applied across configurations. Software Management System (SMS) – SMS allows all the aspects in software configuration management such as version control, workspace management, system modelling, derived object management, change detection in the repository etc. SMS possesses the desired characteristics, providing resources of version control of systems and having a good user interface.

and

Temporal information can be classified into two divisions; absolute temporal and relative temporal [9]. Most of the research in temporal databases concentrated on temporal models with absolute temporal information. To extend the scope of temporal dimension, [12] presented a model which allows relative temporal information e.g., “event A happened before event B and after January 01, 2003”. [12] suggests several temporal operators that could be used for describing the relative temporal information: {equal, before, after, meets, overlaps, starts, during, finishes, finished-by, contains, started-by, overlappedby, met-by and after}. In various temporal research papers the theory of time-element can be divided into two categories: intervals and points [6, 9, 11]. If T is denoted a nonempty set of time-elements and d is denoted a function from T to R+, the set of nonnegative real numbers then:







interval , if d(t)  0 time _ element, t    point , otherwise
According to this classification, the set of timeelements, T, may be expressed as T = I  P, where I is the set of intervals and P is the set of points.

UbiCC Journal – Volume 4 No. 3

545

Special Issue on ICIT 2009 Conference - Applied Computing

4. THE SOFTWARE VERSION MANAGEMENT ISSUES IN ITC-UDM There are three divisions have been formed at ITCUDM. These divisions and their function are as follows:   Infrastructure and Application Systems (AIS) – to develop and maintain the university software; maintain the university computer networking; Technical and Services (TS) – to support the maintenance of information technology hardware, training, multimedia services and help desk. Administration and Procurement (AP) - to manage the daily operation of ITC-UDM such as administration, procurement etc.



The current approach maintains only the concept of current view version of which an existing version will be overwritten by a new incoming version during the process of an update.

Based on the mentioned problems, we strongly believe that the development of temporal-based software version management tool for ITC-UDM could gain the following benefits:   To support project and software managers in planning, managing and evaluating version management. Assigning timestamps (absolute and relative) to each transaction will provide transaction-time database functionality, meaning to retain all previously current database state and making them available for time-based queries. To increase the effectiveness and efficiency of the collaborative software version management process.



Each division is headed by a division leader and supported by several information technology officers, assistant information technology officers and technicians. All the university software modules are developed and maintained by AIS Division. Figure 1 depicts the main software modules managed by the ITC-UDM. There are over thousands source code files are produced by the division. Therefore, it is not easy for the division to manage all those artefacts. University Software Module Academic Module Human Resource Module Student Affairs Module Finance Module Department of Development



5. THE MODEL Version control is one of the main tasks in software configuration management. For any software version would have its own valid time. The collection of software versions should be organized into systematic way for the purpose of retrieval efficiency and to recognize valid time of those versions. Besides the used of unique sign for the associate version, the method of time-stamping is also needed to be embedded into the version management database. 5.1 The Temporal-Based Version Management Specifications Temporal elements involved in the model are transaction time (tt), absolute valid time (avt) and relative valid time (rvt) which can be denoted as, TE = {tt, avt, rvt}. Transaction time is a date-stamping and it represents a transaction when a new valid time for a version is recorded into the application database. Absolute valid time is represent by two different attributes known as valid-from and valid-until and it also using an approach of date-stamping. Meanwhile, relative valid time which involves a time interval, will be represented by a combination of temporal operators, OPERATORs = {op1, op2, op3, …, opn} and one or more defined event(s), signed as EVENTs = {event1, event2, event3, …, eventn}. This model, considered only five temporal operators, hence will be denoted as OPERATORs = {equal, before, after, meets, met_by}. Table 1 illustrates the general definitions of temporal operators based on time interval and time points. Figure 2 shows the

Figure 1: University Software Modules From study done by the authors, two main weaknesses have been found in the current approach for ITC-UDM in managing all versions of source codes produced:   Non systematic procedure used for managing software versions and it is difficult to recognize the valid time for each version. The current approach does not consider the aspect of relative temporal in representing the valid time for each version.

UbiCC Journal – Volume 4 No. 3

546

Special Issue on ICIT 2009 Conference - Applied Computing

organization of temporal elements that involved in software version management. If we have a software with a set of version signed as, V = {v1, v2, v3, …, vn} then the model is: TEMPORAL(vi  V)  (tt  avt  rvt) where, avt = [avt-from, avt-until], rvt = [rvt-from, rvt-until], rvt-from = {{opi  OPERATORs}  {eventi  EVENTs}} and, rvt-until = {{opi  OPERATORs}  {eventi  EVENTs}}. Thus, if the software that has a set of feature attributes Ai then a complete scheme for a temporal-based in software version management can be signed as:

S = {A1, A2, A3, …, An, tt, avt-from, avt-until, rvtfrom, rvt-until} where, Ai = attribute name of a version, tt  P and, avt-from, avt-until, rvt-from and rvt-until  T. Table 2 exhibits the temporal-based versionrecord management for representing KEWNET’s software version history. For example, KEWNET Ver. 1.1 has been updated three times. For the first time, the version has been recorded on tt3 with absolute valid time is from avf2 to avu3 and relative valid time is from rvf2 to rvu3. For the second updated, on tt4, absolute valid time is from avf2 to avu4 and relative valid time is from rvf2 to rvu4. The version has another change request and therefore the version would have a new absolute valid time from avf2 to avu5 and relative valid time from rvf2 to rvu5. This transaction is recorded on tt5.

Table 1: The definitions of temporal operator base on time point and time interval Temporal Operator equal before after meets met_by Time Point t = {(t = ti)  T}  = {( < ti)  T}  = {( > ti)  T}  = {(  ti)  T}  = {(  ti)  T} Time Interval  ={( = i)  T}  = {( < i)  T}  = {(  i)  T}  = {(  i)  T}  = {(  i)  T}

Transaction time

Software version

Valid time

Absolute

Relative

From

Until

From

Until

Figure 2: Temporal elements in software version management

UbiCC Journal – Volume 4 No. 3

547

Special Issue on ICIT 2009 Conference - Applied Computing

Table 2: Version-Record for KEWNET software Ver # 1.0 1.0 1.1 1.1 1.1 1.2 1.2 2.0 2.0 2.1 tt tt1 tt2 tt3 tt4 tt5 tt6 tt7 tt8 tt9 tt10 avt-from avf1 avf1 avf2 avf2 avf2 avf3 avf3 avf4 avf4 avf5 avt-until avu1 avu2 avu3 avu4 avu5 avu6 avu7 avu8 avu9 avu10 rvt-from rvf1 rvf1 rvf2 rvf2 rvf2 rvf3 rvf3 rvf4 rvf4 rvf5 rvt-until rvu1 rvu2 rvu3 rvu4 rvu5 rvu6 rvu7 rvu8 rvu9 rvu10

5.2 The Temporal-Based Version Management Functionality To carry out experiments validating the model proposed, a client-server prototype has been developed. The prototype has three main modules: register version, update the version valid time and queries.

During the register version process the software manager needs to record the foundations information of the software version. Attributes that needed to be key-in by software manager can be signed as, Av = {version code, date release, version description, origin version code, version id}. Figure 3 illustrates the screen sample used to register the basic information of the software version.

Figure 3: Register the software version On completion of new software version registration, then the software manager needs to update its valid time and this can be done by using the module update the version valid time, illustrated in Figure 4. The attributes for this module formed as AT = {version code, transaction date, description, date start, date end, time start, time end, update by, position}. Attribute transaction date is the current date and will be auto-generated by the server. Any changes of a software version valid time, software manager needs to update by using this form. The tool also allows the user to make a query to the database. The users can browse the version valid time and status for any registered software as shown in Figure 5. Meanwhile, Figure 6 shows the output form of query for all histories of valid time and status for a software version.

UbiCC Journal – Volume 4 No. 3

548

Special Issue on ICIT 2009 Conference - Applied Computing

Figure 4: Update the software version valid time

Figure 5: The software version valid time report

Figure 6: The transaction records of a version

UbiCC Journal – Volume 4 No. 3

549

Special Issue on ICIT 2009 Conference - Applied Computing

6. CONCLUSION In practical software version management, it is frequently important to retain a perfect record of past and current valid time for a version states. We cannot replace or overwritten the record of old valid time of a software version during the updating process. Hence, this paper introduces a new model in software version management based on temporal elements. Here, an important issue discussed is temporal aspects such as valid time and transaction time have been stamped on each software version so that the monitoring and conflict management processes can be easily made. Based on the proposed model, a prototype has been developed. The prototype will be experimented in ITC-UDM. It will be used to monitor and keep track the evolution of the software version, systems module and software documents in university’s software. For further improvements, currently, we are investigating related issues including combining the model with change request management, considering more temporal operators and developing a standard temporal model for all configuration items in software configuration managements. References: [1] Bertino, E., Bettini, C., Ferrari, E. and Samarati, P. “A Temporal Access Control Mechanism for Database Systems”, IEEE Trans. On Knowledge and Data Engineering, 8, 1996, 67 – 79. C. E. Dyreson, W. S. Evans, H. Lin and R. T. Snodgrass, “Efficiently Supporting Temporal Granularities”, IEEE Trans. On Knowledge and Data Engineering, Vol. 12 (4), 2000, 568 – 587. G. M. Clemm. “Replacing Version Control With Job Control”, ACM – Proc. 2nd Intl. Workshop On Software Configuration Management, 1989, 162 – 169. D. Gao, C. S. Jensen, R. T. Snodgrass and M. D. Soo, “Join Operations in Temporal Databases”, [5]

The Very Large Database Journal, Vol. 14, 2005, 2 – 29. A. Dix, T. Rodden, and I. Sommerville. “Modelling Versions in Collaborative Work”, IEE – Proc. Software Engineering, 1997, 195 – 206. H. Gregerson, and C. S. Jensen, “Temporal Entity-Relationship Models – A Survey”, IEEE Trans. On Knowledge and Data Engineering, 11, 1999, 464 – 497.

[6]

[7] A. Gustavsson. “Maintaining the Evaluation of Software Objects in an Integrated Environment”, ACM – Proc. 2nd Intl. Workshop On Software Configuration Management, 1989, 114 – 117. [8] A. Havewala. “The Version Control Process: How and Why it can save your project”, Dr. Dobb’s Journal. 24, 1999, 100 – 111. C. S. Jensen and R. T. Snodgrass. “Temporal Data Management”, IEEE Trans. on Knowledge and Data Engineering. 11, 1999, 36 – 44.

[9]

[10] K. Torp, C. S. Jensen and R. T. Snodgrass, “Effective Timestamping in Database”, The Very Large Database Journal, Vol. 8, 1999, 267 – 288. [11] B. Knight, and J. Ma. “A General Temporal Theory”, The Computer Journal, 37, 1994, 114 – 123. [12] B. Knight and J. Ma. “A Temporal Database Model Supporting Relative and Absolute Time”, The Computer Journal. 37, 1994, 588 – 597. [13] A. Lie. “Change Oriented Versioning in a Software Engineering Database”, ACM – Proc. 2nd Intl. Workshop on Software Configuration Management. 1989, 56 – 65. [14] H. Mary. “Beyond Version Control”, Software Magazine. 16, 1996, 45 – 47.

[2]

[3]

[4]

UbiCC Journal – Volume 4 No. 3

550

Special Issue on ICIT 2009 Conference - Applied Computing

EFFECTIVE DIGITAL FORENSIC ANALYSIS OF THE NTFS DISK IMAGE
Mamoun Alazab, Sitalakshmi Venkatraman, Paul Watters University of Ballarat, Australia {m.alazab, s.venkatraman, p.watters} @ballarat.edu.au

ABSTRACT Forensic analysis of the Windows NT File System (NTFS) could provide useful information leading towards malware detection and presentation of digital evidence for the court of law. Since NTFS records every event of the system, forensic tools are required to process an enormous amount of information related to user / kernel environment, buffer overflows, trace conditions, network stack, etc. This has led to imperfect forensic tools that are practical for implementation and hence become popular, but are not comprehensive and effective. Many existing techniques have failed to identify malicious code in hidden data of the NTFS disk image. This research discusses the analysis technique we have adopted to successfully detect maliciousness in hidden data, by investigating the NTFS boot sector. We have conducted experimental studies with some of the existing popular forensics tools and have identified their limitations. Further, through our proposed three-stage forensic analysis process, our experimental investigation attempts to unearth the vulnerabilities of NTFS disk image and the weaknesses of the current forensic techniques. Keywords: NTFS, forensics, disk image, data hiding. 1 INTRODUCTION predominant operating systems in use, such as Windows 2000, Windows XP, Windows Server 2003, Windows Server 2008, Windows Vista, Windows 7 and even in most free UNIX distributions [7, 8, 9]. Hence, malware writers try to target on NTFS as this could result in affecting more computer users. Another compelling reason for witnessing a strong relationship between computer crime and the NTFS file system is the lack of literature that unearth the vulnerabilities of NTFS and the weaknesses of the present digital forensic techniques [10]. This paper attempts to fill this gap by studying the techniques used in the analysis of the NTFS disk image. Our objectives are i) to explore the NTFS disk image structure and its vulnerabilities, ii) to investigate different commonly used digital forensic techniques such as signatures, data hiding, timestamp, etc. and their weaknesses, and iii) finally to suggest improvements in static analysis of NTFS disk image. 2 FORENSIC ANALYSIS PROCESS

Digital forensics is the science of identifying, extracting, analyzing and presenting the digital evidence that has been stored in the digital electronic storage devices to be used in a court of law [1, 2, 3]. While forensic investigation attempts to provide full descriptions of a digital crime scene, in computer systems, the primary goals of digital forensic analysis are fivefold: i) to identify all the unwanted events that have taken place, ii) to ascertain their effect on the system, iii) to acquire the necessary evidence to support a lawsuit, iv) to prevent future incidents by detecting the malicious techniques used and v) to recognize the incitement reasons and intendance of the attacker for future predictions [2, 4]. The general component in digital forensic process are; acquisition, preservation, and analysis [5]. Digital electronic evidence could be described as the information and data of investigative value that are stored by an electric device, such evidence [6]. This research focuses on the abovementioned third goal of acquiring the necessary evidence of intrusions that take place on a computer system. In particular, this paper investigates the digital forensic techniques that could be used to analyze and acquire evidences from the most commonly used file system on computers, namely, Windows NT File System (NTFS). Today, NTFS file system is the basis of

In this section, we describe the forensic analysis process we had adopted to achieve the above mentioned objectives of this research work. We conducted an empirical study using selected digital forensic tools that are predominantly used in practice. Several factors such as effectiveness, uniqueness and robustness in analyzing NTFS disk image were considered in selecting the tools / utilities required

UbiCC Journal – Volume 4 No. 3

551

Special Issue on ICIT 2009 Conference - Applied Computing

for this empirical study. Since each utility does some specific functionality, a collection of such tools were necessary to perform a comprehensive set of functionalities. Hence, the following forensic utilities / tools were adopted to conduct the experimental investigation in this research work: i) Disk imaging utilities such as dd [11] or dcfldd V1.3.4-1 [12] for obtaining sectorby-sector mirror image of the disk; ii) Evidence collection using utilities such as Hexedit [13], Frhed 1.4.0[14] and Strings V2.41[15] to introspect the binary code of the NTFS disk image; iii) NTFS disk analysis using software tools such as The Sleuth KIT (TSK) 3.01[16] and Autopsy [17] and NTFSINFO v1.0 [18] to explore and extract intruded data as well as hidden data for performing forensic analysis. For the experimental investigation of the effectiveness of the above tools, we created test data on a Pentium (R) Core (TM) 2 Due CPU, 2.19 GHz, 2.98 of RAM with Windows XP professional that adopts the NTFS file system partition. In this pilot empirical study, we focused on the boot sector of the NTFS disk image. We adopted the following three stages to perform digital forensic analysis in a comprehensive manner: Stage 1: Hard disk data acquisition, Stage 2: Evidence searching and Stage 3: Analysis of NTFS file system. 2.1 Stage 1 - Hard Disk Data Acquisition As the first stage in forensic analysis, we used the dcfldd developed by Nicholas Harbour and dd utility from George Garner to acquire the NTFS disk image from the digital electronic storage device. This utility was selected for investigation since it provides simple and flexible acquisition tools. The main advantage of using these tools is that we could extract the data in or between partitions to a separate file for more analysis. In addition, this utility provides built-in MD5 hashing features. Some of its salient features allow the analyst to calculate, save, and verify the MD5 hash values. In digital forensic analysis, using hashing technique is important to ensure data integrity and to identify which values of data have been maliciously changed as well as to explore known data objects [19]. 2.2 Stage 2 - Evidence searching The next stage involved searching for evidences with respect to system tampering. An evidence of intrusion could be gained by looking for some known signatures, timestamps as well as even searching for hidden data [20]. In this stage, we used the Strings command by Mark Russinovich, Frhed hexeditor tool by Rihan Kibria and WinHex hexeditor tool by X-Ways Software Technology AG

to detect a keyword or phrase from the disk image. 2.3 Stage 3 - Analysis of NTFS File System In the final stage of the experimental study, we analyzed the data obtained from the NTFS disk image that contributed towards meaningful conclusions of the forensic investigation. We adopted a collection of tools such as the Sleuth Kit (TSK), Autopsy Forensic by Brian Carrier and NTFSINFO v1.0 from Microsoft Sysinternals by Mark Russinovich to perform different aspects of the NTFS file system analysis. 3 FORENSIC INVESTIGATION STEPS

Many aspects must be taken into consideration when conducting a computer forensic investigation. There are different approaches adopted by an investigator while examining a crime scene. From the literature, we find five steps adopted, such as, Policy and procedure development, Evidence assessment, Evidence acquisition, Evidence examination, and documenting and reporting [26]. In our proposed approach for the digital forensic investigation, we adopted the following nine steps as shown in Figure 1: Step 1: Policy and Procedure Development – In this step, suitable tools that are needed in the digital scene are determined as part of administrative considerations. All aspects of policy and procedure development are considered to determine the mission statement, skills and knowledge, funding, personal requirement, evidence handling and support from management. Step 2: Hard Disk Acquisition – This step involves forensic duplication that could be achieved by obtaining NTFS image of the original disk using DD tool command. This step is for obtaining sector-bysector mirror image of the disk and the output of the image file is created as Image.dd. Step 3: Check the Data Integrity – This step ensures the integrity of data acquired through reporting of a hash function. We used MD5 tool to guarantee the integrity of the original media and the resulting image file. Step 4: Extract MFT in the Boot Sector – In this step, the MFT is extracted from the boot sector. We analyzed the MFT using WinHex hexeditor tool and checked number of sectors allocated to the NTFS file system using NTFSINO. Step 5: Extract $Boot file and Backup boot sector – In this step, the $Boot file is extracted to investigate hidden data. We analyzed the hidden data in the $Boot metadata file system using WinHex, TSK and Autopsy tools. Step 6: Compare Boot sector and Backup – A

UbiCC Journal – Volume 4 No. 3

552

Special Issue on ICIT 2009 Conference - Applied Computing

comparison of the original and backup boot sectors is performed in this step. We obtained another 2 Images from the original Image using DD tool. The output generated resulted in two image files named, backupbootsector.dd and bootsector.dd. We analyzed the two image file named backupbootsector.dd and bootsector.dd using WinHex hex-editor tool, TSK and Autopsy tools. Step 7: Check the Data Integrity – In this step the integrity of data is verified again for test of congruence. We adopted the hashing technique using MD5 tool for the two created image files to check the data integrity. Step 8: Extract the ASCII and UNICODE –This step involves extracting the ASCII and UNICODE characters from the binary files in the disk image. We used the Strings command tool and keyword search for matching text or hexadecimal values recorded on the disk. Through keyword search, we could find even files that contain specific words. Step 9: Physical Presentation – In this final step, all the findings from the forensic investigation are documented. It involves presenting the digital evidence through documentation and reporting procedures.

4

BOOT SECTOR ANALYSIS OF NTFS

4.1 NTFS Disk Image As mentioned in the previous section, the first step to be adopted by a digital forensic investigator is to acquire a duplicate copy of the NTFS disk image before beginning the analysis. This is to ensure that the data on the original devices have not been changed during the analysis. Therefore, it is required to isolate the original infected computer from the disk image in order to extract the evidence that could be found on the electronic storage devices. By conducting investigations on the disk image, we could unearth any hidden intrusions since the image captures the invisible information as well [21]. The advantages of analyzing disk images are that the investigators can: a) preserve the digital crime-scene, b) obtain the information in slack space, c) access unallocated space, free space, and used space, d) recover file fragments, hidden or deleted files and directories, e) view the partition structure and f) get date-stamp and ownership of files and folders [3, 22]. 4.2 Master File Table To investigate how intrusions result in data hiding, data deletion and other obfuscations, it is essential to understand the physical characteristics of the Microsoft NTFS file system. Master File Table (MFT) is the core of NTFS since it contains details of every file and folder on the volume and allocates two sectors for every MFT entry [23]. Hence, a good knowledge of the MFT layout structure also facilitates the disk recovery process. Each MFT entry has a fixed size which is 1 KB (at byte offset 64 in the boot sector one could identify the MFT record size). We provide the MFT layout and represent the plan of the NTFS file system using Figure 2. The main purpose of NTFS is to facilitate reading and writing of the file attributes and the MFT enables a forensic analyst to examine in some detail the structure and working of the NTFS volume. Therefore, it’s important to understand how the attributes are stored in the MFT entry. The key feature to note is that MFT entry within the MFT contains attributes that can have any format and any size. Further, as shown in Figure 2, every attribute contains an entry header which is allocated in the first 42 bytes of a file record, and it contains an attribute header and attribute content. The attribute header is used to identify the size, name and the flag value. The attribute content can reside in the MFT followed by the attribute header if the size is less than 700 bytes (known as a resident attribute), otherwise it will store the attribute content in an external cluster called cluster run (known as a nonresident attribute). This is because; the MFT entry is 1KB in size and hence cannot fit anything that occupies more than 700 bytes.

Figure 1: Forensic investigation steps

UbiCC Journal – Volume 4 No. 3

553

Special Issue on ICIT 2009 Conference - Applied Computing

the steps in Figure 1 to analyze the boot sector image. As shown in Figure 3, we performed an analysis of the data structure of this boot sector and the results of the investigation conducted using existing forensic tools is summarized in Table 2. From these results, we could conclude that the existing forensic tools do not check possible infections that could take place in certain hidden data of the boot sector. Hence, we describe the hidden data analysis technique that we had adopted in the next section. 5 HIDDEN DATA ANALYSIS AND RESULTS

Figure 2: MFT layout structure 4.3 Boot Sector Analysis and Results We performed boot sector analysis by investigating metadata files that are used to describe the file system. We followed the steps described in previous section (Figure 1) by first creating a NTFS disk image of the test computer using the dd utility for investigating the boot sector. We used NTFSINFO tool on the disk image as shown in Table 1 which shows the boot sector of the test device and information about the on-disk structure. Such data structure examination enables us to view the MFT information, allocation size, volume size and metadata files. We extracted useful information such as the size of clusters, sector numbers in the file system, starting cluster address of the MFT, the size of each MFT entry and the serial number given for the file system. Table 1: NTFS Information Details. Volume Size ----------Volume size : 483 MB Total sectors : 991199 Total clusters : 123899 Free clusters : 106696 Free space : 416 MB (86% of drive) Allocation Size ---------------Bytes per sector : 512 Bytes per cluster : 4096 Bytes per MFT record : 1024 Clusters per MFT record: 0 MFT Information --------------MFT size : 0 MB (0% of drive) MFT start cluster : 41300 MFT zone clusters : 41344 - 56800 MFT zone size : 60 MB (12% of drive) MFT mirror start : 61949 Meta-Data files From the information gained above, we followed

The recent cyber crime trends are to use different obfuscated techniques such as disguising file names, hiding attributes and deleting files to intrude the computer system. Since the Windows operating system does not zero the slack space, it becomes a vehicle to hide data, especially in $Boot file. Hence, in this study, we have analyzed the hidden data in the $Boot file structure. The $Boot entry is stored in a metadata file at the first cluster in sector 0 of the file system, called $Boot, from where the system boots. It is the only metadata file that has a static location so that it cannot be relocated. Microsoft allocates the first 16 sectors of the file system to $Boot and only half of these sectors contains non-zero values [3]. In order to investigate the NTFS file system, one requires to possess substantial knowledge and experience to analyze the data structure and the hidden data [24]. The $Boot metadata file structure is located in MFT entry 7 and contains the boot sector of the file system. It contains information about the size of the volume, clusters and the MFT. The $Boot metadata file structure has four attributes, namely, $STANDARD_INFORMATION, $FILE_NAME, $SECURITY_DESCRIPTION and $DATA. The $STANDARD_INFORMATION attribute contains temporal information such as flags, owner, security ID and the last accessed, written, and created times. The $FILE_NAME attribute contains the file name in UNICODE, the size and temporal information as well. The $SECURITY_DESCRIPTION attribute contains information about the access control and security properties. Finally, the $DATA attribute contains the file contents. These attributes values for the test sample are shown in Table 2 as an illustration. To achieve this, we used the following TSK command tools: Istat –f ntfs c:\image.dd 7 From our investigations of the resulting attribute values, we find that, the $Boot data structure of the NTFS file system could be used to hide data. By analyzing the hidden data in the boot sector, one could provide useful information for digital forensics. The size of the data that could be hidden in the boot sector is limited by the number of non-zero that

UbiCC Journal – Volume 4 No. 3

554

Special Issue on ICIT 2009 Conference - Applied Computing

Microsoft allocated in the first 16 sectors of the file system. The data could be hidden in the $Boot metadata files without raising suspicion and without affecting the functionality of the system [25]. Table 2: Results of $Boot Analysis MFT Entry Header Values: Entry: 7 Sequence: 7 $LogFile Sequence Number: 0 Allocated File Links: 1 $STANDARD_INFORMATION Attribute Values: Flags: Hidden, System Owner ID: 0 Created: Mon Feb 09 12:09:06 2009 File Modified: Mon Feb 09 12:09:06 2009 MFT Modified: Mon Feb 09 12:09:06 2009 Accessed: Mon Feb 09 12:09:06 2009 $FILE_NAME Attribute Values: Flags: Hidden, System Name: $Boot Parent MFT Entry: 5 Sequence: 5 Allocated Size: 8192 Actual Size: 8192 Created: Mon Feb 09 12:09:06 2009 File Modified: Mon Feb 09 12:09:06 2009 MFT Modified: Mon Feb 09 12:09:06 2009 Accessed: Mon Feb 09 12:09:06 2009 Attributes: Type: $STANDARD_INFORMATION (16-0) Name: N/A Resident size: 48 Type: $FILE_NAME (48-2) Name: N/A Resident size: 76 $SECURITY_DESCRIPTOR (80-3) Type: Name: N/A Resident size: 116 Type: $DATA (128-1) Name: $Data NonResident size: 8192 01 Analysis of the $Boot data structure of the NTFS file system will identify any hidden data. The analyzer should start by making a comparison between the boot sector and the backup boot sector. The image with the boot sector and backup boot sector are supposed to be identical; otherwise there is some data hidden in the $Boot data structure. One method is to check the integrity of the backup boot sector and the boot sector by calculating the MD5 for both of them. A difference in checksum indicates that there is some hidden data. We performed this comparison by adopting the following commands on the $Boot image file and the backup boot image: dd if=image.dd bs=512 count=1 skip=61949 of=c:\backupbootsector.dd –md5sum –verifymd5 –

md5out=c:\hash1.md5 dd if=image.dd bs=512 count=1 of=c:\bootsector.dd –md5sum –verifymd5 –md5out=c:\hash2.md5 We found that hidden data in the $Boot data structure could not be detected directly by the existing tools used in this study and manual inspections were required alongside these forensic tools. Hence, through the analysis conducted with various existing utilities and tools, we arrived at the following results: i) Since NTFS stores all events that take place on a computer system, there is a huge amount of data analysis required while scanning the entire NTFS disk image for forensic purposes. In this empirical study, by merely focusing on the hidden data of the $Boot file, we have shown that a variety of tools and utilities had to be adopted along with manual inspections. Hence, it takes an enormous amount of time to analyze the data derived with such tools. The existing forensic tools are not comprehensive and effective in identifying the recent computer threats. Not all computer infections are detected by forensic tools, especially intrusions that are in the form of hidden data in the $Boot file go unchecked.

ii)

iii) It was mandatory to perform manual investigations alongside the existing tools. By adopting a manual introspection of the $Boot file using the three-stage approach of i) hard disk acquisition, ii) evidence searching and iii) analysis of the NTFS file system, we could successfully identify hidden data in the $Boot file. iv) Intelligent search techniques could be adopted to extract the ASCII and UNICODE characters from binary files in the disk image on either the full file system image or just the unallocated space, which could speed-up the process of identifying hidden data. One of the main reasons for having varying tools is that Microsoft has different versions of the NTFS file system to be catered for. While Windows XP and Windows Server 2003 use the same NTFS version, Windows Vista uses the NTFS 3.1 version [7]. The new NTFS 3.1 has changed the on-disk structure. For example, the location of the volume boot record is at physical sector 2,048. Most of the existing tools do not work with all the different versions of NTFS file system, and hence a comprehensive tool is warranted to cope with these changes.

v)

UbiCC Journal – Volume 4 No. 3

555

Special Issue on ICIT 2009 Conference - Applied Computing

Figure 3: Analysis of the test boot Sector

Table 2: Results from the analysis of the test boot sector. Byte Range 0 -- 2 3 -- 10 11 -- 12 13 -- 13 14 -- 15 16 -- 20 21 -- 21 22 -- 23 24 -- 25 26 -- 27 28 -- 31 32 -- 35 36 -- 39 40 -- 47 48 -- 55 56 -- 63 64 -- 64 65 -- 67 68 -- 68 69 -- 71 72 -- 79 80 -- 83 84 -- 509 510 --511 Size 3 8 2 1 2 5 1 2 2 2 4 4 4 8 8 8 1 3 1 3 8 4 426 2 Description Jump to boot code OEM Name – System ID Bytes per sector: Sectors per cluster Reserved sectors Unused Media descriptor Unused Sectors per track Number of heads Unused Unused Drive type check Number of sectors in file system (volume) Starting cluster address of $MFT Starting cluster address of MFT Mirror $DATA attribute Size of record - MFT entry Unused Size of index record Unused Serial number Unused Boot code Boot signature Value 9458411 NTFS 512 8 0 0 0 0 63 255 32 0 80 00 00 00 0.47264 GB 4*8=32 619,49 210=1024 0 01h 0 C87C8h 0 ~ 0xAA55 Unused – Possible Infection Unused – Possible Infection Unused – Possible Infection Action / Result If bootable, jump. If non-bootable, used to store error message

Unused – Possible Infection Unused – Possible Infection Unused – Possible Infection No Check – Possible Infection No Check – Possible Infection No Check – Possible Infection Unused – Possible Infection For USB thumb drive

UbiCC Journal – Volume 4 No. 3

556

Special Issue on ICIT 2009 Conference - Applied Computing

6

CONCLUSIONS AND RESEARCH DIRECTIONS

FUTURE

Recent methods adopted by computer intruders, attackers and malwares are to target hidden and deleted data so that they could evade from virus scanners and become even difficult to be identified using existing digital forensic tools. This paper has attempted to explore the difficulties involved in digital forensics, especially in conducting NTFS disk image analysis and to propose an effective digital forensic analysis. In this empirical study, we have found that the boot sector of the NTFS file system could be used as a vehicle to hide data by computer attackers as there is a potential weakness. We have emphasized the knowledge and importance of file systems for digital forensics, as several techniques to hide data such as slack space and hidden attributes are being recently adopted by attackers. This is an important NTFS file system weakness to be addressed and research in this domain area could lead to effective solution for the open problem of detecting new malicious codes that make use of such an obfuscated mode of attack. We have shown that the existing forensic software tools are not competent enough to comprehensively detect all hidden data in boot sectors. As a first step to address this problem, we have proposed a three-stage forensic analysis process consisting of nine steps to facilitate the experimental study. We have reported the results gathered by following these proposed steps. By adopting effective search techniques, we were successful in identifying some unknown malicious hidden data in the $Boot file that were undetected by current forensic tools. In this pilot study we had adopted a few forensic techniques and effective manual inspections of the NTFS file image. Our future research directions would be to automate the proposed process so as to facilitate forensic analysis of the NTFS disk image in an efficient and comprehensive manner. We plan to extract and extrapolate malware signatures effectively as well as intelligently for any existing and even new malware that use hidden and obfuscated modes of attack. We would automate the knowledge of how to extract data from hidden data structures and how to reclaim deleted data and we believe this would extensively benefit the digital evidence collection and recovery process. 7 REFERENCES

[1] M. Reith, C. Carr, & G. Gunsch: An examination of digital forensic models, International Journal of Digital Evidence, 1, pp. 1-12 (2002). [2] M. Alazab, S. Venkatraman & P. Watters:

Digital forensic techniques for static analysis of NTFS images, Proceedings of ICIT2009, Fourth International Conference on Information Technology, IEEE Xplore (2009). [3] B. Carrier: File system forensic analysis, Addison-Wesley Professional, USA, (2008). [4] S. Ardisson: Producing a Forensic Image of Your Client’s Hard Drive? What You Need to Know, Qubit, 1, pp. 1-2 (2007). [5] M. Andrew: Defining a Process Model for Forensic Analysis of Digital Devices and Storage Media, Proceedings of SADFE2007, Second International Workshop on Systematic Approaches to Digital Forensic Engineering, pp. 16-30 (2007). [6] E Investigation: Electronic Crime Scene Investigation: A Guide for First Responders, US Department of Justice, NCJ, (2001). [7] Svensson, A., “Computer Forensic Applied to Windows NTFS Computers”, Stockholm's University, Royal Institute of Technology, (2005). [8] NTFS, http://www.ntfs.com, 22/2/2009. [9] D. Purcell & S. Lang: Forensic Artifacts of Microsoft Windows Vista System, Lecture Notes in Computer Science, Springer, 5075, pp. 304-319 (2008). [10] T. Newsham, C. Palmer, A; Stamos & J. Burns: Breaking forensics software: Weaknesses in critical evidence collection, Proceedings of the 2007 Black Hat Conference, (2007). [11] DD tool, George Garner’s site, Retrieved January, 2009 from http://users.erols.com/gmgarner/forensics/. [12] DCFL tool, Nicholas Harbour, http://dcfldd.sourceforge.net/, accessed on 14/1/2009. [13] WinHex tool, X-Ways Software Technology AG, Retrieved January, 2009 from http://www.x-ways.net/winhex/. [14] FRHED tool, Raihan Kibria site, http://frhed.sourceforge.net/, 14/1/2009. [15] STRINGS, Mark Russinovich, Retrieved January, 2009 from http://technet.microsoft.com/enus/sysinternals/bb897439.aspx. [16] TSK tools, Brian Carrier site, http://www.sleuthkit.org/sleuthkit/, 14/1/2009. [17] Autopsy tools, Brian Carrier site, Retrieved January, 2009 from http://www.sleuthkit.org/autopsy/. [18] NTFSINFO tool, Mark Russinovich, Retrieved January, 2009 from http://technet.microsoft.com/enau/sysinternals/bb897424.aspx. [19] V. Roussev, Y.Chen, T. Bourg & G. Richard: Forensic file system hashing revisited, Digital Investigation, Elsevier, 3, pp. 82-90 (2006). [20] K. Chow, F. Law, M. Kwan & K. Lai: The

UbiCC Journal – Volume 4 No. 3

557

Special Issue on ICIT 2009 Conference - Applied Computing

Rules of Time on NTFS File System, Proceedings of the Second International Workshop on Systematic Approaches to Digital Forensic Engineering, pp. 71-85(2007). [21] K.; Jones, R. Bejtlich & C. Rose: Real digital forensics: computer security and incident response, Addison-Wesley Professional, USA, (2008). [22] H. Carvey: Windows Forensic Analysis DVD Toolkit, Syngress Press, USA, (2007). [23] L. Naiqi, W. Yujie & H. QinKe: Computer Forensics Research and Implementation Based on NTFS File System, CCCM'08, ISECS International Colloquium on Computing, Communication, Control, and Management, (2008). [24] J. Aquilina, E. Casey & C. Malin: Malware Forensics Investigating and Analyzing Malicious Code, Syngress Publishing,USA, (2008). [25] E. Huebner, D. Bem & C., Wee: Data hiding in the NTFS file system”, Digital Investigation, Elsevier, (2006), 3, 211-226. [26] S. Hart, J. Ashcroft & D. Daniels: Forensic examination of digital evidence: a guide for law enforcement, National Institute of Justice NIJ-US, Washington DC, USA, Tech. Rep. NCJ, (2004).

UbiCC Journal – Volume 4 No. 3

558

Special Issue on ICIT 2009 Conference - Applied Computing

JOB AND APPLICATION-LEVEL SCHEDULING IN DISTRIBUTED COMPUTING
Victor V. Toporkov Computer Science Department, Moscow Power Engineering Institute, ul. Krasnokazarmennaya 14, Moscow, 111250 Russia ToporkovVV@mpei.ru ABSTRACT This paper presents an integrated approach for scheduling in distributed computing with strategies as sets of job supporting schedules generated by a critical works method. The strategies are implemented using a combination of job-flow and application-level techniques of scheduling within virtual organizations of Grid. Applications are regarded as compound jobs with a complex structure containing several tasks co-allocated to processor nodes. The choice of the specific schedule depends on the load level of the resource dynamics and is formed as a resource request, which is sent to a local batch-job management system. We propose scheduling framework and compare diverse types of scheduling strategies using simulation studies. Keywords: distributed computing, scheduling, application level, job flow, metascheduler, strategy, supporting schedules, task, critical work.

1

INTRODUCTION

The fact that a distributed computational environment is heterogeneous and dynamic along with the autonomy of processor nodes makes it much more difficult to manage and assign resources for job execution at the required quality level [1]. When constructing a computing environment based on the available resources, e.g. in the model which is used in X-Com system [2], one normally does not create a set of rules for resource allocation as opposed to constructing clusters or Grid-based virtual organizations. This reminds of some techniques, implemented in Condor project [3, 4]. Non-clustered Grid resource computing environments are using similar approach. For example, @Home projects which are based on BOINC system realize cycle stealing, i.e. either idle computers or idle cycles of a specific computer. Another still similar approach is related to the management of distributed computing based on resource broker assignment [5-11]. Besides Condor project [3, 4], one can also mention several application-level scheduling projects: AppLeS [6], APST [7], Legion [8], DRM [9], Condor-G [10], and Nimrod/G [11]. It is known, that scheduling jobs with independent brokers, or application-level scheduling,

allows adapting resource usage and optimizing a schedule for the specific job, for example, decreasing its completion time. Such approaches are important, because they take into account details of job structure and users resource load preferences [5]. However, when independent users apply totally different criteria for application optimization along with jobflow competition, it can degrade resource usage and integral performance, e.g. system throughput, processor nodes load balance, and job completion time. Alternative way of scheduling in distributed computing based on virtual organizations includes a set of specific rules for resource use and assignment that regulates mutual relations between users and resource owners [1]. In this case only job-flow level scheduling and allocation efficiency can be increased. Grid-dispatchers [12] or metaschedulers are acting as managing centres like in the GrADS project [13]. However, joint computing nature of virtual organizations creates a number of serious challenges. Under such conditions, when different applications are not isolated, it is difficult to achieve desirable resource performance: execution of the user’s processes can cause unpredictable impact on other neighbouring processes execution time. Therefore, there are researches that pay attention to the creation of virtual machine based virtual Grid

UbiCC Journal – Volume 4 No. 3

559

Special Issue on ICIT 2009 Conference - Applied Computing

workspaces by means of specialized operating systems, e.g., in the new European project XtreemOS (http://www.xtreemos.eu). Inseparability of the resources makes it much more complicated to manage jobs in a virtual organization, because the presence of local job-flows launched by owners of processor nodes should be taken into account. Dynamical load balance of different job-flows can be based on economical principles [14] that support fairshare division model for users and owners. Actual job-flows presence requires forecasting resource state and their reservation [15], for example by means of Maui cluster scheduler simulation approach or methods, implemented in systems such as GARA, Ursala, and Silver [16]. The above-mentioned works are related to either job-flow scheduling problems or application-level scheduling. Fundamental difference between them and the approach described is that the resultant dispatching strategies are based on the integration of job-flows management methods and compound job scheduling methods on processor nodes. It allows increasing the quality of service for the jobs and distributed environment resource usage efficiency. It is considered, that the job can be compound (multiprocessor) and the tasks, included in the job, are heterogeneous in terms of computation volume and resource need. In order to complete the job, one should co-allocate the tasks to different nodes. Each task is executed on a single node and it is supposed, that the local management system interprets it as a job accompanied by a resource request. On one hand, the structure of the job is usually not taken into account. The rare exception is the Maui cluster scheduler [16], which allows for a single job to contain several parallel, but homogeneous (in terms of resource requirements) tasks. On the other hand, there are several resourcequery languages. Thus, JDL from WLMS (http://edms.cern.ch) defines alternatives and preferences when making resource query, ClassAds extensions in Condor-G [10] allows forming resource-queries for dependant jobs. The execution of compound jobs is also supported by WLMS scheduling system of gLite platform (http://www.glite.org), though the resource requirements of specific components are not taken into account. What sets our work apart from other scheduling research is that we consider coordinated applicationlevel and job-flow management as a fundamental part of the effective scheduling strategy within the virtual organization. Environment state of distribution, dynamics of its configuration, user’s and owner’s preferences cause the need of building multifactor and multicriteria job

managing strategies [17-20]. Availability of heterogeneous resources, data replication policies [12, 21, 22] and multiprocessor job structure for efficient co-allocation between several processor nodes should be taken into account. In this work, the multicriteria strategy is regarded as a set of supporting schedules in order to cover possible events related to resource availability. The outline of the paper is as follows. In section 2, we provide details of applicationlevel and job-flow scheduling with a critical works method and strategies as sets of possible supporting schedules. Section 3 presents a framework for integrated job-flow and application-level scheduling. Simulation studies of coordinated scheduling techniques and results are discussed in Section 4. We conclude and point to future directions in Section 5. 2 APPLICATION-LEVEL AND JOB-FLOW SCHEDULING STRATEGIES 2.1 Application-Level Scheduling Strategy The application-level scheduling strategy is a set of possible resource allocation and supporting schedules (distributions) for all N tasks in the job [18]: Distribution:= < <Task 1/Allocation [Start 1, End 1]>, <Task N/Allocation [Start N, End N]> i, …, j, >,

where Allocation i, j is the processor node i, j for Task 1, N; Start 1, N, End 1, N – run time and stop time for Task 1, N execution. Time interval [Start, End] is treated as so called walltime (WT), defined at the resource reservation time [15] in the local batch-job management system. Figure 1 shows some examples of job graphs in strategies with different degrees of distribution, task details, and data replication policies [19]. The first type strategy S1 allows scheduling with fine-grain computations and multiple data replicas, the second type strategy S2 is one with fine-grain computations and a bounded number of data replicas, and the third type S3 implies coarse-grain computations and constrained data replication. The vertices P1, …, P6, P23, and P45 correspond to tasks, while D1, …, D8, D12, D36, and D78 correspond to data transmissions. The transition from graph G1 to graphs G2 and G3 is performed through lumping of tasks and reducing of the parallelism level. The job graph is parameterized by prior estimates of the duration Tij of execution of a task Pi for a

UbiCC Journal – Volume 4 No. 3

560

Special Issue on ICIT 2009 Conference - Applied Computing

processor node nj of the type j, of relative volumes Vij of computations on a processor (CPU) of the type j, etc. (Table 1). It is to mention, such estimations are also necessary in several methods of priority scheduling including backfilling in Maui cluster scheduler.
P2 D1 P1 P3 D2 D6 P2 P4 D3 D4 D5 P5 D8 G2 P4 D7 P6 G1

The processor node load level LLj is the ratio of the total time of usage of the node of the type j to the job run time. Schedules in Fig. 2, b and Fig. 2, c are related to strategies S2 and S3. 2.2 Critical Works Method Strategies are generated with a critical works method [20]. The gist of the method is a multiphase procedure. The first step of any phase is scheduling of a critical work – the longest (in terms of estimated execution time Tij for task Pi) chain of unassigned tasks along with the best combination of available resources. The second step is resolving collisions cased by conflicts between tasks of different critical works competing for the same resource. (a)
n1 n2 n3 n4 Nodes P1 P2 P3 P6 P2 P3 P4 P5 P2 P3 P1 5 10 15 P4 P5 P4 P5 LL1=0.35 LL2=0.10 LL3=0.15 LL4=0.50 CF=41 P6 LL1=0.35 LL2=0 LL3=0.65 LL4=0.50 CF=37 P6 LL1=0.35 LL2=0.10 LL3=0.15 LL4=0.50 CF=41 20 Time

P1 D12 P3 D12 P1 P23 D36 P45 D36 P5 D78 D78

P6

G3

P6

Figure 1: Examples of job graphs.

Figure 2 shows fragments of strategies of types S1, S2, and S3 for jobs in Fig. 1. The duration of all data transmissions is equal to one unit of time for G1, while the transmissions D12 and D78 require two units of time and the transmission D36 requires four units of time for G2 and G3. We assume that the lumping of tasks is characterized by summing of the values of corresponding parameters of constituent subtasks (see Table 1). Table 1: User's task estimations. Tij, Vij Ti1 Ti2 Ti3 Ti4 Vij P1 2 4 6 8 20 P2 3 6 9 12 30 Tasks P3 P4 1 2 2 4 3 6 4 8 10 20 P5 1 2 3 4 10 P6 2 4 6 8 20

Nodes n1 P1 n2 n3 n4 Nodes n1 n2 n3 n4 0
Nodes n1 P1 n2 n3 n4 0 Nodes n1 P1 n2 n3 n4 0

(b)
P4

P2 P3

P6

P5 5 10 15

LL1=0.60 LL2=0 LL3=0.15 LL4=0.20 CF=39 20 Time

(c)
P45

P23

P6

LL1=1 LL2=0 LL3=0 LL4=0 CF=25 20 Time

5

10

15

Supporting schedules in Fig. 2, a present a subset of a Pareto-optimal strategy of the type S1 for tasks Pi, i=1, …, 6 in G1. The Pareto relation is generated by a vector of criteria CF, LLj, j=1, …, 4. A job execution cost-function CF is equal to the sum of Vij/Ti, where Ti is the real load time of processor node j by task Pi rounded to nearest notsmaller integer. Obviously, actual solving time Ti for a task can be different from user estimation Tij (see Table 1).

Figure 2: Fragments of scheduling strategies S1 (a), S2 (b), S3 (c). For example, there are four critical works 12, 11, 10, and 9 time units long (including data transfer time) on fastest processor nodes of the type 1 for the job graph G1 in Fig. 1, a (see Table 1): (P1, P2, P4, P6), (P1, P2, P5, P6), (P1, P3, P4, P6), (P1, P3, P5, P6). The schedule with CF=37 has a collision (see Fig. 2, a), which occurred due to simultaneous attempts of

UbiCC Journal – Volume 4 No. 3

561

Special Issue on ICIT 2009 Conference - Applied Computing

tasks P4 and P5 to occupy processor node n4. This collision is further resolved by the allocation of P4 to the processor node n3 and P5 to the node n4. Such reallocations can be based on virtual organization economics – in order to take higher performance processor node, user should “pay” more. Cost-functions can be used in economical models [14] of resource distribution in virtual organizations. It is worth noting that full costing in CF is not calculated in real money, but in some conventional units (quotas), for example like in corporate non-commercial virtual organizations. The essential point is different – user should pay additional cost in order to use more powerful resource or to start the task faster. The choice of a specific schedule from the strategy depends on the state and load level of processor nodes, and data storage policies. 2.3 Examples of Scheduling Strategies Let us assume that we need to construct a conditionally optimal strategy of the distribution of processors according to the main scheme of the critical works method from [20] for a job represented by the information graph G1 (see Fig. 1). Prior estimates for the duration Tij of processing tasks P1, …, P6 and relative computing volumes Vij for four types of processors are shown in Table 1, where i = 1, …, 6; j = 1, …, 4. The number of processors of each type is equal to 1. The duration of all data exchanges D1, …, D8 is equal to one unit of time. The walltime is given to be WT = 20. The criterion of resource-use efficiency is a cost function CF. We take a prior estimate for the duration Tij that is the nearest to the limit time Ti for the execution of task Pi on a processor of type j, which determines the type j of the processor used. Table 2: The strategy of the type MS1.
Schedule 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Duration T1 2 2 10 2 2 2 10 2 10 2 10 2 2 10 T2 3 3 3 3 3 11 3 11 3 11 3 3 3 3 T3 3 3 3 3 3 11 3 11 3 11 3 3 3 3 T4 2 10 2 2 10 2 2 2 2 2 2 2 10 2 T5 2 10 2 2 10 2 2 2 2 2 2 2 10 2 T6 10 2 2 10 2 2 2 2 2 2 2 10 2 2 A1 1 1 4 1 1 1 1 1 2 1 3 1 1 4 A2 1 1 1 1 1 4 1 4 1 3 1 1 1 1

The conflicts between competing tasks are resolved through unused processors, which, being used as resources, are accompanied with a minimum value of the penalty cost function that is equal to the sum of Vij/Tij (see Table 1) for competing tasks. It is required to construct a strategy that is conditionally minimal in terms of the cost function CF for the upper and lower boundaries of the maximum range for the duration Tij of the execution of each task Pi (see Table 1). It is a modification of the strategy S1 with fine-grain computations, active data replication policy, and the best- and worst execution time estimations. The strategy with a conditional minimum with respect to CF is shown in Table 2 by schedules 1, 2, and 3 (Ai is allocation of task Pi, i = 1, …, 6) and the scheduling diagrams are demonstrated in Fig. 2, a. The strategies that are conditionally maximal with respect to criteria LL1, LL2, LL3, and LL4 are given in Table 2 by the cases 4-7; 8, 9; 10, 11; and 12-14, respectively. Since there are no conditional branches in the job graph (see Fig. 1), LLj is the ratio of the total time of usage of a processor of type j to the walltime WT of the job completion. The Pareto-optimal strategy involves all schedules in Table 2. The schedules 2, 5, and 13 have resolved collisions between tasks P4 and P5. Let us assume that the load of processors is such that the tasks P1, P2, and P3 can be assigned with no more than three units of time on the first and third processors (see Table 2). The metascheduler runs through the set of supporting schedules and chooses a concrete variant of resource distribution that depends on the actual load of processor nodes.

Allocation A3 3 3 3 3 3 1 3 2 2 4 3 3 3 3 A4 1 3 1 1 4 1 1 1 1 1 1 1 3 1 A5 2 4 2 2 1 2 2 2 2 2 2 2 4 2 A6 4 1 1 1 1 1 1 1 1 1 1 4 1 1 CF 41 37 41 41 38 39 41 39 41 41 41 41 39 41 LL1 0.35 0.35 0.35 0.85 0.85 0.85 0.85 0.30 0.35 0.30 0.35 0.35 0.35 0.35

Criteria LL2 0.10 0 0.10 0.10 0 0.10 0.10 0.65 0.75 0.10 0.10 0.10 0 0.10 LL3 0.15 0.65 0.15 0.15 0.15 0 0.15 0 0 0.55 0.60 0.15 0.65 0.15 LL4 0.50 0.50 0.50 0 0.50 0.55 0 0.55 0 0.55 0 0.50 0.50 0.50

UbiCC Journal – Volume 4 No. 3

562

Special Issue on ICIT 2009 Conference - Applied Computing

Then, the metascheduler should choose the schedules 1, 2, 4, 5, 12, and 13 as possible variants of resource distribution. However, the concrete schedule should be formulated as a resource request and implemented by the system of batch processing subject to the state of all four processors and possible runtimes of tasks P4, P5, and P6 (see Table 2). Suppose that we need to generate a Paretooptimal strategy for the job graph G2 (see Fig. 1) in the whole range of the duration Ti of each task Pi, while the step of change is taken to be no less than the lower boundary of the range for the most performance processor. The Pareto relation is generated by the vector of criteria CF, LL1, … , LL4. The remaining initial conditions are the same as in the previous example. The strategies that are conditionally optimal with respect to the criteria CF, LL1, LL2, LL3, and LL4 Table 3: The strategy of the type S2.
Schedule 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Duration T1 2 4 4 5 2 2 2 2 4 4 4 5 2 4 4 4 5 2 2 2 2 4 4 4 5 2 2 2 2 4 4 4 5 T2 3 3 3 3 3 3 3 6 3 3 4 3 6 3 3 4 3 3 3 3 6 3 3 4 3 3 3 3 6 3 3 4 3 T3 3 3 3 3 3 3 3 6 3 3 4 3 6 3 3 4 3 3 3 3 6 3 3 4 3 3 3 3 6 3 3 4 3 T4 4 2 3 2 2 4 5 2 2 3 2 2 2 2 3 2 2 2 4 5 2 2 3 2 2 2 4 5 2 2 3 2 2 T5 4 2 3 2 2 4 5 2 2 3 2 2 2 2 3 2 2 2 4 5 2 2 3 2 2 2 4 5 2 2 3 2 2 T6 3 3 2 2 5 3 2 2 3 2 2 2 2 3 2 2 2 5 3 2 2 3 2 2 2 5 3 2 2 3 2 2 2 A1 1 2 2 2 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 2 2 2 2 1 1 1 1 2 2 2 2 A2 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1

are presented in Table 3 by the schedules 1-4, 5-12, 13-17, 18-25, and 26-33, respectively. The Paretooptimal strategy does not include the schedules 2, 5, 12, 14, 16, 17, 22, and 30. Let us consider the generation of a strategy for the job represented structurally by the graph G3 in Fig. 1 and by summing of the values of the parameters given in Table 1 for tasks P2, P3 and P4, P5. As a result of the resource distribution for the model G3, the tasks P1, P23, P45, and P6 turn out to be assigned to one and the same processor of the first type. Consequently, the costs of data exchanges D12, D36, and D78 can be excluded. Because there can be no conflicts in this case between processing tasks (see Fig. 1), the scheduling obtained before the exclusion of exchange procedures can be revised.

Allocation A3 3 3 3 3 3 3 3 1 3 3 4 3 4 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 4 3 3 4 3 A4 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 2 2 1 1 1 1 1 A5 4 2 3 2 2 4 4 2 2 3 2 2 2 2 2 2 2 2 3 3 2 2 3 2 2 2 4 4 2 2 3 2 2 A6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 CF 39 41 40 43 43 39 41 42 41 40 41 43 43 41 40 41 43 43 39 40 42 41 40 41 43 43 39 40 42 41 40 41 43 LL1 0.40 0.40 0.40 0.35 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.30 0.40 0.40 0.40 0.35 0.35 0.40 0.35 0.30 0.40 0.40 0.40 0.35 0.35 0.40 0.35 0.30 0.40 0.40 0.40 0.35

Criteria LL2 0.20 0.30 0.20 0.35 0.10 0 0 0.30 0.10 0 0.10 0.10 0.40 0.45 0.50 0.50 0.50 0.35 0.20 0.25 0.40 0.30 0.20 0.30 0.35 0.35 0.20 0.25 0.40 0.30 0.20 0.30 0.35 LL3 0.15 0.15 0.30 0.15 0.15 0.15 0.15 0 0.15 0.30 0 0.15 0 0 0 0 0 0.15 0.35 0.40 0.30 0.15 0.30 0.20 0.15 0.15 0.15 0.15 0 0.15 0.30 0 0.15 LL4 0.20 0 0 0 0 0.20 0.25 0 0 0 0.20 0 0.30 0 0 0 0 0 0 0 0 0 0 0 0 0 0.20 0.25 0.30 0 0 0.20 0

UbiCC Journal – Volume 4 No. 3

563

Special Issue on ICIT 2009 Conference - Applied Computing

Table 4: The strategy of the type S3. ScheDuration dule T1 T23 T45 T6 1 2 8 6 4 2 4 8 3 5 3 6 4 6 4 4 8 4 3 5 5 10 4 3 3 6 11 4 3 2

Allocation A1 1 1 1 1 1 1 A23 1 1 1 1 1 1 A45 1 1 1 1 1 1 A6 1 1 1 1 1 1 CF 25 24 24 27 29 32 LL1 1 1 1 1 1 1

Criteria LL2 0 0 0 0 0 0 LL3 0 0 0 0 0 0 LL4 0 0 0 0 0 0

The results of distribution of processors are presented in Table 4 (A23, A45 are allocations, and T23, T45 are run times for tasks P23 and P45). Schedules 1-6 in Table 4 correspond to the strategy that is conditionally minimal with respect to CF with LL1 = 1. Consequently, there is no sense in generating conditionally maximal schedules with respect to criteria LL1, …, LL4. 2.4 Coordinated Scheduling with the Critical Works Method The critical works method was developed for application-level scheduling [19, 20]. However, it can be further refined to build multifactor and multicriteria strategies for job-flow distribution in virtual organizations. This method is based on dynamic programming and therefore uses some integral characteristics, for example total resource usage cost for the tasks that compose the job. However the method of critical works can be referred to the priority scheduling class. There is no conflict between these two facts, because the method is dedicated for task co-allocation of compound jobs. Let us consider a simple example. Fig. 3 represents two jobs with walltimes WT1 = 110 and WT2 = 140 that are submitted to the distributed environment with 8 CPUs. If the jobs are submitted one-by-one the metascheduler (Section 3) will also schedule them one-by-one and will guarantee that every job will be scheduled within the defined time interval and in most efficient way in terms of a selected cost function and maximize average load balance of CPUs on a single job scale (Fig. 4). Job-flow execution will be finished at WT3 = 250. This is an example of application-level scheduling and no integral jobflow characteristics are optimized in this case. To combine application-level scheduling and job-flow scheduling and to fully exploit the advantages of the approach proposed, one can submit both jobs simultaneously or store them in buffer and execute the scheduling for all jobs in the buffer after a certain amount of time (buffer time). If the metascheduler gets more than one job to

schedule it runs the developed mechanisms that optimize the whole job-flow (two jobs in this example). In that case the metascheduler will still try to find an optimal schedule for each single job as described above and, at the same time, it will try to find the most optimal job assignment so that the average load of CPUs will be maximized on a jobflow scale.

Figure 3: Sample jobs. Fig. 5 shows, that both jobs are executed within WT4 = WT2 = 140, every data dependency is taken into account (e.g. for the second job: task P2 is executed only after tasks P0, P4, and P1 are ready), the final schedule is chosen from the generated strategy with the lowest cost function. Priority scheduling based on queues is not an efficient way of multiprocessor jobs co-allocating, in our opinion. Besides, there are several wellknown side effects of this approach in the cluster systems such as LL, NQE, LSF, PBS and others.

UbiCC Journal – Volume 4 No. 3

564

Special Issue on ICIT 2009 Conference - Applied Computing

For example, traditional First-Come-First-Serve (FCFS) strategy leads to idle standing of the resources. Another strategy, which involves job ranking according to the specific properties, such as computational complexity, for example LeastWork-First (LWF), leads to a severe resource fragmentation and often makes it impossible to execute some jobs due to the absence of free resources. In distributed environments these effects can lead to unpredictable job execution time and thereby to unsatisfactory quality of service.

Figure 4: Consequential scheduling. In order to avoid it many projects have components that make schedules, which are supported by preliminary resource reservation mechanisms [15, 16].

One to mention is Maui cluster scheduler, where backfilling algorithm is implemented. Remote Grid resource reservation mechanism is also supported in GARA, Ursala and Silver projects [16]. Here, only one variant of the final schedule is built and it can become irrelevant because of changes in the local job-queue, transporting delays etc. The strategy is some kind of preparation of possible activities in distributed computing based on supporting schedules (see Fig. 2, Tables 2, 3 and 4) and reactions to the events connected with resource assignment and advance reservations [15, 16]. The more factors considered as formalized criteria are taken into account in strategy generation, the more complete is the strategy in the sense of coverage of possible events [18, 19]. The choice of the supporting schedule [20] depends on the utilization state of processor nodes, data storage and relocation policies specific to the environment, structure of the jobs themselves and user estimations of completion time and resource requirements. It is important to mention that users can submit jobs without information about the task execution order as required by existing schedulers like Maui cluster scheduler were only queues are supported. Implemented mechanisms of our approach support a complex structure for the job, which is represented as a directed graph, so user should only provide data dependencies between tasks (i.e. the structure of the job). The metascheduler will generate the schedules to satisfy their needs by providing optimal plans for jobs (application-level scheduling) and the needs for the resource owners by optimizing the defined characteristics of the jobflow for the distributed system (job-flow scheduling). 3 METASCHEDULING FRAMEWORK

Figure 5: Job-flow scheduling.

In order to implement the effective scheduling and allocation to heterogeneous resources, it is very important to group user jobs into flows according to the strategy type selected and to coordinate jobflow and application-level scheduling. A hierarchical structure (Fig. 6) composed of a jobflow metascheduler and subsidiary job managers, which are cooperating with local batch-job management systems, is a core part of a scheduling framework proposed in this paper. It is assumed that the specific supporting schedule is realized and the actual allocation of resources is performed by the system of batch processing of jobs. This schedule is implemented on the basis of a user resource request with a requirement to the types and characteristics of resources (memory and processors) and to the system software as well as generated, for example, by the script of the job entry instruction qsub. Therefore, the formation and support of scheduling

UbiCC Journal – Volume 4 No. 3

565

Special Issue on ICIT 2009 Conference - Applied Computing

strategies should be conducted by the metascheduler, an intermediary link between the job flow and the system of batch processing.
Job-flows i j Metascheduler k

Job manager for strategy Sk

Job manager for strategies Si, Sj

Computer nodes

Job manager for strategy Si

Computer nodes

Computer node domains

Figure 6: framework.

Components

of

metascheduling

The advantages of hierarchically organized resources managers are obvious, e.g., the hierarchical job-queue-control model is used in the GrADS metascheduler [13] and X-Com system [2]. Hierarchy of intermediate servers allows decreasing idle time for the processor nodes, which can be inflicted by transport delays or by unavailability of the managing server while it is dealing with the other processor nodes. Tree-view manager structure in the network environment of distributed computing allows avoiding deadlocks when accessing resources. Another important aspect of computing in heterogeneous environments is that processor nodes with the similar architecture, contents, administrating policy are grouped together under the job manager control. Users submit jobs to the metascheduler (see Fig. 6) which distributes job-flows between processor node domains according to the selected scheduling and resource co-allocation strategy Si, Sj or Sk. It does not mean, that these flows cannot “intersect” each other on nodes. The special reallocation mechanism is provided. It is executed on the higherlevel manager or on the metascheduler-level. Job managers are supporting and updating strategies based on cooperation with local managers and simulation approach for job execution on processor nodes. Innovation of our approach consists in mechanisms of dynamic job-flow environment reallocation based on scheduling strategies. The nature of distributed computational environments itself demands the development of multicriteria and multifactor strategies [17, 18] of coordinated scheduling and resource allocation. The dynamic configuration of the environment, large number of resource reallocation events, user’s

and resource owner’s needs as well as virtual organization policy of resource assignment should be taken into account. The scheduling strategy is formed on a basis of formalized efficiency criteria, which sufficiently allow reflecting economical principles [14] of resource allocation by using relevant cost functions and solving the load balance problem for heterogeneous processor nodes. The strategy is built by using methods of dynamic programming [20] in a way that allows optimizing scheduling and resource allocation for a set of tasks, comprising the compound job. In contrast to previous works, we consider the scheduling strategy as a set of admissible supporting schedules (see Fig. 2, Tables 2 and 3). The choice of the specific variant depends on the load level of the resource dynamics and is formed as a resource query, which is sent to a local batch-job processing system. One of the important features of our approach is resource state forecasting for timely updates of the strategies. It allows implementing mechanisms of adaptive job-flow reallocation between processor nodes and domains, and also means that there is no more fixed task assignment on a particular processor node. While one part of the job can be sent for execution, the other tasks, comprising the job, can migrate to the other processor nodes according to the updated co-allocation strategy. The similar schedule correction procedure is also supported in the GrADS project [13], where multistage job control procedure is implemented: making initial schedule, its correction during the job execution, metascheduling for a set of applications. Downside of this approach is the fact, that it is based on the creation of a single schedule, so the metascheduler stops working when no additional resources are available and job-queue is then set to waiting mode. The possibility of strategy updates allows user, being integrated into economical conditions of virtual organization, to affect job start time by changing resource usage costs. In fact it means that the job-flow dispatching strategy is modified according to new priorities and this provides competitive functioning and dynamic jobflow balance in virtual organization with inseparable resources. 4 SIMULATIONS RESULTS STUDIES AND

4.1 Simulation System We have implemented an original simulation environment (Fig. 7) of the metascheduling framework (see Fig. 6) to evaluate efficiency indices of different scheduling and co-allocation strategies. In contrast to well-known Grid simulation systems such as ChicSim [12] or OptorSim [23], our simulator MetaSim generates

UbiCC Journal – Volume 4 No. 3

566

Special Issue on ICIT 2009 Conference - Applied Computing

multicriteria strategies as a number of supporting schedules for metascheduler reactions to the events connected with resource assignment and advance reservations. Strategies for more than 12000 jobs with a fixed completion time were studied. Every task of a job had randomized completion time estimations, computation volumes, data transfer times and volumes. These parameters for various tasks had difference which was equal to 2, ..., 3. Processor nodes were selected in accordance to their relative performance. For the first group of “fast” nodes the relative performance was equal to 0.66, …, 1, for the second and the third groups 0.33, …, 0.66 and 0.33 (“slow” nodes) respectively. A number of nodes was conformed to a job structure, i.e. a task parallelism degree, and was varied from 20 to 30. 4.2 Types of Strategies We have studied the strategies of the following types: S1 – with fine-grain computations and active data replication policy; S2 – with fine-grain computations and a remote data access; S3 – with coarse-grain computations and static data storage; MS1 – with fine-grain computations, active data replication policy, and the best- and worst execution time estimations (a modification of the strategy S1). The strategy MS1 is less complete than the strategy S1 or S2 in the sense of coverage of events in distributed environment (see Tables 2 and 3). However the important point is the generation of a strategy by efficient and economic computational procedures of the metascheduler. The type S1 has

more computational expenses than MS1 especially for simulation studies of integrated job-flow and application-level scheduling. Therefore, in some experiments with integrated scheduling we compared strategies MS1, S2, and S3. 4.3 Application-Level Scheduling Study We have conducted the statistical research of the critical works method for application-level scheduling with above-mentioned types of strategies S1, S2, S3. The main goal of the research was to estimate a forecast possibility for making application-level schedules with the critical works method without taking into account independent job flows. For 12000 randomly generated jobs there were 38% admissible solutions for S1 strategy, 37% for S2, and 33% for S3 (Fig. 8). This result is obvious: application-level schedules implemented by the critical works method were constructed for available resources non-assigned to other independent jobs. Along with it there is a conflict distribution for the processor nodes that have different performance (“fast” are 2-3 times faster, than “slow” ones): 32% for “fast” ones, 68% for “slow” ones in S1, 56% and 44% in S2, 74% and 26% for S3 (Fig. 9). This may be explained as follows. The higher is the task state of distribution in the environment with active data transfer policy, the lower is the probability of collision between tasks on a specific resource. In order to implement the effective scheduling and resource allocation policy in the virtual organization we should coordinate application and job-flow levels of the scheduling.

Figure 7: Simulation environment of hierarchical scheduling framework based on strategies.

UbiCC Journal – Volume 4 No. 3

567

Special Issue on ICIT 2009 Conference - Applied Computing

S1

S1

S2

S2

S3 Figure 8: Percentage of admissible application-level schedules. 4.4 Job-Flow and Application-Level Scheduling Study For each simulation experiment such factors as job completion “cost”, task execution time, scheduling forecast errors (start time estimation), strategy live-to-time (time interval of acceptable schedules in a dynamic environment), and average load level for strategies S1, MS1, S2, and S3 were studied. Figure 10 shows load level statistics of variable performance processor nodes which allows discovering the pattern of the specific resource usage when using strategies S1, S2, and S3 with coordinated job-flow and application-levels scheduling. The strategy S2 performs the best in the term of load balancing for different groups of processor nodes, while the strategy S1 tries to occupy “slow” nodes, and the strategy S3 - the processors with the highest performance (see Fig. 10).

S3 Figure 9: Percentage of collisions for “fast” processor nodes in application-level scheduling.
Average node load level, % 80 60 40 20 0

S1 0.66-1

S2 0.33-0.66

S3 0.33

Relative processor nodes performance

Figure 10: Processor node load level in strategies S1, S2, and S3. Factor quality analysis of S2, S3 strategies for the whole range of execution time estimations for the

UbiCC Journal – Volume 4 No. 3

568

Special Issue on ICIT 2009 Conference - Applied Computing

selected processor nodes as well as modification MS1, when best- and worst-case execution time estimations were taken, is shown in Figures 11 and 12.
Relative job completion cost 1 Relative task execution time 1

0.5

0.5

0 MS1 Сost S2 S3 Execution time

0

Figure 11: Job completion cost and task execution time in strategies MS1, S2, and S3. Lowest-cost strategies are the “slowest” ones like S3 (see Fig. 11); they are most persistent in the term of time-to-live as well (see Fig. 12).
Relative time-to-live 1 Start time deviation/ job run time 1

0.5

0.5

0 MS1 Time-to-live S2 S3 Deviation

0

Figure 12: Time-to-live and start deviation time in strategies MS1, S2, and S3. The strategies of the type S3 try to monopolize processor resources with the highest performance and to minimize data exchanges. Withal, less persistent are the “fastest”, most expensive and most accurate strategies like S2. Less accurate strategies like MS1 (see Fig. 12) provide longer task completion time, than more accurate ones like S2 (Fig. 11), which include more possible events, associated with processor node load level dynamics. 5 CONCLUSIONS AND FUTURE WORK

scenarios, e.g., in our experiments we use FCFS management policy in local batch-job management systems. Afore-cited research results of strategy characteristics were obtained by simulation of global job-flow in a virtual organization. Inseparability condition for the resources requires additional advanced research and simulation approach of local job passing and local processor nodes load level forecasting methods development. Different jobqueue management models and scheduling algorithms (FCFS modifications, LWF, backfilling, gang scheduling, etc.) can be used here. Along with it local administering rules can be implemented. One of the most important aspects here is that advance reservations have impact on the quality of service. Some of the researches (particularly the one in Argonne National Laboratory) show, that preliminary reservation nearly always increases queue waiting time. Backfilling decreases this time. With the use of FCFS strategy waiting time is shorter than with the use of LWF. On the other hand, estimation error for starting time forecast is bigger with FCFS than with LWF. Backfilling that is implemented in Maui cluster scheduler includes advanced resource reservation mechanism and guarantees resource allocation. It leads to the difference increase between the desired reservation time and actual job starting time when the local request flow is growing. Some of the quality aspects and job-flow load balance problem are associated with dynamic priority changes, when virtual organization user changes execution cost for a specific resource. All of these problems require further research. ACKNOWLEDGEMENT. This work was supported by the Russian Foundation for Basic Research (grant no. 09-01-00095) and by the State Analytical Program “The higher school scientific potential development” (project no. 2.1.2/6718). 6 [1] REFERENCES I. Foster, C. Kesselman, and S. Tuecke: The Anatomy of the Grid: Enabling Scalable Virtual Organizations, Int. J. of High Performance Computing Applications, Vol. 15, No. 3, pp. 200 – 222 (2001) V.V. Voevodin: The Solution of Large Problems in Distributed Computational Media, Automation and Remote Control, Vol. 68, No. 5, pp. 32 – 45 (2007) D. Thain, T. Tannenbaum, and M. Livny: Distributed Computing in Practice: the Condor Experience, Concurrency and Computation: Practice and Experience, Vol. 17, No. 2-4, pp. 323 - 356 (2004)

The related works in scheduling problems are devoted to either job scheduling problems or application-level scheduling. The gist of the approach described is that the resultant dispatching strategies are based on the integration of job-flows and application-level techniques. It allows increasing the quality of service for the jobs and distributed environment resource usage efficiency. Our results are promising, but we have bear in mind that they are based on simplified computation

[2]

[3]

UbiCC Journal – Volume 4 No. 3

569

Special Issue on ICIT 2009 Conference - Applied Computing

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

A. Roy and M. Livny: Condor and Preemptive Resume Scheduling, In: J. Nabrzyski, J.M. Schopf, and J.Weglarz (eds.): Grid resource management. State of the art and future trends, Kluwer Academic Publishers, pp. 135 – 144 (2003) V.V. Krzhizhanovskaya and V. Korkhov: Dynamic Load Balancing of Black-Box Applications with a Resource Selection Mechanism on Heterogeneous Resources of Grid, In: 9th International Conference on Parallel Computing Technologies, Springer, Heidelberg, LNCS, Vol. 4671, pp. 245 – 260 (2007) F. Berman: High-performance Schedulers, In: I. Foster and C. Kesselman (eds.): The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, San Francisco, pp. 279 – 309 (1999) Y. Yang, K. Raadt, and H. Casanova: Multiround Algorithms for Scheduling Divisible Loads, IEEE Transactions on Parallel and Distributed Systems, Vol. 16, No. 8, pp. 1092 – 1102 (2005) A. Natrajan, M.A. Humphrey, and A.S. Grimshaw: Grid Resource Management in Legion,” In: J. Nabrzyski, J.M. Schopf, and J.Weglarz (eds.): Grid resource management. State of the art and future trends, Kluwer Academic Publishers, pp.145 – 160 (2003) J. Beiriger, W. Johnson, H. Bivens et al.: Constructing the ASCI Grid, In: 9th IEEE Symposium on High Performance Distributed Computing, IEEE Press, New York, pp. 193 – 200 (2000) J. Frey, I. Foster, M. Livny et al.: Condor-G: a Computation Management Agent for Multiinstitutional Grids, In: 10th International Symposium on High-Performance Distributed Computing, IEEE Press, New York, pp. 55 – 66 (2001) D. Abramson, J. Giddy, and L. Kotler: High Performance Parametric Modeling with Nimrod/G: Killer Application for the Global Grid?, In: International Parallel and Distributed Processing Symposium, IEEE Press, New York, pp. 520 – 528 (2000) K. Ranganathan and I. Foster: Decoupling Computation and Data Scheduling in Distributed Data-intensive Applications, In: 11th IEEE International Symposium on High Performance Distributed Computing, IEEE Press, New York, pp. 376 – 381 (2002) H. Dail, O. Sievert, F. Berman et al.: Scheduling in the Grid Application Development Software project, In: J. Nabrzyski, J.M. Schopf, and J.Weglarz (eds.): Grid resource management. State of the art and

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

future trends, Kluwer Academic Publishers, pp. 73 – 98 (2003) R. Buyya, D. Abramson, J. Giddy et al.: Economic Models for Resource Management and Scheduling in Grid Computing, J. of Concurrency and Computation: Practice and Experience, Vol. 14, No. 5, pp. 1507 – 1542 (2002) K. Aida and H. Casanova: Scheduling Mixedparallel Applications with Advance Reservations, In: 17th IEEE International Symposium on High-Performance Distributed Computing, IEEE Press, New York, pp. 65 – 74 (2008) D.B. Jackson: GRID Scheduling with Maui/Silver, In: J. Nabrzyski, J.M. Schopf, and J.Weglarz (eds.): Grid resource management. State of the art and future trends, Kluwer Academic Publishers, pp. 161 – 170 (2003) K. Kurowski, J. Nabrzyski, A. Oleksiak, and J. Weglarz: Multicriteria Aspects of Grid Resource Management, In: J. Nabrzyski, J.M. Schopf, and J.Weglarz (eds.): Grid resource management. State of the art and future trends, Kluwer Academic Publishers, pp. 271 – 293 (2003) V. Toporkov: Multicriteria Scheduling Strategies in Scalable Computing Systems, In: 9th International Conference on Parallel Computing Technologies, Springer, Heidelberg, LNCS, Vol. 4671, pp. 313 – 317 (2007) V.V. Toporkov and A.S. Tselishchev: Safety Strategies of Scheduling and Resource Coallocation in Distributed Computing, In: 3rd International Conference on Dependability of Computer Systems, IEEE CS Press, pp. 152 – 159 (2008) V.V. Toporkov: Supporting Schedules of Resource Co-Allocation for Distributed Computing in Scalable Systems, Programming and Computer Software, Vol. 34, No. 3, pp. 160 – 172 (2008) M. Tang, B.S. Lee, X. Tang, et al.: The Impact of Data Replication on Job Scheduling Performance in the Data Grid, Future Generation Computing Systems, Vol. 22, No. 3, pp. 254 – 268 (2006) N.N. Dang, S.B. Lim, and C.K. Yeo: Combination of Replication and Scheduling in Data Grids, Int. J. of Computer Science and Network Security, Vol. 7, No. 3, pp. 304 – 308 (2007) W.H. Bell, D. G. Cameron, L. Capozza et al.: OptorSim – A Grid Simulator for Studying Dynamic Data Replication Strategies, Int. J. of High Performance Computing Applications, Vol. 17, No. 4, pp. 403 – 416 (2003)

UbiCC Journal – Volume 4 No. 3

570

Special Issue on ICIT 2009 Conference - Applied Computing

Least and greatest fixed points of a while semantics function
Fairouz Tchier Mathematics department, King Saud University P.O.Box 22452 Riyadh 11495, Saudi Arabia ftchier@hotmail.com May 1, 2009

Abstract
The meaning of a program is given by specifying the function (from input to output) that corresponds to the program. The denotational semantic definition, thus maps syntactical things into functions. A relational semantics is a mapping of programs to relations. We consider that the input-output semantics of a program is given by a relation on its set of states. In a nondeterministic context, this relation is calculated by considering the worst behavior of the program (demonic relational semantics). In this paper, we concentrate on while loops. We will present some interesting results about the fixed points of the while semantics function; f (X) = Q ∨ P 2 X where P < ∧ Q< = Ø, by taking P := t 2 B and Q := t∼ , one gets the demonic semantics we have assigned to while loops in previous papers. We will show that the least angelic fixed point is equal to the greatest demonic fixed point of the semantics function.

1

Relation Algebras

Both homogeneous and heterogeneous relation algebras are employed in computer science. In this paper, we use heterogeneous relation algebras whose definition is taken from [8, 27, 28]. (1) Definition. A relation algebra A is a structure (B, ∨, ∧, −, ◦, ) over a non-empty set B of elements, called relations. The unary operations −, are total whereas the binary operations ∨, ∧, ◦ are partial. We denote by B∨R the set of those elements Q ∈ B for which the union R ∨ Q is defined and we require that R ∈ B∨R for every R ∈ B. If Q ∈ B∨R , we say that Q has the same type as R. The following conditions are satisfied. (a) (B∨R , ∨, ∧, −) is a Boolean algebra, with zero element 0R and universal element 1R . The elements of B∨R are ordered by inclusion, denoted by ≤. (b) If the products P ◦ R and Q ◦ R are defined, so is P ◦ Q . If the products P ◦ Q and P ◦ R are defined, so is Q ◦ R. If Q ◦ R exists, so does Q ◦ P for every P ∈ B∨R . (c) Composition is associative: P ◦ (Q ◦ R) = (P ◦ Q) ◦ R. 1

Keywords: Angelic fixed points, demonic fixed points, demonic functions, while loops, relational demonic semantics.

UbiCC Journal – Volume 4 No. 3

571

Special Issue on ICIT 2009 Conference - Applied Computing

(d) There are elements R id and idR associated to every relation R ∈ B. R id behaves as a right identity and idR as a left identity for B∨R . (e) The Schr¨der rule P ◦Q ≤ R ⇔ P ◦−R ≤ o −Q ⇔ −R ◦ Q ≤ −P holds whenever one of the three expressions is defined. (f) 1 ◦ R ◦ 1 = 1 iff R = 0 (Tarski rule). If R ∈ B∨R , then R is said to be homogeneous. If all R ∈ A have the same type, the operations are all total and A itself is said to be homogeneous. For simplicity, the universal, zero, and identity elements are all denoted by 1, 0, id, respectively. Another operation that occurs in this article is the reflexive transitive closure R∗ . It satisfies the wellknown laws R∗ =
i≥0

a vector [28] iff x = x ◦ 1. The second way is via monotypes [2]: a relation a is a monotype iff a ≤ id. The set of monotypes {a | a ∈ B∨R }, for a given R, is a complete Boolean lattice. We denote by a∼ the monotype complement of a. The domain and codomain of a relation R can be characterized by the vectors R ◦ 1 and R ◦ 1, respectively [15, 28]. They can also be characterized by the corresponding monotypes. In this paper, we take the last approach. In what follows we formally define these operators and give some of their properties. (3) Definition. The domain and codomain operators of a relation R, denoted respectively by R< and R> , are the monotypes defined by the equations (a) R< = id ∧ R ◦ 1, (b) R> = id ∧ 1 ◦ R. These operators can also be characterized by Galois connections(see [2, 2]). For each relation R and each monotype a, R< ≤ a ⇔ R ≤ a ◦ 1, R> ≤ a ⇔ R ≤ 1 ◦ a. The domain and codomain operators are linked by the equation R> = R < , as is easily checked. (4) Definition. Let R be a relation and a be a monotype. The monotype right residual and monotype left residual of a by R (called factors in [5]) are defined respectively by
• (a) a/R := ((1 ◦ a)/R)> , • (b) R\a := (R\(a 2 1))< .

Ri and R∗ = id ∨ R ◦ R∗ = id ∨ R∗ ◦ R,

where R0 = id and Ri+1 = R ◦ Ri . From Definition 1, the usual rules of the calculus of relations can be derived (see, e.g., [8, 10, 28]). The notion of Galois connections is very important in what follows, there are many definitions of Galois connections [?]. We choose the following one [2]. (2) Definition. Let (S, ≤S ) and (S , ≤S ) be two preordered sets. A pair (f, g) of functions, where f : S → S and g : S → S, forms a Galois connections iff the following formula holds for all x ∈ S and y ∈ S. f (x) ≤S y ⇔ x ≤S g(y). The function f is called the lower adjoint and g the upper adjoint.

2

Monotypes and Related Operators

An alternative characterization of residuals can also be given by means of a Galois connection as follows [1]: • b ≤ a/R ⇔ (b 2 R)> ≤ a, • b ≤ R\a ⇔ (R ◦ b)< ≤ a. We have to use exhaustively the complement of the domain of a relation R, i.e the monotype a such that a = R< ∼ . To avoid the notation R< ∼ , we adopt the Notation 2

In the calculus of relations, there are two ways for viewing sets as relations; each of them has its own advantages. The first is via vectors: a relation x is

UbiCC Journal – Volume 4 No. 3

572

Special Issue on ICIT 2009 Conference - Applied Computing

R := R< ∼ . Because we assume our relation algebra to be complete, least and greatest fixed points of monotonic functions exist. We cite [12] as a general reference on fixed points. Let f be a monotonic function. The following properties of fixed points are used below: (a) µf = {X|f (X) = X} = {X|f (X) ≤ X}, (b) νf = {X|f (X) = X} = {X|X ≤ f (X)}, (c) µf ≤ νf, (d) f (Y ) ≤ Y ⇒ µf ≤ Y, (e) Y ≤ f (Y ) ⇒ Y ≤ νf. In what follows, we describe notions that are useful for the description of the set of initial states of a program for which termination is guaranteed. These notions are progressive finiteness and the initial part of a relation. A relation R is progressively finite in terms of points iff there are no infinite chains s0 , ..., si such that si Rsi+1 ∀i, i ≥ 0. I.e there is no points set y which are the starting points of some path of infinite length. For every point set y, y ≤ R ◦ y ⇒ y = 0. The least set of points which are the starting points of paths of finite length i.e from which we can proceed only finitely many steps is called initial part of R denoted by I(R). This topic is of interest in many areas of computer science, mathematics and is related to recursion and induction principle. (5) Definition. (a) The initial part of a relation R, denoted I(R), is given by • I(R) = {a | a ≤ id : a/R = a} = {a | • • a ≤ id : a/R ≤ a} = µ(a : a ≤ id : a/R), where a is a monotype. (b) A relation R is said to be progressively finite [28] iff I(R) = id.
• The description of I(R) by the formulation a/R = a • shows that I(R) exists, since (a | a ≤ id : a/R) is monotonic in the first argument and because the set of monotypes is a complete lattice, it follows from the fixed point theorem of Knaster and Tarski that this function has a least fixed point. Progressive finiteness of a relation R is the same as well-foundedness

of R . Then, I(R) is a monotype. In a concrete setting, I(R) is the set of monotypes which are not the origins of infinite paths (by R): A relation R is progressively finite iff for a monotype a, a ≤ (R ◦ a)< ⇒ a = 0 equivalently ν(a : a ≤ id : (R ◦ a)< ) = 0 equivalently µ(a : a ≤ • id : a/R) = id. The next theorem involves the function wa (X) := Q ∨ P ◦ X, which is closely related to the description of iterations. The theorem highlights the importance of progressive finiteness in the simplification of fixed point-related properties. (6) Theorem. Let f (X) := Q ∨ P ◦ X be a function. If P is progressively finite, the function f has a unique fixed point which means that ν(f ) = µ(f ) = P ∗ ◦ Q [1]: As the demonic calculus will serve as an algebraic apparatus for defining the denotational semantics of the nondeterministic programs, we will define in what follows these operators.

3

Demonic refinement ordering

We now define the refinement ordering (demonic inclusion) we will be using in the sequel. This ordering induces a complete join semilattice, called a demonic semilattice. The associated operations are demonic join ( ), demonic meet ( ) and demonic composition ( 2 ). We give the definitions and needed properties of these operations, and illustrate them with simple examples. For more details on relational demonic semantics and demonic operators, see [5, 8, 6, 7, 14]. (7) Definition. We say that a relation Q refines a relation R [23], denoted by Q R, iff R< ◦ Q ≤ < < R and R ≤ Q . (8) Proposition. Let Q and R be relations, then (a) The greatest lower (wrt ) of Q and R is, Q R = Q< ◦ R< ◦ (Q ∨ R), If Q< = R< then we have i.e Q R = Q ∨ R. 3 and ∨ coincide

UbiCC Journal – Volume 4 No. 3

573

Special Issue on ICIT 2009 Conference - Applied Computing

(b) If Q and R satisfy the condition Q< ∧ R< = (Q ∧ R)< , their least upper bound is Q R = Q ∧ R ∨ Q ◦ R ∨ R ◦ Q, otherwise, the least upper bound does not exist. If Q< ∧ R< = 0 then we have and ∧ coincide i.e Q R = Q ∧ R. For the proofs see [9, 14]. (9) Definition. The demonic composition of rela• tions Q and R [5] is Q 2 R = (R< /Q) ◦ Q ◦ R. In what follows we present some properties of (10) Theorem. (a) (P
2 2

• (a) S(R) = I(P ) ◦ [(P ∨ Q)< /P ∗ ] ◦ P ∗ ◦ Q., with the restriction

(b) P < ∧ Q< = 0 Our goal is to show that the operational semantics a is equal to the denotational one which is given as the greatest fixed point of the semantic function Q ∨ P 2 X in the demonic semilattice. In other words, we have to prove the next equation: (a) S(R) = {X|X Q∨P
2

.

X};

Q) 2 R = P

2

(Q 2 R),

(b) R total ⇒ Q 2 R = Q ◦ R, (c) Q function ⇒ Q 2 R = Q ◦ R. See [5, 6, 7, 14, 35]. Monotypes have very simple and convenient properties. Some of them are presented in the following proposition. (11) Proposition. Let a and b be monotypes. We have (a) a = a = a2 ,

by taking P := t 2 B and Q := t∼ , one gets the demonic semantics we have assigned to while loops in previous papers [14, 35]. Other similar definitions of while loops can be found in [19, 25, 29]. Let us introduce the following abbreviations: (12) Abbreviation. Let P , Q and X be relations subject to the restriction P < ∧ Q< = 0 (b) and x a monotype. The Abbreviations wd , wa , w< , a and l are defined as follows: wd (X) := Q ∨ P 2 X, • a := (P ∨ Q)< /P ∗ , wa (X) := Q ∨ P ◦ X, l := I(P ). w< (x) := Q< ∨ (P 2 x)< = Q ∨ (P 2 x)< (Mnemonics: the subscripts a and d stand for angelic and demonic, respectively; the subscript < refers to the fact that w< is obtained from wd by composition with <; the monotype a stands for abnormal, since it represents states from which abnormal termination is not possible; finally, l stands for loop, since it represents states from which no infinite loop is possible.) In what follows we will be concerned about the fixed point of wa , w< and wd . (13) Theorem. Every fixed point Y of wa (Abbreviation 12) verifies P ∗ ◦ Q ≤ Y ≤ P ∗ ◦ Q ∨ l∼ 2 1, and the bounds are tight (i.e. the extremal values are fixed points). The next lemma investigates the relationship between fixed points of w< and those of wd (cf. Abbreviation 12). 4

(b) a 2 b = a ∧ b = b 2 a, (c) a ∨ a∼ = id and a ∧ a∼ = 0, (d) a ≤ b ⇔ b∼ ≤ a∼ , (e) a∼ 2 b∼ = (a ∨ b)∼ , (f ) (a ∧ b)∼ = (a 2 b)∼ = a∼ ∨ b∼ , (g) a 2 b∼ ∨ b = a ∨ b, (h) a ≤ b ⇔ a 2 1 ≤ b 2 1. In previous papers [14, 13, 31, 35], we found the semantics of the while loop given by the following P    graph: - e 
- s 
 
 Q

UbiCC Journal – Volume 4 No. 3

574

Special Issue on ICIT 2009 Conference - Applied Computing

(14) Lemma. Let h(X) := (P ∨ Q) ∨ (P ◦ X)< and h1 (x) := (P ∨ Q) 2 1 ∨ P ◦ x. (a) Y = wd (Y ) ⇒ w< (Y < ) = Y < , (b) w< (Y < ) = Y < ⇒ h(Y ) = Y , (c) h(Y ) = Y ⇒ h1 (Y
2

4

Application

1) = Y

2

1,

(15) Lemma. Let Y be a fixed point of wd and b be a fixed point of w< (Abbreviation 12). The relation b 2 Y is a fixed point of wd . (16) Lemma. If Y and Y are two fixed points of wd (Abbreviation 12) such that Y < = Y < and Y < ◦P is progressively finite, then Y = Y . The next theorem characterizes the domain of the greatest fixed point, wrt , of function wd . This domain is the set of points for which normal termination is guaranteed (no possibility of abnormal termination or infinite loop). (17) Theorem. Let W be the greatest fixed point, wrt to , of wd (Abbreviation 12). We have W < = a 2 l. The following theorem is a generalization to a nondeterministic context of the while statement verification rule of Mills [24]. It shows that the greatest fixed point W of wd is uniquely characterized by conditions (a) and (b), that is, by the fact that W is a fixed point of wd and by the fact that no infinite loop is possible when the execution is started in a state that belongs to the domain of W . Note that we also have W < ≤ a (see Theorem 17), but this condition is implicitly enforced by condition (a). Half of this theorem (the ⇐ direction) is also proved by Sekerinski (the main iteration theorem [29]) in a predicative programming set-up. (18) Theorem. A relation W is the greatest fixed point, wrt , of function wd (Abbreviation 12), iff the following two conditions hold: (a) W = wd (W ), (b) W < ≤ l. In what follows we give some applications of our results. 5

In [6, 7], Berghammer and Schmidt propose abstract relation algebra as a practical means for the specification of data types and programs. Often, in these specifications, a relation is characterized as a fixed point of some function. Can demonic operators be used in the definition of such a function? Let us now show with a simple example that the concepts presented in this paper give useful insights for answering this question. In [6, 7], it is shown that the natural numbers can be characterized by the relations z and S (zero and successeur ) the laws (a) Ø = z = zL ∧ zz ⊆ I (z is a point), SS = I ∧ S S ⊆ I (S is a one to one application.), Sz = Ø (z has a predecessor), = L = {x|z ∪ S x x} (generation principle). For the rest of this section, assume that we are given a relation algebra satisfying these laws. In this algebra, because of the last axiom, the inequation (a) z ∪ S X ⊆ X obviously has a unique solution for X, namely, X = L. Because the functiong(X) := z ∪ S X is ∪continuous, this solution can be expressed as (a) L =
n≥0

g n (Ø) =

n≥0

S

n

z,

where g 0 (Ø) = Ø, g n+1 (Ø) = g(g n (Ø)), S 0 = I and S n+1 = S S n . However, it is shown in [6, 7] that z S 2 X ⊆ X, obtained by replacing the join and composition operators in a by their demonic counterparts, has infinitely many solutions. Indeed, from Sz = Ø and the Schr¨der rule, it follows that o (a) z ∩ S L = Ø, so that, by definition of demonic join (8(a)) and demonic composition (9), z S 2 X = (z ∪ S 2 X) ∩ z ∩ (S 2 X)L ⊆ z ∩ S L = Ø. Hence, any relation R is a solution to z S 2 X ⊆ X. Looking at previous papers [14, 32, 33, 34, 31], one

UbiCC Journal – Volume 4 No. 3

575

Special Issue on ICIT 2009 Conference - Applied Computing

immediately sees why it is impossible to reach L by joining anything to z (which is a point and hence is an immediate predecessor of Ø), since this can only lead to z or to Ø. Let us now go ‘fully demonic’ and ask what is a solution to z S 2 X X. By the discussion above, this is equivalent to Ø X, which has a unique solution, X = Ø. This raises the question whether it is possible to find some fully demonic inequation similar to (a), whose solution is X = L. Because L is in the middle of the demonic semilattice, there are in fact two possibilities: either approach L from above or from below. For the approach from above, consider the inequation X z S
2

how the universal relationL arises as the greatest lower bound n≥0 S n 2 z of this set of points. Note that, whereas there is a unique solution to a, there are infinitely many solutions to 4 (equivalently, to a), for example n≥k S n (= n≥k S n ), for any k. For the upward approach, consider z X 2S X.

X.

Using Theorem 10(c), we have z S 2X = z S X, since S is deterministic (axiom a(b)). From a, z ⊆ S L; this implies z ⊆ S XL and S X ⊆ z, so that, by definition of , z S X = z ∩ S X ∪ z ∩ S XL ∪ z ∩ S X = z ∪ S X. This means that 4 reduces to (a) X z ∪ S X.

By definition of refinement (7), this implies that z ∪ S XL ⊆ XL; this is a variant of (a), thus having XL = L as only solution. This means that any solution to 4 must be a total relation. But L is total and in fact is the largest (by ) total relation. It is also a solution to 4 (since by axiom a(d), z ∪ S L = L) so that L = {X|X z S 2 X}; that is, L is the greatest fixed point in (BL , ) of n2 f (X) := z S 2 X. Now consider z, n≥0 S n where S is a n-fold demonic composition defined by S 0 = I and S n+1 = S 2 S n . By axiom a(b), S is deterministic, so that, by 10(c) and associativity of demonic composition, conS n 2 z = S n z. Hence, It is easy to show that for any n ≥ 0, S n z is a point (it is the n-th successor of zero) and that m = n ⇒ S m z = S n z. Hence, in (BL , ), {S n z|n ≥ 0} (i.e. {S n 2 z|n ≥ 0}) is the set of immediate predecessors of Ø; looking at [31] shows 6

Here also there are infinitely many solutions to this inequation; in particular, any vector v, including Ø and L, is a solution to 4. Because (BL , ) is only a join semilattice, it is not at all obvious that the least fixed point of h(X) := z X 2 S exists. It does, however, since the following derivan tion shows that n≥0 z 2 S n (= n≥0 h (z ), 0 where h (z ) = z ) is a fixed point of h and hence is obviously the least solution of 4: Because z and S are mappings, property 10(c) implies that z 2 S n = z S n , for any n ≥ 0. But z S n is also a mapping (it is the inverse of the point S n z) and hence is total, from which, by Proposition 8(a) n and equation a, n≥0 z 2 S n = = n≥0 z S n n z S = ( n≥0 S z)˘ = L = L. This n≥0 means that L is the least upper bound of the set of mappings {z 2 S n |n ≥ 0}. Again, a look at [31] gives some intuition to understand this result, after recalling that mappings are minimal elements in (BL , ) (though not all mappings have the form z 2 S n ). Thus, building L from below using the set of mappings {z 2 S n |n ≥ 0} is symmetric to building it from above using the set of points {S n 2 z|n ≥ 0}.

5

Conclusion

We presented a theorem that can be also used to find the fixed points of functions of the form f (X) := Q ∨ P 2 X (no restriction on the domains of P and Q). This theorem can be applied also to the program verification and construction (as in the precedent example). Half of this theorem (the ⇐ direction) is also proved by Sekerinski (the main iteration theorem [29]) in a predicative programming set-up. Our theorem is more general because there is no restriction on the domains of the relations P and Q.

UbiCC Journal – Volume 4 No. 3

576

Special Issue on ICIT 2009 Conference - Applied Computing

The approach to demonic input-output relation presented here is not the only possible one. In [19, 20, 21], the infinite looping has been treated by adding to the state space a fictitious state ⊥ to denote nontermination. In [8, 18, 22, 26], the demonic input-output relation is given as a pair (relation,set). The relation describes the input-output behavior of the program, whereas the set component represents the domain of guaranteed termination. We note that the preponderant formalism employed until now for the description of demonic input-output relation is the wp-calculus. For more details see [3, 4, 17].

[6] Berghammer, R.: Relational Specification of Data Types and Programs. Technical report 9109, Fakult¨t f¨r Informatik, Universit¨t der a u a Bundeswehr M¨nchen, Germany, Sept. 1991. u [7] Berghammer, R. and Schmidt, G.: Relational Specifications. In C. Rauszer, editor, Algebraic Logic, 28 of Banach Center Publications. Polish Academy of Sciences, 1993. [8] Berghammer, R. and Zierer, H.: Relational Algebraic Semantics of Deterministic and Nondeterministic Programs. Theoretical Comput. Sci., 43, 123–147 (1986). [9] Boudriga, N., Elloumi, F. and Mili, A.: On the Lattice of Specifications: Applications to a Specification Methodology. Formal Aspects of Computing, 4, 544–571 (1992). [10] Chin, L. H. and Tarski, A.: Distributive and Modular Laws in the Arithmetic of Relation Algebras. University of California Publications, 1, 341–384 (1951). [11] Conway, J. H.: Regular Algebra and Finite Machines. Chapman and Hall, London, 1971. [12] Davey, B. A. and Priestley, H. A.: Introduction to Lattices and Order. Cambridge Mathematical Textbooks. Cambridge University Press, Cambridge, 1990. [13] J. Desharnais, B. M¨ller, and F. Tchier. Kleene o under a demonic star. 8th International Conference on Algebraic Methodology And Software Technology (AMAST 2000), May 2000, Iowa City, Iowa, USA, Lecture Notes in Computer Science, Vol. 1816, pages 355–370, SpringerVerlag, 2000. [14] Desharnais, J., Belkhiter, N., Ben Mohamed Sghaier, S., Tchier, F., Jaoua, A., Mili, A. and Zaguia, N.: Embedding a Demonic Semilattice in a Relation Algebra. Theoretical Computer Science, 149(2):333–360, 1995. 7

References
[1] Backhouse, R. C., and Doombos, H.: Mathematical Induction Made Calculational. Computing science note 94/16, Department of Mathematics and Computer Science, Eindhoven University of Technology, The Netherlands, 1994. [2] Backhouse, R. C., Hoogendijk, P., Voermans, E. and van der Woude, J.:. A Relational Theory of Datatypes. Research report, Department of Mathematics and Computer Science, Eindhoven University of Technology, The Netherlands, 1992. [3] R. J. R. Back. : On the correctness of refinement in program development. Thesis, Department of Computer Science, University of Helsinki, 1978. [4] R. J. R. Back and J. von Wright.: Combining angels, demons and miracles in program specifications. Theoretical Computer Science,100, 1992, 365–383. [5] Backhouse, R. C. and van der Woude, J.: Demonic Operators and Monotype Factors. Mathematical Structures in Comput. Sci., 3(4), 417– 433, Dec. (1993). Also: Computing Science Note 92/11, Department of Mathematics and Computer Science, Eindhoven University of Technology, The Netherlands, 1992.

UbiCC Journal – Volume 4 No. 3

577

Special Issue on ICIT 2009 Conference - Applied Computing

[15] Desharnais, J., Jaoua, A., Mili, F., Boudriga, N. and Mili, A.: A Relational Division Operator: The Conjugate Kernel. Theoretical Comput. Sci., 114, 247–272 (1993). [16] Dilworth, R. P.: Non-commutative Residuated Lattices. Trans. Amer. Math. Sci., 46, 426–444 (1939). [17] E. W. Dijkstra. : A Discipline of Programming. Prentice-Hall, Englewood Cliffs, N.J., 1976. [18] H. Doornbos. : A relational model of programs without the restriction to Egli-Milner monotone constructs. IFIP Transactions, A-56:363–382. North-Holland, 1994. [19] C. A. R. Hoare and J. He. : The weakest prespecification. Fundamenta Informaticae IX, 1986, Part I: 51–84, 1986. [20] C. A. R. Hoare and J. He. : The weakest prespecification. Fundamenta Informaticae IX, 1986, Part II: 217–252, 1986. [21] C. A. R. Hoare and al. : Laws of programming. Communications of the ACM, 30:672–686, 1986. [22] R. D. Maddux. : Relation-algebraic semantics. Theoretical Computer Science, 160:1–85, 1996. [23] Mili, A., Desharnais, J. and Mili, F.: Relational Heuristics for the Design of Deterministic Programs. Acta Inf., 24(3), 239–276 (1987). [24] Mills, H. D., Basili, V. R., Gannon, J. D. and Hamlet,R. G.: Principles of Computer Programming. A Mathematical Approach. Allyn and Bacon, Inc., 1987. [25] Nguyen, T. T.: A Relational Model of Demonic Nondeterministic Programs. Int. J. Foundations Comput. Sci., 2(2), 101–131 (1991). [26] D. L. Parnas. A Generalized Control Structure and its Formal Definition. Communications of the ACM, 26:572–581, 1983 [27] Schmidt, G.: Programs as Partial Graphs I: Flow Equivalence and Correctness. Theoretical Comput. Sci., 15, 1–25 (1981). 8

[28] Schmidt, G. and Str¨hlein, T.: Relations and o Graphs. EATCS Monographs in Computer Science. Springer-Verlag, Berlin, 1993. [29] Sekerinski, E.: A Calculus for Predicative Programming. In R. S. Bird, C. C. Morgan, and J. C. P. Woodcock, editors, Second International Conference on the Mathematics of Program Construction, volume 669 of Lecture Notes in Comput. Sci. Springer-Verlag, 1993. [30] Tarski, A.: On the calculus of relations. J. Symb. Log. 6, 3, 1941, 73–89. [31] F. Tchier.: S´mantiques relationnelles e d´moniaques et v´rification de boucles non e e d´terministes. Theses of doctorat, D´partement e e de Math´matiques et de statistique, Universit´ e e Laval, Canada, 1996. [32] F. Tchier.: Demonic semantics by monotypes. International Arab conference on Information Technology (Acit2002),University of Qatar, Qatar, 16-19 December 2002. [33] F. Tchier.: Demonic relational semantics of compound diagrams. In: Jules Desharnais, Marc Frappier and Wendy MacCaull, editors. Relational Methods in computer Science: The Qu´bec seminar, pages 117-140, Methods Pube lishers 2002. [34] F. Tchier.: While loop d demonic relational semantics monotype/residual style. 2003 International Conference on Software Engineering Research and Practice (SERP03), Las Vegas, Nevada, USA, 23-26, June 2003. [35] F. Tchier.: Demonic Semantics: using monotypes and residuals. IJMMS 2004:3 (2004) 135160. (International Journal of Mathematics and Mathematical Sciences) [36] M. Walicki and S. Medal.: Algebraic approches to nondeterminism: An overview. ACM computong Surveys,29(1), 1997, 30-81. [37] L.Xu, M. Takeichi and H. Iwasaki.: Relational semantics for locally nondeterministic

UbiCC Journal – Volume 4 No. 3

578

Special Issue on ICIT 2009 Conference - Applied Computing

programs. New Generation Computing 15, 1997, 339-362.

9

UbiCC Journal – Volume 4 No. 3

579

Special Issue on ICIT 2009 Conference - Applied Computing

CASE STUDIES IN THIN CLIENT ACCEPTANCE
Paul Doyle, Mark Deegan, David Markey, Rose Tinabo, Bossi Masamila, David Tracey School of Computing, Dublin Institute of Technology, Ireland WiSAR Lab, Letterkenny Institute of Technology {paul.doyle, mark.deegan, david.markey}@dit.ie,{rose.tinabo, bossi.masamila}@student.dit.ie david.tracey@lyit.ie

ABSTRACT Thin Client technology boasts an impressive range of financial, technical and administrative benefits. Combined with virtualisation technology, higher bandwidth availability and cheaper high performance processors, many believe that Thin Clients have come of age. But despite a growing body of literature documenting successful Thin Client deployments there remains an undercurrent of concern regarding user acceptance of this technology and a belief that greater efforts are required to understand how to integrate Thin Clients into existing, predominantly PC-based, deployments. It would be more accurate to state that the challenge facing the acceptance of Thin Clients is a combination of architectural design and integration strategy rather than a purely technical issue. Careful selection of services to be offered over Thin Clients is essential to their acceptance. Through an evolution of three case studies the user acceptance issues were reviewed and resolved resulting in a 92% acceptance rate of the final Thin Client deployment. No significant bias was evident in our comparison of user attitudes towards desktop services delivered over PCs and Thin Clients. Keywords: Thin Clients, Acceptance, Virtualisation, RDP, Terminal Services.

1

INTRODUCTION

It is generally accepted that in 1993 Tim Negris coined the phrase “Thin Client” in response to Larry Ellison’s request to differentiate the server centric model of Oracle from the desktop centric model prevalent at the time. Since then the technology has evolved from a concept to a reality with the introduction of a variety of hardware devices, network protocols and server centric virtualised environments. The Thin Client model offers users the ability to access centralised resources using full graphical desktops from remotely located, low cost, stateless devices. While there is sufficient literature in support of Thin Clients and their deployment, the strategies employed are not often well documented. To demonstrate the critical importance of how Thin Clients perform in relation to user acceptance we present a series of case studies highlighting key points to be addressed in order to ensure a successful deployment. 1.1 Research Aim The aim of this research has been to identify a successful strategy for Thin Client acceptance within an educational institute. There is sufficient literature which discusses the benefits of Thin Client adoption, and while this was referenced it was not central to the aims of this research as the barrier to obtaining these benefits was seen to be acceptance of the

technology. Over a four year period, three Thin Client case studies were run within the Dublin Institute of Technology with the explicit aim of determining the success factors in obtaining user satisfaction. The following data criteria were used to evaluate each case study in addition to referencing the Universal Theory of User Acceptance Testing (UTUAT) [1]. 1) Login events on the Thin Clients. 2) Reservation of the Thin Client facility. 3) The cost of maintaining the service. 1.2 Paper Structure In section 2 we review the historical background and trends of Thin Client technology to provide an understanding of what the technology entails. Section 3 discusses the case for Thin Clients within existing literature including a review of deployments within industry and other educational institutes. Section 4 provides details of the three case studies discussing their design, evaluating the results, and providing critical analysis. Section 5 takes a critical look at all of the data and sections 6 and 7 provide conclusions and identify future work. This paper is aimed at professionals within educational institutes seeking ways to realize the benefits of Thin Client computing while maintaining the support and acceptance of users. It provides a balance between

UbiCC Journal – Volume 4 No. 3

585

Special Issue on ICIT 2009 Conference - Applied Computing
the hype of Thin Clients and the reality of their deployment. 2 THIN CLIENT EVOLUTION The challenge faced by Thin Client technology is to deliver on these lower costs and mobility, while continuing to provide a similarly rich GUI user experience to that provided by the desktop machine (a challenge helped by improved bandwidth, but latency is still often a limiting factor [4]) and the flexibility with regard to applications they have on their desktop. Typically, current Thin Client systems have an application on a server (generally Windows or Linux) which encodes the data to be rendered into a remote display protocol. This encoded data is sent over a network to a Thin Client application running on a PC or a dedicated Thin Client device to be decoded and displayed. The Thin Client will send user input such as keystrokes to the application on the server. The key point is that the Thin Client does not run the code for the user's application, but only the code required to support the remote display protocol. While the term Thin Client was not used for dumb terminals attached to mainframes in the 1970's, the mainframe model shared many of the attributes of Thin Client computing. It was centralised, the mainframe ran the software application and held the data (or was attached to the data storage) and the terminal could be shared by users as it did not retain personal data or applications, but displayed content on the screen as sent to it by the mainframe. From a desktop point of view, the 1980's were dominated by the introduction and adoption of the Personal Computer. Other users requiring higher performance and graphics used Unix Workstations from companies like Apollo and Sun Microsystems. The X Window System [5] was used on many Workstations and X terminals were developed as a display and input terminal and provided a lower cost alternative to a Unix Workstation, with the X terminal connecting to a central machine running an X display manager. As such, they shared some of the characteristics of a Thin Client system, although the X terminal ran an X Server making it more complicated than Thin Client devices. The 1990's saw the introduction of several remote display protocols, such as Citrix's ICA [6] Microsoft's RDP [7] and AT&T's VNC [8] for Unix that took advantage of the increasing bandwidth available on a LAN to provide a remote desktop to users. Terminal Services was introduced as part of Windows NT4.0 in 1996 and it offered support for the Remote Desktop Protocol (RDP) allowing access to Windows applications running on the Server, giving users access to a desktop on the Server using an RDP client on their PC. RDP is now offered on a range of Windows platforms [9]. Wyse and vendors such as Ncomputing launched terminals, which didn't run the Windows operating system, but accessed Windows applications on a Windows Server using RDP, which is probably still the

The history of Thin Clients is marked by a number of overly optimistic predictions that it was about to become the dominant model of desktop computing. In spite of this there have been a number of marked developments in this history along with those of desktop computing in general which are worth reviewing to set the context for examining the user acceptance of this technology. Thin Clients have established a role in desktop computing although not quite the dominant one initially predicted. These developments have usually been driven by increases in processing power (and reductions in the processor costs) in line with Moore's law, but the improvements in bandwidth and storage capacity are having an increasing effect on desktop computing and on Thin Client computing [2] driving the move towards more powerful lower cost desktops but also the possibilities of server virtualisation and Thin Client computing with the ability to run Thin Clients over WANs. The first wave of computing was one where centralised mainframe computers provided the computing power as a shared resource which users accessed using dumb terminals which provided basic text based input and output and then limited graphics as they became graphics terminals. These mainframes were expensive to purchase and were administered by specialists in managed environments and mostly used for specific tasks such as performing scientific calculations and running highly specialised bespoke payroll systems. The next wave was that of personal computing, whereby users administered their own systems which provided a platform for their personal applications, such as games, word-processor, mail and personal data. Since then the personal computer has undergone a number of significant changes, but the one of most interest was the nature of the interface provided to the user which has grown into a rich Graphical User Interface where the Personal Computer became a gateway to the Internet with the Web browser evolving into a platform for delivery of rich media content, such as audio and video. This move from a mainframe centralised computing model to a PC distributed one resulted in a number of cost issues related to administration. This issue was of particular concern for corporate organizations, in relation to licensing, data security, maintenance and system upgrades. For these cost reasons and the potential for greater mobility for users, the use of Thin Clients is often put forward as a way to reduce costs using the centralised model of the Thin Client architecture. This also offers lower purchase costs and reduces the consumption of energy [3].

UbiCC Journal – Volume 4 No. 3

586

Special Issue on ICIT 2009 Conference - Applied Computing
dominant role of dedicated hardware Thin Clients. Similarly VNC is available on many Linux and Unix distributions and is commonly used to provide remote access to a user's desktop. These remote display protocols face increasing demands for more desktop functionality and richer media content, with ongoing work required in how, where and when display updates are encoded, compressed or cached [10]. Newer remote display protocols such as THINC have been designed with the aim of improving these capabilities [11]. In 1999, Sun Microsystems took the Thin Client model further with the SunRay, which was a simple network appliance, using its own remote display protocol called ALP. Unlike some of the other Thin Clients which ran their own operating system, SunRay emphasized its completely stateless nature [12]. This stateless nature meant that no session information or data was held or even cached (not even fonts) on the appliance itself and enabled its session mobility feature, whereby a smart card was used to identify a user with a session so that with the smartcard the user could login from any SunRay connected to the session's server and receive the desktop as it was previously. Many of these existing players have since focused on improving their remote desktop protocols and support for multimedia or creating new hardware platforms. There have also been some newer arrivals like Pano Logic and Teradici who have developed specific client hardware to create “zero” clients, with supporting server virtualisation to render the remote display protocols. Also, there are a number of managed virtual desktops hosted in a data centre now being offered. One of the drivers behind Thin Client Technology, particularly when combined with a dedicated hardware device, is to reduce the cost of the client by reducing the processing requirement to that of simply rendering content, but a second driver (and arguably more important one) is to gain a level of universality by simplifying the variations in the client side environment. This has been met in a number of new ways using Virtual Machine players and USB memory in Microsoft's research project “Desktop on a Keychain” (DOK) [13] and also the Moka5 product [14], allowing the mobility (and security) benefits attributed to Thin Clients. This can be enhanced with the use of network storage to cache session information [15]. It can be seen that Thin Clients have evolved along with other desktop computing approaches, often driven by the same factors of increasing processing power, storage capacity and bandwidth. However, newer trends that are emerging with regard to virtualisation, internet and browser technologies, together with local storage, present new challenges and opportunities for Thin Client technology to win user acceptance. As Weiser said in 1999 in this new era, “hundreds or thousands of computers do our bidding. The relationship is the inverse of the mainframe era: the people get the air conditioning now, and the nice floors, and the computers live out in cyberspace and sit there waiting eagerly to do something for us”. [16] 3 THE CASE FOR THIN CLIENTS

There are many stated benefits for Thin Clients all of which are well documented [17][18]. While there is no single definitive list, potential system designers may have different aims when considering Thin Clients, these benefits should be clearly understood prior to embarking on any deployment and are discussed below. 3.1 Reduced cost of software maintenance The administrative cost benefit of the Thin Client model, according to Jern [19] is based on the simple observation that there are fewer desktop images to manage. With the combination of virtualisation environments and Windows Terminal Service (WTS) systems it would not be uncommon for twenty five or more desktop environments to be supported from a single installation and configuration. This reduces the number of upgrades and customizations required for desktop images in computer laboratories where the aim is to provide a consistent service from all systems. Kissler and Hoyt [20] remind us that the “creative use of Thin Client technology can decrease both management complexity and IT staff time.” In particular they chose Thin Client technology to reduce the complexity of managing a large number of kiosks and quick-access stations in their new thirty three million dollar library. They have also deployed Thin Client devices in a range of other roles throughout Valparaiso University in Indiana. Golick [21] on the other hand suggests that the potential benefits of a Thin Client approach include the lower mean time to repair (MTTR) and lower distribution costs. It is interesting to note that he does suggest that the potential cost savings for hardware are a myth, but that administration savings still make a compelling case for using Thin Client technology. Enhanced Security Speer and Angelucci [22] suggest that security concerns should be a major factor in the decision to adopt Thin Client systems and this becomes more apparent when referencing the Gartner Thin Client classification model. The Thin Client approach ensures that data is stored and controlled at the datacentre hosting the Thin Client devices. It is easy to argue that the user can retain the mobility of laptops but with enhanced security and the data is not mobile, just the access point. The argument is even easier to make when we consider recent high-profile cases of the theft of unencrypted laptops containing sensitive medical or financial records. The freedom

3.2

UbiCC Journal – Volume 4 No. 3

587

Special Issue on ICIT 2009 Conference - Applied Computing
conferred on users of corporate desktop and laptop PCs undermines the corporation’s obligations in relation to data privacy and security. Steps taken to protect sensitive data on user devices are often too little and too late. Strassmann [23] states that the most frequent use of a personal computer is for accessing web applications and states that the Thin Client model demonstrates significantly lower security risks for the corporation. Five security justifications for adopting the Thin Client model were proposed. 1) 2) 3) 4) 5) Zombie Prevention Theft Dodging File Management Software Control Personal Use Limitations 1) 2) 3) 4) 5.4 million kWh reduction, 2,800 tonnes of CO2 saved annually Servers reduced by a factor of 20 IT budget cut by a fifth

Strassmann concedes that Thin Clients are not necessarily best for every enterprise and every class of user, but for enterprises with a large number of stationary “non-power” users, “Thin Clients may present the best option in terms of security, cost effectiveness and ease of management.” 3.3 User Mobility User mobility can refer to the ability of a user to use any device, typically within the corporation’s intranet, as a desktop where the user will see a consistent view of the system, for example, SunRay hot-desking. While user profiles in Microsoft Windows support this, it is often only partially implemented. Session mobility can be viewed as the facility for users to temporarily suspend or disconnect their desktop session and to have it reappear, at their request, on a different device at a later time. This facility removes the need for users to log-out or to boot-up a desktop system each time they wish to log-in. Both of these potential features of Thin Client technologies help to break the sense of personal ownership that users often feel for their desktop or laptop computers. It is this sense of personal ownership which makes the maintenance and replacement of corporate PCs a difficult task, and this feeling of ownership and control is often a reason why users resist the adoption of a centrally controlled Thin Client to replace their desktop, whereas this is exactly why IT management may want to adopt it. Environmental Costs In the article “An Inefficient Truth” Plan [24] reveals a series of “truths” supported by a number of case studies directed at the growing costs of Information and Communication Technologies. One such case study is of Reed Managed Services where 4,500 PCs were replaced with Thin Clients, and a centralised blade server providing server based virtualised desktops. Savings are reported as follows:

Indeed there are many deployments focused on obtaining energy savings through the use of Thin Clients. In a case study where SunRay systems were introduced into Sparkasse a public German Bank, Bruno-Britz [25] reports that the savings in electricity costs alone were enormous. The University of Oxford has deployed SunRay Thin Client devices in their libraries citing the cooler and quieter operation as factors in their decision. These devices, having no local hard disk and no fan operate at a lower temperature and more quietly than traditional PCs. This characteristic has environmental implications from noise, cooling and power consumption perspectives. 3.5 Summary of Benefits In summary, we can extract the benefits observed within literature and case studies as follows: 1) Increased security as data maintained centrally 2) Reduced cost of hardware deployment and management and faster MTTR 3) Reduced administration support costs 4) Environmental costs savings 5) Reduced cost of software maintenance 6) Reduced cost of software distribution 7) Zero cost of local software support 8) The ability to leverage existing desktop hardware and software 9) Interface portability and session mobility 10) Enhanced Capacity planning 11) Centralised Usage Tracking and Capacity Planning 3.6 Thin Clients vs. Fat Clients Thin Client technology has evolved in sophistication and capability since the middle of the 1990s, however the “thickness” (the amount of software and administration required on the access device) of the client is a source of distinction for many vendors [26][27]. Regardless of “thickness”, Thin Clients require less configuration and support when compared to Fat Clients (your typical PC). In the early 1990s Gartner provided a client-server reference design shown in Figure 1. This design provides clarity for the terms “thin” and “fat” clients by viewing applications in terms of the degree of data access, application and presentation logic present on the server and client sides of the network. The demand for network based services such as email, social networking and the World Wide Web has driven bandwidth and connectivity requirements to higher and higher levels of reliability and performance [28]. As we progress to an “always on”

3.4

UbiCC Journal – Volume 4 No. 3

588

Special Issue on ICIT 2009 Conference - Applied Computing
network infrastructure the arguments focused against Thin Clients based on requiring an offline mode of usage are less relevant. The move from Fat Client to Thin Client is however often resisted as individuals find themselves uncomfortable with the lack of choice provided when the transition is made, as observed by Wong et al.[29]. incomplete and flawed technology. In the case of Thin Clients, it should be accepted that there are tradeoffs to be made. One of the appealing aspects of the Fat client is its ability to be highly flexible which facilitates extensive customization. However not every user will require that flexibility and customization. Thin Clients are not going to be a silver bullet addressing all users needs all of the time. All three case studies were evaluated under the following headings in order to allow a direct comparison between each. These criteria were selected to ensure that there was a balance between the user acceptance of the technology and the technical success of each deployment. 1) Login events on the Thin Clients 2) Reservation of the Thin Client facility 3) The cost of maintaining the service

Figure 1: Gartner Group Client/Server Reference Design

4

CASE STUDIES

No matter how well documented the benefits of Thin Clients may be, there is still an issue of acceptance to be addressed. While it may be tempting to assume that the implementation of technology is a technical issue and that simply by building solutions a problem is effectively solved, evidence would point to the contrary. As there can often be a disparity between what is built and what is required or needed. Too often requirements gathering, specification definition and user consultation are forgotten in the rush to provide new services which are believed to be essential. In essence the notion of “if we build it they will come” is adopted, inevitably causing confusion and frustration for both service provider and the user. For example, during Sun Microsystems’ internal deployment of its own SunRay Thin Client solution many groups and functions sought exemptions from the deployment as they believed that their requirements were sufficiently different to the “generic user” to warrant exclusion from the project. The same arguments still exist today and it is often those with a more technical understanding of the technology who are the agents of that technology’s demise. By providing interesting and often creative edge cases which identify the limitations of a technology, they can, by implication, tarnish it as an

Figure 2: Case Study 1

4.1

DIT Case Study 1 In 2005 the DIT introduced the SunRay Thin Client technology into the School of Computing. In a similar approach to many other technology deployments the strengths of the technology were reviewed and seen as the major selling points of the deployment. In the case of SunRay there was a cheap appliance available which would provide the service of graphical based Unix desktops. Centralised administration ensured that the support costs would be low and the replacement requirements for systems for the next five years would be negligible. In essence the technological and administrative advantages were the focus of this deployment. Few of the services offered within the existing PC infrastructure were included in the deployment. This deployment sought to offer new services to students and introduced Thin Clients for the first time to both students and staff.

UbiCC Journal – Volume 4 No. 3

589

Special Issue on ICIT 2009 Conference - Applied Computing
4.1.1 Design A single laboratory was identified for deploying the SunRay systems and all PC in that lab were replaced with SunRay 150 devices. A private network interconnect was built which ensured that all data sent from the clients traversed a private network to the SunRay server. The initial design of this case study is shown in Figure 2 and it allowed students within this new Thin Client lab access to the latest version of Solaris using a full screen graphical environment as opposed to an SSH command-line Unix shell which was the traditional method still used from existing computing laboratories. A new authentication system was introduced based on LDAP which required students to have a new username and password combination which was different to the credentials already in use within the Active Directory domain used for the existing PC network. The reason for this alternative authentication process was due to the difficulty of authenticating on a Unix system using Active Directory. Once the server was running, the Thin Client laboratory was ready to provide graphical based Unix login sessions at a considerable reduced price when compared to an investment of Unix workstations for each desk. In total 25 Thin Client devices were installed which were all connected to a single Solaris server. In summary the key components within the design were as follows: 1) 2) 3) 4) 5) 6) The service was on a private network New authentication process was introduced New storage mechanism was introduced Devices were all in the same location Service provided was a CDE desktop on Solaris Graphical desktops running on Linux servers also accessible Given that the nature of the service did not significantly change over the course of the three years that the system was in place with the exception of semester activity in line with student presence in the institute, it is clear that there was low utilization of the service. The graph shows raw data plotted, where login events were less than 10 per day.

14 12 10

Login Events per day

8 6 4 2 0 Feb 05 Feb 06 Feb 07 Feb 08

Figure 3: User Login Events

Reservation of the Thin Client Facility: Each laboratory may be reserved by staff for the delivery of tutorial sessions and exercises. The hourly reservations for this laboratory were reduced as a result of the introduction of Thin Clients with only 1 to 2 hours being reserved per day. One of the primary reasons for the reduction in the use of this facility was the fact that it had now become special purpose and the bookings for the room were limited to the courses which could be taught within it. The Cost of Maintaining the Service: A detailed analysis of cost savings associated with the introduction of Thin Clients within our institute and specifically the costs associated with this case study was performed by Reynolds and Gleeson, [30]. In their study they presented evidence of savings in relation to the cost of support, the cost of deployment and a basic analysis of the power consumption costs. They review both the system and the software distribution steps associated with Thin Clients and PC systems and present a point of quantifiable comparison between the two. Key findings of this analysis were as follows: 1) Time spent performing system upgrades and hardware maintenance was reduced to virtually zero as no hardware or software upgrades were required. 2) A single software image was maintained at the central server location and changes were made available instantly to all users. 3) No upgrade costs were incurred on the Thin Clients or server hardware. All systems have

4.1.2 Results The login events are a measure of the general activity of the devices themselves and were considered to be a reasonable benchmark for comparison with existing laboratories within the institute. One interesting point is that the comparison of facilities is not necessarily relevant when the facilities provide different services. Due to the fact that Unix instead of Windows was provided meant that, with the exception of those taking courses involving Unix, the majority of students were unfamiliar with the technology and did not seek to use the systems. Login events on the Thin Clients: The login events were extracted from the Solaris server by parsing the output of the last command which displays the login and logout information for users which it extracts from the /var/adm/wtrmpx file. The number of login events per day was calculated and plotted in the graph shown in Fig. 3. Immediately obvious was the low use of the system.

UbiCC Journal – Volume 4 No. 3

590

Special Issue on ICIT 2009 Conference - Applied Computing
remained in place throughout both case studies. The devices in this lab are now 8 years old and are fulfilling the same role today as they did when first installed. 4) The Thin Client lab is a low power consumption environment due to the inherent energy efficiency of the Thin Client hardware over existing PCs. This can provide up to 95% energy savings when compared to traditional PCs [24]. 4.1.3 Analysis There has been extensive research in the area of user acceptance of technology, but perhaps the most relevant work in this area is the Unified Theory of Acceptance and Use of Technology (UTAUT) [1] which identifies four primary constructs or factors; a) b) c) d) Performance Expectancy Effort Expectancy Social Influence Facilitating Conditions This is defined as the degree to which there is a perception of how others will view or judge them based on their use of the system. Clearly by isolating the devices and having it associated with specialized courses, there was no social imperative to use the labs. Unix as a desktop was relatively uncommon in the School at the time of the case study and there would have been a moderate to strong elitist view of those who were technical enough to use the systems. d) Facilitating Conditions This is defined as the degree to which an individual believes in the support for a system. At first glance this does not appear to be a significant factor considering that the services were created by the support team and there was considerable vested interest in seeing it succeed. However additional questions asked by the UTAUT include the issue of compatibility with systems primarily used by the individual. Each of the UTAUT factors can be considered significant for Case Study 1. Many of the issues raised hang on the fundamental issue that the new services offered on the Thin Client were different to existing services and for all practical purposes seen as incompatible with the majority of systems available to students elsewhere. The fact that the technology itself may have worked flawlessly, and may have delivered reduced costs was irrelevant as the service remained under utilized. Given that the reason for this lack of acceptance was potentially inherent in the implementation of services and not due to failings in the technology itself it was clear that a second case study was required which would address the issue of service.

While there are additional factors such as Gender, Age and Experience, within the student populations these are for the most part reasonably consistent and will be ignored. It should be stressed that although the UTAUT was developed for an industry based environment it is easily adapted for our purposes. It was felt that this model serves as a relevant reference point when discussing the performance of the case studies. Clearly Case Study 1 failed to gain acceptance despite belief that it would in fact be highly successful at its inception. We review the case study under the four UTAUT headings to identify the source of the user rejection of the Thin Clients. a) Performance Expectancy This factor is concerned with the degree to which the technology will assist in enhancing a users own performance. Clearly however the services provided an advantage to those students who wished to use Unix systems. Since the majority of courses are based on the Windows operating system it would be reasonable to assume that there was no perceived advantage in using a system which was not 100% compatible with the productivity applications used as part of the majority of courses. b) Effort Expectancy This factor is concerned with the degree of ease associated with the use of the system. One of the clear outcomes of Case Study 1 was that students rejected the Unix systems as it was seen to be a highly complex system, requiring additional authentication beyond what was currently used in traditional laboratories. c) Social Influence

Figure 4: Case Study 2

UbiCC Journal – Volume 4 No. 3

591

Special Issue on ICIT 2009 Conference - Applied Computing
4.2 Case Study 2 The second case study is a modification of the basic implementation of the first case study with changes focused on increasing student acceptance of the Thin Client facility. Removing the Unix centric nature of the existing service was central to the system redesign. It was decided that additional services could be easily and cheaply offered to the Thin Client environment providing users with the ability to access more compatible services from within the Thin Client environment. Figure 4 identifies the key components within the design. b) Course specific Windows Terminal Servers for courses where there were specific software requirements not common to all students. c) Individual Virtualised desktops for students in specific modules where administration rights were required. d) All services were made available from both the Thin Client and PC labs as they were available over the Remote Desktop Protocol RDP. e) Provisioning of an easy access point to all services from within the Thin Client environment which was not available from PC systems. 4.2.2 Results The data gathered for Case Study 2 was evaluated under same three headings as per case study 1. 1) Login events on the Thin Clients 2) Reservation of the Thin Client facility. 3) The cost of maintaining the service. 25 20 Login Events per Day 15 10 5 0 08 Feb 22 Feb 08 Mar 22 Mar 05 Apr
Figure 5: User Login Event Comparison Case Study 2

4.2.1 Design The most important addition to the second case study was the provision of additional services which were similar to those available in PC labs. This was to ensure that students could use this facility and have an experience on a par with the PC labs. A new domain was created where Unix and Windows shared a common authentication process. Due to difficulties integrating Unix and the existing Windows authentication process, the new Domain was built on the LDAP system with SAMBA providing the link between the new Windows Terminal Servers and the LDAP system. While students could now use the same username and password combination for Windows and Unix systems this was not integrated into the existing Windows authentication process. Students were still required to have two sets of credentials, the first for the existing PC labs, and the second for access to a new domain containing a number of Windows Terminal Servers and the original graphical Unix desktop. While the Thin Clients now provided Windows and Unix graphical desktops, the new Windows Domain was also accessible from existing PC labs via RDP connections to the Terminal Servers. This allowed classes to be scheduled either inside or outside of the Thin Client laboratory. In addition to providing Windows Terminal Services (WTS), student owned virtual machines were now also available. Due to the fact that most services were now available from all locations, the ease of access to the services from within the Thin Client lab was improved by providing users with a menu of destinations upon login. This new login script effectively provided a configurable redirection service to the WTS and Virtualisation destinations using the rdesktop utility [31] which performed a full screen RDP connection to specified destinations. An interesting outcome of this destination chooser was that any RDP based destination could be included regardless of the authentication process used. This would however require a second authentication process with the connecting service. The new services provided were as follows: a) A general purpose Windows Terminal Server with mounted storage for all students and staff.

Case Study 1

Login events on the Thin Clients: Figure 5 shows a comparison of activity during the same time period for the two case studies. To identify trends in the data a displacement forward moving average was performed on the data as shown in Eq. (1). (1)

It is clear that for the same time period there was a significant increase in the use of the system as the number of login events increased by a factor of 4. Once again the login events were extracted from the Solaris server by parsing the output of the last command. Reservation of the Thin Client Facility: The changes to the Thin Client facility were announced at the start of the second academic semester as a PC upgrade and the number of room bookings increased as shown in Figure 6 from 6 hours a week to 20 hours a week. This was due to the use of the room as a Windows based laboratory

UbiCC Journal – Volume 4 No. 3

592

Special Issue on ICIT 2009 Conference - Applied Computing

using the new WTS and virtualisation services. 8

Hours per day

Case Study 1 Case Study 2

6 4 2 0 Mon Tue Wed

Thurs

Fri

Figure 6: Thin Client Room Reservations

The Cost of Maintaining the Service: All of the benefits observed from the first case study were retained within this case study. The addition of terminal services reduced the reliance of students on Fat Client installations. Students are now using virtual machines and terminal servers on a regular basis from all labs.

4.2.3 Analysis This second case study certainly saw an improvement over its earlier counterpart and students and staff could now access more familiar services from the Thin Client lab. Given the dramatic increase relative to the earlier results it could be stated that the introduction of the more familiar services increased the acceptance of the facility. Both case studies demonstrated equally well that it is possible to obtain the total cost of ownership benefits using a Thin Client model, but the services offered has a dramatic affect on user acceptance. It is useful to review the outcome in relation to the UTUAT. a) Performance Expectancy Given that new services such as personalised virtual machines were now available, staff and students could identify a clear advantage to the system where administration rights could be provided in a safe manner, allowing more complex and previously unsupported activities to take place. For example, the Advanced Internet module for the MSc. students could now build and administer full web servers which could remain private to the student ensuring that no other student could access or modify a project which was a work in progress. b) Effort Expectancy Considerable improvements were made in this case study to allow users to access well known environments from both the Thin Clients and PC systems. Students who were taught modules using the new WTS or virtual environments were trained on how to access the systems, and once they used them they continued to do so throughout the year. Those who did not have

modules being taught using these new services were still required go through a new login/access process which was not well documented. For example within the Thin Client labs the new username/password combination was required to access the choice of destinations from the devices. This acted as a barrier to use even though emails were sent to students and information on how to access these accounts were posted in the labs. Usernames were based on existing student ID numbers. c) Social Influence Little changed in this case study for those who did not have a teaching requirement based on the new services. d) Facilitating Conditions With the provision of WTS services and virtual machines which provided Windows environments the issue of compatibility was reduced. However two issues remained which were not addressed. Firstly while users could now share a common data store between systems on this new domain there was no pre-packaged access to the data store on the existing PC domain. While it was technically possible to combine both under a single view, this required user intervention and additional training which was not provided. Secondly the sequence of steps required to access choices from the Thin Clients was a non-standard login process which now required a second login, the first of which was at a Unix graphical login screen. For many this initial login step remained as a barrier to using the system.

The most striking result from this case study is that while the second case study demonstrated significant increase in acceptance and use, the PC environments remained the system of choice for students, as shown in Figure 7. In this graph we show the typical use PC laboratory within the same faculty. Thin Client use remained less than one third of the use of the busiest computer laboratory. Thin Clients are shown to be capable of providing services equally well to both Windows and Unix users by introducing the ability of students to access their own private desktop from many locations, however this feature alone was not enough to entice users from the existing PC infrastructure. Clearly the introduction of virtualisation to the infrastructure allowed new services to be developed and used from Thin and Fat clients which could be seen as a potential for migrating users to a Thin Client/Virtualisation model, which indeed is a future planned initiative. The results show a definite increase in the use of the Thin Client facilities with data being gathered from the same period over both case studies to eliminate any bias which might occur due to module schedule

UbiCC Journal – Volume 4 No. 3

593

Special Issue on ICIT 2009 Conference - Applied Computing

differences at different time periods during the year. The timing and method used to announce the changes was critical to the increase in acceptance. The announcement of the systems as a PC upgrade removed some of the barriers which existed for users who did not feel comfortable with a Unix environment but failed to attract a majority of the students. 80 70 60 50 40 30 20 10 0
08 Feb

PC Lab 1

Case Study 2 Case Study 1
22 Feb 08 Mar 22 Mar 05 Apr

Figure 7: Comparison with PC Computer Labs

4.3

Case Study 3

The third case study was designed using the experiences of the first two case studies and was extended beyond the School of Computing. It was aimed at demonstrating the capability of the Thin Client technology in two different demographic environments, the first was one of the Institute Libraries where PCs were used by students from many different faculties and the second was within the Business faculty where computer system use was provided in support of modules taught within that faculty. This case study expressed the following aims at the outset 1) To demonstrate the use of Thin Client technology within the student population and determine the level of student acceptance of that technology. 2) To implement a number of alternative technologies in order to provide a point of comparison with respect to their overall performance and acceptance. 3) To determine the capability of the existing network infrastructure to support Thin Clients. 4.3.1 Design Unlike the previous case studies the aim was to insert Thin Clients into the existing environment as invisibly as possible. This meant that existing authentication processes were to be maintained. There were two different authentication processes in place which needed to be support, Novell Client for the Business faculty and Active Directory for the Library. In both cases a WTS system was built which joined to the respective domains. Applications were installed on the Thin Client in order to mirror those that were present on existing PCs in the chosen

locations. It was essential that the Thin Clients were not to be identifiable by students if at all possible, and to co-locate them with existing PC systems. To ensure that all devices behaved in a consistent manner to PCs they must boot and present the same login screen as would be expected on a PC in the same location. To achieve this all Thin Client devices with the exception of the SunRay systems used a Preboot Execution Environment (PXE) [32] boot process to connect to a Linux Terminal Server Project server (LTSP). The server redirected the user session to the correct WTS using rdesktop where the user was presented with a Windows login screen identical to those on adjacent PC systems. The SunRay systems were run in Kiosk mode which allowed the boot sequence to redirect the session to a WTS also via the rdesktop utility. The WTS were installed on a VMWare ESX Server to allow rollback and recovery of the servers. This however was not central to the design of the case study and only served as a convenience in sharing hardware resources between multiple servers. The only concern was the potential performance of the WTS under a virtualised model. Given that the applications were primarily productivity applications such as word processing and browsing, and that the maximum number of users allowable on any WTS was 25 (based on the number of devices which were directly connected to the WTS) this was considered to be within the acceptable performance range of the architecture. This assumption was tested prior to the case study being made accessible to students with no specific issues raised as to warrant further restructuring of the architecture Seventy five Thin Clients were deployed in six locations. The following Thin Client devices were used as shown in Figure 8 and Table 1.

Login Events per Day

Figure 8: Case Study 3

UbiCC Journal – Volume 4 No. 3

594

Special Issue on ICIT 2009 Conference - Applied Computing

Table 1: Thin Clients deployed Device Boot Mode Quantity Dell GX260 PXE Boot PC 15 Dell FX 160 PXE Boot TC 25 HP T5730 PXE Boot TC 8 Fujitsu FUTRO S PXE Boot TC 2 SunRay 270 SunRay 25 4.3.2 Linux Terminal Server Project LTSP works by configuring PCs or suitable Thin Clients to use PXE-Boot to obtain the necessary kernel and RDP client used as part of this project. These are obtained from a TFTP server whose IP address is provided as a DHCP parameter when the client PXE-Boots. As part of the DHCP dialogue, devices configured to PXE-Boot are given settings by the DHCP server. These include; TFTP Boot Server Host Name and Bootfile Name. The necessary settings were configured on each of the DHCP servers serving the relevant locations within the DIT so as to point any PXE-Boot devices to the relevant LTSP boot server and to specify the kernel to be loaded by the PXE-Boot client. Using these settings the PXE-Boot clients load a Linux kernel and then an RDP client which connects to one of the three WTS used as part of this case study.

remotely from the primary labs within the Business faculty and traditionally did not have high use. Lab 2 was a more central location and again as expected this exhibited greater user activity. The systems remained in operation continually for the period of the case study which was over one month during which data was collected from the three WTS systems. 4.3.4 User Survey Once the case study was running a desktop satisfaction survey which employed the Likert scale [33] was conducted to obtain feedback from students using the Thin Client systems. The design of the questionnaire was such that students were asked to identify their desktop using a colour coded system which was known only to the authors. Each of the Thin Clients and a selection of PC systems (which were not PXE booted) where targeted for the survey to allow a comparative analysis between all Thin Clients and existing PC systems to be performed. The survey did not reference Thin Clients in any of the questions but rather sought feedback on application use and overall satisfaction with the performance of the system through a series of questions. There were 234 responses recorded for the survey. The key questions in the survey were as follows. 1) Please rate the overall performance of the machine you are currently using 2) Please identify the primary reason you used this computer 3) How would you rate your overall satisfaction with this desktop? 4) Would you use this desktop computer again? 80% User Satisfaction Ratings 75% 70% 65% 60%
PC-Fat SunRay PXE HP TC Dell TC Client Boot PC

140 Login Events per Day 120 100 80 60 40 20 0 17 Apr 24 Apr 1 May 8 May 15 May
Figure 9: User Login Event Comparison

Library Lab 2 Lab 1

4.3.3 Results Use of the Thin Clients was recorded using login scripts on the Windows Terminal Servers which recorded login and logout events. As expected the use of the Library systems exceed the use of the laboratories but both were in line with typical use patterns expected for each location. What was immediately obvious was that each location had a higher utilization than the previous two case studies but comparable with the PC labs shown in Figure 9. One of the difficulties with the comparison however is that the final case study was performed at a different point in the teaching semester and use of the systems declined as students prepared for examinations. Lab 1 was a “quiet lab” located

All Applications

Browsers

Figure 10: User satisfaction rating of desktop performance

The issue of overall performance was broken down by the device used which was identified using the colour coded scheme described earlier. Figure 10 below represents the average rating of satisfaction reported by users broken down by device and primary application in use. Since over 50% of responses identified “Browsing” as the primary reason for using the machine there are two

UbiCC Journal – Volume 4 No. 3

595

Special Issue on ICIT 2009 Conference - Applied Computing

User Satisfaction Ratings

satisfaction ratings provided as a point of comparison. Figure 11 shows the combined rating of users responses to overall satisfaction with desktop, desktop performance and application performance.

Non-USB Storage 100% 90% 80% 70% 60% 50% 40%
PC

USB Only

80% User Satisfaction Ratings 78% 76% 74% 72% 70% 68% 66%
PC SunRay PXE Boot HP DELL

SunRay PXE Boot

HP

Dell

Figure 13: Storage Satisfaction Rating

Figure 11: Combined rating of desktop performance

4.3.5 Analysis This final case study while shorter in length than the other case studies demonstrated significant progress in user acceptance. As part of the survey users were asked if they would consider reusing the system and as can be seen in Figure 12 there was significant support for the systems. The small number of responses representing those who did not wish to reuse the system cited USB performance as the primary cause of their dissatisfaction. This was identified early in the testing of the Thin Clients that all systems performed noticeably slower than the PC systems in this respect. Questions regarding the primary storage method used by students were added to the survey as was a satisfaction rating. From the results in Figure 13 it is clear that while the PC systems did perform better when users primarily used USB storage, the satisfaction in storage performance for all other options were comparable. The HP satisfaction rate had a low survey response rate and hence was not considered significant in our analysis given the small number of data points. NO, 8%

By making the Thin Clients as invisible as possible and comparing satisfaction and user access to the existing PC systems it was clear that for the majority of users there was no apparent change to the services provided. Integrating into the existing authentication process was an essential feature of this case study as was the presenting of a single authentication process at the WTS login screen. Efforts were also made to ensure that the applications installed on the WTS were configured to look and feel the same as those on the standard PC. As with the previous case studies it is useful to review the case study in relation to the UTUAT. a) Performance Expectancy With the exception of increasing the number of desktops in the Library, the primary deployment mainly replaced existing systems, so users were not provided with any reminders that they were using a different system. In effect there was no new decision or evaluation by the user to address the questions which were relevant in the previous case studies. b) Effort Expectancy The reuse of the existing login/access procedure which was well known and part of the normal process for students using existing PC systems again allowed for this factor to become mainly irrelevant. Usernames, passwords, applications and system behaviour were identical to those on the PCs. c) Social Influence Without perceiving a difference in service, social influence as a factor was also eliminated. Only the SunRay systems had different keyboards and screens, and as these screens were of higher resolution than existing PCs they were if anything seen as a more popular system. d) Facilitating Conditions Unlike the previous case studies support for the facility was more complex. Different levels of expertise and engagement were required. Thin

YES, 92%

Figure 12: User Response "Would you use this system again"

UbiCC Journal – Volume 4 No. 3

596

Special Issue on ICIT 2009 Conference - Applied Computing

Clients were now part of a larger support structure where many individuals were not core members of the technical team who built the systems. However given that only three support calls were raised during the case study there was little pressure on this factor either. The calls raised were not in fact directly related to the Thin Client devices, but rather the network and the virtual environments used to host the centralised servers. 5 CRITICAL ANALYSIS

case study. These three case studies provide data centric analysis of user acceptance and identify the evolving designs of our deployments. To gain acceptance of Thin Clients within an educational institute our case studies identified these key factors. 1) Locate the Thin Clients among the existing PC systems, do not separate them or isolate them. 2) Ensure that the login process and credentials users are identical to the existing PC systems. 3) Ensure that the storage options are identical to the existing PC systems 4) Focus on providing exactly the same services that already exist as opposed to focusing on out new services. By ensuring we ran a blind test on the user population where Thin Clients co-existed with PC systems, and where the services offered were indistinguishable by the user, we were able to show a user satisfaction rating of 92%. No significant bias was evident in our comparison of user attitudes of desktop services delivered over PCs and Thin Clients. 7 FUTURE WORK

The UTUAT provides a useful reference point in understanding some of the factors affecting acceptance of the Thin Clients. In the first case study the primary barrier to acceptance was the incompatibility of the new system with the existing system. Students were not motivated to use the new system as there were few advantages to doing so and considerable effort in learning how to use it. The second case study while more successful still failed to gain acceptance despite the expansion of services offered being comparable with existing Windows services. The session mobility and access from anywhere feature, while useful did not overcome the resistance of users to migrate to the Thin Clients. Thin Clients still required separate credentials and the login process was still different to the PC systems. The third and final case study was designed to provide the same existing services as the PC only using a centralised server and Thin Client model. No new services for the user were provided. The primary aim was to have the systems indistinguishable from the existing installation of PCs, effectively running a blind test for user acceptance. Once the users accepted the new systems, further machines could be deployed quickly and cheaply. The total cost of ownership and centralised support savings demonstrated in the first two case studies were just as relevant in the third case study. 6 CONCLUSION

Additional case studies are planned which will focus on acceptance of Thin Clients within the academic staff population and will evaluate the relevance of some of the proposed core technological advantages within that environment such as session mobility, Desktop as a Service, and dynamic lab reconfiguration and remote access using WAN and not just LAN environments. 8 REFERENCES [1] V. Venkatesh, M.G. Morris, G.B. Davis, and F.D. Davis, “User acceptance of information technology: Toward a unified view,” Mis Quarterly, 2003, pp. 425-478. [2] J.D. Northcutt, “CYB Newslog - Toward Virtual Computing Environments.” [3] D. Tynan, “Think thin,” InfoWorld, Jul. 2005. [4] S.J. Yang, J. Nieh, M. Selsky, and N. Tiwari, “The Performance of Remote Display Mechanisms for Thin-Client Computing,” IN PROCEEDINGS OF THE 2002 USENIX ANNUAL TECHNICAL CONFERENCE, 2002. [5] T. Richardson, F. Bennett, G. Mapp, and A. Hopper, “Teleporting in an X window system environment,” IEEE Personal Communications Magazine, vol. 1, 1994, pp. 6-13. [6] Citrix Systems, “Citrix MetaFrame 1.8 Backgrounder,” Jun. 1998.

There is considerable literature in support of Thin Client technology, and while there may be debate regarding the finer points of its advantages the issue has been and continues to be one of acceptance. Acceptance for Thin Clients as a technology is often confused with the non technical issues arising from the deployment. The UTUAT helps distinguish between technical and non-technical issues and as shown within our case studies, the way in which the technology was presented to the user had a higher impact on acceptance than had the technology itself. This point is highlighted by the fact that the Thin Client devices which were not widely used in first case study were integrated seamlessly into the third

UbiCC Journal – Volume 4 No. 3

597

Special Issue on ICIT 2009 Conference - Applied Computing

[7] Microsoft Corporation, “Remote Desktop Protocol: Basic Connectivity and Graphics Remoting Specification,” Technical White Paper, Redmond, 2000. [8] T. Richardson, Q. Stafford-Fraser, K. Wood, and A. Hopper, “Virtual network computing,” Internet Computing, IEEE, vol. 2, 1998, pp. 33-38. [9] Microsoft Corporation, “Microsoft Windows NT Server 4.0, Terminal Server Edition: An Architectural Overview,” Jun. 1998. [10] J. Nieh, S.J. Yang, and N. Novik, “A comparison of thin-client computing architectures,” Network Computing Laboratory, Columbia University, Technical Report CUCS-022-00, 2000. [11] R.A. Baratto, L.N. Kim, and J. Nieh, “Thinc: A virtual display architecture for thin-client computing,” Proceedings of the twentieth ACM symposium on Operating systems principles, ACM New York, NY, USA, 2005, pp. 277-290. [12] B.K. Schmidt, M.S. Lam, and J.D. Northcutt, “The interactive performance of SLIM: a stateless, thin-client architecture,” Proceedings of the seventeenth ACM symposium on Operating systems principles, Charleston, South Carolina, United States: ACM, 1999, pp. 32-47. [13] M. Annamalai, A. Birrell, D. Fetterly, and T. Wobber, Implementing Portable Desktops: A New Option and Comparisons, Microsoft Corporation, 2006. [14] “MokaFive, Virtual Desktops,” http://www.mokafive.com/. [15] R. Chandra, N. Zeldovich, C. Sapuntzakis, and M.S. Lam, “The Collective: A cache-based system management architecture,” Proceedings of the 2nd USENIX Symposium on Networked Systems Design and Implementation (NSDI’05). [16] M. Weiser, “How computers will be used differently in the next twenty years,” Security and Privacy, 1999. Proceedings of the 1999 IEEE Symposium on, 1999, pp. 234-235. [17] M. Jern, “"Thin" vs. "fat" visualization clients,” Proceedings of the working conference on Advanced visual interfaces, L'Aquila, Italy: ACM, 1998, pp. 270-273. [18] S. Kissler and O. Hoyt, “Using thin client technology to reduce complexity and cost,” Proceedings of the 33rd annual ACM SIGUCCS conference on User services, ACM New York, NY, USA, 2005, pp. 138-140. [19] M. Jern, “"Thin" vs. "fat" visualization clients,” L'Aquila, Italy: ACM, 1998, pp. 270-273. [20] S. Kissler and O. Hoyt, “Using thin client technology to reduce complexity and cost,” New York, NY, USA: ACM, 2005, pp. 138– 140.

[21] J. Golick, “Network computing in the new thinclient age,” netWorker, vol. 3, 1999, pp. 3040. [22] S.C. Speer and D. Angelucci, “Extending the Reach of the Thin Client.,” Computers in Libraries, vol. 21, 2001, pp. 46 - . [23] P.A. Strassmann, “5 SECURE REASONS FOR THIN CLIENTS.,” Baseline, 2008, p. 27. [24] G.A. Plan, “An inefficient truth,” PC World, 2007. [25] M. Bruno-Britz, “Bank Sheds Pounds.,” Bank Systems & Technology, vol. 42, 2005, p. 39. [26] “Sun Ray White Papers,” http://www.sun.com/sunray/whitepapers.xml. [27] B.K. Schmidt, M.S. Lam, and J.D. Northcutt, “The interactive performance of SLIM: a stateless, thin-client architecture,” Charleston, South Carolina, United States: ACM, 1999, pp. 32-47. [28] S. Potter and J. Nieh, “Reducing downtime due to system maintenance and upgrades,” San Diego, CA: USENIX Association, 2005, pp. 66. [29] I. Wong-Bushby, R. Egan, and C. Isaacson, “A Case Study in SOA and Re-architecture at Company ABC,” 2006, p. 179b. [30] G. Reynolds and M. Gleeson, “Towards the Deployment of Flexible and Efficient Learning Tools: The Thin Client,” The Proceedings of the 4th China-Europe International Symposium on Software. China (Guanzhou). Sun Yat-Sen University. (2008). [31] “rdesktop: A Remote Desktop Protocol client.” [32] B. Childers, “PXE: not just for server networks anymore!,” Linux J., vol. 2009, 2009, p. 1. [33] R. Likert, “A Technique for the Measurement of Attitudes,” Archives of Psychology, vol. 140, 1932, pp. 1–55.

UbiCC Journal – Volume 4 No. 3

598

Special Issue on ICIT 2009 Conference - Applied Computing

AN INTERACTIVE COMPOSITION OF WORKFLOW APPLICATIONS BASED ON UML ACTIVITY DIAGRAM
Yousra Bendaly Hlaoui, Leila Jemni Ben Ayed Research Unit of Technologies of Information and Communication Tunis, Tunisia Yousra.bendalyhlaoui@esstt.rnu.tn Leila.jemni@fsgt.rnu.tn

ABSTRACT In today's distributed applications, semi automatic and semantic composition of workflows from Grid services is becoming an important challenge. We focus in this paper on how to model and compose interactively workflow applications from Grid services without considering lower level description of the Grid environment. To reach this objective, we propose a Model-Driven Approach for developing such applications based on semantic and syntactic descriptions of services available on the Grid and abstract description provided by UML activity diagram language as well. As there are particular needs for modeling composed workflows interactively from Grid services, we propose to extend the UML activity diagram notation. These extensions deal with additional information allowing an interactive and semi automatic composition of workflows. In addition this specific domain language contains appropriate data to describe matched Grid services that are useful for the execution of the obtained workflows. Keywords: Grid services, Interactive, semantic, composition, Workflow application, UML activity diagrams.

1

INTRODUCTION

Today’s distributed applications [23] are developed by integrating web or Grid services [13, 14] in a workflow. Due to the very large number of available services and the existence of different possibilities for constructing workflow from matching services, the problem of building such applications is usually a non trivial task for a developer. This problem requires finding and orchestrating appropriate Grid services in a workflow. Therefore, we propose an approach that allows semi automatic and interactive composition of workflow applications from Grid services. To describe and model workflow applications we use UML [25] activity diagrams. Recently, several solutions were proposed to compose applications from Grid services such as works presented in [8, 17, 18]. However, the proposed solutions need interaction with user and guidelines or rules in the design of the composed applications. Consequently, the resulting source code is neither re-usable nor it promotes dynamic adaptation facilities as it should. However, for applications composed of Grid services, we need an abstract view not only of the offered services but also of the resulting application [31]. This abstraction allows the reuse of the elaborated

application and on the other reduces the complexity of the composed applications. There are several architectural approaches for distributed computing applications [22] which make easy the development process. However, these approaches need rigorous development methods to promote the reuse of components in future Grid development application [16]. It has been proven from past experience that using structured engineering methods makes easy the development process of any computing system and reduces the complexity when building large Grid application [22]. To reduce this complexity and allow the reuse of Grid service applications, we adopt a model-driven approach [24]. Thus we introduce in this paper a new approach to build, interactively, workflow applications by following OMG(s) principals of the MDA in the development process [2, 3, 4]. In this approach [2, 3, 4], our focus is to compose and model workflows from existing Grid services that represent the main aspect in the development of Grid services applications. The workflow modeling identifies the control and data flows from one depicted Grid service's operation to the next to build and compose the whole application. To model and express the composed workflow of Grid services, we use as abstract language the

UbiCC Journal – Volume 4 No. 3

599

Special Issue on ICIT 2009 Conference - Applied Computing

activity diagrams of UML [25]. The provided model forms the Platform Independent Model (PIM) of the proposed MDA approach. This model is more understandable for the user than an XML [35] based workflow description languages like BPEL4WS [15] which represent the Platform Specific Model (PSM). This paper is organized as follows. Section 2 presents the related work. Section 3 introduces the different components of the composition system; section 4 specifies our proposed UML profile, composition patterns and different steps of the interactive composition process. Finally, section 5 concludes the paper and proposes areas for further research. 2 RELATED WORK

Many works were carried out in the field of Grid and Web services composition, such as works presented in [8, 17, 18, 19, 20, 28, 29, 30]. In [28] authors were interested in the semi automatic composition of web services and proposed a validation approach based on the semantic descriptions of services and on a logic based language to describe and validate the resulting composite Web services. However, the resulting composed web service is not clear for user who is not familiar with logic based languages. In our contribution, we propose a solution not only to compose workflows from available Grid services, but also to provide graphical and comprehensive models of the resulting workflows. In the same framework, authors in [29] proposed a composition approach of Web services based on Symbolic Transition Systems (STS). They developed a sound and complete logical approach for identifying the existence of available composition. They have emphasized upon the abstract representation of the composition request (the goal of the composition) and the representation of the resulting composite Web service. For the representation, authors have used UML state machine diagrams [25] which are suitable only to describe a sequence of component services without addressing the other forms of matching services in a workflow such as parallel branches or and-branches. On the other hand, UML activity diagrams that we use in our modelling approach support all kind of workflow composition patterns [10] such as parallelism, split and fork. The authors in [19, 20, 30] have proposed a Model Driven Approach for composing manually Web services. They were based on UML activity diagrams to describe the composite Web service and on UML class diagrams to describe each available Web Service. The user depicts the suitable Web service and matches it in the workflow representing the composite Web service using UML activity diagrams. This approach would have been better if the composition were automatically elaborated, since

the number of available services is in increase with the existence of several forms and manners to compose such services. Based on domain ontology description, we lead the user through to the composition process. Also, we provide for this user a graphical interface based on a domain specific UML language for automatic grid service composition. This UML profile [5] is based on stereotypes, tagged values and workflow patterns [5] that we propose to ensure the automatic composition. In the field of Grid services composition the most related work is the work presented by Gubala et al in [8, 17, 18]. In this work, the authors have developed a tool for semi automatic and assisted composition of scientific Grid application workflows. The tool uses domain specific knowledge and employs several levels of workflow abstractness in order to provide a comprehensive representation of the workflow for the user and to lead him in the process of possible solution construction, dynamic refinement and execution. The originality of our contribution is that firstly we save the effort of the user from the dynamic refinement and execution as we propose a Model Driven Approach which separates the specific model from the independent model. Secondly, we use UML activity diagrams to deliver the functionality in a more natural way for the human user. The use of UML activity diagrams in the description of workflow application is argued in several works such as works presented in [1, 10, 12, 27]. Thus, the advantage of UML activity diagrams is that they provide an effective visual notation and facilitate the analysis of workflows composition. In our approach, we propose an UML profile for composing systematically a workflow application from Grid services [5]. 3 THE INTERACTIVE COMPOSITION SYSTEM WORKFLOW

The system allows an interactive and semantic composition of workflows from Grid services. As shown in figure 1, the system is composed of three components: a Grid Services workflows composer, an ontological Grid Services registry and a workflows execution system also we call it activity machine. The Grid services workflow composer This system is composed of three components: the composition tool, the transformation tool and the verification tool. 3.1.1 The composition tool It provides a graphical interface in the form of UML activity diagrams editor allowing to the user an interactive, systematic and semantic workflow 3.1

UbiCC Journal – Volume 4 No. 3

600

Special Issue on ICIT 2009 Conference - Applied Computing

Figure 1: Different components of the workflow composition system Composition [6]. This composition is based on the composition process which will be detailed in section 4.3. In the Grid registry, services are described in an ontological form with statements regarding the service operation's inputs, outputs, pre-conditions and effects (the IOPE set) [26]. Through these notions, the composition system is able to match different grid service’s operations into a workflow following a reverse traversal approach. Thus, and by associating the required data with the produced output, the composer constructs a data flow between Grid service’s operations using our workflow composition patterns and UML profile[5]. The composer may also use a specific notion of effect that may bind two operations together with non-data dependency. If the Grid registry fails to find the right operation, the composition process stops. Otherwise, the composition process will stop when all workflow dependencies are resolved. The request is sent to the Ontological Grid Registry in the form of SPARQL query [34]. This language provides a higher-level access to the ontology transcribed knowledge for the automatic discovery and semantic matching of services. Therefore, once the workflow model is built, it should be validated and verified to ensure its reliability before being executed and reused as subworkflow. 3.1.2 The transformation tool To support the verification and the execution of workflow models described in UML activity diagrams (UML-AD), the transformation tool translates the activity diagram into a Hyper-Graph (HG). This HG will be translated as well by the transformation tool into a NuSMV format file according to a relative semantic. The details of these semantics may not be relevant to the topic for which the paper is submitted. However these details could be made available. 3.1.3 The verification tool Checking errors in design models like UML activity diagrams is essential since correcting an error while the system is alive and running is usually very costly [21]. Consequently, workflow activity diagram models should be spotted and corrected as early as possible [6]. Several techniques are used in the field of behavioural design verification such as theorem proving and model checking [11]. The latter is the most useful because it is fully automatic and gives feedback in case of detected errors. It verifies whether some given finite state machine satisfies some given property specified in temporal logic [9]. For activity diagrams, symbolic model checking has proven to be an efficient verification technique [11]. Thus, our verification tool is based on NuSMV symbolic model checker [9] that supports strong fairness property which is necessary to be verified in a workflow model to obtain realistic results. With the model checker, arbitrary propositional requirements can be checked against the input model. If a requirement fails to hold, an error trace is returned by the model checker. The transformation tool translates systematically the error trace into an activity diagram trace by high-

UbiCC Journal – Volume 4 No. 3

601

Special Issue on ICIT 2009 Conference - Applied Computing

lighting a corresponding path in the given activity diagram. The Grid registry During the workflow composition process, the Grid registry provides the composer system with the description of services available at the moment and provides reasoning capabilities to enable proper matchmaking of services inputs and outputs. The Grid registry [6] is an ontological distributed repository of Grid services and subworkflows. This registry is responsible for storing and managing documents which contains descriptions of syntax and semantics of services and their operations expressed in an RDF file [33]. The semantic Web is making available technologies which support automate knowledge sharing. In particular there are several existing initiatives such as OWL-S [26] which proves that ontologies have a key role in the automating service discovery and composition. That knowledge is based on semantic descriptions of service classes published by the service developers and provided in the Grid environment [16]. Our Grid registry is based on an ontological description of services and workflows. The service ontology [7] provides concepts and properties that allow description and matchmaking of available services. A part of this ontology is common to all services and it is based on a standard semantic web service description ontology OWL-S [26] which makes interoperability with existing services. A part from the common ontology, there is a domain specific part of the ontology. The domain service ontology [7] allows users to extend the common ontology schema in order to provide a better specification of services as well as their inputs and outputs. For these we define a data ontology [7] which provides concepts and properties for describing services input and outputs. Ontology alignment [7] is a process for finding semantic relationships among the entities of ontologies. Its main activity is to find similar concept in ontologies being aligned, in order to map them. The measures for similarity computation can be divided into two general groups; namely lexical measures and structural measures. Lexical measures are based on surface similarity such as title or label of entities. In contrary, structural measures try to recognize similarities by considering structures of ontology graphs. The most advanced similarity algorithms use combination of multiple similarity measures to obtain more information about concepts similarity. In our Grid registry, we adopt an approach using a combination of lexical and structural similarity [7]. We use similarity measures for mapping domain ontology as initial selection and then the selection will be refined with using structural similarity method [7]. 3.2

The workflow execution system The reliable workflow model is sent to the workflow execution system [6] which produces implementation code for handling control flow and data flow. The activity diagram describing the workflow model is translated into a specific XML file which will be the input of the execution system. A workflow execution system executes different workflow activities specified in the workflow XML document in the correct order and with their required inputs and outputs data. The execution of an activity corresponds to the invocation of a Grid service’s operation. The workflow execution system monitors these activities using the tagged values information expressed in the activities but does not perform them. An activity of the activity diagram modelling the workflow represents a state of the workflow execution system in which the system waits for an invoked grid service operation to complete its work. Hence, the defined semantics of activity diagrams for the verification describe the behaviour of the execution system. When the system enters a state relative to an invocation grid service node or activity ai, it invokes a piece of behaviour that is executed by the service or system environment. While the latter is in ai (activity ai is active), it waits for the termination event of the invoked piece of behaviour. When termination event occurs, the system reacts by executing the outgoing edge E: it leaves the E's sources and enters the E's targets and the execution process continues for the other activity nodes until the final node is reached. 4 UML BASED INTERACTIVE COMPOSITION OF WORKFLOWS FROM GRID SERVICES

3.3

In order to match and compose different Grid service’s operations, we need to analyze constructs of workflow models at higher abstraction level. Since UML [25] is the core of the MDA [24], we use its activity diagram language to model composed workflows. The composition system provides to the user a graphical interface to compose its request using a UML profile specific for the domain of composing systematically 4.1 UML Profile for composing workflows In this section, we present our UML profile which is based on Domain Specific Language (DSL) for customizing UML activity diagrams for the systematic composition of workflows from Grid services [5]. In our DSL (See Figure 2), an activity of an UML activity diagram represents a Grid service's operation, while object flows represent the types of results which flow from one activity to another.

UbiCC Journal – Volume 4 No. 3

602

Special Issue on ICIT 2009 Conference - Applied Computing

Effects binding two operations are presented with control flows [5]. The name of an activity in the diagram represents the name of the Grid service's operation. This name must be specified as a Grid service could have more than one operation often called interface which are specified in its relative WSDL file [32]. There are two different types of activities: yetunresolved activities and established activities of the composed workflow. The former represent the need for a Grid Service's operation to be inserted in order to complete the workflow. However, the latter represent abstract operations that are already included into the workflow. As there are two different activity types in a Grid service workflow model, an activity needs to be typed and specified. To fulfil this, we propose to use the DSL modelling element invoke to stereotype an established activity which is used to invoke an external Grid service's operation and yetunresolved to stereotype activities which are not yet resolved. Object nodes of an established activity are data stereotyped. Unknown input and output for a yet-unresolved activity are unknown stereotyped. In our UML profile, an object node could be relatedto a final node as composed workflow of Grid application should always deliver a result. 4.2 UML-AD composition patterns We identify, in this section, how UML activity diagrams support some of basic Grid service composition patterns [5]. These patterns are

essential in the systematic building of workflow applications from Grid services. The use of these patterns depends on the number of the depicted Grid Service's operations and their inputs and outputs [5]. These operations are results of the semantic research elaborated by the ontological Grid services registry. This research is invoked by a request given by the composition system in order to complete an unresolved activity in the workflow. The Grid service registry provides zero, one or more operations producing the intended output. Operations are depicted to be inserted in the workflow interactively with the user. 4.2.1 Sequence Pattern When the Grid registry provides one Grid service's operation that is able to produce the required result or the user selects one operation from the provided operation set; the composition system uses the sequence pattern to insert the operation in the workflow. In this case and as is illustrated by the figure 3, a single abstract operation or activity (e.g. GridService1Operation1) will be inserted in the workflow model described by the UML-AD language. This operation may also require some data for itself (e.g. GridService1Operation1Input) and thus it may introduce a new unresolved dependency (e.g. the yet-unresolved stereotyped activity). So, we use a follow-up method to build a simple pipeline-type sequential workflow: a sequence pattern.

Figure 2: Meta-model of Grid service workflow composition specific UML activity diagram language

UbiCC Journal – Volume 4 No. 3

603

Special Issue on ICIT 2009 Conference - Applied Computing

A Sequence pattern is composed with sequential activities which are related with control flow (non data operations dependency) or object flow (data operation dependency).

to object node representing the required input which both of them flow to a merge construct. Semantically, several services instances are invoked in parallel threads and the merge will only wait for the first flow to finish. We distinguish, in Figure 5, two different Grid service's operations, GridService1Operation1 and GridService2Operation1 providing the same output data DataOutput.

Figure 3: The sequence pattern 4.2.2 And-branches pattern The and-branches pattern is introduced when the introduced operation represented by an abstract UML activity has more than one input. This pattern is based on the Synchronization pattern presented in [9]. This pattern starts with object nodes, representing alternative operation inputs, which flow to a join node. The latter is linked to the abstract grid service's operation. This operation introduces some unresolved dependencies in the workflow. Semantically, several services instances are invoked in parallel threads and the join will wait for all flows to finish. As illustrated in Figure 4, the operation of the Grid service GridService1Operation1 needs two inputs data GridService1Operation1Input1 and GridService1Operation1Input2. The relative pattern produces two parallel threads in the workflow.

Figure 5: Alternative branches pattern 4.2.4 Alternative services pattern When composing workflows from Grid services, a specific matching based on semantic comparison could provide two or more different Grid services performing each of them the required operation. In such case and when the user do not choose one of the depicted Grid service’s operations, the composition system uses the alternative services pattern to involve the operations in the workflow model. In this pattern, the Grid service’s operation to insert is modelled by a composed super-activity with a specified input data object and specified output data object (Figure 6). The super-activity is stereotyped as AlternativeServiceInstance to indicate that its task may be accomplished by a set of alternative service's instances. These alternative service instances are described with sub-activities. The sub-activities shall be grid service instances and thus stereotyped as invoke. It was up to decision mechanism of the workflow execution engine to choose which service instance in such given workflow node is to be invoked and executed. In Figure 6, the data DataOutput is provided from GridServiceOperation service operation which could GridService1Operation1 provider or be GridService2Operation2 provider.

Figure 4: And-branches pattern 4.2.3 Alternative branches pattern When the Grid registry provides more than one operation able to produce the required result, and the user do not select one of them, the composition requires a specific pattern: the alternative branches pattern. This pattern combines the Exclusive Choice and Simple Merge patterns presented in [9]. In this pattern, each alternative service's operation is linked Figure 6: Alternative services pattern

UbiCC Journal – Volume 4 No. 3

604

Special Issue on ICIT 2009 Conference - Applied Computing

4.3 The composition process Figure 7 illustrates the scenario of the composition process of workflows from available Grid services. This composition is based on the domain specific UML activity diagram language presented in section 4.1. In the following, we comment the different process steps of the scenario presented in the figure 7. Step 1: The user builds its composition request by specifying what kind of outcome or result that it expects from the workflow application execution. Step 2: The composition system analyses the desired output and sends a SPARQL query to request the ontologies of the Grid registry describing the available Grid services. The composer requests the Grid registry for a Grid service’s operation having the specified result as output. Step 3: If the required operation is not found and all unknown results are resolved then the composition process stops.

Step 4: If the required operation is found then the system displays its characteristics to the user to confirm the choice. The register may provide more than one operation. In such case the user could choose the operation to insert in the workflow model from the given list. If it does not specify its operation, then the system inserts all the given operations using one of the composition patterns presented in section 4.2. Relatively to the number of depicted operations and their inputs and outputs, the composer chooses the right composition pattern. Step 5: For each input of inserted operation, the system defines one unresolved dependency as a workflow activity which is not yet established. This activity depends on some Grid service’s operation. For each unresolved dependency the composer asks the user if it wants to continue the composition process or not. If the response is positive the composer re-executes the process from the step 2 to resolve the current unresolved dependency.

Figure 7: Scenario of the interactive composition of Grid service workflows based on UML activity diagrams

UbiCC Journal – Volume 4 No. 3

605

Special Issue on ICIT 2009 Conference - Applied Computing

5

Illustration of composition

the

interactive

workflow

In the following, we illustrate the composition process through the example of the domain of the city traffic pollution analysis. This application, as presented in [23], targets the computation of traffic air pollutant emission in an urban area. Step 1: Figure 8 shows an example of initial workflow that represents a composition request for the results of the pollutant emission due to the city traffic. The desired result, PollutionEmission, is described by the rectangle representing the object node in the relative activity diagram.

Figure 8: Initial workflow as a composition request Step 2: Figure 9 represents the workflow of the computation of traffic air pollution analysis after one step of composition. The service’s operation, delivering the PollutionEmission result, is AirPollutionEmissionCalculator. This operation is the result of the composer query asked to the ontological Grid registry. The operation requires two inputs TrafficFlowFile and PathsLenght-File, thus it infers two unresolved dependencies in the activity diagram modelling the composed workflow.

the composer system is able to match different operations into a workflow following a reverse traversal approach. Thus, and by associating the required data with the produced output, the composer constructs a data flow between operation using workflow patterns and our UML profile [5]. The composer may also use a specific notion of effect that may bind two operations together with non-data dependency. In [10], five basic control patterns were defined to be supported by all workflow languages and workflow products. These patterns are Sequence pattern, Parallel split pattern, Synchronization pattern, Exclusive Choice and Simple Merge patterns. Figure 10 represents the example of city traffic analysis Workflow after the full composition activity. It involves several Grid service operations, sequence branches, parallel split branches, simple merge branches and a loop [5]. The loop is involved in the workflow diagram as the application iterates in order to analyze the possible traffic. The Figure shows also how UML activity diagrams support the five basic patterns in the composition specific domain of Grid services workflows [5]. In the example, some of object node or input data, such as VehiculeType and StartZonzId, are given by the user of the application; they do not have an operation provider. This illustrates the interaction between our composition system and the user.

Figure 9: An example of workflow after one step of composition Step 3: For every dependency that needs to be resolved .i.e. a yet-unresolved activity, the composer contacts the ontological registry in order to find suitable service’s operations that may produce the required result. The services are described in an ontological form with statements regarding the service operation’s inputs, outputs, preconditions and effects (the IOPE set) [26]. Through these notions,

Figure 10: The workflow application after the full composition

UbiCC Journal – Volume 4 No. 3

606

Special Issue on ICIT 2009 Conference - Applied Computing

6

CONCLUSION

In this paper, we have presented an approach for composing interactively workflows from Grid services [2, 3, 4, 6]. This composition is based on an UML profile for customizing UML activity diagrams to compose and model workflows [5] and on composition patterns [5] as well. The interactive composition process was illustrated through the example of city traffic pollution analysis domain [23] We have developed and implemented most of the presented components of the composition system. Actually, we are working on the implementation of the workflow execution system that invokes and executes the depicted Grid service instances and manages the control and data flows in a run time environment relatively to our proposed activity diagram semantic.

Interactive Composition of UML-AD for the Modelling of Workflow Applications, In. Proc. International Conference on Of the 4th Information Technology, ICIT'2009, Amman, Jordan (2009). [7] Y. Bendaly Hlaoui, L. Jemni Ben Ayed: Ontological Description of Grid Services Supporting Automatic Workflow Composition, In. Proc. Of the International Conference on Web and Information Technologies, ICWIT'2009, Kerkennah, Tunisia, ACM SIGAPP.fr, IHE éditions, pp. 233-243 (2009). [8] M. Bubak, R. Guballa, M. Kapalka, M. Malawski, K. Rycerz: Workflow Composer and service registry for grid applications, Journal of Future Generation Computer Systems, Vol. 21, pp. 79-86 (2005). [9] A. Cimatti, E. Clarck, A. Tacchella: Nusmv version 2: An opensource tool for symbolic model checking, In Proc. Of the International Conference on Computer-Aided Verification, CAV'02, Lecture Notes in Computer Science, Springer Verlag (2002). [10] M. Dumas, and A. H. M. ter Hofsetde: UML Activity Diagrams as a Workflow Speci_cation Language, In UML'2001 Conference, Toronto, Ontario, Canada, Lecture Notes in Computer Science (LNCS), Springer-Verlag, Heidelberg, Germany (2001). [11] R. Eshuis: Semantics and verification of UML Activity Diagrams for Workflows Modelling, PhD thesis, University of Twente (2002). [12] R. Eshuis and R. Wieringa: Comparing Petri net and Activity diagram variants for workflow modelling: A Quest for Reactive Petri Nets, Petri Net technology for communication based Systems, LNCS, Springer Verlag (2003). [13] I. Foster, D.Berry, A.Djaoui, A.Grimshaw, B.Horn, H.Kishimoto, F.Maciel, A.Savy, F.Siebenlist, R.Subramaniam, J.Treadwell, J.Von Reich: The Open Grid Services Architecture, Version 1.0. (2004). [14] I. Foster, C. Kesselman: Grid Services for Distributed System Integration, Journal of IEEE Computer, Vol. 35, No. 6, pp. 37-46 (2004). [15] T. Gardner: UML modelling of automated Business Processes with a Mapping to BPEL4WS, In. Proc. Of the European Conference on Object Oriented Programming (2003).

7

REFERENCES

[1] R. Bastos, D. Dubugras, A. Ruiz: Extending UML Activity Diagram for Workflow modelling in Productions Systems, In. Proc. Of the 35th Annual Hawaii International Conference on System Sciences, HICSS'02, IEEE Cs Press (2002). [2] Y. Bendaly Hlaoui, L. Jemni Ben Ayed: Toward an UML-based composition of grid services workflows, In Proc. Of the 2nd international workshop on Agent-oriented software engineering challenges for Ubiquitous and Pervasive Computing, AUPC’08, ACM Digital Library, pp. 21-28 (2008). [3] Y. Bendaly Hlaoui, L. Jemni Ben Ayed: An extented UML activity Diagram for Composing Grid Services Workflows, In Proc. Of the IEEE international Conference on Risks and Security of Internet and Systems, CriSIS’08, Tozeur, Tunisia, p. 207-212 (2008).
[4] Y. Bendaly Hlaoui, L. Jemni Ben Ayed: A

MDA approach for semi automatic grid services workflows composition, In Proc. Of the IEEE international conference on Industrial Engineering and engineering Managment, IEEM’08, p.1433-1437 (2008). [5] Y. Bendaly Hlaoui, L. Jemni Ben Ayed: Patterns for Modeling and Composing Workflows from Grid Services, In. Proc. Of the 11th International Conference on Enterprise Information Systems, ICEIS'2009, Milan, Italy, LNBIP, SpringerVerlag, Vol. 24, pp. 615-626 (2009). [6] Y. Bendaly Hlaoui, L. Jemni Ben Ayed: An

UbiCC Journal – Volume 4 No. 3

607

Special Issue on ICIT 2009 Conference - Applied Computing

[16] C. Goble, D. de Roure: The grid: an application of the semantic web, ACM SIGMOD RecordSpecial section on semantic web and data management, Vol. 31, No.4, pp. 65-70 (2002). [17] T. Gubala, D. Herezlak, M. Bubak, M. Malawski: Semantic Composition of Scientific Workflows Based on the Petri Nets Formalism, In Proc. Of the Second IEEE International Conference on e-Science and Grid Computing, e-Science'06 (2006). [18] R. Guballa, A. Hoheisel, F. First: Highly Dynamic workflow Orchestration for scientific Applications, CoreGRID Technical Report, Number TR-0101 (2007). [19] R. Gronomo, I. Solheim: Towards Modelling Web Service Composition in UML, In The 2nd International Workshop on Web Services: Modelling, Architecture and Infrastructure, Porto, Portugal (2004). [20] R. Gronomo, MC. Jaeger: Model Driven Semantic Web Service Composition, In Proc. Of the 12th Asia-Pacific Software Engineering Conference, APSEC'05 (2005). [21] M. Laclavik, E.Gatial, Z. Balogh, O. Habala, G. Nguyen, L.Hluchy: Experience Management Based on Text Notes, In. Proc. Of e-Challenges 2005 Conf. (2005). [22] W. Li, C. Huang, Q. Chen, H. Bian: A ModelDriven Aspect Framework for Grid Service Development, In Proc. Of the IEEE International Conference on Internet and Web Applications and Services, ICIW’06, pp. 139-146 (2006). [23] M. Masetti, S. Bianchi, G. Viano: Application of K-Wf Grid technology to Coordinated Traffic Management. http://grid02.softeco.it/site/projectinfo.html [24] Model Driven Architecture (MDA). Document nomber omrsc/2001-07-01 (2001) [25] Object Management Group. UML Superstructure Specification. July (2005). 2.0

[27] Pllana, T. Fahringer, J. Testori, S. Benkner, I. Brandic: Towards an UML Based Graphical Representation of Grid Workflow Application, In. Proc. Of the 2nd Eu-ropean Across Grids Conference, Nicosia, Cyprus, Springer-Verlag (2004). [28] J. Rao, P. Kungas, M. Matskin: Logic-based web service composition: from service description to process model, In Proc. Of the IEEE International Conference on Web Services, ICWS 2004, San Diego, California, USA (2004). [29] E. Sirin, J. Hendler, B. Parsia: Semi automatic composition of web services using semantic descriptions, In Proc. Of the ICEIS-2003 Workshop on Web Services, Modeling, Architecture and Infrastructure, Angers, France (2003). [30] D. Skogan, R. Gronomo, I. Solheim: Web Service Composition in UML, In Proc. Of the 8th Intl Enterprise Distributed Object Computing Conference, EDOC'04 (2004). [31] M. Smith, T. Friese, B. Freisleben: Model Driven Development of Service-Oriented Grid Applications, In. Proc. Of the IEEE Asia-Pacific Conference on Services Computing, APSCC'06 (2006). [32] Web Services Descriptio Language (WSDL) 1.1. W3C Note 15 March (2001). [33]W3C: Resource Description Framework (RDF) Model and Syntax Specification, report num. TR/1999/REC-rdf-syntax-19990222 (1999). [34]W3C: SPARQL Query Language for RDF, report , 2008. [35] M. J. Young: XML Step by Step, Microsoft Press, ISBN: 2-84082-812-X (2001).

[26] OWL-S: Semantic Markup for Web Services. The OWL Services Coalition. OWL-S version 2.0.S.

UbiCC Journal – Volume 4 No. 3

608

Special Issue on ICIT 2009 Conference - Applied Computing

HOW TO MAP PERSPECTIVES
Gilbert Ahamer, Adrijana Car, Robert Marschallinger, Gudrun Wallentin, Fritz Zobl Institute for Geographic Information Science at the Austrian Academy of Sciences ÖAW/GIScience, Schillerstraße 30, A-5020 Salzburg, Austria gilbert.ahamer@oeaw.ac.at, adrijana.car@oeaw.ac.at, robert.marschallinger@oeaw.ac.at, gudrun.wallentin@oeaw.ac.at, fritz.zobl@oeaw.ac.at

ABSTRACT “Perspectives” are seen as the basic element of realities. We propose different methods to “map” time, sspace, economic levels and other perceptions of reality. IT allows views on new worlds. These worlds arise by applying new perspectives to known reality. IT helps to organise the complexity of the resulting views. Key Words: Geographic Information Science, mapping, time, space, perception. 0. LET’S START TO THINK 0.1 Our world is the entirety of perceptions. (Our world is not the entirety of facts.) 1. WRITING HELPS TO BECOME AWARE We ask: Is it possible to map = write 1. the distribution of material facts and elements in geometric space? (physics) 2. the distribution of factual events in global time? (history) 3. the distribution of real-world objects across the Earth? (geography) 4. the distribution of elements along material properties? (chemistry) 5. the distribution of growth within surrounding living conditions2? (biology) 6. the distribution of persons acting in relationships? (sociology) 7. the distribution of individuals between advantage and disadvantage? (economics) 8. the distribution of perspectives within feasible mindsets? (psychology) 9. the distribution of living constructs along selectable senses? (theology) We see: awareness results from reflection (Fig. 2).
elements objects events matter living conditions personalities advantages perspectives sense x y z space themes time = t

Figure 0: The human being perceives the world. Hence, every individual lives in a different world (Fig. 0). 0.2 The “indivisible unit”, the atom (ατομος1) of reality, is equal to one (human) perspective. Our world is made up of a multitude of perceptions, not of a multitude of realities and not of a multitude of atoms (Fig. 1).

Figure 1: The “primordial soup” of living, before the advent of (social) organisms: uncoordinated perspectives, uncoordinated world views. 0.3 In order to share one’s own conception with others, “writing” was invented. Similarly, complex structures, such as landscapes, are “mapped”. To map means to write structures.

Figure 2: Fundamental dimensions, along which to coordinate individual world views when reflecting.

                                                            
1

                                                            
2

what cannot be split any further (Greek)

životné prostredie (Slovak): living environment

UbiCC Journal – Volume 4 No. 3

609

Special Issue on ICIT 2009 Conference - Applied Computing

2. TIME CAN BE 1. an attribute of space (a very simple historic GISystem) 2. an independent entity (Einstein’s physics) 3. the source of space (cosmology). In terms of GIS item 2.1 is expressed as “t is one of the components of geo-data” i (Fig. 3).

Figure 3: The where-what-when components of geo-data, also known as triad (Peuquet 2002: 203). Time can be understood as • establishing an ordinal scale for events • driving changes (= Δ) of realities • something that unfortunately does not appear on paper. A proposed solution is to map changing realities (Δ) instead of mapping time. Time is replaced by what it produces. This is indicated in Fig. 4.
Δ elements Δ objects Δ events Δ matter (e.g. its path) Δ living conditions Δ personalities Δ advantages Δ perspectiv. Δ sense x y z space themes project! time = t

Figure 5: Notions of path in a geo-space: (a) Minard’s map of human losses during Napoleon’s 1812 campaign into Russia; and (b) its geovisualisation in a time cube (Kraak, 2009). Further examples such as landslides in geology, growth of plants, energy economics, economics will be shown in chapter 7. For implementing the idea to project the t axis onto the Δ axis we need to have clear insight how time quantitatively changes reality. In other words: we need a model, which (explaining how processes occur) determines the representation of time (Fig. 6). Examples are sliding geology, ΔGDP/cap, plant growth. One cannot perceive time (never!), only its effects: what was perceived in this time span (duration)4? This is why the t axis is projected onto another axis denoting the effect of elapsed time; what this means to the individual sciences is shown in Fig. 4. Very similarly, in physics nobody can feel force, only its effect (deformation, acceleration), and still forces have been undisputedly a key concept for centuries. What is time? Just a substrate for procedures. What is space? Just hooks into perceived reality. We retain from this chapter 2 that we need a clear model of how elapsing time changes reality. Then we can map time as suggested: by its effects.

Figure 4: The projection of time (t) onto the effects of time (the changes Δ) can apply to any science. This idea flips = projects the t axis onto one of the vertical axes. Time means then: how maps are changed by the envisaged procedures. Such procedures modify the variables along the axes, be they of physical (gravity force) or of social nature (war). A classical example is Minard’s map of Napoleon’s 1812 campaign into Russia3 (Fig. 5a, b).

                                                            
3

                                                            
4

Patriotic War (in Russian): Отечественная война

T. de Chardin’s (1950) concept of durée (French).

UbiCC Journal – Volume 4 No. 3

610

Special Issue on ICIT 2009 Conference - Applied Computing

3. HOW TO WRITE TIME? The big picture shows us various examples: 1. as a wheel (see the Indian flag): revolving zodiacs, rounds in stadiums, economic cycles, Kondratieff’s waves 2. as an arrow (see Cartesian coordinates): directed processes, causal determinism, d/dt, d²/dt² 3. as the engine for further improvement (evolutionary economics): decrease vs. increase in global income gaps, autopoietic systems, self-organisation 4. as the generator of new structures (institution building, political integration, progressive didactics): new global collaborative institutions, peer-review, culture of understanding, self-responsible learning, interculturality 5. as evolving construct (music). From this chapter 3 we only keep in mind that the concepts to understand and represent time are fundamentally and culturally different.

5. HOW TO MAP SPACE AND TIME? The detailed picture: it is obvious that a choice must be made for one mode of representation and for one view of one scientific discipline: 1. (x, y; t): cartography, GIS (Fig. 7) 2. (x, y, z; t): geology 3. (x, y, z; vx, vy, vz; t): landslides 4. (x, y, z; biospheric attributes; t): ecology, tree-line modelling 5. (countries; economic attributes; GDP/cap) or (social attributes; structural shifts; elapsing evolutionary time): economic and social facts in the “Global Change Data Base”6 (Fig. 8) 6. perceiving rhythms and structures: (only) these are “worth recognising”: music, architecture, fine arts.
objects seen by geographers

x y z space themes

harmonised world views!

time = t

Figure 6: All data representations require models. 4. HOW TO WRITE SPACE? The big picture shows us various examples: 1. as a container of any fact and any process (geography and GIS) 2. as result of human action (landscape planning) 3. as evolving construct (architecture). Examples span space as • received and prefabricated versus • final product of one’s actions, namely: 1. spaces as the key notion for one’s own science: everything that can be georeferenced means GIS 2. space as the product of human activity 3. expanding space into state space: the entirety of possible situations is represented by the space of all “state vectors” which is suitable only if procedures are smooth. The main thesis here is: the “effects of time” are structurally similar in many scientific disciplines, and they often imply “changes in structures” too. Information Technology (IT) is already providing scientific tools to visualise such structures.

5

Figure 7: Harmonising world views: GIS reunites world views by relating everything to its location. Different sciences may have considerably different outlooks on reality (Fig. 8). A humble attitude of recognising facts5 instead of believing in the theories one’s own discipline offers can empower people to survive even in the midst of other scientific specialties: Galileo’s (1632) spirit: give priority to observation, not to theories! This is the essential advantage of geography as a science: geographers describe realities, just as they appear. Such a model-free concept of science has promoted the usefulness of GIS tools to people independent of personal convictions, scientific models or theories.
objects seen by economists

x y z space themes

harmonised world views!

time = t

Figure 8: Different but again internally harmonised world views: explain facts from another angle.

                                                            
5

                                                            
6

datum (Latin): what is given (unquestionable)

This GCDB is described in Ahamer (2001)

UbiCC Journal – Volume 4 No. 3

611

Special Issue on ICIT 2009 Conference - Applied Computing

6. WHAT IT DOES, DID, AND COULD DO 6.1 IT helps to organise the multitude of views (= perceptions) onto data that are generated by humans: • IT constructs world views, such as: GIS, history, economics, geology, ecology etc. • IT has already largely contributed to demolishing traditional limitations of space and time: o Space: tele(-phone, -fax, -vision), virtual globes (Longley et al., 2001) o Time: e-learning, asynchronous web-based communication, online film storage (Andrienko & Andrienko 2006). 6.2 This paper investigates non-classical modes of geo-representation. We would like to point out that there are two already well-established fields that offer solutions to mapping (space and time, Fig. 9) views: Scientific and information visualisation are branches of computer graphics and user interface design which focus on presenting data to users, by means of interactive or animated digital images. The goal of this field7 is usually to improve the understanding of the data presented. If the data presented refers to human and physical environments, at geographic scales of measurement, then we talk about Geovisualisation, e.g. (MacEachren, Gahegan et al. 2004; Dykes, MacEachren et al. 2005, Dodge et al., 2008).

7. EXAMPLES The authors are members of the “Time and Space” project at their institution named “Geographic Information Science”8, a part of which explores the cognitive, social, and operational aspects of space & time in GIScience. This includes models of both social and physical space and consequences thereof for e.g. spatial analysis and spatial data infrastructures. We investigate how space and time are considered in these application areas, and how well the existing models of space and time meet their specified needs (see e.g. Fig. 9). This investigation is expected to identify gaps. Analysis of these gaps will result in improved or new spatio-temporal concepts particularly in support of the above mentioned application areas. 7.1 Sliding realities: geology The notion of the path in geography (x, y, t) is extended by the z axis (see item 5.2) which produces a map of “time”: Fig. 9 (Zobl, 2009).

Figure 9: Geology takes the (x, y, z; t) world view. The “effect of time” is sliding (luckily in the same spatial dimensions x, y, z): we take the red axis in Fig. 10. Space itself is sufficiently characteristic for denoting the effects of time.

Figure 9: Time series and 3 spatio-temporal data types (http://www.crwr.utexas.edu/gis/gishydro05/). 6.3 IT could develop tools that are then interchangeable across scientific disciplines, e.g. landslides that may structurally resemble institutional and economic shifts (see 7.1). IT could prompt scientists to also look at data structures from other disciplines. Whatever the disciplines may be, the issues are structures and structural change! Figure 10: These effects of time occur in space, most helpfully. Source: Brunner et al. (2003).

                                                            
8

                                                            
7

http://en.wikipedia.org/wiki/Scientific_Visualization

The overarching aim of the GIScience Research Unit is to integrate the “G” into Information Sciences (GIScience, 2009)

UbiCC Journal – Volume 4 No. 3

612

Special Issue on ICIT 2009 Conference - Applied Computing

7.2 Slices of realities: geology Despite the lucky coincidence that the effect of time (Δx, Δy, Δz) occurs in the same space (x, y, z) we try to produce slides carrying more information (item 5.3) and hence recur to the so-called attributes mentioned in Fig. 9 such as grey shades or colours. The speed of sliding (d/dt x, d/dt y, d/dt z) is denoted both by horizontal offsets and whitish colours in the spaghettis (Marschallinger, 2009) of Fig. 11.

7.5 Global deforestation One key driver for global change is deforestation; easy to map as change of land use category of a given area (Fig. 13).

  Figure 11: The (x, y, z; vx, vy, vz; t) view of a landslide process (shades of grey mean speed v).

Figure 13: The (x, y, z; Δ biospheric attributes; t): view of the global deforestation process in megatons carbon. Above: map of carbon flow, below: time series of GCDB data per nation symbolically geo-referenced by the location of their capitals. This representation is analogous to Fig. 11. In both, the focus shifts from maps(t) maps(t, Δt). Interest includes temporal dynamics: t = colour (above); Δt = height+colour (below), enriching the purely spatial interest. Even if to the aim is to enlarge the scope of the information delivered from the static map (Fig. 13 above) to the “dynamic map” (Fig. 13 below), readers will remain unsatisfied because no insight into the dynamic properties of deforestation is provided (Fig. 18). Increasingly, the viewer’s focus turns further from “facts” to “changes of facts”, to “relationships with driving parameters9” and to (complex social and political) “patterns10”. 7.6 Realities beyond slides But what if the information belongs to the social or economic realm (Fig. 14)? How to depict economic levels, education or policies? Figure 14: Example for graphic notation: one (hypothesised) parameter per nation (seen across the ).  Jordan =

7.3 Slide shows How to map spatial realities that are not any longer isotropic displacement vectors of space itself? For the example of changing tree lines in the Alps (Wallentin, 2009) a slide show is used to present the change of growth patterns made up of the multitude of individual agents (= trees = dots in Fig. 12). Moving spatial structures are depicted as a film of structures (item 5.4).

Figure 12: The (x, y, z; biospheric attributes; t) view of the Alpine tree line (above) and its shift induced by climate change as a slide show (below). In such processes which involve independent behaviour of autonomous agents (here: trees) it becomes seemingly difficult to apply a transformation of space itself, e.g. d/dt(x, y, z).

                                                            
see the suggested scenarios for water demand, water supply and water quality (Ahamer, 2008) 10 Patterns: name of the journal of the American Society for Cybernetics ASC
9

UbiCC Journal – Volume 4 No. 3

613

Special Issue on ICIT 2009 Conference - Applied Computing

7.7 Mapping social processes Social processes in social organisms can be described by the intensity of four different communicational dimensions (Fig. 15) along time: S = info, A = team, T = debate, B = integration. This type of writing (Fig. 16) resembles a score in musical notation11 and was invented for the webbased negotiation game “Surfing Global Change” (SGC), its rules are published in (Ahamer, 2004). The elementary particle of humanity’s progress – consensus building – is trained by SGC In this case, IT contributed to making communication independent from space and time: a web-platform enables asynchronous worldwide interaction of participants.

8. TRANSFORMATION OF COORDINATES 8.1 All the above examples have shown that • various “spaces” can be thought of • it would be suitable to enlarge the notion of “time”. 8.2 Suitably, a transformation of coordinates from time to “functional time” may be thought of. 8.3 In chapter 2, we suggested already to regard time as the substrate for procedures. Consequently, different “times” can be applied to different procedures. As an example, in theoretical physics, the notion of “Eigentime12” is common and means the system’s own time. 8.4 Similar to the fall line in the example of landslides in chapter 7.1 (red in Fig. 10) the direction of the functional time is the highest gradient of the envisaged process. This (any!) time axis is just a mental, cultural construction. 8.5 According to chapter 2 (Fig. 6) a clear understanding (mental model) is necessary to identify the main “effect of time”. We see that such an understanding can be culturally most diverse. Just consider the example of economic change: • optimists think that the global income gap decreases with development • pessimists believe that it increases, hampering global equity. 8.6 Therefore, any transformation of coordinates bears in itself the imponderability of complex social assumptions about future global development and includes a hypothesis on the global future. 8.7 Still, a very suitable transformation is t GDP/capita

Figure 15: Four basic components of any social procedure: learning information (Soprano S), forming a team (Alto A), debating (Tenor T), and integrating opposing views (Bass B).

(Fig. 17) both because of good data availability and increased visibility of paths of development. GDP/cap resembles evolutionary time.

time t = real time:

GDP/cap

≈ evolutionary time of development: complex graphic structure simpler graphical structure

Figure 16: A map of social processes in 4 dimensions during a negotiation procedure in a university course: participants show varying activity levels.

Figure 17: A suitable transformation of time uses the economic level, measured as GDP per capita.

                                                            
11

                                                            
12

partitura (Italian): score (in music)

literally (German): the own time (of the system)

UbiCC Journal – Volume 4 No. 3

614

Special Issue on ICIT 2009 Conference - Applied Computing

8.8 The strategic interest of such a transformation is “pattern recognition”, namely to perceive more easily structures in data of development processes. Examples for such “paths of development” are shown in Fig. 18 for the example of fuel shares in energy economics.

9. A FUTURISTIC VISION 9.1 Building on the vision of “Digital Earth” (Gore, 1998), the deliberations in this paper might eventually lead to the vision of “Digital Awareness”: the common perspective on realities valid for the global population, aided by (geo)graphic means. 9.2 The primordial element of (human and societal) evolution is consensus building. Without ongoing creation of consensus global “evolutionary time” is likely to fall back. The futuristic vision is to map global awareness.

  Figure 18: Structural shift of percentages of various fuels in all nations’ energy demand 196191. Data source: GCDB (Ahamer, 2001). 8.9 It is suggested here that implicitly during many mapping endeavours such transformation occurs. This is legitimate, but care must be taken to take into account the (silently) underlying model of human development. 8.10 Suitable transformation of coordinates can facilitate to see and communicate evolutionary structures, as it enables common views of humans and is therefore helpful for global consensus building. 8.11 Also the “effects of time” are projected into a common system of understanding which might give hope to facilitate common thinking independently of pre-conceived ideologies. This plan creates the “common reference system of objects”. 8.12 This paper suggests enlarging the concept of • “globally universal geo-referencing” (one of the legacies of IT) to • “globally universal view-referencing” • or “globally universal referencing of perspectives” 13. Fig. 19 illustrates this step symbolically. Figure 19: The global society perceives the world. 9.3 Much like the georeferenced satellites which circulate around the world produce a “Google, Virtual [or similar] Earth”, the individual spectators in Fig. 19 circle around the facts – and they create a “common virtual perception”: an IIS = Interperspective Information System.
the entirety seen by all global citizens

x y z space of themes

entirety of world views! time = t

                                                            
The facts themselves may well be delivered by endeavours such as Wikipedia but here it refers to the perspective on facts! A huge voluntarily generated database on people’s perceptions, views and opinions would be needed.
13

Figure 20: Divergent perceptions circulate around earthen realities. The entirety of world views creates the IIS (Interperspective Information System).

UbiCC Journal – Volume 4 No. 3

615

Special Issue on ICIT 2009 Conference - Applied Computing

9.4 Do we just mean interdisciplinarity? No. Nor do we simply refer to people looking into any direction. Fig. 21 shows the difference to IIS.

10. CONCLUSION Sciences are similar to “languages” spoken by people, they differ globally. Understanding for others’ languages is essential for global sustainable peace. Human perceptions are also strongly influenced by underlying models, assumptions and preconceived understandings. Studying geo-referenced data sets (GIS) can help to facilitate bridging interperceptional gaps. For the transformation of world views – to make them understandable – it is necessary to know about • the “effect of time”, namely the “path along the continuum of time” which a variable is expected to take • the speakers’ underlying model of a complex techno-socio-economic nature • the resulting perception of other humans. A future task and purpose of IT could be to combine the multitude of (e.g. geo-referenced) data and to rearrange it in an easily understandable manner for the viewpoints and perspectives of another scientific discipline or just another human being. Such a system is called Interperspective Information System IIS. Merging a multitude of perspectives to form a common view of the entire global population is the target of an IIS. Symbolically, a “Google Earth”-like tool would eventually develop into a “Google World Perspective”-like tool, or a “Virtual Earth”-like tool would become a “Virtual Perspective” tool encompassing all (scientific, social, personal, political, etc.) views in an easily and graphically understandable manner. In the above futuristic vision, IT can/should(!) become a tool to facilitate consensus finding. It can rearrange the same data for a new view. Symbolically speaking: similar to Google Earth which allows one to view the same landscape from different angles, a future tool would help to navigate the world concepts, the world views and the world perspectives of the global population. IT can reorganise extremely large data volumes (if technological growth rates continue) and could eventually share these according to the viewpoint of the viewer. Such a step of generalisation would lead from “Geographic Information Science” to “Interperspective Information Science”, implying the change of angles of perception according to one’s own discipline.

Figure 21: This is not IIS. 9.5 The science of the third millennium will allow dealing with a multitude of world views and world perspectives (see Tab. 1) with an emphasis on consensus building. When learning, the emphasis lies on social learning and may also make use of game-based learning (such as the web-based negotiation game “Surfing Global Change”) which allows to experimentally experiment with world views without any risk involved. Table 1: The science of the third millennium encompasses multiple perspectives element interaction perspective single ones manifold Mechanics Thermodynamics Logics Systems analysis Teaching

19th cent.

20th cent.

Social learning gaming, IIS

9.6 A suitable peaceful “common effort14” for a peaceful future of humankind would involve developing tools and visual aids in order to understand the opinions of other citizens of the globe. The future is dialogue. Or else there will be no future.

                                                            
(jihad in Arabic) also means: common effort of a society
14

UbiCC Journal – Volume 4 No. 3

21st cent.

616

Special Issue on ICIT 2009 Conference - Applied Computing

REFERENCES Ahamer, G. (2001), A Structured Basket of Models for Global Change. In: Environmental Information Systems in Industry and Public Administration (EnvIS). ed. by C. Rautenstrauch and S. Patig, Idea Group Publishing, Hershey, 101-136, http://www. oeawgiscience.org/ProjectFactSheets/Project FactSheet_GlobalChange.pdf. Ahamer, G., Wahliss, W. (2008), Baseline Scenarios for the Water Framework Directive. Ljubljana, WFD Twinning Project in Slovenia, http://www.oeaw-giscience.org/ProjectFact Sheets/ProjectFactSheet_EU_SDI.pdf. Andrienko, N., Andrienko G. (2006), Exploratory Spatial Analysis, Springer Brunner, F.K., Zobl, F., Gassner, G. (2003), On the Capability of GPS for Landslide Monitoring. Felsbau 2/2003, 51-54. de Chardin, T. (1950), La condition humaine [Der Mensch im Kosmos]. Beck, Stuttgart. Dodge, M., McDerby, M., Turner, M. (eds.) (2008) Geographic Visualisation, Wiley Dykes, J., A. MacEachren, et al. (2005). Exploring Geovisualization. Oxford, Elsevier. Galileo, G. (1632), Dialogo sopra i due massimi sistemi del mondo, tolemaico, e copernicano. Fiorenza. GIScience, (2008), Connecting Real and Virtual Worlds. Poster at AGIT’08, http://www.oeawgiscience.org/index.php?option=com_content&ta sk=blogcategory&id=43&Itemid=29. Gore, A. (1998). Vision of Digital Earth, http://www.isde5.org/al_gore_speech.htm. Kraak (2009), Minard’s map. www.itc.nl/personal/kraak/1812/3dnap.swf Longley, P.A. et al. (2001) Geographic Information. Science and Systems, Wiley MacEachren, A. M., M. Gahegan, et al. (2004). Geovisualization for Knowledge Construction and Decision Support. IEEE Computer Graphics & Applications 2004 (1/2): 13-17. Marschallinger, R. (2009), Analysis and Integration of Geo-Data. http://www.oeaw-giscience.org/. Peuquet, D. J. (2002). Representations of Space and Time. New York, The Guilford Press. Wallentin, G. (2009), Ecology & GIS. Spatiotemporal modelling of reforestation processes. See http://www.oeawgiscience.org/images/stories/Downloads/pecha% 20kucha%20technoz%20day.pdf Zobl, F. (2009), Mapping, Modelling and Visualisation of georelevant processes. http://www.oeaw-giscience.org/.

                                                            
GIScience goes way beyond this view of time and space (considering time as function) because it allows for much more complex queries and analyses.
i

UbiCC Journal – Volume 4 No. 3

617


				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:486
posted:8/19/2009
language:English
pages:75
Description: UBICC, the Ubiquitous Computing and Communication Journal [ISSN 1992-8424], is an international scientific and educational organization dedicated to advancing the arts, sciences, and applications of information technology. With a world-wide membership, UBICC is a leading resource for computing professionals and students working in the various fields of Information Technology, and for interpreting the impact of information technology on society.