Overclocked Load Scheduling in Large Clustered Reservation Systems
The International Journal of Computer Science and Information Security is a monthly periodical on research articles in general computer science and information security which provides a distinctive technical perspective on novel technical research work, whether theoretical, applicable, or related to implementation. Target Audience: IT academics, university IT faculties; and business people concerned with computer science and security; industry IT departments; government departments; the financial industry; the mobile industry and the computing industry. Coverage includes: security infrastructures, network security: Internet security, content protection, cryptography, steganography and formal methods in information security; multimedia systems, software, information systems, intelligent systems, web services, data mining, wireless communication, networking and technologies, innovation technology and management. Thanks for your contributions in July 2010 issue and we are grateful to the reviewers for providing valuable comments. IJCSIS July 2010 Issue (Vol. 8, No. 4) has an acceptance rate of 36 %.
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No.4, 2010 Overclocked Load Scheduling in Large Clustered Reservation Systems Tania Taami Amir Masoud Rahmani Ahmad Khademzade Ismail Ataie Islamic Azad University, Islamic Azad University, Islamic Azad University, Jam Petro. Complex, Science and Research Branch, Science and Research Branch, Science and Research Branch, Tehran, Iran Tehran, Iran Tehran, Iran Tehran, Iran email@example.com firstname.lastname@example.org email@example.com Zadeh@itrc.ac.ir Abstract—Advanced resource reservation has a great role in Physical architectural model of computing nodes is a maintaining QoS of requests. Resource allocation and cluster of nodes that connected by a shared back bone . management to reservation requests for optimal utilization Any workload is divided in two subdivisions. In the first and guarantee of quality of service is challenging effort. When division workload is deployed to node or nodes and in the a reservation request for a resource type fails although enough second division workload(s) is started and continued up to free capacity might be available, there is not any chance for its end. After transferring workload(s) to target(s), resolving conflicts. Inflexibility of reservation request in computation starts and terminates until end of its workload. support of replacement on time axis, results in rigid resource Two constraints exist on this model: computation capacity of utilization and even poor QoS of the system. But with the help nodes and bandwidth capacity of infrastructure of network. of new overclocking technologies for doing over-clocking on some current scheduled reservation chunks, new chances Using overclocking any reservations or allocation on emerge to beat these restrictions . Using strict overclocking computing nodes could be relocated, finish times. schema with traditional processors in limited time in cluster of Computing resources overclocking needs awareness of servers, simulation results show QoS of reservations could be troubles that might be introduced in reliability of results and improved. This is came through with improvement to utilizing on hardware chips. On the other hand, solving thermal of resources and increasing accepted reservations without any equations of node material is costly in real time scheduler side effects on processing and reliability of computations. . So, for improving the schedulers we need a simple and dependable model to utilize capabilities of resources. Keywords-scheduling; overclocking; thermal behaviour; advance reservation; cluster; QoS; The layout of this letter will be as follows: section ІІ will describe system model, reservation model, overclocking concepts and strict overclocking schema. In section ІІІ we I. INTRODUCTION will propose an algorithm that combined overclocking and In center of any collection system should be a scheduler scheduling mechanisms into harmony. We will evaluate the to manage and allocate resources to the clients in appropriate performance of proposed algorithm with the simulation and time. Once of most essential resources in any system, either results in section IV. Finally, in section V we present our single or orchestrated system is processing unit. Accepting conclusions of algorithms and proposed over-clocking and scheduling requests in appropriate time on appropriate schema. nodes is challenging effort of scheduler. In this paper we concentrate on overclocking computing resource to beat II. MODELS AND OVERCLOCKING CONCEPTS underutilized resources and improving QoS of reservations. Previously, many efforts have been done for scheduling A. System Model in clusters or grid systems [2, 6, 7, 8, 9, 10, 11] and also In this paper we choose system models of . At this scheduling with over-clocking capabilities in single node moment, briefly describe this model. systems for real-time (periodic and aperiodic) jobs [1, 5], but no studies about the integration of these yet. In this model we have one type of requests: reservation requests. according definition any reservation request R has In reliable overclocking, computing resource should be five parameters: Rc, Rs, Re, n, Rio, where Rc is coming time controlled so that does not pass the thermal threshold of of reservation request, Rs is start time of reservation, Re is equipment . In this paper is introduced simple model of end time of reservation, n is number of processing units that reliable overclocking processors, either overcome should be served for reservation and Rio is aspect of time is complexity of real thermal model of processors that impact required to transferring reservation request to processing any algorithms in real time and either reduce complexity of units. In this model requests should be guaranteed to computation of thermal radiated from processors that also serviced with n processing unit, in interval Rs and Re. reduce computation time of any stage of algorithm. Reserves could not coming in system earlier than Rc time 320 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No.4, 2010 but could out of system earlier than Re time if all of works The parameter and relate power consumption of have been done on computing nodes. processor to its speed. The parameter has a value of The system model in this paper is considered as a cluster roughly 3.0 [1,3]. For safety of system, processor of nodes that connected by a single and shared media temperature should not reach to critical point of temperature, backbone, similar to a LAN network. A cluster consist of due to damaging effects on chip operation. one coordinator node and n agent nodes A1, A2, ..,An. the According to thermal model in the (1), we can derive coordinator node receives requests, reservations, and following (2) for calculating temperature at any point of possibly plans to schedule request on agent nodes by its time[1,3]: scheduler module. In a different way, each agent node also has two major parts: local scheduler and processor frequency TE =TF +(T TF )et / τ 0 (2) controller. The coordinator's scheduler dispatches scheduling timetables and requests that should be ran on node, to agent Where in general TF = Rs is steady state temperature at F schedulers. According received timetables local scheduler overclocking speed of sF and TE = R s is temperature at E give control of processing unit to request, the reservation. between times with speed of sE after elapsed t unit of time, Figure 1 shows structure of cluster of nodes with a master or and T0 is the temperature at lowest level at the start time. coordinator for managing several agent nodes that all Parameter is equal to R·C and t is elapsed time of time that connected to single backbone. temperature was T0. According to this model of computation, there are two By this equation, we can calculate the t value: resource, computing resource and network resource. Based on these two types of resources, there are conflicts on To T H accessing and utilization them. First conflict appears when t = τ ln ( ) (3) any two or more request want exclusively access the TE TH network media for communicating and deploying workload To avoiding complex and time consuming computations to destination node. Only one of them could access the at run time on scheduler, we utilize simple and effective network and transfer its data to destination node. Another strict overclocking schema. Consequently, in this schema, resource is computing power of the nodes. When a request we exploited three phases in support of CPU frequency wants completely access to the node, intended for uses it for scaling, under-clocked phase, normal clocked phase and processing purposes in some time interval, other requests overclocked phase. In under-clocking phase (i.e. idle mode) could not access it until end of processing time of current frequency of processor is reduced to minimum available request on it. value which results in reduced temperature to near the minimum possible value. In the over-clocking phase transiently frequency of processor is increased to maximum value until temperature reach to normal point. Finally in the Local Scheduler Frequency controller normal-clocking phase frequency backs to nominal it to continue probably reminded workload of request. Agent node 1 Considering the temperature is not above normal, reliability Global and continuity of computing operations are preserved. Also Scheduler we cover two working modes in the schema, normal load Coordinator node mode and idle load mode. To reducing temperature more Local Frequency quickly in idle mode we never deploy any workload to the Scheduler controller processor that keeps temperature and frequency in lowest limit, i.e. under-clocking phase. We exploit this situation due Agent node n to expanding succeeding overclocking interval to the Figure 1. Topology of cluster of nodes with a coordinator and many agent maximum possible value. Using the (3) we can calculate t nodes. and ratio of under-clocking to over-clocking periods. B. Thermal Model III. ALGORITHM Relation between processor speed and thermal behavior In this section we introduce a scheduling algorithm that of any chip can be approximated by the following uses described strict overclocking schema in situations equation: where conflicts are appeared between current reservation request and previous guaranteed and scheduled requests, κsα (t) T(t) reservations parts, is discovered. T ' (t)= C R C As previously described, for overclocking any time Where T(t) is temperature at time t and s(t) is speed of period of the processors, we elaborate the three step strict processor at time t. the parameters R and C are the thermal overclocking schema: in first step, node processor get under- resistance and capacitance of chips, respectively (with fan or clocking frequency with idle workload, in the second, the any peripheral attached to chip, like heat sink). node get overclocking frequency, and last, the node get normal clocking frequency. Only the timeslots of processor 321 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No.4, 2010 could be overclocked if exists enough timeslot before it that hasn't been allocated to any request. // reserve nodes with overclocking 15 for i=1 to R.n - #AvailabeNodes In following algorithms there are two overclocking 16 RE=EligiblesAlloci.R approach: other-overclocking and self-overclocking. In 17 = min((RE.Re-RE.Rs-RE.Rio), maxOCTime)*OCRate; other-overclocking approach, timeslot of processor belong to 18 EligibleAllocsi.interval.start -= Tidle; other previous requests, the reservations, is overclocked. But 19 EligibleAllocsi.interval.end -= ; in self-overclocking approach, current request on nodes is 20 updateAllocOnNode(EligibleAllocsi.node,EligibleAllocsi); overclocked. 21 allocateNode(EligibleAllocsi.node, R.Rs, R.Re, R); The doReserve algorithm (Fig. 2) firstly tries to schedule 22 end for; 23 return true; reservation R in cluster of nodes, without over-clocking. If 24 else it could not proceed, tries to apply overclocking techniques. // find nodes that have self OverClocking condition for The doReserveWithOverClock algorithm (Fig. 3) implements a // Reservation R strict overclocking schema that previously has been 25 selfOCNodes explained. First it finds eligible nodes; the nodes could be 26 for i=1 to n overclocked during period of some scheduled jobs or 27 if (isFree(nodei, R.Rs- Tidle, R.Re-()) reservations. If it could schedule by available nodes with 28 selfOCNodes +=nodei; normal clocking and overclocking other possible nodes, 29 end for either self-overclocking or other-overclocking, it proceeds, 30 if (#EligibleAllocs+ #selfOCNodes+#AvailabeNodes otherwise it fails. Value of is amount of time that the end R.n) of request goes back because of overclocking. The Tidle 31 reserveNodes(AvailabeNodes, R, # AvailabeNodes); parameter is the required time for period of under-clocking // reserve nodes for R reservation with overclocking other with idle workload. // scheduled requests 32 for i=1 to R.n - #AvailabeNodes 33 RE=EligiblesAlloci.R boolean doReserve (R) 34 = min((RE.Re-RE.Rs-RE.Rio),maxOCTime)*OCRate; 1 if (isFreeIO(R.Rs, (R.Re- R.Rs)·R.Rio) == false) 35 EligibleAllocsi.interval.start -= Tidle; 2 return false; 36 EligibleAllocsi.interval.end -= ; 3 AvailabeNodes findAvailabeNodes(R.Rs, R.Re); 37 updateAllocOnNode(EligibleAllocsi.node, 4 if (#AvailableNodes < R.n) EligibleAllocsi); 5 return doReserveWithOverClock(R); 38 allocateNode(EligibleAllocsi.node, R.Rs, R.Re, R); 6 else reserveNodes(AvailabeNodes, R.Rs, R.Re, R.n); 39 end for; 7 return true; // reserve nodes for R Reservation with Overclocking R itself 40 for i=1 to R.n- (#EligibleAllocs+ #AvailabeNodes) 41 = min((R.Re-R.Rs-R.Rio), maxOCTime)*OCRate; 42 allocStartTime = R.Rs - Tidle; Figure 2. Top level of reservation algorithm 43 allocEndTime= R.Re - ; 44 allocateNode(nodei, allocStartTime, allocEndTime, R); 45 end for; 46 return true; boolean doReserveWithOverClock (R) 47 end if; // find and set Eligible Allocation scheduled slot of nodes for 48 end if; // overcloking 49 return false; 1 EligibleAllocs 2 for i = 1 to n 3 Alloci=null; Figure 3. Strict over-clocking schedular algorithm 4 =min((R.Re-R.Rs-R.Rio), maxOCTime)*OCRate; 5 if (Rid = cpuOverlap(nodei, R.Rs, R.Re)) != null and Overclocking schema could be applied on start time of 5.1 isFree(nodei, Rid.Rs-Tidle-Rid.Rio, Rid.Rs) and computation until end time of it. That is to say, overclocking 5.2 (Rid.Re - ) Rs and couldn't be applied on communication part of request 5.3 isFree(nodei, Rid.Re, R.Re) ) because communication time of any request depended to 6 TimeIntervalnodei, R( Rid.Rs - Tidl, R.Re-); network specification of cluster (i.e. bandwidth) and could 7 Alloci= (nodei, Rid, TimeIntervalnodei, R); not be altered or increased without changing physical 8 end if; characteristics of underlying network's components. 9 if (Alloci !=null) 10 eligibleAllocs eligibleAllocs + Alloci; 11 end for IV. PERFORMANCE EVALUATION 12 AvailabeNodes findAvailabeNodes(R.Rs, R.Re); For analysis of mentioned strict overclocking schema, 13 if (#EligibleAllocs + #AvailabeNodes R.n) we simulate a cluster of nodes with varying processing 14 reserveNodes(AvailabeNodes, R, #AvailabeNodes); nodes and reservation requests. In all simulations, maximum number of requested nodes by any reservation request is number of nodes in cluster. The reservation requests deploy 322 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No.4, 2010 its workload to the nodes by using multicasting approach, 70 aimed to maximize bandwidth utilization. Overclocked Nodes=100, Workload= Normal For simulating previous algorithms, we use following 60 parameters: Arrival time of reservation requests have Poisson distribution with average of 50 unit of time. Initially 50 we consider length of requests be near to overclocking Utilization(%) period, i.e. in interval of [40 .. 50], with uniform distribution 40 that is named . This value of is nearly double of overclocking time length. Secondly we studied multiples of the in system utilization and acceptance ratio of system. 30 For computing fractions of idle time to overclocking time, we used Dell Latitude D810 with Centrino processor and (3). 20 Based in this provision, this ratio calculated as 3 to 2, 3 units of time for idle time and 2 units of time for overclocking 10 time. As mentioned previously, number of requested nodes 5 10 15 20 25 30 35 40 45 50 in each reservation is in [1 .. number of nodes] interval, i.e. number of requests (103) with increasing number of nodes, request of nodes for each reservation will rise. Total simulation time, 11 hours was 80 considered. Yield of overclocking than normal operation of Nodes=100, Workload= processor is 0.5 (the OCRate in the algorithm 2). Also 75 Overclocked Normal communication time ration or the Rio is 0.1 of total 70 workload. Although advance reservation is used for 65 guarantee of QoS of mixed typical job and reservation for Acceptance(%) reservation request, in this model we detach start of service 60 and start of request for adapting with future advance 55 reservation models, and simulation purposes (FIFO model). 50 Results (Fig. 2) show that using strictly overclocking 45 schema improves utilization of resources and acceptance ratio of reservation request in scalable form. 40 Overall, because of multi node reservation request that is 35 responded through dynamic and elasticity of overclocking, 30 that impacts and results in more utilization in overclocked 5 10 15 20 25 30 35 40 45 50 number of requests (103) schema than normal clocking schema, despite of reducing and convergence of overclocked and normal schema Figure 4. Acceptance and utilization in 100 nodes. together. Fig. 3 in comparison with Fig. 2 proves that increasing 80 Nodes=500, Workload= number of nodes have not any impact on improving 75 Overclocked utilization and acceptance ratio similar to normal clocking. 70 Normal In other way, with increasing average length of 65 reservation workloads, overall overclocked utilization Acceptance(%) improvement with respect to normal clocking, will be 60 increased. The reason is that, with increasing the workload, 55 side effects of idle time slice that happened before any 50 overclocking part of workload, is decreased. But with growing number of requests at the constant workload rate, 45 this gain is starting to be decreased, because side effects of 40 underutilized idle times before any overclocked time slices 35 will be raised. 30 5 10 15 20 25 30 35 40 45 50 number of requests (103) 323 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No.4, 2010 70 55 Nodes=500, Workload= Nodes=500, Workload=3 Overclocked Overclocked Normal Normal 50 60 45 50 Utilization(%) Acceptance(%) 40 40 35 30 30 25 20 20 10 5 10 15 20 25 30 35 40 45 50 15 number of requests (103) 5 10 15 20 25 30 35 40 45 50 number of requests (103) Figure 5. Acceptance and utilization in 500 nodes In all cases, normal and over-clocked schema, increasing 65 Nodes=500, Workload=2 Overclocked average length of reservations will cause drop of acceptance 60 Normal ratio of reservation requests. Coming out such results is obvious; because of increasing length of reservations, the 55 probability of facing of them with each other will increase 50 simultaneously. Acceptance(%) 45 80 Nodes=500, Workload=2 40 Overclocked Normal 35 70 30 60 Utilization(%) 25 20 50 5 10 15 20 25 30 35 40 45 50 number of requests (103) 40 Figure 6. Acceptance and utilization in 500 nodes with workload of 2 and 3. 30 With increasing the workload length of reservations absolutely, both normal and overclocked schemas quickly improve more than before until to reach saturation point. At 20 5 10 15 20 25 30 35 40 45 50 number of requests (103) this point, increasing number of requests, the overclocking has no other influences. Fig. 5 with Fig. 6 shows this matter. Nodes=500, Workload=3 Based on default value of ,2and 3Fig. 7 graphs show 80 Overclocked that increasing average workload of requests, peak point of Normal improvement is shifted to left, i.e. towards to less reservation 70 request numbers. This means, with increasing workload, collision between end time of requests and required idle time Utilization(%) 60 intervals before overclocking time of processor, will happen sooner. 50 40 30 20 5 10 15 20 25 30 35 40 45 50 number of requests (103) 324 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No.4, 2010 30  N. Bansal and K. Pruhs, “Speed scaling to manage temperature”, in Symposium on Theoretical Aspects of Computer Science, 2005. Nodes=500  N. Bansal, T. Kimbrel, and K. Pruhs, “Dynamic speed scaling to manage energy and temperature”, IEEE Syposium on Foundations of 25 Computer Science, 2004.  S. Wang, R. Bettati, “Reactive Speed Control in Temperature- Utilization Improvement(%) Constrained Real-Time Systems”, Proceedings of the 18th Euromicro Conference on Real-Time Systems (ECRTS 06), Dresden, Germany, pp. 161-170, July 2006. 20  L. Eyraud-dubois , G. Mounié , D. Trystram, “Analysis of Scheduling Algorithms with Reservations”, Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium, USA, 2007. 15  J. Blazewicz, P. Dell’Olmo, M. Drozdowski, P. Maczka, “Scheduling multiprocessor tasks on parallel processors with limited availability”, European Journal of Operational Research, vol. 149, pp. 377–389, 2003. 10 5 10 15 20 25 30 35 40 45 50  J. Blazewicz, M. Machowiak, J. Weglarz, M. Kovalyov, D. Trystram, number of requests (103) “Schedulingmalleable tasks on parallel processors to minimize the makespan”. Annals of Operations Research, vol. 129, pp. 65–80, 16 2004.  K. Jansen. “Scheduling malleable parallel tasks: An asymptotic fully Nodes=500 14 polynomial time approximation scheme”, Algorithmica, vol. 39, pp. 59–81, 2004. 12  O.H. Kwon, K.Y. Chwa, “Scheduling parallel tasks with individual Acceptance Improvement(%) deadlines”, 6th International Symposium on Algorithms and 10 Computation, Springer-Verlag, vol. 215, pp. 198–207, 1995.  V. Subramani, R. Kettimuthu, S. Srinivasan, P. Sadayappan, 8 “Distributed Job Scheduling on Computational Grids Using Multiple Simultaneous Requests”, IEEE Computer Society, p. 359, 2002. 6  A. Mamat, Y. Lu, J. Deogun, S. Goddard, “Real-Time Divisible Load Scheduling with Advance Reservation”, Euromicro Conference 4 on Real-Time Systems (ECRTS '08), Prague, pp. 37-46, 2008. 2 ACKNOWLEDGMENT 0 This work was supported by Iran Telecommunication 5 10 15 20 25 30 35 40 45 50 number of requests (103) Research Center (ITRC). Figure 7. Acceptance and utilization improvement in 500 nodes with workload of , 2 and 3 V. CONCLUSIONS Study of results shows that by means of the proposed strict overclocking schema in controlled boundary, utilization absolutely increases than normal clocking. Also, acceptance rate of system with limited conditions increase. In addition, as temperature of processing nodes could not reach to critical point, reliability of computation is preserved. With preserving power of processor, economical and commercial aspect of power consumption remains. Expanding networks and resources, we can use this schema in larger grid networks than clusters. Since resources exclusively are provided to requests, this model and algorithms is very good for private grids that total resources available for commercial purposes. REFERENCES  Y. Ahn, R. Bettati, “Transient Overclocking for Aperiodic Task Execution in Hard Real-Time Systems”, Euromicro Conference on Real-Time Systems (ECRTS '08), Prague, p. 102, 2008.  D. G. Feitelson, “Scheduling parallel jobs on clusters”, High Performance Cluster Computing, vol. 1, Architectures and Systems, pp. 519–533, 1999. 325 http://sites.google.com/site/ijcsis/ ISSN 1947-5500