Overclocked Load Scheduling in Large Clustered Reservation Systems
The International Journal of Computer Science and Information Security is a monthly periodical on research articles in general computer science and information security which provides a distinctive technical perspective on novel technical research work, whether theoretical, applicable, or related to implementation. Target Audience: IT academics, university IT faculties; and business people concerned with computer science and security; industry IT departments; government departments; the financial industry; the mobile industry and the computing industry. Coverage includes: security infrastructures, network security: Internet security, content protection, cryptography, steganography and formal methods in information security; multimedia systems, software, information systems, intelligent systems, web services, data mining, wireless communication, networking and technologies, innovation technology and management. Thanks for your contributions in July 2010 issue and we are grateful to the reviewers for providing valuable comments. IJCSIS July 2010 Issue (Vol. 8, No. 4) has an acceptance rate of 36 %.

(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No.4, 2010
Overclocked Load Scheduling in Large Clustered
Reservation Systems
Tania Taami Amir Masoud Rahmani Ahmad Khademzade Ismail Ataie
Islamic Azad University, Islamic Azad University, Islamic Azad University, Jam Petro. Complex,
Science and Research Branch, Science and Research Branch, Science and Research Branch, Tehran, Iran
Tehran, Iran Tehran, Iran Tehran, Iran ataie.ismail@gmail.com
t.taami@srbiau.ac.ir rahmani@sr.iau.ac.ir Zadeh@itrc.ac.ir
Abstract—Advanced resource reservation has a great role in Physical architectural model of computing nodes is a
maintaining QoS of requests. Resource allocation and cluster of nodes that connected by a shared back bone [12].
management to reservation requests for optimal utilization Any workload is divided in two subdivisions. In the first
and guarantee of quality of service is challenging effort. When division workload is deployed to node or nodes and in the
a reservation request for a resource type fails although enough second division workload(s) is started and continued up to
free capacity might be available, there is not any chance for its end. After transferring workload(s) to target(s),
resolving conflicts. Inflexibility of reservation request in computation starts and terminates until end of its workload.
support of replacement on time axis, results in rigid resource Two constraints exist on this model: computation capacity of
utilization and even poor QoS of the system. But with the help nodes and bandwidth capacity of infrastructure of network.
of new overclocking technologies for doing over-clocking on
some current scheduled reservation chunks, new chances Using overclocking any reservations or allocation on
emerge to beat these restrictions [1]. Using strict overclocking computing nodes could be relocated, finish times.
schema with traditional processors in limited time in cluster of Computing resources overclocking needs awareness of
servers, simulation results show QoS of reservations could be troubles that might be introduced in reliability of results and
improved. This is came through with improvement to utilizing on hardware chips. On the other hand, solving thermal
of resources and increasing accepted reservations without any equations of node material is costly in real time scheduler
side effects on processing and reliability of computations. [1]. So, for improving the schedulers we need a simple and
dependable model to utilize capabilities of resources.
Keywords-scheduling; overclocking; thermal behaviour;
advance reservation; cluster; QoS; The layout of this letter will be as follows: section ІІ will
describe system model, reservation model, overclocking
concepts and strict overclocking schema. In section ІІІ we
I. INTRODUCTION
will propose an algorithm that combined overclocking and
In center of any collection system should be a scheduler scheduling mechanisms into harmony. We will evaluate the
to manage and allocate resources to the clients in appropriate performance of proposed algorithm with the simulation and
time. Once of most essential resources in any system, either results in section IV. Finally, in section V we present our
single or orchestrated system is processing unit. Accepting conclusions of algorithms and proposed over-clocking
and scheduling requests in appropriate time on appropriate schema.
nodes is challenging effort of scheduler. In this paper we
concentrate on overclocking computing resource to beat II. MODELS AND OVERCLOCKING CONCEPTS
underutilized resources and improving QoS of reservations.
Previously, many efforts have been done for scheduling A. System Model
in clusters or grid systems [2, 6, 7, 8, 9, 10, 11] and also In this paper we choose system models of [12]. At this
scheduling with over-clocking capabilities in single node moment, briefly describe this model.
systems for real-time (periodic and aperiodic) jobs [1, 5], but
no studies about the integration of these yet. In this model we have one type of requests: reservation
requests. according definition any reservation request R has
In reliable overclocking, computing resource should be five parameters: Rc, Rs, Re, n, Rio, where Rc is coming time
controlled so that does not pass the thermal threshold of of reservation request, Rs is start time of reservation, Re is
equipment [1]. In this paper is introduced simple model of end time of reservation, n is number of processing units that
reliable overclocking processors, either overcome should be served for reservation and Rio is aspect of time is
complexity of real thermal model of processors that impact required to transferring reservation request to processing
any algorithms in real time and either reduce complexity of units. In this model requests should be guaranteed to
computation of thermal radiated from processors that also serviced with n processing unit, in interval Rs and Re.
reduce computation time of any stage of algorithm. Reserves could not coming in system earlier than Rc time
320 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No.4, 2010
but could out of system earlier than Re time if all of works The parameter and relate power consumption of
have been done on computing nodes. processor to its speed. The parameter has a value of
The system model in this paper is considered as a cluster roughly 3.0 [1,3]. For safety of system, processor
of nodes that connected by a single and shared media temperature should not reach to critical point of temperature,
backbone, similar to a LAN network. A cluster consist of due to damaging effects on chip operation.
one coordinator node and n agent nodes A1, A2, ..,An. the According to thermal model in the (1), we can derive
coordinator node receives requests, reservations, and following (2) for calculating temperature at any point of
possibly plans to schedule request on agent nodes by its time[1,3]:
scheduler module. In a different way, each agent node also
has two major parts: local scheduler and processor frequency TE =TF +(T TF )et / τ
0
(2)
controller. The coordinator's scheduler dispatches scheduling
timetables and requests that should be ran on node, to agent Where in general TF = Rs is steady state temperature at
F
schedulers. According received timetables local scheduler overclocking speed of sF and TE = R s is temperature at
E
give control of processing unit to request, the reservation. between times with speed of sE after elapsed t unit of time,
Figure 1 shows structure of cluster of nodes with a master or and T0 is the temperature at lowest level at the start time.
coordinator for managing several agent nodes that all Parameter is equal to R·C and t is elapsed time of time that
connected to single backbone. temperature was T0.
According to this model of computation, there are two
By this equation, we can calculate the t value:
resource, computing resource and network resource. Based
on these two types of resources, there are conflicts on To T H
accessing and utilization them. First conflict appears when t = τ ln ( ) (3)
any two or more request want exclusively access the TE TH
network media for communicating and deploying workload
To avoiding complex and time consuming computations
to destination node. Only one of them could access the
at run time on scheduler, we utilize simple and effective
network and transfer its data to destination node. Another
strict overclocking schema. Consequently, in this schema,
resource is computing power of the nodes. When a request
we exploited three phases in support of CPU frequency
wants completely access to the node, intended for uses it for
scaling, under-clocked phase, normal clocked phase and
processing purposes in some time interval, other requests
overclocked phase. In under-clocking phase (i.e. idle mode)
could not access it until end of processing time of current
frequency of processor is reduced to minimum available
request on it.
value which results in reduced temperature to near the
minimum possible value. In the over-clocking phase
transiently frequency of processor is increased to maximum
value until temperature reach to normal point. Finally in the
Local
Scheduler
Frequency
controller
normal-clocking phase frequency backs to nominal it to
continue probably reminded workload of request.
Agent node 1 Considering the temperature is not above normal, reliability
Global and continuity of computing operations are preserved. Also
Scheduler
we cover two working modes in the schema, normal load
Coordinator node mode and idle load mode. To reducing temperature more
Local Frequency
quickly in idle mode we never deploy any workload to the
Scheduler controller
processor that keeps temperature and frequency in lowest
limit, i.e. under-clocking phase. We exploit this situation due
Agent node n
to expanding succeeding overclocking interval to the
Figure 1. Topology of cluster of nodes with a coordinator and many agent maximum possible value. Using the (3) we can calculate t
nodes. and ratio of under-clocking to over-clocking periods.
B. Thermal Model III. ALGORITHM
Relation between processor speed and thermal behavior In this section we introduce a scheduling algorithm that
of any chip can be approximated by the following uses described strict overclocking schema in situations
equation[1]: where conflicts are appeared between current reservation
request and previous guaranteed and scheduled requests,
κsα (t) T(t) reservations parts, is discovered.
T ' (t)=
C R C
As previously described, for overclocking any time
Where T(t) is temperature at time t and s(t) is speed of period of the processors, we elaborate the three step strict
processor at time t. the parameters R and C are the thermal overclocking schema: in first step, node processor get under-
resistance and capacitance of chips, respectively (with fan or clocking frequency with idle workload, in the second, the
any peripheral attached to chip, like heat sink). node get overclocking frequency, and last, the node get
normal clocking frequency. Only the timeslots of processor
321 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No.4, 2010
could be overclocked if exists enough timeslot before it that
hasn't been allocated to any request. // reserve nodes with overclocking
15 for i=1 to R.n - #AvailabeNodes
In following algorithms there are two overclocking 16 RE=EligiblesAlloci.R
approach: other-overclocking and self-overclocking. In 17 = min((RE.Re-RE.Rs-RE.Rio), maxOCTime)*OCRate;
other-overclocking approach, timeslot of processor belong to 18 EligibleAllocsi.interval.start -= Tidle;
other previous requests, the reservations, is overclocked. But 19 EligibleAllocsi.interval.end -= ;
in self-overclocking approach, current request on nodes is 20 updateAllocOnNode(EligibleAllocsi.node,EligibleAllocsi);
overclocked. 21 allocateNode(EligibleAllocsi.node, R.Rs, R.Re, R);
The doReserve algorithm (Fig. 2) firstly tries to schedule 22 end for;
23 return true;
reservation R in cluster of nodes, without over-clocking. If
24 else
it could not proceed, tries to apply overclocking techniques. // find nodes that have self OverClocking condition for
The doReserveWithOverClock algorithm (Fig. 3) implements a // Reservation R
strict overclocking schema that previously has been 25 selfOCNodes
explained. First it finds eligible nodes; the nodes could be 26 for i=1 to n
overclocked during period of some scheduled jobs or 27 if (isFree(nodei, R.Rs- Tidle, R.Re-())
reservations. If it could schedule by available nodes with 28 selfOCNodes +=nodei;
normal clocking and overclocking other possible nodes, 29 end for
either self-overclocking or other-overclocking, it proceeds, 30 if (#EligibleAllocs+ #selfOCNodes+#AvailabeNodes
otherwise it fails. Value of is amount of time that the end R.n)
of request goes back because of overclocking. The Tidle 31 reserveNodes(AvailabeNodes, R, # AvailabeNodes);
parameter is the required time for period of under-clocking // reserve nodes for R reservation with overclocking other
with idle workload. // scheduled requests
32 for i=1 to R.n - #AvailabeNodes
33 RE=EligiblesAlloci.R
boolean doReserve (R) 34 = min((RE.Re-RE.Rs-RE.Rio),maxOCTime)*OCRate;
1 if (isFreeIO(R.Rs, (R.Re- R.Rs)·R.Rio) == false) 35 EligibleAllocsi.interval.start -= Tidle;
2 return false; 36 EligibleAllocsi.interval.end -= ;
3 AvailabeNodes findAvailabeNodes(R.Rs, R.Re); 37 updateAllocOnNode(EligibleAllocsi.node,
4 if (#AvailableNodes < R.n) EligibleAllocsi);
5 return doReserveWithOverClock(R); 38 allocateNode(EligibleAllocsi.node, R.Rs, R.Re, R);
6 else reserveNodes(AvailabeNodes, R.Rs, R.Re, R.n); 39 end for;
7 return true; // reserve nodes for R Reservation with Overclocking R itself
40 for i=1 to R.n- (#EligibleAllocs+ #AvailabeNodes)
41 = min((R.Re-R.Rs-R.Rio), maxOCTime)*OCRate;
42 allocStartTime = R.Rs - Tidle;
Figure 2. Top level of reservation algorithm
43 allocEndTime= R.Re - ;
44 allocateNode(nodei, allocStartTime, allocEndTime, R);
45 end for;
46 return true;
boolean doReserveWithOverClock (R) 47 end if;
// find and set Eligible Allocation scheduled slot of nodes for 48 end if;
// overcloking 49 return false;
1 EligibleAllocs
2 for i = 1 to n
3 Alloci=null; Figure 3. Strict over-clocking schedular algorithm
4 =min((R.Re-R.Rs-R.Rio), maxOCTime)*OCRate;
5 if (Rid = cpuOverlap(nodei, R.Rs, R.Re)) != null and Overclocking schema could be applied on start time of
5.1 isFree(nodei, Rid.Rs-Tidle-Rid.Rio, Rid.Rs) and computation until end time of it. That is to say, overclocking
5.2 (Rid.Re - ) Rs and couldn't be applied on communication part of request
5.3 isFree(nodei, Rid.Re, R.Re) ) because communication time of any request depended to
6 TimeIntervalnodei, R( Rid.Rs - Tidl, R.Re-); network specification of cluster (i.e. bandwidth) and could
7 Alloci= (nodei, Rid, TimeIntervalnodei, R); not be altered or increased without changing physical
8 end if; characteristics of underlying network's components.
9 if (Alloci !=null)
10 eligibleAllocs eligibleAllocs + Alloci;
11 end for IV. PERFORMANCE EVALUATION
12 AvailabeNodes findAvailabeNodes(R.Rs, R.Re); For analysis of mentioned strict overclocking schema,
13 if (#EligibleAllocs + #AvailabeNodes R.n) we simulate a cluster of nodes with varying processing
14 reserveNodes(AvailabeNodes, R, #AvailabeNodes); nodes and reservation requests. In all simulations, maximum
number of requested nodes by any reservation request is
number of nodes in cluster. The reservation requests deploy
322 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No.4, 2010
its workload to the nodes by using multicasting approach, 70
aimed to maximize bandwidth utilization. Overclocked
Nodes=100, Workload=
Normal
For simulating previous algorithms, we use following 60
parameters: Arrival time of reservation requests have
Poisson distribution with average of 50 unit of time. Initially 50
we consider length of requests be near to overclocking
Utilization(%)
period, i.e. in interval of [40 .. 50], with uniform distribution 40
that is named . This value of is nearly double of
overclocking time length. Secondly we studied multiples of
the in system utilization and acceptance ratio of system. 30
For computing fractions of idle time to overclocking time,
we used Dell Latitude D810 with Centrino processor and (3). 20
Based in this provision, this ratio calculated as 3 to 2, 3 units
of time for idle time and 2 units of time for overclocking 10
time. As mentioned previously, number of requested nodes 5 10 15 20 25 30 35 40 45 50
in each reservation is in [1 .. number of nodes] interval, i.e. number of requests (103)
with increasing number of nodes, request of nodes for each
reservation will rise. Total simulation time, 11 hours was 80
considered. Yield of overclocking than normal operation of Nodes=100, Workload=
processor is 0.5 (the OCRate in the algorithm 2). Also 75 Overclocked
Normal
communication time ration or the Rio is 0.1 of total 70
workload. Although advance reservation is used for 65
guarantee of QoS of mixed typical job and reservation for
Acceptance(%)
reservation request, in this model we detach start of service 60
and start of request for adapting with future advance 55
reservation models, and simulation purposes (FIFO model).
50
Results (Fig. 2) show that using strictly overclocking 45
schema improves utilization of resources and acceptance
ratio of reservation request in scalable form. 40
Overall, because of multi node reservation request that is
35
responded through dynamic and elasticity of overclocking, 30
that impacts and results in more utilization in overclocked 5 10 15 20 25 30 35 40 45 50
number of requests (103)
schema than normal clocking schema, despite of reducing
and convergence of overclocked and normal schema Figure 4. Acceptance and utilization in 100 nodes.
together.
Fig. 3 in comparison with Fig. 2 proves that increasing 80
Nodes=500, Workload=
number of nodes have not any impact on improving 75 Overclocked
utilization and acceptance ratio similar to normal clocking. 70
Normal
In other way, with increasing average length of 65
reservation workloads, overall overclocked utilization
Acceptance(%)
improvement with respect to normal clocking, will be
60
increased. The reason is that, with increasing the workload, 55
side effects of idle time slice that happened before any 50
overclocking part of workload, is decreased. But with
growing number of requests at the constant workload rate,
45
this gain is starting to be decreased, because side effects of 40
underutilized idle times before any overclocked time slices 35
will be raised. 30
5 10 15 20 25 30 35 40 45 50
number of requests (103)
323 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No.4, 2010
70 55
Nodes=500, Workload= Nodes=500, Workload=3 Overclocked
Overclocked
Normal
Normal 50
60
45
50
Utilization(%)
Acceptance(%)
40
40
35
30 30
25
20
20
10
5 10 15 20 25 30 35 40 45 50 15
number of requests (103) 5 10 15 20 25 30 35 40 45 50
number of requests (103)
Figure 5. Acceptance and utilization in 500 nodes
In all cases, normal and over-clocked schema, increasing 65
Nodes=500, Workload=2 Overclocked
average length of reservations will cause drop of acceptance 60
Normal
ratio of reservation requests. Coming out such results is
obvious; because of increasing length of reservations, the 55
probability of facing of them with each other will increase 50
simultaneously.
Acceptance(%)
45
80
Nodes=500, Workload=2 40
Overclocked
Normal 35
70
30
60
Utilization(%)
25
20
50
5 10 15 20 25 30 35 40 45 50
number of requests (103)
40
Figure 6. Acceptance and utilization in 500 nodes with workload of 2
and 3.
30
With increasing the workload length of reservations
absolutely, both normal and overclocked schemas quickly
improve more than before until to reach saturation point. At
20
5 10 15 20 25 30 35 40 45 50
number of requests (103) this point, increasing number of requests, the overclocking
has no other influences. Fig. 5 with Fig. 6 shows this matter.
Nodes=500, Workload=3 Based on default value of ,2and 3Fig. 7 graphs show
80
Overclocked that increasing average workload of requests, peak point of
Normal
improvement is shifted to left, i.e. towards to less reservation
70
request numbers. This means, with increasing workload,
collision between end time of requests and required idle time
Utilization(%)
60
intervals before overclocking time of processor, will happen
sooner.
50
40
30
20
5 10 15 20 25 30 35 40 45 50
number of requests (103)
324 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No.4, 2010
30 [3] N. Bansal and K. Pruhs, “Speed scaling to manage temperature”, in
Symposium on Theoretical Aspects of Computer Science, 2005.
Nodes=500
[4] N. Bansal, T. Kimbrel, and K. Pruhs, “Dynamic speed scaling to
manage energy and temperature”, IEEE Syposium on Foundations of
25
Computer Science, 2004.
[5] S. Wang, R. Bettati, “Reactive Speed Control in Temperature-
Utilization Improvement(%)
Constrained Real-Time Systems”, Proceedings of the 18th Euromicro
Conference on Real-Time Systems (ECRTS 06), Dresden, Germany,
pp. 161-170, July 2006.
20
[6] L. Eyraud-dubois , G. Mounié , D. Trystram, “Analysis of Scheduling
Algorithms with Reservations”, Proceedings of the 21st IEEE
International Parallel and Distributed Processing Symposium, USA,
2007.
15
[7] J. Blazewicz, P. Dell’Olmo, M. Drozdowski, P. Maczka, “Scheduling
multiprocessor tasks on parallel processors with limited availability”,
European Journal of Operational Research, vol. 149, pp. 377–389,
2003.
10
5 10 15 20 25 30 35 40 45 50 [8] J. Blazewicz, M. Machowiak, J. Weglarz, M. Kovalyov, D. Trystram,
number of requests (103)
“Schedulingmalleable tasks on parallel processors to minimize the
makespan”. Annals of Operations Research, vol. 129, pp. 65–80,
16 2004.
[9] K. Jansen. “Scheduling malleable parallel tasks: An asymptotic fully
Nodes=500
14
polynomial time approximation scheme”, Algorithmica, vol. 39, pp.
59–81, 2004.
12 [10] O.H. Kwon, K.Y. Chwa, “Scheduling parallel tasks with individual
Acceptance Improvement(%)
deadlines”, 6th International Symposium on Algorithms and
10 Computation, Springer-Verlag, vol. 215, pp. 198–207, 1995.
[11] V. Subramani, R. Kettimuthu, S. Srinivasan, P. Sadayappan,
8 “Distributed Job Scheduling on Computational Grids Using Multiple
Simultaneous Requests”, IEEE Computer Society, p. 359, 2002.
6
[12] A. Mamat, Y. Lu, J. Deogun, S. Goddard, “Real-Time Divisible
Load Scheduling with Advance Reservation”, Euromicro Conference
4
on Real-Time Systems (ECRTS '08), Prague, pp. 37-46, 2008.
2
ACKNOWLEDGMENT
0
This work was supported by Iran Telecommunication
5 10 15 20 25 30 35 40 45 50
number of requests (103)
Research Center (ITRC).
Figure 7. Acceptance and utilization improvement in 500 nodes with
workload of , 2 and 3
V. CONCLUSIONS
Study of results shows that by means of the proposed
strict overclocking schema in controlled boundary,
utilization absolutely increases than normal clocking. Also,
acceptance rate of system with limited conditions increase.
In addition, as temperature of processing nodes could not
reach to critical point, reliability of computation is
preserved. With preserving power of processor, economical
and commercial aspect of power consumption remains.
Expanding networks and resources, we can use this
schema in larger grid networks than clusters. Since resources
exclusively are provided to requests, this model and
algorithms is very good for private grids that total resources
available for commercial purposes.
REFERENCES
[1] Y. Ahn, R. Bettati, “Transient Overclocking for Aperiodic Task
Execution in Hard Real-Time Systems”, Euromicro Conference on
Real-Time Systems (ECRTS '08), Prague, p. 102, 2008.
[2] D. G. Feitelson, “Scheduling parallel jobs on clusters”, High
Performance Cluster Computing, vol. 1, Architectures and Systems,
pp. 519–533, 1999.
325 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
Get documents about "