Performance Prediction of Single Static String Algorithms on Cluster Configurations
W
Shared by: ijcsiseditor
Categories
Tags
IJCSIS, call for paper, journal computer science, research, google scholar, IEEE, Scirus, download, ArXiV, library, information security, internet, peer review, scribd, docstoc, cornell university, archive, Journal of Computing, DOAJ, Open Access, June 2011, Volume 9, No. 6, Impact Factor, engineering, international, proQuest, computing, computer, technology
-
Stats
- views:
- 249
- posted:
- 7/5/2011
- language:
- English
- pages:
- 7
Document Sample


(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, 2011
Performance Prediction of Single Static String
Algorithms on Cluster Configurations
Prasad J. C. K. S. M. Panicker
Research Scholar, Dept. of CSE, Professor, Dept of CSE
Dr.MG.R University, Chennai Federal Institute of Science and Technology[FISAT]
cum Asst.Professor, Dept of CSE, FISAT, Angamaly, India
Angamaly, India
cheeeran.prasad@gmail.com
Abstract—This paper study various factors in the performance of Pattern P=P[1] P[2]…….P[m]. The characters of both T and P
static single pattern searching algorithms and predict the belong to a finite set of elements of the set S and
searching time for the operation to be performed in a cluster m<<n. Search processes identify all the occurrence of the
computing environment. Master-worker model of parallel pattern P in text T. Two types of input data have considered
computation and communication is designed for searching
algorithms with MPI technologies. Prediction and analysis is
(natural language input string and DNA sequence string) for
based on KMP, BM, BMH, ZT, QS, BR, FS, SSABS, TVBS, the evaluation of algorithms. The actual task of searching is
ZTBMH and BRFS algorithms. Performances have compared done parallel among the processors from 0 to p-1.
and discussed the results. Theoretical and practical results have Master node decompose the text in to r subtexts and
presented. This work consists of implementation of the same distributed to available workers(nodes)[6]. Each subtext
algorithms in two different cluster architecture environments contains k = [(n-m+1) / r] + m-1 characters, where k is the
Dhakshina-I and Dhakshina-II. This has improved the reliability successive characters of the complete text. There is an overlap
of the prediction method. The result of algorisms helps us to of m-1 successive characters between successive sub texts. So
predict and implement this technology in the areas of text search, there will be a redundancy of r(m-1) characters for processing.
DNA search and Web related fields in a cost effective manner.
The objective is to compare the result of searching with
Keywords- Static String Searching, Beowulf Cluster, Parallel different algorithms. So redundancy of searching does not
programming have relevance in the system[7].
III. RELATED WORKS
I. INTRODUCTION Smith[8] compared the theoretical running time and the
The importance of pattern searching and its prediction is practical running time of the Knuth-Morris-Pratt [KMP]
more relevant with the latest advancements in DNA algorithm, Boyer-Moore [BM] algorithm and Brute-Force[BF]
sequencing, web search engines, database operations, signal algorithm. Hume and Sunday[9] constricted taxonomy and
processing, error detection, speech and pattern recognition explored the performance of most existing versions of the
areas, which require pattern searching problem to process single keyword Boyer-Moore pattern matching algorithm. The
Terabytes amount of data. Most of the researchers usually various experiments selected efficient versions for use in
focus on achieving high throughput with expensive resources practical applications. Pirklbauer[10] compares several versions
of software and hardware. The cluster architecture considered of the Knuth-Morris-Pratt algorithm, several versions of BM
here is based on Beowulf architecture [1] with open source algorithm[11] and Brute-Force algorithm. Since Pirklbauer did
technologies with 1 Gbps speed of networks. During the not classified the algorithms, it is difficult to compare them
benchmark test[2] of HP Linpack system, the system worked
and the testing of the Boyer-Moore variants is not quite as
with a speed of 80 Gigaflops with 32 nodes. Therefore, this
extensive as the Hume and Sunday taxonomy.
research paper focuses on providing a high-speed but low-cost
string matching implementation details with limited resources.
The implementation is based on Message Passing A comparative study on the performance of today’s most
Implementation (MPI) standard [3,4,5] based parallel prominent retrieval models is presented below to identify the
programming technology. parameters involved in searching. Considered two Beowulf
cluster configurations[12] in which, second configuration is the
software upgradation with better speed performance and better
II. MASTER WORKER MODEL STATIC PATTERN
resource management in hardware systems. Second
SEARCHING ENVIRONMENT
configuration utilizes heterogeneous operating systems of
A single string pattern matching problem can be defined as customized Debian5 of Linux with BSD as Firewall systems.
follows. Let T be a large text of n number of character size and Sungrid engine acts as a master node in the second
P be a pattern of length m. Text T =T[1] T[2] ………..T[n] and configuration. Beowulf architecture is used with both
This work is sponsored by Centre for High Performance Computing, FISAT
300 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, 2011
configurations.
This work compares the results of searching with sequential TABLE4: PARALLEL EXECUTION TIME IN SECONDS FOR A FILE SIZE 12 MB
WITH 5 NODES IN BEOWULF CLUSTER DAKSHINA-II
and two Beowulf cluster configuration (Dakshina I &
m 5 10 15 20 25 50
Dakshina II) implementations to identify the parameters
involved in searching. To get a reliable and consistent KMP 0.304 0.289 0.278 0.264 0.261 0.257
performance result, the average of ten executions for different BM 0.161 0.156 0.144 0.132 0.107 0.054
pattern of constant length is given in the table as result values. BMH 0.162 0.153 0.145 0.138 0.122 0.061
ZT 0.292 0.268 0.254 0.253 0.247 0.236
TABLE 1: SEQUENTIAL EXECUTION TIME IN SECOND FOR A FILE SIZE 12 MB
QS 0.106 0.097 0.093 0.09 0.084 0.052
(N=12661224)
m 5 10 15 20 25 50 BR 0.138 0.135 0.131 0.124 0.118 0.061
KMP 1.47 1.414 1.401 1.381 1.372 1.394 FS 0.106 0.103 0.097 0.096 0.088 0.057
BM 0.783 0.754 0.703 0.644 0.536 0.257 SSABS 0.161 0.149 0.142 0.136 0.124 0.104
BMH 0.786 0.746 0.712 0.664 0.576 0.294 TVSBS 0.137 0.134 0.132 0.122 0.117 0.105
ZT 1.296 1.243 1.156 1.074 1.007 0.989 ZTMBH 0.196 0.189 0.165 0.132 0.093 0.037
QS 0.523 0.48 0.452 0.435 0.413 0.244 BRBMH 0.098 0.093 0.086 0.081 0.075 0.043
BR 0.673 0.664 0.631 0.602 0.574 0.307
0.523 0.512 0.481 0.463 0.433 0.283 TABLE5: PARALLEL EXECUTION TIME IN SECOND FOR A FILE SIZE 12 MB
FS
WITH 10 NODES BEOWULF CLUSTER DAKSHINA-II
SSABS 0.796 0.723 0.694 0.662 0.614 0.507 m 5 10 15 20 25 50
TVSBS 0.667 0.654 0.642 0.605 0.576 0.521 KMP 0.231 0.22 0.212 0.207 0.204 0.194
ZTMBH 0.963 0.921 0.809 0.653 0.462 0.186 BM 0.128 0.117 0.114 0.109 0.086 0.045
BRBMH 0.482 0.456 0.423 0.397 0.368 0.214 BMH 0.126 0.114 0.111 0.106 0.093 0.049
ZT 0.225 0.207 0.195 0.195 0.183 0.182
TABLE2: PARALLEL EXECUTION TIME IN SECONDS FOR A FILE SIZE 12 MB QS 0.081 0.075 0.071 0.068 0.065 0.038
WITH 5 NODES IN BEOWULF CLUSTER DAKSHINA-1
BR 0.105 0.104 0.099 0.093 0.089 0.047
m 5 10 15 20 25 50
FS 0.081 0.079 0.074 0.072 0.069 0.044
KMP 0.531 0.51 0.475 0.464 0.45 0.444
SSABS 0.124 0.113 0.108 0.103 0.086 0.082
BM 0.284 0.272 0.254 0.232 0.194 0.093
TVSBS 0.104 0.101 0.101 0.094 0.09 0.083
BMH 0.284 0.269 0.257 0.24 0.208 0.106
ZTMBH 0.152 0.145 0.129 0.103 0.072 0.029
ZT 0.469 0.449 0.417 0.388 0.364 0.357
BRBMH 0.075 0.072 0.068 0.061 0.057 0.033
QS 0.189 0.175 0.163 0.158 0.15 0.089
BR 0.244 0.24 0.228 0.217 0.208 0.112
FS 0.189 0.183 0.174 0.167 0.156 0.103
IV. FACTORS AFFECTING THE SEARCHING PROCESS
There are both hardware and software factors[13] which
SSABS 0.287 0.261 0.251 0.239 0.222 0.183
affect the searching process results. These factors can be
TVSBS 0.241 0.236 0.232 0.218 0.208 0.188 classified in to Algorithmic factors, Architectural factors,
ZTMBH 0.348 0.322 0.292 0.236 0.167 0.067 network factors and I/O factors.
BRBMH 0.174 0.166 0.154 0.144 0.134 0.078
Only one user login in the cluster acts with the interconnection
TABLE3: PARALLEL EXECUTION TIME IN SECOND FOR A FILE SIZE 12 MB
network during the searching process. Timing function used
WITH 10 NODES BEOWULF CLUSTER DAKSHINA-I was getrusage system call to find the running time of the
m 5 10 15 20 25 50 algorithms. Data are accessed from main memory before the
KMP 0.411 0.395 0.376 0.368 0.361 0.355 timing function begins. In string matching algorithm, length of
BM 0.219 0.212 0.199 0.18 0.15 0.074
the searched pattern (m), the length of the text being searched
BMH 0.221 0.21 0.199 0.187 0.167 0.082 (n) and the size of the alphabet (o) are the direct parameters
ZT 0.398 0.369 0.351 0.345 0.334 0.327
affecting the performance[14] of searching operations.
QS 0.146 0.135 0.128 0.123 0.117 0.069 In cluster computing environment, with large number of
BR 0.189 0.187 0.178 0.169 0.16 0.085 computer nodes, the result doesn’t become attractive because
of the network traffic in the interconnection network. The
FS 0.146 0.143 0.134 0.131 0.122 0.08
speedup of a program using multiple processors in parallel
SSABS 0.223 0.203 0.194 0.187 0.172 0.144 computing is limited by the time needed for the sequential
TVSBS 0.187 0.183 0.182 0.169 0.162 0.146 fraction of the program as per Amdajl’s law[15]. Amdahl's law
ZTMBH 0.27 0.258 0.228 0.183 0.129 0.053 states that if P is the percentage (proportion) of the program
BRBMH 0.136 0.129 0.119 0.11 0.104 0.061 that can be made parallel (i.e. benefit from parallelization),
301 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, 2011
and (1 − P) is the proportion (percentage) that cannot be allows distributed administration[21].
parallelized (remains serial), then the maximum speedup that
can be achieved by using N processors is S = N/(N-NP+P). Linux kernel used with configuration 1 is customised Debian
Reverse factor (Against Amdajl’s law) says [16, 17], the problem of 2.6.18 whereas for configuration 2 is of the version 2.6.26.
size scales with the number of processors. When more During HPL benchmarking[2], the speed of Dakshina-II has
powerful processors are given, the problem generally expands recorded 9600 cores floating operations in a second where as
to make use of the increased facilities. Users have control over Dakshina-I performed 7000 cores floating point operations per
such things as grid resolution, number of time steps, difference second.
operator complexity, and other parameters that are usually V. SEARCHING TIME PREDICTION
adjusted to allow the program to be run in some desired
amount of time. Hence, it may be most realistic to assume that Based on the above mentioned factors, a theoretical prediction
run time, not problem size, is constant. method for string searching process can be formed. This
Resource manager used with Dakshina-I is Torque where as formula will help to decide the optimum number of processors
to be used for searching process in cluster computing
the Dakshina-II configuration used is Sun Grid Engine. SGE
need the "tight integration whereas The TORQUE[18] Resource environment, because of computing-communication ratio limit
Manager is a distributed resource manager providing control of Amdahl’s law.
over batch jobs and distributed compute nodes. The Parallel
Virtual Machine (PVM) [19] is a software tool for parallel The string matching problem can achieve data parallelism with
networking of heterogeneous computers in the Dakshina-I. the following simple data partitioning technique: The text
Dakshina-II, uses Sun Grid Engine which perform scalable string is decomposed into a number of subtexts according to
parallel job startup with qrsh as a replacement for rsh or ssh the number of processors allocated. These subtexts are stored
for starting the remote tasks. It is possible to start binaries with in local disks of the processors. According this partitioning
qrsh. If a special command line option -inherit is specified, approach the static master-worker model is followed: First, the
qrsh will start a subtask in an existing Grid engine parallel job. master distributes the pattern string to all the workers. In the
Size of RAM of Dakshina-I is 256 MB whereas the same in second step, each worker reads its subtext from the local disk
Dakshina-II is 1.23 GB. RAM memory does not have the in main memory and search using string matching algorithms.
power of making the computer processor work faster. In our Finally, in the third step, the workers send the number of
work nodes, it takes the CPU approximately 200ns occurrences back to master.
(nanoseconds) to access RAM compared to 12,000,000ns to
access the hard drive[18]. When there is more RAM memory in Let T1 be the time required for the initial setup[22] of the
master node. Master node read the text of length n and search
the computer, the probability of “running out” of RAM
memory and having the necessity to make a change with the pattern of length m. Then master node divide the text into
hard disk swap file is smaller and, thus the computer subparts based on the availability of work nodes for
broadcasting. Thus reading of the text requires n accesses. Let
performance increases.
tavg be the average time to perform one I/O step. However
MPI used with Dakshina-I is LAM MPI whereas the same in open MPI does not read all the values from these files during
startup always, but when division of text become a
Dakshina-II is Open MPI. LAM MPI is with the standard of
MPI-1.2 and much of MPI-2, Open MPI is with the standard requirement it reads the text and then send them to all nodes in
of MPI-2 [20]. Open MPI is therefore able to combine the the job for Dhakshina II, but Dhakshina I used LAM MPI
which reads both text and pattern initially. In the case of
expertise, technologies, and resources from all across the High
Performance Computing community in order to build the best pattern, master node does not waste time for reading
MPI library available. Open MPI offers advantages for system operation. The files are read on each node during each process'
startup. This is intended behavior: It allows for per-node
and software vendors, application developers and computer
science researchers. File system used with Dakshina-I is PVFS customization, which is especially relevant in heterogeneous
where as the same in Dakshina-II is NFS. PVFS consists of a environments. Thus for configuration -I is,
T1 = tavg * (n + m) + Є1 Equation (5.1).
server process and a client library, both of which are written
entirely of user-level code. A Linux kernel module and pvfs- and for configuration II, T1 = tavg * n + Є1 Equation (5.2).
client process allow the file system to be mounted and used T2 be the time required for communication from master node
with standard utilities. Network File System(NFS) originally to slave nodes. The cluster nodes in the Configuration I and II
developed by Sun, allowing a user on a client computer to are interconnected using the Gigabit Ethernet LAN, Realtek
access files over a network in a manner similar to how local 8169 Gigabit Network Card on each compute nodes and HP
storage is accessed. NFS, builds on the Open Network ProCurve 1400-24G(39078A) Switch X 2. Performance of
Computing Remote Procedure Call (ONC RPC) system. switch is less than 4.7 μs (64-byte packets) and less than 3.0
Authentication system used with configuration1 is NIS where μs (64-byte packets)[18]. The switch has a maximum through
as the same in configuration-2 is LDAP. NIS do not provide a put of 35.7 million packets per second (pps). The data transfer
distributed administration capability nor hierarchical data takes place at the speed of around 990 Megabit per second
administration. LDAP organizes data into a hierarchy, and among the servers and around 665 Mb per second among the
302 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, 2011
client nodes. In the ideal multicomputer architecture, the cost This can substitute the maximum computation step from the
of sending a message between two tasks located on different above table and s is the average time to perform one search
processors can be represented by two parameters: the message (computation) step. The value of n in each computation node
startup time, which is the time required to initiate the will be the length of the subtext of n with size [(n-m+1)/p]+m-
communication, and the transfer time per (typically four-byte) 1 characters.
word, which is 32 bit in both configuration. The time[23] T5 be the result gathering time from slave nodes to master
required to send a message of size m includes the nodes. Master node/SGE Server/LDAP Servers are of IBM X
communication time to broadcast the pattern string to all series Quad Core Xeon processors with SAS HDD, 2 GB
processors involved in the processing of the string matching.
RAM and NIC 2G. T5 be the computation of consolidating
Let l1 is the latency time, a fixed startup overhead time needed time of master node and to compute the final result. This IBM
to prepare sending a message from one processor to the other.
Server collects the result (count value of pattern match) from
Let assume that the function MPI_Bcast is completed in log all p nodes simultaneously [24]. The function MPI_Reduce
2 p steps. In each step, one or two parallel send operations per
is completed in log2P steps. Thus the communication time to
processor are performed. The size of an m pattern string is m
gather the results[22] is
bytes. The transmission time is proportional to the size of
pattern string. Let c1 be the incremental communication time T5 = log 2P (l1+c1). Equation (5.6).
per byte. Therefore, the broadcast transfers m bytes to the
other p-1 processors. The expression for this amount of the Thus the maximum searching time can be predicted after
time is given by [25, 26] . considering all the above factors will be the sum of T1 + T2 +
T3 + T4+ T5 . To determine this result, the following
T2 = log 2 p (l1 + m c1 ). Equation (5.3). parameters have to be found. tavg be the average time to
perform one I/O step, s the average time to perform one search
T3 be the average I/O time for reading the subtext from the (computation) step, Є1, time required for division of subtext,
local disk of a slave processors of Intel Pentium 4 (3.6 GHz), l1 the latency time, a fixed startup overhead time needed to
with 1.23 GB RAM and 80 GB SATA HDD. I/O time is
prepare sending a message from one processor to the other, c1,
proportional to the length of the text. Each node processor has the incremental communication time per byte. tavg , s and Є is
to reads the subtext from the local disk into a buffer in main found by measuring the time taken which performs n steps[27].
memory with size [(n - m +1) / p] + m -1 characters, the I/O
time on each client is given by: tavg = Time taken by the processor to read k character / k
Equation (5.7).
T3 = ( [(n − m +1) / p] + m −1) tavg. Equation (5.4). The average tavg value for different character length is 4.7E-10
T4 be the maximum computation time, then seconds.
T4 = (Max. computation step) X s. Equation (5.5). s = Time taken by searching the text of length n / n.
Equation (5.8).
Searching steps required for string matching of an m pattern
string with n text string requires for different algorithms are Following algorithms results with several pattern
given the following table length and nodes will have different s value for Configuration
TABLE 6 : SEARCHING STEPS REQUIRED FOR STRING MATCHING FOR I and Configuration II.
DIFFERENT ALGORITHMS
TABLE 7: AVERAGE TIME TAKEN FOR SEARCHING CHARACTER (S)
Maximum FOR THE CONFIGURATION-I WHEN P=5
computation step
Algorithms required m 5 10 15 20 25 50
KMP 2n KMP 2.15E-08 2.06E-08 1.92E-08 1.89E-08 1.82E-08 1.80E-08
BM 3n BM 7.58E-09 7.33E-09 6.91E-09 6.25E-09 6.49E-09 2.68E-09
BMH m ((n-m)+1) BMH 4.54E-09 2.16E-09 1.37E-09 9.77E-10 6.68E-10 1.73E-10
ZT (n2+ m2 + nm) ZT 3.01E-15 2.95E-15 2.68E-15 2.48E-15 2.35E-15 2.30E-15
QS 2.51E-09 2.28E-09 2.05E-09 1.98E-09 1.87E-09 1.07E-09
QS nm+(n-m)
BR [(n+m)2+ (n-m)]/2 BR 2.95E-15 2.90E-15 2.94E-15 2.81E-15 2.70E-15 1.46E-15
FS nm FS 3.09E-09 1.50E-09 9.50E-10 6.77E-10 5.10E-10 1.76E-10
SSAB
SSABS (n+m)+[(n-m)(m-2) S 5.76E-09 2.41E-09 1.52E-09 1.02E-09 7.49E-10 3.07E-10
TVSB
TVSBS m(n – m + 1)
S 3.92E-09 1.89E-09 1.26E-09 8.74E-10 6.71E-10 2.94E-10
ZTMBH ½[n2+nm+m] ZTM
BH 4.42E-15 4.16E-15 3.80E-15 3.01E-15 2.06E-15 8.28E-16
BRBMH m(n+1)-2 BRB
MH 2.84E-09 1.34E-09 8.50E-10 5.98E-10 4.37E-10 1.25E-10
303 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, 2011
same text file of 12MB used with Configuration I and
TABLE 8: AVERAGE TIME TAKEN FOR SEARCHING CHARACTER (S) Configuration II is given below. The following result is
FOR THE CONFIGURATION-I WHEN P=10
obtained by adding T1, T2, T3, T4 and T5.
m 5 10 15 20 25 50
KMP 1.77E-08 1.67E-08 1.57E-08 1.54E-08 1.52E-08 1.50E-08 TABLE 11: THEORETICAL PARALLEL EXECUTION TIME IN
BM 6.52E-09 6.50E-09 6.10E-09 5.68E-09 4.92E-09 2.79E-09 SECONDS FOR A FILE SIZE 12 MB WITH 5 NODES IN
BEOWULF CLUSTER DAKSHINA-1
BMH 4.04E-09 1.87E-09 1.24E-09 8.64E-10 6.31E-10 1.83E-10
m 5 10 15 20 25 50
ZT 2.70E-15 2.40E-15 2.33E-15 2.31E-15 2.27E-15 2.21E-15
QS 2.18E-09 1.19E-09 7.74E-10 6.31E-10 4.52E-10 1.51E-10 KMP 0.551 0.53 0.495 0.485 0.468 0.464
BR 2.67E-15 2.58E-15 2.47E-15 2.54E-15 2.31E-15 1.47E-15 BM 0.295 0.286 0.27 0.245 0.254 0.109
FS 2.78E-09 1.35E-09 8.68E-10 6.19E-10 4.99E-10 1.75E-10 BMH 0.295 0.281 0.268 0.255 0.219 0.117
SSAB
S 4.87E-09 2.05E-09 1.25E-09 8.64E-10 6.74E-10 2.75E-10 ZT 0.49 0.48 0.438 0.405 0.385 0.377
TVSB
S 3.47E-09 1.70E-09 1.11E-09 7.65E-10 5.93E-10 2.75E-10 QS 0.198 0.325 0.423 0.534 0.623 0.699
ZTM BR 0.244 0.24 0.243 0.233 0.224 0.125
BH 3.80E-15 3.57E-15 3.29E-15 2.83E-15 1.97E-15 1.13E-15
BRB FS 0.203 0.198 0.188 0.179 0.169 0.119
MH 2.68E-09 1.33E-09 8.89E-10 5.96E-10 4.61E-10 1.42E-10
SSABS 0.299 0.283 0.276 0.252 0.235 0.198
TABLE 9: AVERAGE TIME TAKEN FOR SEARCHING CHARACTER (S) TVSBS 0.256 0.248 0.248 0.229 0.22 0.194
FOR THE CONFIGURATION-II WHEN P=5
ZTMBH 0.362 0.341 0.312 0.249 0.173 0.074
m 5 10 15 20 25 50
BRBMH 0.187 0.178 0.169 0.159 0.146 0.087
KMP 1.29E-08 1.14E-08 1.11E-08 1.05E-08 1.04E-08 1.03E-08
BM 4.47E-09 4.25E-09 3.94E-09 3.81E-09 3.15E-09 1.57E-09 TABLE 12. THEORETICAL PARALLEL EXECUTION TIME IN
BMH 2.74E-09 1.26E-09 7.98E-10 5.59E-10 4.00E-10 1.10E-10 SECONDS FOR A FILE SIZE 12 MB WITH 10 NODES IN
BEOWULF CLUSTER DAKSHINA-1
ZT 1.84E-15 1.68E-15 1.59E-15 1.62E-15 1.56E-15 1.49E-15
m 5 10 15 20 25 50
QS 1.52E-09 7.15E-10 5.21E-10 3.56E-10 2.69E-10 8.76E-11
KMP 0.455 0.43 0.406 0.397 0.393 0.388
BR 1.82E-15 1.77E-15 1.74E-15 1.62E-15 1.48E-15 8.31E-16
BM 0.255 0.254 0.239 0.223 0.194 0.113
FS 1.89E-09 8.66E-10 5.45E-10 3.97E-10 2.89E-10 1.00E-10
BMH 0.263 0.244 0.243 0.226 0.207 0.123
SSAB
S 3.35E-09 1.37E-09 8.21E-10 5.89E-10 4.26E-10 1.69E-10 ZT 0.44 0.392 0.38 0.378 0.371 0.361
TVSB
S 2.27E-09 1.12E-09 7.19E-10 5.04E-10 3.84E-10 1.67E-10 QS 0.173 0.173 0.164 0.175 0.156 0.105
ZTM BR 0.221 0.214 0.205 0.211 0.192 0.125
BH 2.50E-15 2.38E-15 2.04E-15 1.70E-15 1.19E-15 5.19E-16
BRB FS 0.183 0.178 0.172 0.164 0.165 0.118
MH 1.73E-09 7.79E-10 4.82E-10 3.62E-10 2.48E-10 6.89E-11
SSABS 0.254 0.241 0.228 0.215 0.212 0.178
TABLE 10: AVERAGE TIME TAKEN FOR SEARCHING CHARACTER TVSBS 0.227 0.223 0.218 0.201 0.195 0.181
(S) FOR THE CONFIGURATION-II WHEN P=10 ZTMBH 0.312 0.293 0.271 0.234 0.165 0.098
m 5 10 15 20 25 50 BRBMH 0.177 0.175 0.176 0.158 0.153 0.097
KMP 9.90E-09 9.43E-09 9.07E-09 8.60E-09 8.32E-09 7.69E-09
BM 3.78E-09 3.65E-09 3.36E-09 3.20E-09 2.49E-09 1.57E-09 TABLE 13: THEORETICAL PARALLEL EXECUTION TIME IN
BMH 2.22E-09 1.06E-09 6.57E-10 4.69E-10 3.02E-10 9.59E-11 SECONDS FOR A FILE SIZE 12 MB WITH 5 NODES IN
BEOWULF CLUSTER DAKSHINA-I1
ZT 1.51E-15 1.38E-15 1.23E-15 1.21E-15 1.18E-15 1.16E-15
m 5 10 15 20 25 50
QS 1.26E-09 6.15E-10 3.84E-10 2.77E-10 2.18E-10 6.77E-11
KMP 0.335 0.297 0.289 0.273 0.271 0.268
BR 1.51E-15 1.44E-15 1.42E-15 1.37E-15 1.29E-15 7.15E-15
BM 0.177 0.169 0.157 0.152 0.127 0.067
FS 1.53E-09 7.40E-10 4.72E-10 3.31E-10 2.49E-10 7.06E-11
BMH 0.181 0.167 0.159 0.149 0.134 0.077
SSAB
S 2.78E-09 1.17E-09 7.15E-10 5.02E-10 3.12E-10 1.45E-10 ZT 0.302 0.277 0.263 0.267 0.258 0.247
TVSB
S 1.89E-09 9.06E-10 5.99E-10 4.14E-10 3.21E-10 1.48E-10 QS 0.123 0.107 0.113 0.102 0.096 0.064
ZTM BR 0.153 0.149 0.147 0.137 0.126 0.074
BH 2.13E-15 2.07E-15 1.81E-15 1.51E-15 1.08E-15 5.58E-16
BRB FS 0.127 0.117 0.111 0.108 0.099 0.071
MH 1.43E-09 6.69E-10 4.14E-10 2.91E-10 2.20E-10 9.59E-11
SSABS 0.177 0.163 0.153 0.149 0.137 0.112
TVSBS 0.151 0.149 0.144 0.135 0.129 0.113
Time taken by searching a character in a large text of length n
depends on the searching algorithm and number of nodes used ZTMBH 0.208 0.198 0.171 0.144 0.103 0.049
[28]
. The above table is the average value of s for 5 nodes and BRBMH 0.117 0.106 0.099 0.099 0.086 0.051
10 nodes. Now theoretical and predicted searching time for the
304 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, 2011
VII. CONCLUSION
TABLE 14: THEORETICAL PARALLEL EXECUTION TIME IN
SECONDS FOR A FILE SIZE 12 MB WITH 10 NODES IN
The above mentioned method is a general method to find
BEOWULF CLUSTER DAKSHINA-II maximum searching time. Different algorithms works in
m 5 10 15 20 25 50 different ways. Running time of certain algorithms is
KMP 0.258 0.246 0.237 0.225 0.218 0.202 independent of the keyword set. The experimental results
show the close relationship with theoretical analyzed
BM 0.151 0.146 0.135 0.129 0.102 0.067
searching time. More than 98% of accuracy for searching
BMH 0.148 0.142 0.132 0.126 0.103 0.068 prediction can be achieved with this method on this particular
ZT 0.249 0.228 0.204 0.202 0.196 0.194 environment. We have not considered DNA sequence string
QS 0.103 0.093 0.085 0.081 0.079 0.051 /multiple string pattern searching operations. Algorithms like
BR 0.128 0.123 0.121 0.117 0.111 0.58 ZT, BMH, ZTMBH, BR, and BRBMH are designed for a
DNA sequence string or multiple pattern searching. However,
FS 0.104 0.101 0.097 0.091 0.086 0.052
these algorithms also satisfy this formula for natural language
SSABS 0.148 0.141 0.134 0.128 0.102 0.097 strings. With DNA sequence searching mechanism, the result
is satisfactory. The next level of work is to make an Estimate
TVSBS 0.127 0.122 0.121 0.112 0.109 0.101
for DNA sequences searching. The effective application of
ZTMBH 0.178 0.173 0.152 0.128 0.094 0.052 this formula is for natural language searching and network
BRBMH 0.098 0.092 0.086 0.081 0.077 0.068 security applications. The experimental results show that the
proposed prediction method works effectively by considering
worst case of these algorithms, especially in case of
l1 is the latency time, a fixed startup overhead time needed to alphabets[32] of ASCII codes, and thus the proposed method is
prepare sending a message from one processor to the other quite applicable for exact pattern matching of natural
and c1 be the incremental communication time per byte. In language. The results of algorisms helps us to implement these
order to find the values of l1 and c1, ping-pong test is technology in the areas of text search, DNA search and Web
performed[29] between two processors, which send/receive a related fields in a cost effective manner.
number of messages between two processes on the same
processor. Both processes do nothing but simply send and
receive messages. All timings are average times over 100 REFERENCES
separate rounds. The values of l1 and c1 can be found using the [1] Thomas Sterling[2006], Beowulf Cluster Computing with Linux, MIT
linear regression method to fit a straight line to the curve of Press. Page 25.
the communication[31]. Gigabit Ethernet provides a reasonably [2] A. Petitet, R. C. Whaley, J. Dongarra, A. Cleary,[ Sept 2008], HPL - A
high bandwidth given its low price, but suffers from relatively Portable Implementation of the High-Performance Linpack Benchmark
for Distributed-Memory Computers, Innovative Computing Laboratory,
high latency. The average latency value l1 is 95.72 μs(95.72 x University of Tenneesse. doi: http://www.netlib.org/benchmark/hpl/
10-6 Seconds)[30]. The average incremental communication [3] Peter S. Pacheco[1997], Parallel Programming with MPI.. Morgan
time c1 is 0.02731 μs (0.02731 x 10 -6 Seconds). Kaufmann Publishers, Inc. Page 28
[4] Al Geist, Adam Beguelin, Jack Dongarra[1996], PVM: a users' guide
VI. PERFORMANCE ANALYSIS and tutorial for networked parallel computing, MIT Press. Page 103-107
[5] William Gropp, Ewing Lusk, Anthony Skjellum,[1996] Using MPI:
Portable parallel programming with the message-passing interface.. MIT
Experimental results and estimates for the running time of Press. Page 124
string searching algorithms are presented by assuming that [6] Markus Weinhardt and Wayne Luk, PACT GmbH[2002], Task-Parallel
both the text and the patterns are random strings with uniform Programming of Reconfigurable Systems’, Springer Berlin / Heidelberg,
distribution. In practice, texts and patterns are not random, but ISBN 978-3-540-42499-4, Page 172-181
this estimate gives a rough idea about the performance of these [7] Michailidis, P.D. and Margaritis, K.G[2005]., ‘Parallel text searching
algorithms. Time required for division of subtext Є1 is applications on a heterogeneous cluster architecture’, Int.
J.Computational Science and Engineering, Vol. 1, No. 1, pp.45–59.
negligibly small when compared to the other factors. Reverse
[8] D.E.Knuth, J.H.Morris and V.R.PratT[1977], Fast pattern matching in
factor low is not relevant with this experiment because of the strings, SIAM Journal on Computing, Vol.6, no.2, pp.323-350.
collision during the network traffic. The expected running [9] Hume S.C and D.Sunday[1991], “Fast string searching”, Software
time is higher than experimented case. This is because of the Practice and Experience, 1221-1248.
worst case computation values have considered in the formula [10] Pirklbauer K,[1992] “A study of pattern-matching algorithms”,
(Table 6). By fixing the value of s based on length of Structured Programming’, 89-98.
characters m, n and number of computations for different [11] Richard Cole[1991], "Tight bounds on the complexity of the Boyer-
Moore algorithm". Proceedings of the 2nd Annual ACM-SIAM
algorithms, is the significant part of this linear approach Symposium on Discrete Algorithms, pp. 224–233.
method to estimate static search time. Each experiment http://portal.acm.org/citation.cfm?id=127830
performed 10 times and the average value is taken (deviations [12] Prasad J.C., K.S.M.Panicker,[2009] ‘Beowulf Dakshina Cluster
were very small). Architecture with Linux Debian Operating system for MPI
Programming’, Proceedings of International Conference on Information
Processing,, Bangalore, India ISBN: 978-93-80026-75-2, Page 350.
305 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 6, 2011
[13] P.D.Michailidis, K.G.Margaritis[2000], Parallel String Matching [27] J. Rodriguez de Souza, E. Argollo, A. Duarte, D. Rexachs, E.
Algorithm: A bibliographical review, Technical Report, Dept.of Applied Luque[2007], Fault Tolerant Master-Worker over a Multi-Cluster
Informatics, University of Macedonia. Architecture, Proceedings of the International Conference Parallel
[14] Panagiotis D. Michailidis and Konstantinos G. Margaritis[2001], Computing: Current & Future Issues of High-End Computing, NIC
‘Parallel Text Searching Application on a Heterogeneous Cluster of Series, Vol. 33, ISBN 3-00-017352-8, Pages. 465-472.
Workstations’, IEEE Computer Society Proceedings of the [28] SMIT, G.DE V [1982]. “A comparison of three string matching
International Conference on Parallel Processing Workshops algorithms”, Software Practice and Experience, Page57-66.
(ICPPW’01), Page. 153 [29] Kiran Nagaraja, Neeraj Krishnan, Ricardo Bianchini, Richard P. Martin,
[15] Amdahl, Gene.[1967] ‘Validity of the Single Processor Approach to Thu D. Nguyen, “Evaluating the Impact of Communication Architecture
Achieving Large-Scale Computing Capabilities’. AFIPS Conference on the per formability of Cluster-Based Services’, Department of
Proceedings,: 483–485. Computer Science, Rutgers University,NJ 08854, http://dark-
www.inst.eecs.berkeley.edu/~n252/paper/Amdahl.pdf. panic.rutgers.edu/Research/mendosus/publications/commimpactHPCA-
[16] John L. Gustafson, Reevaluating Amdahl's Law, Ames Laboratory, 03.pdf
Department of Energy, ISU, Ames, Iowa. [30] Anderson, T., D. Culler, D. Patterson.[1995], A Case for NOW
www.scl.ameslab.gov/Publications/Gus/AmdahlsLaw/Amdahls.html (Network of Workstations), IEEE Micro(vol. 15), pp. 54-64.
[17] Benner, R.E., Gustafson, J.L., Montry, G.R[1988]., Development and [31] Panagiotis D. Michailidis and Konstantinos G. Margaritis[2002], “String
analysis of scientific application programs on a 1024-processor Matching Problem On A Cluster of Personal Computers: Performance
hypercube," Sandia National Laboratories, Page 0317. Modeling”, Intern. J. Computer Math., Taylor & Francis, Vol. 79(8), pp.
[18] Prasad J.C., KSM Panicker[2010], ‘String Searching Algorithm 867–888.
Implementation-Performance Study With Two Cluster Configurations’, [32] Foster, Ian[1995], Designing and Building Parallel Programs Addison-
International Journal of Computer Science and Communication, ISSN: Wesley, ISBN 0201575949, Message Passing Interface, page 245.
0973-7391, Page 551-555.
[19] Recent Advances in Parallel Virtual Machine and Message Passing AUTHORS PROFILE
Interface[2005], 12th European PVM/MPI Users’ Group Meeting
Sorrento, Italy, September 18-21, 2005. Proceedings, Springer, Volume K S Mohanachandra Panicker was graduated in Electrical Engineering in
3666. 1971 from REC Calicut, post-graduation from College of Engineering,
Trivandrum in 1973, and Ph D from I I T , New Delhi in 1986. He has joined
[20] http://www.open-mpi.org N S S College as a lecturer in Electrical Engineering in 1974 and served at
[21] Gerald Carter[2003], LDAP System Administration, O'Reilly, page 2 different levels up to 2005 as Assistant Professor, Professor and Principal. He
[22] Panagiotis D. Michailidis, Konstantinos G. Margaritis[2002], Parallel was principal of Govt. Model Engineering College Ernakulam, during 1994 -
implementations for string matching problem on a cluster of distributed 2001 and Federal Institute of Science and Technology (FISAT) during 2005-
workstations, Neural, Parallel & Scientific Computations, Volume 10 , 2008. As a Professor, Dr Panicker worked in European University of Lefke,
Issue 3, Pages: 287 – 312. Cyprus. Currently he is the Dean, Planning and research, in FISAT,
[23] Ahmad Fadel Klaib, Hugh Osborne[Dec 2008], ‘Searching Protien Angamaly. At national and international level, he has published over 40
Sequence Databases Using BRBMH Matching Algorithm’, IJCSNS papers in conferences and journals. He is also guiding three students
International Journal of Computer Science and Network Security, Vol.8, registered for Ph D.
No 12, , Page 59.
[24] Garth A. Gibson, Rodney Van Meter[2000], ‘Network attached storage
architecture’, Communications of the ACM, Volume 43 , Issue 11, Prasad J.C graduated in Mathematics from University of Calicut in 1998,
Pages: 37 – 45. Post graduation in Computer Applications from Bharathiar University in
[25] Yang Yu Bhaskar Krishnamachari Prasanna, V.K.[2004], ‘Issues in 2001, and his second post graduation in Computer Science and Engineering in
designing middleware for wireless sensor networks : Network, IEEE, 2006. He had joined as lecturer in the Department of Computer Applications
Volume: 18, Issue: 1, page15- 21. at Union Christian College. Presently he is working as Asst.Professor, Dept of
Computer Science and Engineering, Federal Institute of Science and
[26] F Ashiya, M Matsumoto, S Nagasawa, S Tomita[2007], Loop-network
configuration and single-mode optical fiber cable technologies for Technology[FISAT], Angamaly. At national and international level, he has
subscriber network, International Journal of Digital & Analog published 12 papers in conferences and 2 papers in international journals.
Communication Systems, Volume 3. Issue 1, , John Wiley & Sons, Ltd, Prasad is currently doing his Ph.D work in Dr.M.G.R.University, Chennai.
Pages 77 – 83.
306 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
Related docs
Other docs by ijcsiseditor
Digital Images Encryption in Spatial Domain Based on Singular Value Decomposition and Cellular Automata
Views: 0 | Downloads: 0
Agent Behavior in Multiagent Systems: Issues and Challenges in Design, Development and Implementation
Views: 1 | Downloads: 0
Optimizing Cost, Delay, Packet Loss and Network Load in AODV Routing Protocols
Views: 2 | Downloads: 0
Get documents about "