"Performance Prediction of Single Static String Algorithms on Cluster Configurations"
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 6, 2011 Performance Prediction of Single Static String Algorithms on Cluster Configurations Prasad J. C. K. S. M. Panicker Research Scholar, Dept. of CSE, Professor, Dept of CSE Dr.MG.R University, Chennai Federal Institute of Science and Technology[FISAT] cum Asst.Professor, Dept of CSE, FISAT, Angamaly, India Angamaly, India email@example.com Abstract—This paper study various factors in the performance of Pattern P=P P…….P[m]. The characters of both T and P static single pattern searching algorithms and predict the belong to a finite set of elements of the set S and searching time for the operation to be performed in a cluster m<<n. Search processes identify all the occurrence of the computing environment. Master-worker model of parallel pattern P in text T. Two types of input data have considered computation and communication is designed for searching algorithms with MPI technologies. Prediction and analysis is (natural language input string and DNA sequence string) for based on KMP, BM, BMH, ZT, QS, BR, FS, SSABS, TVBS, the evaluation of algorithms. The actual task of searching is ZTBMH and BRFS algorithms. Performances have compared done parallel among the processors from 0 to p-1. and discussed the results. Theoretical and practical results have Master node decompose the text in to r subtexts and presented. This work consists of implementation of the same distributed to available workers(nodes). Each subtext algorithms in two different cluster architecture environments contains k = [(n-m+1) / r] + m-1 characters, where k is the Dhakshina-I and Dhakshina-II. This has improved the reliability successive characters of the complete text. There is an overlap of the prediction method. The result of algorisms helps us to of m-1 successive characters between successive sub texts. So predict and implement this technology in the areas of text search, there will be a redundancy of r(m-1) characters for processing. DNA search and Web related fields in a cost effective manner. The objective is to compare the result of searching with Keywords- Static String Searching, Beowulf Cluster, Parallel different algorithms. So redundancy of searching does not programming have relevance in the system. III. RELATED WORKS I. INTRODUCTION Smith compared the theoretical running time and the The importance of pattern searching and its prediction is practical running time of the Knuth-Morris-Pratt [KMP] more relevant with the latest advancements in DNA algorithm, Boyer-Moore [BM] algorithm and Brute-Force[BF] sequencing, web search engines, database operations, signal algorithm. Hume and Sunday constricted taxonomy and processing, error detection, speech and pattern recognition explored the performance of most existing versions of the areas, which require pattern searching problem to process single keyword Boyer-Moore pattern matching algorithm. The Terabytes amount of data. Most of the researchers usually various experiments selected efficient versions for use in focus on achieving high throughput with expensive resources practical applications. Pirklbauer compares several versions of software and hardware. The cluster architecture considered of the Knuth-Morris-Pratt algorithm, several versions of BM here is based on Beowulf architecture  with open source algorithm and Brute-Force algorithm. Since Pirklbauer did technologies with 1 Gbps speed of networks. During the not classified the algorithms, it is difficult to compare them benchmark test of HP Linpack system, the system worked and the testing of the Boyer-Moore variants is not quite as with a speed of 80 Gigaflops with 32 nodes. Therefore, this extensive as the Hume and Sunday taxonomy. research paper focuses on providing a high-speed but low-cost string matching implementation details with limited resources. The implementation is based on Message Passing A comparative study on the performance of today’s most Implementation (MPI) standard [3,4,5] based parallel prominent retrieval models is presented below to identify the programming technology. parameters involved in searching. Considered two Beowulf cluster configurations in which, second configuration is the software upgradation with better speed performance and better II. MASTER WORKER MODEL STATIC PATTERN resource management in hardware systems. Second SEARCHING ENVIRONMENT configuration utilizes heterogeneous operating systems of A single string pattern matching problem can be defined as customized Debian5 of Linux with BSD as Firewall systems. follows. Let T be a large text of n number of character size and Sungrid engine acts as a master node in the second P be a pattern of length m. Text T =T T ………..T[n] and configuration. Beowulf architecture is used with both This work is sponsored by Centre for High Performance Computing, FISAT 300 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 6, 2011 configurations. This work compares the results of searching with sequential TABLE4: PARALLEL EXECUTION TIME IN SECONDS FOR A FILE SIZE 12 MB WITH 5 NODES IN BEOWULF CLUSTER DAKSHINA-II and two Beowulf cluster configuration (Dakshina I & m 5 10 15 20 25 50 Dakshina II) implementations to identify the parameters involved in searching. To get a reliable and consistent KMP 0.304 0.289 0.278 0.264 0.261 0.257 performance result, the average of ten executions for different BM 0.161 0.156 0.144 0.132 0.107 0.054 pattern of constant length is given in the table as result values. BMH 0.162 0.153 0.145 0.138 0.122 0.061 ZT 0.292 0.268 0.254 0.253 0.247 0.236 TABLE 1: SEQUENTIAL EXECUTION TIME IN SECOND FOR A FILE SIZE 12 MB QS 0.106 0.097 0.093 0.09 0.084 0.052 (N=12661224) m 5 10 15 20 25 50 BR 0.138 0.135 0.131 0.124 0.118 0.061 KMP 1.47 1.414 1.401 1.381 1.372 1.394 FS 0.106 0.103 0.097 0.096 0.088 0.057 BM 0.783 0.754 0.703 0.644 0.536 0.257 SSABS 0.161 0.149 0.142 0.136 0.124 0.104 BMH 0.786 0.746 0.712 0.664 0.576 0.294 TVSBS 0.137 0.134 0.132 0.122 0.117 0.105 ZT 1.296 1.243 1.156 1.074 1.007 0.989 ZTMBH 0.196 0.189 0.165 0.132 0.093 0.037 QS 0.523 0.48 0.452 0.435 0.413 0.244 BRBMH 0.098 0.093 0.086 0.081 0.075 0.043 BR 0.673 0.664 0.631 0.602 0.574 0.307 0.523 0.512 0.481 0.463 0.433 0.283 TABLE5: PARALLEL EXECUTION TIME IN SECOND FOR A FILE SIZE 12 MB FS WITH 10 NODES BEOWULF CLUSTER DAKSHINA-II SSABS 0.796 0.723 0.694 0.662 0.614 0.507 m 5 10 15 20 25 50 TVSBS 0.667 0.654 0.642 0.605 0.576 0.521 KMP 0.231 0.22 0.212 0.207 0.204 0.194 ZTMBH 0.963 0.921 0.809 0.653 0.462 0.186 BM 0.128 0.117 0.114 0.109 0.086 0.045 BRBMH 0.482 0.456 0.423 0.397 0.368 0.214 BMH 0.126 0.114 0.111 0.106 0.093 0.049 ZT 0.225 0.207 0.195 0.195 0.183 0.182 TABLE2: PARALLEL EXECUTION TIME IN SECONDS FOR A FILE SIZE 12 MB QS 0.081 0.075 0.071 0.068 0.065 0.038 WITH 5 NODES IN BEOWULF CLUSTER DAKSHINA-1 BR 0.105 0.104 0.099 0.093 0.089 0.047 m 5 10 15 20 25 50 FS 0.081 0.079 0.074 0.072 0.069 0.044 KMP 0.531 0.51 0.475 0.464 0.45 0.444 SSABS 0.124 0.113 0.108 0.103 0.086 0.082 BM 0.284 0.272 0.254 0.232 0.194 0.093 TVSBS 0.104 0.101 0.101 0.094 0.09 0.083 BMH 0.284 0.269 0.257 0.24 0.208 0.106 ZTMBH 0.152 0.145 0.129 0.103 0.072 0.029 ZT 0.469 0.449 0.417 0.388 0.364 0.357 BRBMH 0.075 0.072 0.068 0.061 0.057 0.033 QS 0.189 0.175 0.163 0.158 0.15 0.089 BR 0.244 0.24 0.228 0.217 0.208 0.112 FS 0.189 0.183 0.174 0.167 0.156 0.103 IV. FACTORS AFFECTING THE SEARCHING PROCESS There are both hardware and software factors which SSABS 0.287 0.261 0.251 0.239 0.222 0.183 affect the searching process results. These factors can be TVSBS 0.241 0.236 0.232 0.218 0.208 0.188 classified in to Algorithmic factors, Architectural factors, ZTMBH 0.348 0.322 0.292 0.236 0.167 0.067 network factors and I/O factors. BRBMH 0.174 0.166 0.154 0.144 0.134 0.078 Only one user login in the cluster acts with the interconnection TABLE3: PARALLEL EXECUTION TIME IN SECOND FOR A FILE SIZE 12 MB network during the searching process. Timing function used WITH 10 NODES BEOWULF CLUSTER DAKSHINA-I was getrusage system call to find the running time of the m 5 10 15 20 25 50 algorithms. Data are accessed from main memory before the KMP 0.411 0.395 0.376 0.368 0.361 0.355 timing function begins. In string matching algorithm, length of BM 0.219 0.212 0.199 0.18 0.15 0.074 the searched pattern (m), the length of the text being searched BMH 0.221 0.21 0.199 0.187 0.167 0.082 (n) and the size of the alphabet (o) are the direct parameters ZT 0.398 0.369 0.351 0.345 0.334 0.327 affecting the performance of searching operations. QS 0.146 0.135 0.128 0.123 0.117 0.069 In cluster computing environment, with large number of BR 0.189 0.187 0.178 0.169 0.16 0.085 computer nodes, the result doesn’t become attractive because of the network traffic in the interconnection network. The FS 0.146 0.143 0.134 0.131 0.122 0.08 speedup of a program using multiple processors in parallel SSABS 0.223 0.203 0.194 0.187 0.172 0.144 computing is limited by the time needed for the sequential TVSBS 0.187 0.183 0.182 0.169 0.162 0.146 fraction of the program as per Amdajl’s law. Amdahl's law ZTMBH 0.27 0.258 0.228 0.183 0.129 0.053 states that if P is the percentage (proportion) of the program BRBMH 0.136 0.129 0.119 0.11 0.104 0.061 that can be made parallel (i.e. benefit from parallelization), 301 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 6, 2011 and (1 − P) is the proportion (percentage) that cannot be allows distributed administration. parallelized (remains serial), then the maximum speedup that can be achieved by using N processors is S = N/(N-NP+P). Linux kernel used with configuration 1 is customised Debian Reverse factor (Against Amdajl’s law) says [16, 17], the problem of 2.6.18 whereas for configuration 2 is of the version 2.6.26. size scales with the number of processors. When more During HPL benchmarking, the speed of Dakshina-II has powerful processors are given, the problem generally expands recorded 9600 cores floating operations in a second where as to make use of the increased facilities. Users have control over Dakshina-I performed 7000 cores floating point operations per such things as grid resolution, number of time steps, difference second. operator complexity, and other parameters that are usually V. SEARCHING TIME PREDICTION adjusted to allow the program to be run in some desired amount of time. Hence, it may be most realistic to assume that Based on the above mentioned factors, a theoretical prediction run time, not problem size, is constant. method for string searching process can be formed. This Resource manager used with Dakshina-I is Torque where as formula will help to decide the optimum number of processors to be used for searching process in cluster computing the Dakshina-II configuration used is Sun Grid Engine. SGE need the "tight integration whereas The TORQUE Resource environment, because of computing-communication ratio limit Manager is a distributed resource manager providing control of Amdahl’s law. over batch jobs and distributed compute nodes. The Parallel Virtual Machine (PVM)  is a software tool for parallel The string matching problem can achieve data parallelism with networking of heterogeneous computers in the Dakshina-I. the following simple data partitioning technique: The text Dakshina-II, uses Sun Grid Engine which perform scalable string is decomposed into a number of subtexts according to parallel job startup with qrsh as a replacement for rsh or ssh the number of processors allocated. These subtexts are stored for starting the remote tasks. It is possible to start binaries with in local disks of the processors. According this partitioning qrsh. If a special command line option -inherit is specified, approach the static master-worker model is followed: First, the qrsh will start a subtask in an existing Grid engine parallel job. master distributes the pattern string to all the workers. In the Size of RAM of Dakshina-I is 256 MB whereas the same in second step, each worker reads its subtext from the local disk Dakshina-II is 1.23 GB. RAM memory does not have the in main memory and search using string matching algorithms. power of making the computer processor work faster. In our Finally, in the third step, the workers send the number of work nodes, it takes the CPU approximately 200ns occurrences back to master. (nanoseconds) to access RAM compared to 12,000,000ns to access the hard drive. When there is more RAM memory in Let T1 be the time required for the initial setup of the master node. Master node read the text of length n and search the computer, the probability of “running out” of RAM memory and having the necessity to make a change with the pattern of length m. Then master node divide the text into hard disk swap file is smaller and, thus the computer subparts based on the availability of work nodes for broadcasting. Thus reading of the text requires n accesses. Let performance increases. tavg be the average time to perform one I/O step. However MPI used with Dakshina-I is LAM MPI whereas the same in open MPI does not read all the values from these files during startup always, but when division of text become a Dakshina-II is Open MPI. LAM MPI is with the standard of MPI-1.2 and much of MPI-2, Open MPI is with the standard requirement it reads the text and then send them to all nodes in of MPI-2 . Open MPI is therefore able to combine the the job for Dhakshina II, but Dhakshina I used LAM MPI which reads both text and pattern initially. In the case of expertise, technologies, and resources from all across the High Performance Computing community in order to build the best pattern, master node does not waste time for reading MPI library available. Open MPI offers advantages for system operation. The files are read on each node during each process' startup. This is intended behavior: It allows for per-node and software vendors, application developers and computer science researchers. File system used with Dakshina-I is PVFS customization, which is especially relevant in heterogeneous where as the same in Dakshina-II is NFS. PVFS consists of a environments. Thus for configuration -I is, T1 = tavg * (n + m) + Є1 Equation (5.1). server process and a client library, both of which are written entirely of user-level code. A Linux kernel module and pvfs- and for configuration II, T1 = tavg * n + Є1 Equation (5.2). client process allow the file system to be mounted and used T2 be the time required for communication from master node with standard utilities. Network File System(NFS) originally to slave nodes. The cluster nodes in the Configuration I and II developed by Sun, allowing a user on a client computer to are interconnected using the Gigabit Ethernet LAN, Realtek access files over a network in a manner similar to how local 8169 Gigabit Network Card on each compute nodes and HP storage is accessed. NFS, builds on the Open Network ProCurve 1400-24G(39078A) Switch X 2. Performance of Computing Remote Procedure Call (ONC RPC) system. switch is less than 4.7 μs (64-byte packets) and less than 3.0 Authentication system used with configuration1 is NIS where μs (64-byte packets). The switch has a maximum through as the same in configuration-2 is LDAP. NIS do not provide a put of 35.7 million packets per second (pps). The data transfer distributed administration capability nor hierarchical data takes place at the speed of around 990 Megabit per second administration. LDAP organizes data into a hierarchy, and among the servers and around 665 Mb per second among the 302 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 6, 2011 client nodes. In the ideal multicomputer architecture, the cost This can substitute the maximum computation step from the of sending a message between two tasks located on different above table and s is the average time to perform one search processors can be represented by two parameters: the message (computation) step. The value of n in each computation node startup time, which is the time required to initiate the will be the length of the subtext of n with size [(n-m+1)/p]+m- communication, and the transfer time per (typically four-byte) 1 characters. word, which is 32 bit in both configuration. The time T5 be the result gathering time from slave nodes to master required to send a message of size m includes the nodes. Master node/SGE Server/LDAP Servers are of IBM X communication time to broadcast the pattern string to all series Quad Core Xeon processors with SAS HDD, 2 GB processors involved in the processing of the string matching. RAM and NIC 2G. T5 be the computation of consolidating Let l1 is the latency time, a fixed startup overhead time needed time of master node and to compute the final result. This IBM to prepare sending a message from one processor to the other. Server collects the result (count value of pattern match) from Let assume that the function MPI_Bcast is completed in log all p nodes simultaneously . The function MPI_Reduce 2 p steps. In each step, one or two parallel send operations per is completed in log2P steps. Thus the communication time to processor are performed. The size of an m pattern string is m gather the results is bytes. The transmission time is proportional to the size of pattern string. Let c1 be the incremental communication time T5 = log 2P (l1+c1). Equation (5.6). per byte. Therefore, the broadcast transfers m bytes to the other p-1 processors. The expression for this amount of the Thus the maximum searching time can be predicted after time is given by [25, 26] . considering all the above factors will be the sum of T1 + T2 + T3 + T4+ T5 . To determine this result, the following T2 = log 2 p (l1 + m c1 ). Equation (5.3). parameters have to be found. tavg be the average time to perform one I/O step, s the average time to perform one search T3 be the average I/O time for reading the subtext from the (computation) step, Є1, time required for division of subtext, local disk of a slave processors of Intel Pentium 4 (3.6 GHz), l1 the latency time, a fixed startup overhead time needed to with 1.23 GB RAM and 80 GB SATA HDD. I/O time is prepare sending a message from one processor to the other, c1, proportional to the length of the text. Each node processor has the incremental communication time per byte. tavg , s and Є is to reads the subtext from the local disk into a buffer in main found by measuring the time taken which performs n steps. memory with size [(n - m +1) / p] + m -1 characters, the I/O time on each client is given by: tavg = Time taken by the processor to read k character / k Equation (5.7). T3 = ( [(n − m +1) / p] + m −1) tavg. Equation (5.4). The average tavg value for different character length is 4.7E-10 T4 be the maximum computation time, then seconds. T4 = (Max. computation step) X s. Equation (5.5). s = Time taken by searching the text of length n / n. Equation (5.8). Searching steps required for string matching of an m pattern string with n text string requires for different algorithms are Following algorithms results with several pattern given the following table length and nodes will have different s value for Configuration TABLE 6 : SEARCHING STEPS REQUIRED FOR STRING MATCHING FOR I and Configuration II. DIFFERENT ALGORITHMS TABLE 7: AVERAGE TIME TAKEN FOR SEARCHING CHARACTER (S) Maximum FOR THE CONFIGURATION-I WHEN P=5 computation step Algorithms required m 5 10 15 20 25 50 KMP 2n KMP 2.15E-08 2.06E-08 1.92E-08 1.89E-08 1.82E-08 1.80E-08 BM 3n BM 7.58E-09 7.33E-09 6.91E-09 6.25E-09 6.49E-09 2.68E-09 BMH m ((n-m)+1) BMH 4.54E-09 2.16E-09 1.37E-09 9.77E-10 6.68E-10 1.73E-10 ZT (n2+ m2 + nm) ZT 3.01E-15 2.95E-15 2.68E-15 2.48E-15 2.35E-15 2.30E-15 QS 2.51E-09 2.28E-09 2.05E-09 1.98E-09 1.87E-09 1.07E-09 QS nm+(n-m) BR [(n+m)2+ (n-m)]/2 BR 2.95E-15 2.90E-15 2.94E-15 2.81E-15 2.70E-15 1.46E-15 FS nm FS 3.09E-09 1.50E-09 9.50E-10 6.77E-10 5.10E-10 1.76E-10 SSAB SSABS (n+m)+[(n-m)(m-2) S 5.76E-09 2.41E-09 1.52E-09 1.02E-09 7.49E-10 3.07E-10 TVSB TVSBS m(n – m + 1) S 3.92E-09 1.89E-09 1.26E-09 8.74E-10 6.71E-10 2.94E-10 ZTMBH ½[n2+nm+m] ZTM BH 4.42E-15 4.16E-15 3.80E-15 3.01E-15 2.06E-15 8.28E-16 BRBMH m(n+1)-2 BRB MH 2.84E-09 1.34E-09 8.50E-10 5.98E-10 4.37E-10 1.25E-10 303 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 6, 2011 same text file of 12MB used with Configuration I and TABLE 8: AVERAGE TIME TAKEN FOR SEARCHING CHARACTER (S) Configuration II is given below. The following result is FOR THE CONFIGURATION-I WHEN P=10 obtained by adding T1, T2, T3, T4 and T5. m 5 10 15 20 25 50 KMP 1.77E-08 1.67E-08 1.57E-08 1.54E-08 1.52E-08 1.50E-08 TABLE 11: THEORETICAL PARALLEL EXECUTION TIME IN BM 6.52E-09 6.50E-09 6.10E-09 5.68E-09 4.92E-09 2.79E-09 SECONDS FOR A FILE SIZE 12 MB WITH 5 NODES IN BEOWULF CLUSTER DAKSHINA-1 BMH 4.04E-09 1.87E-09 1.24E-09 8.64E-10 6.31E-10 1.83E-10 m 5 10 15 20 25 50 ZT 2.70E-15 2.40E-15 2.33E-15 2.31E-15 2.27E-15 2.21E-15 QS 2.18E-09 1.19E-09 7.74E-10 6.31E-10 4.52E-10 1.51E-10 KMP 0.551 0.53 0.495 0.485 0.468 0.464 BR 2.67E-15 2.58E-15 2.47E-15 2.54E-15 2.31E-15 1.47E-15 BM 0.295 0.286 0.27 0.245 0.254 0.109 FS 2.78E-09 1.35E-09 8.68E-10 6.19E-10 4.99E-10 1.75E-10 BMH 0.295 0.281 0.268 0.255 0.219 0.117 SSAB S 4.87E-09 2.05E-09 1.25E-09 8.64E-10 6.74E-10 2.75E-10 ZT 0.49 0.48 0.438 0.405 0.385 0.377 TVSB S 3.47E-09 1.70E-09 1.11E-09 7.65E-10 5.93E-10 2.75E-10 QS 0.198 0.325 0.423 0.534 0.623 0.699 ZTM BR 0.244 0.24 0.243 0.233 0.224 0.125 BH 3.80E-15 3.57E-15 3.29E-15 2.83E-15 1.97E-15 1.13E-15 BRB FS 0.203 0.198 0.188 0.179 0.169 0.119 MH 2.68E-09 1.33E-09 8.89E-10 5.96E-10 4.61E-10 1.42E-10 SSABS 0.299 0.283 0.276 0.252 0.235 0.198 TABLE 9: AVERAGE TIME TAKEN FOR SEARCHING CHARACTER (S) TVSBS 0.256 0.248 0.248 0.229 0.22 0.194 FOR THE CONFIGURATION-II WHEN P=5 ZTMBH 0.362 0.341 0.312 0.249 0.173 0.074 m 5 10 15 20 25 50 BRBMH 0.187 0.178 0.169 0.159 0.146 0.087 KMP 1.29E-08 1.14E-08 1.11E-08 1.05E-08 1.04E-08 1.03E-08 BM 4.47E-09 4.25E-09 3.94E-09 3.81E-09 3.15E-09 1.57E-09 TABLE 12. THEORETICAL PARALLEL EXECUTION TIME IN BMH 2.74E-09 1.26E-09 7.98E-10 5.59E-10 4.00E-10 1.10E-10 SECONDS FOR A FILE SIZE 12 MB WITH 10 NODES IN BEOWULF CLUSTER DAKSHINA-1 ZT 1.84E-15 1.68E-15 1.59E-15 1.62E-15 1.56E-15 1.49E-15 m 5 10 15 20 25 50 QS 1.52E-09 7.15E-10 5.21E-10 3.56E-10 2.69E-10 8.76E-11 KMP 0.455 0.43 0.406 0.397 0.393 0.388 BR 1.82E-15 1.77E-15 1.74E-15 1.62E-15 1.48E-15 8.31E-16 BM 0.255 0.254 0.239 0.223 0.194 0.113 FS 1.89E-09 8.66E-10 5.45E-10 3.97E-10 2.89E-10 1.00E-10 BMH 0.263 0.244 0.243 0.226 0.207 0.123 SSAB S 3.35E-09 1.37E-09 8.21E-10 5.89E-10 4.26E-10 1.69E-10 ZT 0.44 0.392 0.38 0.378 0.371 0.361 TVSB S 2.27E-09 1.12E-09 7.19E-10 5.04E-10 3.84E-10 1.67E-10 QS 0.173 0.173 0.164 0.175 0.156 0.105 ZTM BR 0.221 0.214 0.205 0.211 0.192 0.125 BH 2.50E-15 2.38E-15 2.04E-15 1.70E-15 1.19E-15 5.19E-16 BRB FS 0.183 0.178 0.172 0.164 0.165 0.118 MH 1.73E-09 7.79E-10 4.82E-10 3.62E-10 2.48E-10 6.89E-11 SSABS 0.254 0.241 0.228 0.215 0.212 0.178 TABLE 10: AVERAGE TIME TAKEN FOR SEARCHING CHARACTER TVSBS 0.227 0.223 0.218 0.201 0.195 0.181 (S) FOR THE CONFIGURATION-II WHEN P=10 ZTMBH 0.312 0.293 0.271 0.234 0.165 0.098 m 5 10 15 20 25 50 BRBMH 0.177 0.175 0.176 0.158 0.153 0.097 KMP 9.90E-09 9.43E-09 9.07E-09 8.60E-09 8.32E-09 7.69E-09 BM 3.78E-09 3.65E-09 3.36E-09 3.20E-09 2.49E-09 1.57E-09 TABLE 13: THEORETICAL PARALLEL EXECUTION TIME IN BMH 2.22E-09 1.06E-09 6.57E-10 4.69E-10 3.02E-10 9.59E-11 SECONDS FOR A FILE SIZE 12 MB WITH 5 NODES IN BEOWULF CLUSTER DAKSHINA-I1 ZT 1.51E-15 1.38E-15 1.23E-15 1.21E-15 1.18E-15 1.16E-15 m 5 10 15 20 25 50 QS 1.26E-09 6.15E-10 3.84E-10 2.77E-10 2.18E-10 6.77E-11 KMP 0.335 0.297 0.289 0.273 0.271 0.268 BR 1.51E-15 1.44E-15 1.42E-15 1.37E-15 1.29E-15 7.15E-15 BM 0.177 0.169 0.157 0.152 0.127 0.067 FS 1.53E-09 7.40E-10 4.72E-10 3.31E-10 2.49E-10 7.06E-11 BMH 0.181 0.167 0.159 0.149 0.134 0.077 SSAB S 2.78E-09 1.17E-09 7.15E-10 5.02E-10 3.12E-10 1.45E-10 ZT 0.302 0.277 0.263 0.267 0.258 0.247 TVSB S 1.89E-09 9.06E-10 5.99E-10 4.14E-10 3.21E-10 1.48E-10 QS 0.123 0.107 0.113 0.102 0.096 0.064 ZTM BR 0.153 0.149 0.147 0.137 0.126 0.074 BH 2.13E-15 2.07E-15 1.81E-15 1.51E-15 1.08E-15 5.58E-16 BRB FS 0.127 0.117 0.111 0.108 0.099 0.071 MH 1.43E-09 6.69E-10 4.14E-10 2.91E-10 2.20E-10 9.59E-11 SSABS 0.177 0.163 0.153 0.149 0.137 0.112 TVSBS 0.151 0.149 0.144 0.135 0.129 0.113 Time taken by searching a character in a large text of length n depends on the searching algorithm and number of nodes used ZTMBH 0.208 0.198 0.171 0.144 0.103 0.049  . The above table is the average value of s for 5 nodes and BRBMH 0.117 0.106 0.099 0.099 0.086 0.051 10 nodes. Now theoretical and predicted searching time for the 304 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 6, 2011 VII. CONCLUSION TABLE 14: THEORETICAL PARALLEL EXECUTION TIME IN SECONDS FOR A FILE SIZE 12 MB WITH 10 NODES IN The above mentioned method is a general method to find BEOWULF CLUSTER DAKSHINA-II maximum searching time. Different algorithms works in m 5 10 15 20 25 50 different ways. Running time of certain algorithms is KMP 0.258 0.246 0.237 0.225 0.218 0.202 independent of the keyword set. The experimental results show the close relationship with theoretical analyzed BM 0.151 0.146 0.135 0.129 0.102 0.067 searching time. More than 98% of accuracy for searching BMH 0.148 0.142 0.132 0.126 0.103 0.068 prediction can be achieved with this method on this particular ZT 0.249 0.228 0.204 0.202 0.196 0.194 environment. We have not considered DNA sequence string QS 0.103 0.093 0.085 0.081 0.079 0.051 /multiple string pattern searching operations. Algorithms like BR 0.128 0.123 0.121 0.117 0.111 0.58 ZT, BMH, ZTMBH, BR, and BRBMH are designed for a DNA sequence string or multiple pattern searching. However, FS 0.104 0.101 0.097 0.091 0.086 0.052 these algorithms also satisfy this formula for natural language SSABS 0.148 0.141 0.134 0.128 0.102 0.097 strings. With DNA sequence searching mechanism, the result is satisfactory. The next level of work is to make an Estimate TVSBS 0.127 0.122 0.121 0.112 0.109 0.101 for DNA sequences searching. The effective application of ZTMBH 0.178 0.173 0.152 0.128 0.094 0.052 this formula is for natural language searching and network BRBMH 0.098 0.092 0.086 0.081 0.077 0.068 security applications. The experimental results show that the proposed prediction method works effectively by considering worst case of these algorithms, especially in case of l1 is the latency time, a fixed startup overhead time needed to alphabets of ASCII codes, and thus the proposed method is prepare sending a message from one processor to the other quite applicable for exact pattern matching of natural and c1 be the incremental communication time per byte. In language. The results of algorisms helps us to implement these order to find the values of l1 and c1, ping-pong test is technology in the areas of text search, DNA search and Web performed between two processors, which send/receive a related fields in a cost effective manner. number of messages between two processes on the same processor. Both processes do nothing but simply send and receive messages. All timings are average times over 100 REFERENCES separate rounds. The values of l1 and c1 can be found using the  Thomas Sterling, Beowulf Cluster Computing with Linux, MIT linear regression method to fit a straight line to the curve of Press. Page 25. the communication. Gigabit Ethernet provides a reasonably  A. Petitet, R. C. Whaley, J. Dongarra, A. Cleary,[ Sept 2008], HPL - A high bandwidth given its low price, but suffers from relatively Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers, Innovative Computing Laboratory, high latency. The average latency value l1 is 95.72 μs(95.72 x University of Tenneesse. doi: http://www.netlib.org/benchmark/hpl/ 10-6 Seconds). The average incremental communication  Peter S. Pacheco, Parallel Programming with MPI.. Morgan time c1 is 0.02731 μs (0.02731 x 10 -6 Seconds). Kaufmann Publishers, Inc. Page 28  Al Geist, Adam Beguelin, Jack Dongarra, PVM: a users' guide VI. PERFORMANCE ANALYSIS and tutorial for networked parallel computing, MIT Press. Page 103-107  William Gropp, Ewing Lusk, Anthony Skjellum, Using MPI: Portable parallel programming with the message-passing interface.. MIT Experimental results and estimates for the running time of Press. Page 124 string searching algorithms are presented by assuming that  Markus Weinhardt and Wayne Luk, PACT GmbH, Task-Parallel both the text and the patterns are random strings with uniform Programming of Reconfigurable Systems’, Springer Berlin / Heidelberg, distribution. In practice, texts and patterns are not random, but ISBN 978-3-540-42499-4, Page 172-181 this estimate gives a rough idea about the performance of these  Michailidis, P.D. and Margaritis, K.G., ‘Parallel text searching algorithms. Time required for division of subtext Є1 is applications on a heterogeneous cluster architecture’, Int. J.Computational Science and Engineering, Vol. 1, No. 1, pp.45–59. negligibly small when compared to the other factors. Reverse  D.E.Knuth, J.H.Morris and V.R.PratT, Fast pattern matching in factor low is not relevant with this experiment because of the strings, SIAM Journal on Computing, Vol.6, no.2, pp.323-350. collision during the network traffic. The expected running  Hume S.C and D.Sunday, “Fast string searching”, Software time is higher than experimented case. This is because of the Practice and Experience, 1221-1248. worst case computation values have considered in the formula  Pirklbauer K, “A study of pattern-matching algorithms”, (Table 6). By fixing the value of s based on length of Structured Programming’, 89-98. characters m, n and number of computations for different  Richard Cole, "Tight bounds on the complexity of the Boyer- Moore algorithm". Proceedings of the 2nd Annual ACM-SIAM algorithms, is the significant part of this linear approach Symposium on Discrete Algorithms, pp. 224–233. method to estimate static search time. Each experiment http://portal.acm.org/citation.cfm?id=127830 performed 10 times and the average value is taken (deviations  Prasad J.C., K.S.M.Panicker, ‘Beowulf Dakshina Cluster were very small). Architecture with Linux Debian Operating system for MPI Programming’, Proceedings of International Conference on Information Processing,, Bangalore, India ISBN: 978-93-80026-75-2, Page 350. 305 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 6, 2011  P.D.Michailidis, K.G.Margaritis, Parallel String Matching  J. Rodriguez de Souza, E. Argollo, A. Duarte, D. Rexachs, E. Algorithm: A bibliographical review, Technical Report, Dept.of Applied Luque, Fault Tolerant Master-Worker over a Multi-Cluster Informatics, University of Macedonia. Architecture, Proceedings of the International Conference Parallel  Panagiotis D. Michailidis and Konstantinos G. Margaritis, Computing: Current & Future Issues of High-End Computing, NIC ‘Parallel Text Searching Application on a Heterogeneous Cluster of Series, Vol. 33, ISBN 3-00-017352-8, Pages. 465-472. Workstations’, IEEE Computer Society Proceedings of the  SMIT, G.DE V . “A comparison of three string matching International Conference on Parallel Processing Workshops algorithms”, Software Practice and Experience, Page57-66. (ICPPW’01), Page. 153  Kiran Nagaraja, Neeraj Krishnan, Ricardo Bianchini, Richard P. Martin,  Amdahl, Gene. ‘Validity of the Single Processor Approach to Thu D. Nguyen, “Evaluating the Impact of Communication Architecture Achieving Large-Scale Computing Capabilities’. AFIPS Conference on the per formability of Cluster-Based Services’, Department of Proceedings,: 483–485. Computer Science, Rutgers University,NJ 08854, http://dark- www.inst.eecs.berkeley.edu/~n252/paper/Amdahl.pdf. panic.rutgers.edu/Research/mendosus/publications/commimpactHPCA-  John L. Gustafson, Reevaluating Amdahl's Law, Ames Laboratory, 03.pdf Department of Energy, ISU, Ames, Iowa.  Anderson, T., D. Culler, D. Patterson., A Case for NOW www.scl.ameslab.gov/Publications/Gus/AmdahlsLaw/Amdahls.html (Network of Workstations), IEEE Micro(vol. 15), pp. 54-64.  Benner, R.E., Gustafson, J.L., Montry, G.R., Development and  Panagiotis D. Michailidis and Konstantinos G. Margaritis, “String analysis of scientific application programs on a 1024-processor Matching Problem On A Cluster of Personal Computers: Performance hypercube," Sandia National Laboratories, Page 0317. Modeling”, Intern. J. Computer Math., Taylor & Francis, Vol. 79(8), pp.  Prasad J.C., KSM Panicker, ‘String Searching Algorithm 867–888. Implementation-Performance Study With Two Cluster Configurations’,  Foster, Ian, Designing and Building Parallel Programs Addison- International Journal of Computer Science and Communication, ISSN: Wesley, ISBN 0201575949, Message Passing Interface, page 245. 0973-7391, Page 551-555.  Recent Advances in Parallel Virtual Machine and Message Passing AUTHORS PROFILE Interface, 12th European PVM/MPI Users’ Group Meeting Sorrento, Italy, September 18-21, 2005. Proceedings, Springer, Volume K S Mohanachandra Panicker was graduated in Electrical Engineering in 3666. 1971 from REC Calicut, post-graduation from College of Engineering, Trivandrum in 1973, and Ph D from I I T , New Delhi in 1986. He has joined  http://www.open-mpi.org N S S College as a lecturer in Electrical Engineering in 1974 and served at  Gerald Carter, LDAP System Administration, O'Reilly, page 2 different levels up to 2005 as Assistant Professor, Professor and Principal. He  Panagiotis D. Michailidis, Konstantinos G. Margaritis, Parallel was principal of Govt. Model Engineering College Ernakulam, during 1994 - implementations for string matching problem on a cluster of distributed 2001 and Federal Institute of Science and Technology (FISAT) during 2005- workstations, Neural, Parallel & Scientific Computations, Volume 10 , 2008. As a Professor, Dr Panicker worked in European University of Lefke, Issue 3, Pages: 287 – 312. Cyprus. Currently he is the Dean, Planning and research, in FISAT,  Ahmad Fadel Klaib, Hugh Osborne[Dec 2008], ‘Searching Protien Angamaly. At national and international level, he has published over 40 Sequence Databases Using BRBMH Matching Algorithm’, IJCSNS papers in conferences and journals. He is also guiding three students International Journal of Computer Science and Network Security, Vol.8, registered for Ph D. No 12, , Page 59.  Garth A. Gibson, Rodney Van Meter, ‘Network attached storage architecture’, Communications of the ACM, Volume 43 , Issue 11, Prasad J.C graduated in Mathematics from University of Calicut in 1998, Pages: 37 – 45. Post graduation in Computer Applications from Bharathiar University in  Yang Yu Bhaskar Krishnamachari Prasanna, V.K., ‘Issues in 2001, and his second post graduation in Computer Science and Engineering in designing middleware for wireless sensor networks : Network, IEEE, 2006. He had joined as lecturer in the Department of Computer Applications Volume: 18, Issue: 1, page15- 21. at Union Christian College. Presently he is working as Asst.Professor, Dept of Computer Science and Engineering, Federal Institute of Science and  F Ashiya, M Matsumoto, S Nagasawa, S Tomita, Loop-network configuration and single-mode optical fiber cable technologies for Technology[FISAT], Angamaly. At national and international level, he has subscriber network, International Journal of Digital & Analog published 12 papers in conferences and 2 papers in international journals. Communication Systems, Volume 3. Issue 1, , John Wiley & Sons, Ltd, Prasad is currently doing his Ph.D work in Dr.M.G.R.University, Chennai. Pages 77 – 83. 306 http://sites.google.com/site/ijcsis/ ISSN 1947-5500