Performance Prediction of Single Static String Algorithms on Cluster Configurations by ijcsiseditor


									                                                                    (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                              Vol. 9, No. 6, 2011

          Performance Prediction of Single Static String
              Algorithms on Cluster Configurations
                          Prasad J. C.                                                             K. S. M. Panicker
             Research Scholar, Dept. of CSE,                                                      Professor, Dept of CSE
              Dr.MG.R University, Chennai                                          Federal Institute of Science and Technology[FISAT]
          cum Asst.Professor, Dept of CSE, FISAT,                                                     Angamaly, India
                     Angamaly, India

Abstract—This paper study various factors in the performance of                  Pattern P=P[1] P[2]…….P[m]. The characters of both T and P
static single pattern searching algorithms and predict the                      belong to a finite set of elements of the set S and
searching time for the operation to be performed in a cluster                   m<<n. Search processes identify all the occurrence of the
computing environment. Master-worker model of parallel                          pattern P in text T. Two types of input data have considered
computation and communication is designed for searching
algorithms with MPI technologies. Prediction and analysis is
                                                                                (natural language input string and DNA sequence string) for
based on KMP, BM, BMH, ZT, QS, BR, FS, SSABS, TVBS,                             the evaluation of algorithms. The actual task of searching is
ZTBMH and BRFS algorithms. Performances have compared                           done parallel among the processors from 0 to p-1.
and discussed the results. Theoretical and practical results have                 Master node decompose the text in to r subtexts and
presented. This work consists of implementation of the same                     distributed to available workers(nodes)[6]. Each subtext
algorithms in two different cluster architecture environments                   contains k = [(n-m+1) / r] + m-1 characters, where k is the
Dhakshina-I and Dhakshina-II. This has improved the reliability                 successive characters of the complete text. There is an overlap
of the prediction method. The result of algorisms helps us to                   of m-1 successive characters between successive sub texts. So
predict and implement this technology in the areas of text search,              there will be a redundancy of r(m-1) characters for processing.
DNA search and Web related fields in a cost effective manner.
                                                                                The objective is to compare the result of searching with
Keywords- Static String Searching, Beowulf Cluster,            Parallel         different algorithms. So redundancy of searching does not
programming                                                                     have relevance in the system[7].

                                                                                                     III. RELATED WORKS
                         I.    INTRODUCTION                                     Smith[8] compared the theoretical running time and the
    The importance of pattern searching and its prediction is                   practical running time of the Knuth-Morris-Pratt [KMP]
more relevant with the latest advancements in DNA                               algorithm, Boyer-Moore [BM] algorithm and Brute-Force[BF]
sequencing, web search engines, database operations, signal                     algorithm. Hume and Sunday[9] constricted taxonomy and
processing, error detection, speech and pattern recognition                     explored the performance of most existing versions of the
areas, which require pattern searching problem to process                       single keyword Boyer-Moore pattern matching algorithm. The
Terabytes amount of data. Most of the researchers usually                       various experiments selected efficient versions for use in
focus on achieving high throughput with expensive resources                     practical applications. Pirklbauer[10] compares several versions
of software and hardware. The cluster architecture considered                   of the Knuth-Morris-Pratt algorithm, several versions of BM
here is based on Beowulf architecture [1] with open source                      algorithm[11] and Brute-Force algorithm. Since Pirklbauer did
technologies with 1 Gbps speed of networks. During the                          not classified the algorithms, it is difficult to compare them
benchmark test[2] of HP Linpack system, the system worked
                                                                                and the testing of the Boyer-Moore variants is not quite as
with a speed of 80 Gigaflops with 32 nodes. Therefore, this
                                                                                extensive as the Hume and Sunday taxonomy.
research paper focuses on providing a high-speed but low-cost
string matching implementation details with limited resources.
The implementation is based on Message Passing                                  A comparative study on the performance of today’s most
Implementation (MPI) standard [3,4,5] based parallel                            prominent retrieval models is presented below to identify the
programming technology.                                                         parameters involved in searching. Considered two Beowulf
                                                                                cluster configurations[12] in which, second configuration is the
                                                                                software upgradation with better speed performance and better
                                                                                resource management in hardware systems. Second
                                                                                configuration utilizes heterogeneous operating systems of
A single string pattern matching problem can be defined as                      customized Debian5 of Linux with BSD as Firewall systems.
follows. Let T be a large text of n number of character size and                Sungrid engine acts as a master node in the second
P be a pattern of length m. Text T =T[1] T[2] ………..T[n] and                     configuration. Beowulf architecture is used with both
This work is sponsored by Centre for High Performance Computing, FISAT

                                                                                                           ISSN 1947-5500
                                                                 (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                           Vol. 9, No. 6, 2011
This work compares the results of searching with sequential                  TABLE4: PARALLEL EXECUTION TIME IN SECONDS FOR A FILE SIZE 12 MB
                                                                             WITH 5 NODES IN BEOWULF CLUSTER DAKSHINA-II
and two Beowulf cluster configuration (Dakshina I &
                                                                                  m             5     10      15       20     25        50
Dakshina II) implementations to identify the parameters
involved in searching. To get a reliable and consistent                         KMP       0.304   0.289      0.278     0.264     0.261     0.257
performance result, the average of ten executions for different                 BM        0.161   0.156      0.144     0.132     0.107     0.054
pattern of constant length is given in the table as result values.              BMH       0.162   0.153      0.145     0.138     0.122     0.061
                                                                                 ZT       0.292   0.268      0.254     0.253     0.247     0.236
                                                                                 QS       0.106   0.097      0.093      0.09     0.084     0.052
    m          5      10       15      20      25         50                     BR       0.138   0.135      0.131     0.124     0.118     0.061
  KMP        1.47   1.414    1.401   1.381    1.372    1.394                     FS       0.106   0.103      0.097     0.096     0.088     0.057
   BM       0.783   0.754    0.703   0.644    0.536    0.257                   SSABS      0.161   0.149      0.142     0.136     0.124     0.104
  BMH       0.786   0.746    0.712   0.664    0.576    0.294                   TVSBS      0.137   0.134      0.132     0.122     0.117     0.105
   ZT       1.296   1.243    1.156   1.074    1.007    0.989                  ZTMBH       0.196   0.189      0.165     0.132     0.093     0.037
   QS       0.523     0.48   0.452   0.435    0.413    0.244                  BRBMH       0.098   0.093      0.086     0.081     0.075     0.043
   BR       0.673   0.664    0.631   0.602    0.574    0.307
            0.523   0.512    0.481   0.463    0.433    0.283                 TABLE5: PARALLEL EXECUTION TIME IN SECOND FOR A FILE SIZE 12 MB
                                                                             WITH 10 NODES BEOWULF CLUSTER DAKSHINA-II
 SSABS      0.796   0.723    0.694   0.662    0.614    0.507                      m           5      10      15        20    25        50
 TVSBS      0.667   0.654    0.642   0.605    0.576    0.521                    KMP       0.231    0.22      0.212     0.207     0.204     0.194
 ZTMBH      0.963   0.921    0.809   0.653    0.462    0.186                    BM        0.128   0.117      0.114     0.109     0.086     0.045
 BRBMH      0.482   0.456    0.423   0.397    0.368    0.214                    BMH       0.126   0.114      0.111     0.106     0.093     0.049
                                                                                 ZT       0.225   0.207      0.195     0.195     0.183     0.182
TABLE2: PARALLEL EXECUTION TIME IN SECONDS FOR A FILE SIZE 12 MB                 QS       0.081   0.075      0.071     0.068     0.065     0.038
                                                                                 BR       0.105   0.104      0.099     0.093     0.089     0.047
    m            5     10      15      20     25         50
                                                                                 FS       0.081   0.079      0.074     0.072     0.069     0.044
  KMP       0.531     0.51   0.475   0.464     0.45    0.444
                                                                               SSABS      0.124   0.113      0.108     0.103     0.086     0.082
   BM       0.284   0.272    0.254   0.232    0.194    0.093
                                                                               TVSBS      0.104   0.101      0.101     0.094      0.09     0.083
  BMH       0.284   0.269    0.257    0.24    0.208    0.106
                                                                              ZTMBH       0.152   0.145      0.129     0.103     0.072     0.029
   ZT       0.469   0.449    0.417   0.388    0.364    0.357
                                                                              BRBMH       0.075   0.072      0.068     0.061     0.057     0.033
   QS       0.189   0.175    0.163   0.158     0.15    0.089
   BR       0.244     0.24   0.228   0.217    0.208    0.112
   FS       0.189   0.183    0.174   0.167    0.156    0.103
                                                                              IV. FACTORS AFFECTING THE SEARCHING PROCESS
                                                                                There are both hardware and software factors[13] which
 SSABS      0.287   0.261    0.251   0.239    0.222    0.183
                                                                             affect the searching process results. These factors can be
 TVSBS      0.241   0.236    0.232   0.218    0.208    0.188                 classified in to Algorithmic factors, Architectural factors,
 ZTMBH      0.348   0.322    0.292   0.236    0.167    0.067                 network factors and I/O factors.
 BRBMH      0.174   0.166    0.154   0.144    0.134    0.078
                                                                             Only one user login in the cluster acts with the interconnection
                                                                             network during the searching process. Timing function used
WITH 10 NODES BEOWULF CLUSTER DAKSHINA-I                                     was getrusage system call to find the running time of the
    m           5       10      15       20       25      50                 algorithms. Data are accessed from main memory before the
  KMP       0.411    0.395   0.376    0.368    0.361   0.355                 timing function begins. In string matching algorithm, length of
   BM       0.219    0.212   0.199     0.18     0.15   0.074
                                                                             the searched pattern (m), the length of the text being searched
  BMH       0.221     0.21   0.199    0.187    0.167   0.082                 (n) and the size of the alphabet (o) are the direct parameters
   ZT       0.398    0.369   0.351    0.345    0.334   0.327
                                                                             affecting the performance[14] of searching operations.
   QS       0.146    0.135   0.128    0.123    0.117   0.069                 In cluster computing environment, with large number of
   BR       0.189    0.187   0.178    0.169     0.16   0.085                 computer nodes, the result doesn’t become attractive because
                                                                             of the network traffic in the interconnection network. The
   FS       0.146    0.143   0.134    0.131    0.122    0.08
                                                                             speedup of a program using multiple processors in parallel
 SSABS      0.223    0.203   0.194    0.187    0.172   0.144                 computing is limited by the time needed for the sequential
 TVSBS      0.187    0.183   0.182    0.169    0.162   0.146                 fraction of the program as per Amdajl’s law[15]. Amdahl's law
 ZTMBH       0.27    0.258   0.228    0.183    0.129   0.053                 states that if P is the percentage (proportion) of the program
 BRBMH      0.136    0.129   0.119     0.11    0.104   0.061                 that can be made parallel (i.e. benefit from parallelization),

                                                                                                          ISSN 1947-5500
                                                                (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                          Vol. 9, No. 6, 2011
and (1 − P) is the proportion (percentage) that cannot be                    allows distributed administration[21].
parallelized (remains serial), then the maximum speedup that
can be achieved by using N processors is S = N/(N-NP+P).                     Linux kernel used with configuration 1 is customised Debian
Reverse factor (Against Amdajl’s law) says [16, 17], the problem             of 2.6.18 whereas for configuration 2 is of the version 2.6.26.
size scales with the number of processors. When more                         During HPL benchmarking[2], the speed of Dakshina-II has
powerful processors are given, the problem generally expands                 recorded 9600 cores floating operations in a second where as
to make use of the increased facilities. Users have control over             Dakshina-I performed 7000 cores floating point operations per
such things as grid resolution, number of time steps, difference             second.
operator complexity, and other parameters that are usually                              V. SEARCHING TIME PREDICTION
adjusted to allow the program to be run in some desired
amount of time. Hence, it may be most realistic to assume that               Based on the above mentioned factors, a theoretical prediction
run time, not problem size, is constant.                                     method for string searching process can be formed. This
 Resource manager used with Dakshina-I is Torque where as                    formula will help to decide the optimum number of processors
                                                                             to be used for searching process in cluster computing
the Dakshina-II configuration used is Sun Grid Engine. SGE
need the "tight integration whereas The TORQUE[18] Resource                  environment, because of computing-communication ratio limit
Manager is a distributed resource manager providing control                  of Amdahl’s law.
over batch jobs and distributed compute nodes. The Parallel
Virtual Machine (PVM) [19] is a software tool for parallel                   The string matching problem can achieve data parallelism with
networking of heterogeneous computers in the Dakshina-I.                     the following simple data partitioning technique: The text
Dakshina-II, uses Sun Grid Engine which perform scalable                     string is decomposed into a number of subtexts according to
parallel job startup with qrsh as a replacement for rsh or ssh               the number of processors allocated. These subtexts are stored
for starting the remote tasks. It is possible to start binaries with         in local disks of the processors. According this partitioning
qrsh. If a special command line option -inherit is specified,                approach the static master-worker model is followed: First, the
qrsh will start a subtask in an existing Grid engine parallel job.           master distributes the pattern string to all the workers. In the
Size of RAM of Dakshina-I is 256 MB whereas the same in                      second step, each worker reads its subtext from the local disk
Dakshina-II is 1.23 GB. RAM memory does not have the                         in main memory and search using string matching algorithms.
power of making the computer processor work faster. In our                   Finally, in the third step, the workers send the number of
work nodes, it takes the CPU approximately 200ns                             occurrences back to master.
(nanoseconds) to access RAM compared to 12,000,000ns to
access the hard drive[18]. When there is more RAM memory in                   Let T1 be the time required for the initial setup[22] of the
                                                                             master node. Master node read the text of length n and search
the computer, the probability of “running out” of RAM
memory and having the necessity to make a change with the                    pattern of length m. Then master node divide the text into
hard disk swap file is smaller and, thus the computer                        subparts based on the availability of work nodes for
                                                                             broadcasting. Thus reading of the text requires n accesses. Let
performance increases.
                                                                             tavg be the average time to perform one I/O step. However
MPI used with Dakshina-I is LAM MPI whereas the same in                      open MPI does not read all the values from these files during
                                                                             startup always, but when division of text become a
Dakshina-II is Open MPI. LAM MPI is with the standard of
MPI-1.2 and much of MPI-2, Open MPI is with the standard                     requirement it reads the text and then send them to all nodes in
of MPI-2 [20]. Open MPI is therefore able to combine the                     the job for Dhakshina II, but Dhakshina I used LAM MPI
                                                                             which reads both text and pattern initially. In the case of
expertise, technologies, and resources from all across the High
Performance Computing community in order to build the best                   pattern, master node does not waste time for reading
MPI library available. Open MPI offers advantages for system                 operation. The files are read on each node during each process'
                                                                             startup. This is intended behavior: It allows for per-node
and software vendors, application developers and computer
science researchers. File system used with Dakshina-I is PVFS                customization, which is especially relevant in heterogeneous
where as the same in Dakshina-II is NFS. PVFS consists of a                  environments. Thus for configuration -I is,
                                                                                        T1 = tavg * (n + m) + Є1 Equation (5.1).
server process and a client library, both of which are written
entirely of user-level code. A Linux kernel module and pvfs-                 and for configuration II, T1 = tavg * n + Є1 Equation (5.2).
client process allow the file system to be mounted and used                   T2 be the time required for communication from master node
with standard utilities. Network File System(NFS) originally                 to slave nodes. The cluster nodes in the Configuration I and II
developed by Sun, allowing a user on a client computer to                    are interconnected using the Gigabit Ethernet LAN, Realtek
access files over a network in a manner similar to how local                 8169 Gigabit Network Card on each compute nodes and HP
storage is accessed. NFS, builds on the Open Network                         ProCurve 1400-24G(39078A) Switch X 2. Performance of
Computing Remote Procedure Call (ONC RPC) system.                            switch is less than 4.7 μs (64-byte packets) and less than 3.0
Authentication system used with configuration1 is NIS where                  μs (64-byte packets)[18]. The switch has a maximum through
as the same in configuration-2 is LDAP. NIS do not provide a                 put of 35.7 million packets per second (pps). The data transfer
distributed administration capability nor hierarchical data                  takes place at the speed of around 990 Megabit per second
administration. LDAP organizes data into a hierarchy, and                    among the servers and around 665 Mb per second among the

                                                                                                         ISSN 1947-5500
                                                            (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                      Vol. 9, No. 6, 2011
client nodes. In the ideal multicomputer architecture, the cost          This can substitute the maximum computation step from the
of sending a message between two tasks located on different              above table and s is the average time to perform one search
processors can be represented by two parameters: the message             (computation) step. The value of n in each computation node
startup time, which is the time required to initiate the                 will be the length of the subtext of n with size [(n-m+1)/p]+m-
communication, and the transfer time per (typically four-byte)           1 characters.
word, which is 32 bit in both configuration. The time[23]                 T5 be the result gathering time from slave nodes to master
required to send a message of size m includes the                        nodes. Master node/SGE Server/LDAP Servers are of IBM X
communication time to broadcast the pattern string to all                series Quad Core Xeon processors with SAS HDD, 2 GB
processors involved in the processing of the string matching.
                                                                         RAM and NIC 2G. T5 be the computation of consolidating
Let l1 is the latency time, a fixed startup overhead time needed         time of master node and to compute the final result. This IBM
to prepare sending a message from one processor to the other.
                                                                         Server collects the result (count value of pattern match) from
Let assume that the function MPI_Bcast is completed in log               all p nodes simultaneously [24]. The function MPI_Reduce
2 p steps. In each step, one or two parallel send operations per
                                                                         is completed in log2P steps. Thus the communication time to
processor are performed. The size of an m pattern string is m
                                                                         gather the results[22] is
bytes. The transmission time is proportional to the size of
pattern string. Let c1 be the incremental communication time                      T5 = log 2P (l1+c1).     Equation (5.6).
per byte. Therefore, the broadcast transfers m bytes to the
other p-1 processors. The expression for this amount of the              Thus the maximum searching time can be predicted after
time is given by [25, 26] .                                              considering all the above factors will be the sum of T1 + T2 +
                                                                         T3 + T4+ T5 . To determine this result, the following
       T2 = log 2 p (l1 + m c1 ). Equation (5.3).                        parameters have to be found. tavg be the average time to
                                                                         perform one I/O step, s the average time to perform one search
  T3 be the average I/O time for reading the subtext from the            (computation) step, Є1, time required for division of subtext,
local disk of a slave processors of Intel Pentium 4 (3.6 GHz),           l1 the latency time, a fixed startup overhead time needed to
with 1.23 GB RAM and 80 GB SATA HDD. I/O time is
                                                                         prepare sending a message from one processor to the other, c1,
proportional to the length of the text. Each node processor has          the incremental communication time per byte. tavg , s and Є is
to reads the subtext from the local disk into a buffer in main           found by measuring the time taken which performs n steps[27].
memory with size [(n - m +1) / p] + m -1 characters, the I/O
time on each client is given by:                                          tavg = Time taken by the processor to read k character / k
                                                                                                                          Equation (5.7).
T3 = ( [(n − m +1) / p] + m −1) tavg. Equation (5.4).                    The average tavg value for different character length is 4.7E-10
T4 be the maximum computation time, then                                 seconds.

T4 = (Max. computation step) X s.       Equation (5.5).                  s = Time taken by searching the text of length n / n.
                                                                                                                        Equation (5.8).
 Searching steps required for string matching of an m pattern
string with n text string requires for different algorithms are                   Following algorithms results with several pattern
given the following table                                                length and nodes will have different s value for Configuration
                                                                         TABLE 7: AVERAGE TIME TAKEN FOR SEARCHING CHARACTER (S)
                     Maximum                                             FOR THE CONFIGURATION-I WHEN P=5
                  computation step
 Algorithms          required                                              m        5         10         15          20          25           50
     KMP                  2n                                              KMP    2.15E-08   2.06E-08   1.92E-08   1.89E-08    1.82E-08   1.80E-08

     BM                   3n                                               BM    7.58E-09   7.33E-09   6.91E-09   6.25E-09    6.49E-09   2.68E-09

     BMH             m ((n-m)+1)                                          BMH    4.54E-09   2.16E-09   1.37E-09   9.77E-10    6.68E-10   1.73E-10

      ZT            (n2+ m2 + nm)                                          ZT    3.01E-15   2.95E-15   2.68E-15   2.48E-15    2.35E-15   2.30E-15
                                                                           QS    2.51E-09   2.28E-09   2.05E-09   1.98E-09    1.87E-09   1.07E-09
     QS               nm+(n-m)
     BR            [(n+m)2+ (n-m)]/2                                       BR    2.95E-15   2.90E-15   2.94E-15   2.81E-15    2.70E-15   1.46E-15

     FS                  nm                                                FS    3.09E-09   1.50E-09   9.50E-10   6.77E-10    5.10E-10   1.76E-10
   SSABS          (n+m)+[(n-m)(m-2)                                         S    5.76E-09   2.41E-09   1.52E-09   1.02E-09    7.49E-10   3.07E-10
    TVSBS            m(n – m + 1)
                                                                            S    3.92E-09   1.89E-09   1.26E-09   8.74E-10    6.71E-10   2.94E-10
   ZTMBH             ½[n2+nm+m]                                           ZTM
                                                                           BH    4.42E-15   4.16E-15   3.80E-15   3.01E-15    2.06E-15   8.28E-16
   BRBMH               m(n+1)-2                                           BRB
                                                                           MH    2.84E-09   1.34E-09   8.50E-10   5.98E-10    4.37E-10   1.25E-10

                                                                                                       ISSN 1947-5500
                                                                       (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                                 Vol. 9, No. 6, 2011
                                                                                  same text file of 12MB used with Configuration I and
TABLE 8: AVERAGE TIME TAKEN FOR SEARCHING CHARACTER (S)                           Configuration II is given below. The following result is
                                                                                  obtained by adding T1, T2, T3, T4 and T5.
  m        5         10         15          20         25              50
 KMP    1.77E-08   1.67E-08   1.57E-08   1.54E-08   1.52E-08   1.50E-08           TABLE 11:  THEORETICAL PARALLEL EXECUTION TIME IN
  BM    6.52E-09   6.50E-09   6.10E-09   5.68E-09   4.92E-09   2.79E-09           SECONDS FOR A FILE SIZE 12 MB WITH 5 NODES IN
                                                                                  BEOWULF CLUSTER DAKSHINA-1
 BMH    4.04E-09   1.87E-09   1.24E-09   8.64E-10   6.31E-10   1.83E-10
                                                                                     m        5       10        15        20         25        50
  ZT    2.70E-15   2.40E-15   2.33E-15   2.31E-15   2.27E-15   2.21E-15
  QS    2.18E-09   1.19E-09   7.74E-10   6.31E-10   4.52E-10   1.51E-10            KMP      0.551    0.53      0.495     0.485     0.468     0.464
  BR    2.67E-15   2.58E-15   2.47E-15   2.54E-15   2.31E-15   1.47E-15             BM      0.295    0.286     0.27      0.245     0.254     0.109
  FS    2.78E-09   1.35E-09   8.68E-10   6.19E-10   4.99E-10   1.75E-10            BMH      0.295    0.281     0.268     0.255     0.219     0.117
   S    4.87E-09   2.05E-09   1.25E-09   8.64E-10   6.74E-10   2.75E-10             ZT      0.49     0.48      0.438     0.405     0.385     0.377
   S    3.47E-09   1.70E-09   1.11E-09   7.65E-10   5.93E-10   2.75E-10             QS      0.198    0.325     0.423     0.534     0.623     0.699
 ZTM                                                                                BR      0.244    0.24      0.243     0.233     0.224     0.125
  BH    3.80E-15   3.57E-15   3.29E-15   2.83E-15   1.97E-15   1.13E-15
 BRB                                                                                FS      0.203    0.198     0.188     0.179     0.169     0.119
  MH    2.68E-09   1.33E-09   8.89E-10   5.96E-10   4.61E-10   1.42E-10
                                                                                   SSABS    0.299    0.283     0.276     0.252     0.235     0.198
TABLE 9: AVERAGE TIME TAKEN FOR SEARCHING CHARACTER (S)                           TVSBS     0.256    0.248     0.248     0.229      0.22     0.194
                                                                                  ZTMBH     0.362    0.341     0.312     0.249     0.173     0.074
  m        5         10         15         20         25         50
                                                                                  BRBMH     0.187    0.178     0.169     0.159     0.146     0.087
 KMP    1.29E-08   1.14E-08   1.11E-08   1.05E-08   1.04E-08   1.03E-08
  BM    4.47E-09   4.25E-09   3.94E-09   3.81E-09   3.15E-09   1.57E-09           TABLE 12. THEORETICAL PARALLEL EXECUTION TIME IN
 BMH    2.74E-09   1.26E-09   7.98E-10   5.59E-10   4.00E-10   1.10E-10           SECONDS FOR A FILE SIZE 12 MB WITH 10 NODES IN
                                                                                  BEOWULF CLUSTER DAKSHINA-1
  ZT    1.84E-15   1.68E-15   1.59E-15   1.62E-15   1.56E-15   1.49E-15
                                                                                      m      5      10     15   20    25    50
  QS    1.52E-09   7.15E-10   5.21E-10   3.56E-10   2.69E-10   8.76E-11
                                                                                    KMP      0.455    0.43      0.406     0.397     0.393      0.388
  BR    1.82E-15   1.77E-15   1.74E-15   1.62E-15   1.48E-15   8.31E-16
                                                                                     BM      0.255    0.254     0.239     0.223     0.194      0.113
  FS    1.89E-09   8.66E-10   5.45E-10   3.97E-10   2.89E-10   1.00E-10
                                                                                    BMH      0.263    0.244     0.243     0.226     0.207      0.123
   S    3.35E-09   1.37E-09   8.21E-10   5.89E-10   4.26E-10   1.69E-10              ZT      0.44     0.392      0.38     0.378     0.371      0.361
   S    2.27E-09   1.12E-09   7.19E-10   5.04E-10   3.84E-10   1.67E-10              QS      0.173    0.173     0.164     0.175     0.156      0.105
 ZTM                                                                                 BR      0.221    0.214     0.205     0.211     0.192      0.125
  BH    2.50E-15   2.38E-15   2.04E-15   1.70E-15   1.19E-15   5.19E-16
 BRB                                                                                 FS      0.183    0.178     0.172     0.164     0.165      0.118
  MH    1.73E-09   7.79E-10   4.82E-10   3.62E-10   2.48E-10   6.89E-11
                                                                                    SSABS    0.254    0.241     0.228     0.215     0.212      0.178

TABLE 10: AVERAGE TIME TAKEN FOR SEARCHING CHARACTER                                TVSBS    0.227    0.223     0.218     0.201     0.195      0.181
(S) FOR THE CONFIGURATION-II WHEN P=10                                             ZTMBH     0.312    0.293     0.271     0.234     0.165      0.098
  m        5         10         15         20         25          50               BRBMH     0.177    0.175     0.176     0.158     0.153      0.097
 KMP    9.90E-09   9.43E-09   9.07E-09   8.60E-09   8.32E-09   7.69E-09
  BM    3.78E-09   3.65E-09   3.36E-09   3.20E-09   2.49E-09   1.57E-09           TABLE 13: THEORETICAL PARALLEL EXECUTION TIME IN
 BMH    2.22E-09   1.06E-09   6.57E-10   4.69E-10   3.02E-10   9.59E-11           SECONDS FOR A FILE SIZE 12 MB WITH 5 NODES IN
                                                                                  BEOWULF CLUSTER DAKSHINA-I1
  ZT    1.51E-15   1.38E-15   1.23E-15   1.21E-15   1.18E-15   1.16E-15
                                                                                      m      5      10     15   20    25    50
  QS    1.26E-09   6.15E-10   3.84E-10   2.77E-10   2.18E-10   6.77E-11
                                                                                    KMP      0.335    0.297     0.289     0.273     0.271     0.268
  BR    1.51E-15   1.44E-15   1.42E-15   1.37E-15   1.29E-15   7.15E-15
                                                                                     BM      0.177    0.169     0.157     0.152     0.127     0.067
  FS    1.53E-09   7.40E-10   4.72E-10   3.31E-10   2.49E-10   7.06E-11
                                                                                    BMH      0.181    0.167     0.159     0.149     0.134     0.077
   S    2.78E-09   1.17E-09   7.15E-10   5.02E-10   3.12E-10   1.45E-10              ZT      0.302    0.277     0.263     0.267     0.258     0.247
   S    1.89E-09   9.06E-10   5.99E-10   4.14E-10   3.21E-10   1.48E-10              QS      0.123    0.107     0.113     0.102     0.096     0.064
 ZTM                                                                                 BR      0.153    0.149     0.147     0.137     0.126     0.074
  BH    2.13E-15   2.07E-15   1.81E-15   1.51E-15   1.08E-15   5.58E-16
 BRB                                                                                 FS      0.127    0.117     0.111     0.108     0.099     0.071
  MH    1.43E-09   6.69E-10   4.14E-10   2.91E-10   2.20E-10   9.59E-11
                                                                                    SSABS    0.177    0.163     0.153     0.149     0.137     0.112
                                                                                    TVSBS    0.151    0.149     0.144     0.135     0.129     0.113
Time taken by searching a character in a large text of length n
depends on the searching algorithm and number of nodes used                        ZTMBH     0.208    0.198     0.171     0.144     0.103     0.049
    . The above table is the average value of s for 5 nodes and                    BRBMH     0.117    0.106     0.099     0.099     0.086     0.051
10 nodes. Now theoretical and predicted searching time for the

                                                                                                              ISSN 1947-5500
                                                                (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                          Vol. 9, No. 6, 2011

                                                                                                 VII. CONCLUSION
                                                                          The above mentioned method is a general method to find
BEOWULF CLUSTER DAKSHINA-II                                               maximum searching time. Different algorithms works in
    m        5     10       15 20    25     50                            different ways. Running time of certain algorithms is
   KMP       0.258    0.246   0.237    0.225   0.218    0.202             independent of the keyword set.        The experimental results
                                                                          show the close relationship with theoretical analyzed
   BM        0.151    0.146   0.135    0.129   0.102    0.067
                                                                          searching time. More than 98% of accuracy for searching
   BMH       0.148    0.142   0.132    0.126   0.103    0.068             prediction can be achieved with this method on this particular
    ZT       0.249    0.228   0.204    0.202   0.196    0.194             environment. We have not considered DNA sequence string
    QS       0.103    0.093   0.085    0.081   0.079    0.051             /multiple string pattern searching operations. Algorithms like
    BR       0.128    0.123   0.121    0.117   0.111     0.58             ZT, BMH, ZTMBH, BR, and BRBMH are designed for a
                                                                          DNA sequence string or multiple pattern searching. However,
    FS       0.104    0.101   0.097    0.091   0.086    0.052
                                                                          these algorithms also satisfy this formula for natural language
  SSABS      0.148    0.141   0.134    0.128   0.102    0.097             strings. With DNA sequence searching mechanism, the result
                                                                          is satisfactory. The next level of work is to make an Estimate
  TVSBS      0.127    0.122   0.121    0.112   0.109    0.101
                                                                          for DNA sequences searching. The effective application of
 ZTMBH       0.178    0.173   0.152    0.128   0.094    0.052             this formula is for natural language searching and network
 BRBMH       0.098    0.092   0.086    0.081   0.077    0.068             security applications. The experimental results show that the
                                                                          proposed prediction method works effectively by considering
                                                                          worst case of these algorithms, especially in case of
 l1 is the latency time, a fixed startup overhead time needed to          alphabets[32] of ASCII codes, and thus the proposed method is
prepare sending a message from one processor to the other                 quite applicable for exact pattern matching of natural
and c1 be the incremental communication time per byte. In                 language. The results of algorisms helps us to implement these
order to find the values of l1 and c1, ping-pong test is                  technology in the areas of text search, DNA search and Web
performed[29] between two processors, which send/receive a                related fields in a cost effective manner.
number of messages between two processes on the same
processor. Both processes do nothing but simply send and
receive messages. All timings are average times over 100                                                REFERENCES
separate rounds. The values of l1 and c1 can be found using the           [1]  Thomas Sterling[2006], Beowulf Cluster Computing with Linux, MIT
linear regression method to fit a straight line to the curve of                Press. Page 25.
the communication[31]. Gigabit Ethernet provides a reasonably             [2] A. Petitet, R. C. Whaley, J. Dongarra, A. Cleary,[ Sept 2008], HPL - A
high bandwidth given its low price, but suffers from relatively                Portable Implementation of the High-Performance Linpack Benchmark
                                                                               for Distributed-Memory Computers, Innovative Computing Laboratory,
high latency. The average latency value l1 is 95.72 μs(95.72 x                 University of Tenneesse. doi:
10-6 Seconds)[30]. The average incremental communication                  [3] Peter S. Pacheco[1997], Parallel Programming with MPI.. Morgan
time c1 is 0.02731 μs (0.02731 x 10 -6 Seconds).                               Kaufmann Publishers, Inc. Page 28
                                                                          [4] Al Geist, Adam Beguelin, Jack Dongarra[1996], PVM: a users' guide
              VI. PERFORMANCE ANALYSIS                                         and tutorial for networked parallel computing, MIT Press. Page 103-107
                                                                          [5] William Gropp, Ewing Lusk, Anthony Skjellum,[1996] Using MPI:
                                                                               Portable parallel programming with the message-passing interface.. MIT
Experimental results and estimates for the running time of                     Press. Page 124
string searching algorithms are presented by assuming that                [6] Markus Weinhardt and Wayne Luk, PACT GmbH[2002], Task-Parallel
both the text and the patterns are random strings with uniform                 Programming of Reconfigurable Systems’, Springer Berlin / Heidelberg,
distribution. In practice, texts and patterns are not random, but              ISBN 978-3-540-42499-4, Page 172-181
this estimate gives a rough idea about the performance of these           [7] Michailidis, P.D. and Margaritis, K.G[2005]., ‘Parallel text searching
algorithms. Time required for division of subtext Є1 is                        applications on a heterogeneous cluster architecture’, Int.
                                                                               J.Computational Science and Engineering, Vol. 1, No. 1, pp.45–59.
negligibly small when compared to the other factors. Reverse
                                                                          [8] D.E.Knuth, J.H.Morris and V.R.PratT[1977], Fast pattern matching in
factor low is not relevant with this experiment because of the                 strings, SIAM Journal on Computing, Vol.6, no.2, pp.323-350.
collision during the network traffic. The expected running                [9] Hume S.C and D.Sunday[1991], “Fast string searching”, Software
time is higher than experimented case. This is because of the                  Practice and Experience, 1221-1248.
worst case computation values have considered in the formula              [10] Pirklbauer K,[1992] “A study of pattern-matching algorithms”,
(Table 6). By fixing the value of s based on length of                         Structured Programming’, 89-98.
characters m, n and number of computations for different                  [11] Richard Cole[1991], "Tight bounds on the complexity of the Boyer-
                                                                               Moore algorithm". Proceedings of the 2nd Annual ACM-SIAM
algorithms, is the significant part of this linear approach                    Symposium         on      Discrete     Algorithms,    pp.      224–233.
method to estimate static search time. Each experiment               
performed 10 times and the average value is taken (deviations             [12] Prasad J.C., K.S.M.Panicker,[2009] ‘Beowulf Dakshina Cluster
were very small).                                                              Architecture with Linux Debian Operating system for MPI
                                                                               Programming’, Proceedings of International Conference on Information
                                                                               Processing,, Bangalore, India ISBN: 978-93-80026-75-2, Page 350.

                                                                                                          ISSN 1947-5500
                                                                        (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                                  Vol. 9, No. 6, 2011
[13] P.D.Michailidis, K.G.Margaritis[2000], Parallel String Matching                 [27] J. Rodriguez de Souza, E. Argollo, A. Duarte, D. Rexachs, E.
     Algorithm: A bibliographical review, Technical Report, Dept.of Applied               Luque[2007], Fault Tolerant Master-Worker over a Multi-Cluster
     Informatics, University of Macedonia.                                                Architecture, Proceedings of the International Conference Parallel
[14] Panagiotis D. Michailidis and Konstantinos G. Margaritis[2001],                      Computing: Current & Future Issues of High-End Computing, NIC
     ‘Parallel Text Searching Application on a Heterogeneous Cluster of                   Series, Vol. 33, ISBN 3-00-017352-8, Pages. 465-472.
     Workstations’,      IEEE Computer Society        Proceedings of the             [28] SMIT, G.DE V [1982]. “A comparison of three string matching
     International Conference on Parallel Processing Workshops                            algorithms”, Software Practice and Experience, Page57-66.
     (ICPPW’01), Page. 153                                                           [29] Kiran Nagaraja, Neeraj Krishnan, Ricardo Bianchini, Richard P. Martin,
[15] Amdahl, Gene.[1967] ‘Validity of the Single Processor Approach to                    Thu D. Nguyen, “Evaluating the Impact of Communication Architecture
     Achieving Large-Scale Computing Capabilities’. AFIPS Conference                      on the per formability of Cluster-Based Services’, Department of
     Proceedings,:                                                483–485.                Computer Science, Rutgers University,NJ 08854, http://dark-                         
[16] John L. Gustafson, Reevaluating Amdahl's Law, Ames Laboratory,                       03.pdf
     Department         of       Energy,       ISU,      Ames,         Iowa.         [30] Anderson, T., D. Culler, D. Patterson.[1995], A Case for NOW                         (Network of Workstations), IEEE Micro(vol. 15), pp. 54-64.
[17] Benner, R.E., Gustafson, J.L., Montry, G.R[1988]., Development and              [31] Panagiotis D. Michailidis and Konstantinos G. Margaritis[2002], “String
     analysis of scientific application programs on a 1024-processor                      Matching Problem On A Cluster of Personal Computers: Performance
     hypercube," Sandia National Laboratories, Page 0317.                                 Modeling”, Intern. J. Computer Math., Taylor & Francis, Vol. 79(8), pp.
[18] Prasad J.C., KSM Panicker[2010], ‘String Searching Algorithm                         867–888.
     Implementation-Performance Study With Two Cluster Configurations’,              [32] Foster, Ian[1995], Designing and Building Parallel Programs Addison-
     International Journal of Computer Science and Communication, ISSN:                   Wesley, ISBN 0201575949, Message Passing Interface, page 245.
     0973-7391, Page 551-555.
[19] Recent Advances in Parallel Virtual Machine and Message Passing                                              AUTHORS PROFILE
     Interface[2005], 12th European PVM/MPI Users’ Group Meeting
     Sorrento, Italy, September 18-21, 2005. Proceedings, Springer, Volume           K S Mohanachandra Panicker was graduated in Electrical Engineering in
     3666.                                                                           1971 from REC Calicut, post-graduation from College of Engineering,
                                                                                     Trivandrum in 1973, and Ph D from I I T , New Delhi in 1986. He has joined
[20]                                                         N S S College as a lecturer in Electrical Engineering in 1974 and served at
[21] Gerald Carter[2003], LDAP System Administration, O'Reilly, page 2               different levels up to 2005 as Assistant Professor, Professor and Principal. He
[22] Panagiotis D. Michailidis, Konstantinos G. Margaritis[2002], Parallel           was principal of Govt. Model Engineering College Ernakulam, during 1994 -
     implementations for string matching problem on a cluster of distributed         2001 and Federal Institute of Science and Technology (FISAT) during 2005-
     workstations, Neural, Parallel & Scientific Computations, Volume 10 ,           2008. As a Professor, Dr Panicker worked in European University of Lefke,
     Issue 3, Pages: 287 – 312.                                                      Cyprus. Currently he is the Dean, Planning and research, in FISAT,
[23] Ahmad Fadel Klaib, Hugh Osborne[Dec 2008], ‘Searching Protien                   Angamaly. At national and international level, he has published over 40
     Sequence Databases Using       BRBMH Matching Algorithm’, IJCSNS                papers in conferences and journals. He is also guiding three students
     International Journal of Computer Science and Network Security, Vol.8,          registered for Ph D.
     No 12, , Page 59.
[24] Garth A. Gibson, Rodney Van Meter[2000], ‘Network attached storage
     architecture’, Communications of the ACM, Volume 43 , Issue 11,                 Prasad J.C graduated in Mathematics from University of Calicut in 1998,
     Pages: 37 – 45.                                                                 Post graduation in Computer Applications from Bharathiar University in
[25] Yang Yu Bhaskar Krishnamachari Prasanna, V.K.[2004], ‘Issues in                 2001, and his second post graduation in Computer Science and Engineering in
     designing middleware for wireless sensor networks : Network, IEEE,              2006. He had joined as lecturer in the Department of Computer Applications
     Volume: 18, Issue: 1, page15- 21.                                               at Union Christian College. Presently he is working as Asst.Professor, Dept of
                                                                                     Computer Science and Engineering, Federal Institute of Science and
[26] F Ashiya, M Matsumoto, S Nagasawa, S Tomita[2007], Loop-network
     configuration and single-mode optical fiber cable technologies for              Technology[FISAT], Angamaly. At national and international level, he has
     subscriber network, International Journal of Digital & Analog                   published 12 papers in conferences and 2 papers in international journals.
     Communication Systems, Volume 3. Issue 1, , John Wiley & Sons, Ltd,             Prasad is currently doing his Ph.D work in Dr.M.G.R.University, Chennai.
     Pages 77 – 83.

                                                                                                                      ISSN 1947-5500

To top