Docstoc

A New Dynamic Data Allocation Algorithm for Distributed Database

Document Sample
A New Dynamic Data Allocation Algorithm for Distributed Database Powered By Docstoc
					                                                          (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                              Vol. 9, No. 5, May 2011


          A New Dynamic Data Allocation Algorithm
                 for Distributed Database
                         Fardin Esmaili Sangari                                                 Seyed Mostafa Mansourfar
              Sama Technical and vocational training college                            Sama Technical and vocational training college
                        Islamic Azad university                                                   Islamic Azad university
                      Urmia branch, Urmia, Iran                                                 Sahand branch, Sahand, Iran
                       Fardin_e_s@yahoo.com                                                   Mostafa.mansourfar@gmail.com



Abstract__ Data and fragment allocation is an important issue in              in distributed systems. These papers have been performed
distributed database systems. Data allocation is carried out based            data allocation depending on static data access patterns or
on data access dynamic and static patterns. This paper proposes a             query access patterns. Access probability of nodes to data
new strategy for data allocation named Relative Threshold                     fragmentations is stable in static environment. While these
Algorithm (RTA) in non-repeated distributed database. Proposed                changes in dynamic environments and using of static
algorithm does reallocation data fragments by changing access
                                                                              methods frequently reduces database performance. Dynamic
pattern to data fragments. This algorithm proposes data fragments
migrate at the site that has at most availability to fragments.               algorithm has been presented for data allocation in non-
Simulation results show that RTA performance is better than                   replicate database systems called threshold algorithm [7].
existing algorithms in term of hit ratio. It also reduces requirement         Threshold algorithm transfers data fragmentation among sites
space. We believe the reduction of storage overhead make RTA                  according to change data access pattern. It focuses on load
more attractive in distributed database systems.                              balance. This algorithm provides data allocation with low hit
Keywords-component:      Distribute   Database;     Dynamic     Data          ratio. In other words, the requirement probability of that site
Allocation                                                                    is low to fragment in site and it doesn’t completely consider
                                                                              number of other sites access while takes into account and
                          I. Introduction                                     only the last site has access to data during data transfer to
                                                                              other sites. We aim to focus on the disadvantages and we
     Database and network technologies have been the most                     attempt to eliminate them. The rest of the paper is as follows.
important problems in creating distributed database systems,                  In section 2, we review threshold algorithm. Proposed
for the past decade. A distributed database system is consists                algorithm is presented in section 3. In Section 4, simulation
of a collection of sites connected communication network, in                  results of proposed algorithm have been showed. Finally,
which each site is a database system in its own right, but the                section 5 is the conclusion.
sites have agreed to work together, so a user at any site can
access data anywhere in the network exactly as if the data                                       II. Threshold algorithm
were all stored at the user’s own site [1]. Distributed
database systems use data allocation for achieving two aims.                     Threshold algorithm is one of the dynamic allocation
First is total data transmission cost minimized for process                   algorithms which transfer data fragments among sites
(i.e., the maximum number of fragments that can be allocated                  according to changing patterns [7][10][11]. Threshold
in a site) and Second one is the unifying of implementation                   algorithm stores only one counter for each fragment. Figure 1
strategy. The majority concern of a distributed database                      shows fragment i with its associated counter.
system is the designing of the fragmentation and allocation
of the underlying data. Fragmentation unit can be a file
where allocation issue becomes the file allocation problem
[2]. However, data allocation is a NP-complete problem
[3]. So, quick allocation requires creation of efficient                                 Figure 1. Any fragment i in threshold algorithm
solution. Moreover, optimal allocation of database hardly is
employed by a distributed database system on query strategy.                      In the threshold algorithm, the initial value of the counter
                                                                              is zero. The counter value is increased by one for each
    A few papers have been recently proposed for data                         remote access to the fragment. It is reset to zero for a local
allocation problem. Chu in [4] has considered this problem.                   access. Whenever the counter exceeds a predetermined
Repetitive and non-repetitive models conducted in [5][6] and                  threshold value, the ownership of the fragment is transferred
[7][8] address issue dynamic file allocation. In [6][7][8] and                to another node. At this point, the critical question is, which
[9] have been presented various solutions for data allocation                 node will be the new owner of the fragment? The algorithm




                                                                        138                              http://sites.google.com/site/ijcsis/
                                                                                                         ISSN 1947-5500
                                                             (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                 Vol. 9, No. 5, May 2011
gives very little information about the past accesses to the             distributed. We eliminate single point of failure. If that site
fragment. In fact, throughout the entire access history only             crashed, other sites access to information yet is there and
the last node which accessed the fragment is known. Two                  only crashed site information will be destroyed. Our
strategies have been selected for current possessor. Whether             proposed algorithm raise hit ratio. It reduces data
new possessor is selected randomly, or last accessing node is            replacement due to locality. This would be show as follow.
selected as new possessor. In initial strategy, the randomly
                                                                         We make our work assumptions as follow.
chosen node could be one that has never accessed the
fragment before. Therefore, latter strategy heuristically is                   Initially, fragments are randomly distributed in the
better. Initially all fragments are distributed to the nodes                    sites.
randomly. A threshold value is set by δ. Every node j,                         Initially, counter value is zero
threshold algorithm executes for every fragment i that have                    An incremental counter is used. The initial value of
been stored. It reduces traffic two nodes which have                            the counter is zero.
threshold value exceed one (δ>1). One of the important                         if the name of access fragment is same as the name of
problems in threshold algorithm is the exact choice of                          identifier field then For each access to fragment,
threshold value. Because of this, value affects on fragments                    counter value increases by one
movement (mobility of the fragments) directly. If threshold
value increases, fragment will tend to remain more in current
node. Otherwise, as the threshold value decreases, fragment                                  Fragment             counter
tendency will visit more sites.

                                                                             Figure2. The fragment in each site at proposed algorithm
In threshold algorithm, if n fragments are in a site then n
distinctive counter are requirement. If site B consecutively
accesses to fragment in site A then counter increases by one
                                                                             Relative threshold algorithm:
and counter is tended to threshold value. Now, if site A
randomly accesses to fragment that before site B consecutively           Step 1. Initial counter value is set zero for all sites and
accesses it then counter be zero.                                                  distribute fragments randomly between sites. (at
                                                                                   each site counter=0)
If site B consecutively accesses to fragment in site A and site          Step 2. Process the access request for stored fragment.
C accesses to this fragment for first time and with this access,
counter value equal with threshold value then fragment is                Step 3. For each request (locally or remote), counter value
transferred to site C because site C has performed last access.                    increase one, if the access is repetitive. go to step
This events increase response time.                                                2.
                                                                         Step 4. If name of requested fragment is not same as the
                  III. Proposed Algorithm                                          fragment field, set counter by zero is replaced
    Our proposed algorithm uses two fields for every site.                         identifier field with new fragment name.
Number of fields doesn’t depend on fragments number                      Step 5. If counter value exceeds threshold value (counter>δ)
which resides in site. One of fields count number of accesses                       and fragment is in site then counter will be zero
and other shows last fragment which has access to current                           else fragment is transferred to access site and
site. The fragment tends to stay at the node with higher                            counter will be zero.
access probability. As the access probability of the node
increases, the tendency to remain at this node also increases.           Step 6. Refer to step 2.
It is also shown that as the threshold value increases, the                  We suppose sites topology as in figure 3. Site 2 wants to
fragment will tend to stay more at the node with higher                  access fragment of site1, so it increased one to counter and
access probability. At every access, name of fragment is                 fragment field value become equal to A. each sequential
compared with counter if they are similar counter increased              access increases counter value, if site 2 finds existent data in
by one. Counter is set to zero when site accesses to fragment            A. if this value is higher threshold value, data will move to
for first time and then the name of fragment is recorded in              site 2. If site 2 accesses to data unlike A, counter value will
identifier field. Our algorithm computes total number of                 be zero. And fragment field value will be replaced by a new
accesses whether these accesses are local or remote. It is               fragment name.
important that the number of accesses is interval. This
algorithm increases probability of fragment resident in site.
However, response time decreases, because it doesn’t require
any information replacement from remote site. Threshold
algorithm is a centralized algorithm. If site failed, total site
information would waste. Our proposed algorithm is




                                                                   139                               http://sites.google.com/site/ijcsis/
                                                                                                     ISSN 1947-5500
                                                                                                                    (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                                                                        Vol. 9, No. 5, May 2011
                                                                                                                                  Experiment is repeated with number of site 5 and
                                                                                                                              threshold 10 and similar results have almost been achieved.
                                                                                                                              Whatever environment be more intense, higher hit ratio
                                                                                                                              would be achieved.
                                                                                                                                                                  site = 9 , value = 10

                                                                                                                                               1200


                                                                                                                                               1000


                                                                                                                                               800




                                                                                                                                    Hit Rate
                                                                                                                                                                                                         Threshold
                                                                                                                                               600
                                           Figure3. Example of sites topology                                                                                                                            RTA

                                                                                                                                               400

                                              IV. Simulation Results                                                                           200


    In this section, we evaluate the proposed algorithm and                                                                                      0
                                                                                                                                                      0   2000   4000        6000         8000   10000
compare it with threshold algorithm and show our algorithm                                                                                                       Number of access

which has better performance. In this simulation, the number
of fragment is between 100 and 9000. Initially, these                                                                             Figure 6. compare of Hit Rate in RTA & Threshold algorithm
fragments are randomly distributed between sites.                                                                             with different access numbers and different sites & value numbers
Experiments were examined in different environments.
                                                                                                                                                                 V. Conclusion
   In first scenario, we consider number of sites variably and
assume threshold value as stable (figure 2).                                                                                      In this article we introduce a new method to distributed
                                                                                                                              data fragment of Distributed Database System. RTA is based
                                                            site = 5 , value = 5                                              on threshold algorithm that uses different strategy for data
                           2500                                                                                               transmission. In our experiments, we consider hit ratio. This
                                                                                                                              simulation is configurable for testing different network
                           2000
                                                                                                                              topologies and different data request and/or allocation
                           1500
                                                                                                                              conditions. Result of experiment shows the RTA hit rate is
                Hit Rate




                                                                                                        Threshold
                                                                                                        RTA
                                                                                                                              better than threshold algorithm and achieve better
                           1000                                                                                               improvement of threshold algorithm. We use non-repeated
                               500
                                                                                                                              distributed algorithm. In future, we can consider RTA in
                                                                                                                              repeated distributed algorithm.
                                   0
                                       0      2000      4000          6000         8000      10000                            References
                                                        Number of Acceess

                                                                                                                               [1] Baseda, S. Tasharofi, M. Rahgozar, "Near Neighborhood Allocation: A
    Figure 4. Compare of Hit Rate in RTA & Threshold algorithm                                                                       Novel Dynamic Data Allocation Algorithm in DDB", CSICC 2006.
                 with different access numbers                                                                                 [2] Navathe, S.B., S. Ceri, G. Wiederhold and J.Dou," Vertical Partitioning
                                                                                                                                     Algorithms for Database Design", ACM Transaction on Database
    In this experiment, hit ratio factor of data fragment length                                                                     Systems, 1984
is 2500, threshold value is 5 and number of sites is 5. So                                                                     [3] Y. F. Huang and J. H. Chen, “Fragment Allocation in Distributed
simulation results in figure 3 show proposed algorithm                                                                               Database Design” , Journal of Information Science and Engineering
increases fragment hit ratio when requested fragment exist in                                                                        17, 491-506, 2001
current site.                                                                                                                  [4] Ahmad, I., K. Karlapalem, Y. K. Kwok and S. K. “Evolutionary
                                                                                                                                     Algorithms for Allocating Data in Distributed Database Systems”,
                                                       site = 5 , value= 10
                                                                                                                                     International Journal of Distributed and Parallel Databases, 11: 5-32,
                                                                                                                                     The Netherlands, 2002.
                1400
                                                                                                                               [5] A. Brunstroml, S. T. Leutenegger and R. Simhal, “Experimental
                1200                                                                                                                 Evaluation of Dynamic Data Allocation Strategies in a Distributed
                1000                                                                                                                 Database with changing Workloads” , ACM Transactions on
                                                                                                                                     Database Systems, 1995
     Hit Rate




                 800
                                                                                                     Threshold
                                                                                                     RTA                       [6] A. G. Chin,” Incremental Data Allocation and ReAllocation in
                 600
                                                                                                                                     Distributed Database Systems”, Journal of Database Management;
                 400
                                                                                                                                     Jan-Mar 2001; 12, 1; ABI/INFORM Global pg. 35
                 200
                                                                                                                               [7] T. Ulus and M. Uysal, "Heuristic Approach to Dynamic Data
                           0                                                                                                         Allocation in Distributed Database Systems", Pakistan Journal of
                               0           2000      4000         6000        8000        10000
                                                                                                                                     Information and Technology 2 (3): 231-239, 2003
                                                     Numaber of Access
                                                                                                                               [8] S. Voulgaris, M.V. Steen, A. Baggio, and G. Ballintjn,” Transparent
                                                                                                                                     Data Relocation in Highly Availabl Distributed Systems”. Studia
     Figure 5. compare of Hit Rate in RtA & Threshold algorithm                                                                      Informatica Universalis. 2002
                  with different access numbers




                                                                                                                        140                                             http://sites.google.com/site/ijcsis/
                                                                                                                                                                        ISSN 1947-5500
                                                                     (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                         Vol. 9, No. 5, May 2011
[9] L. C. John,” A Generic Algorithm for Fragment Allocation in
      Distributed Database Systems” , ACM, 1994
[10] Basseda. R , “Fragment Allocation in Distributed Database Systems
      “,Database Research Group , 2006
[11] Basseda. R ,“Data Allocation In Distributed Database Systems”,
      Technical Report No . DBRG . RB-ST. A50715, 2005.




                                                                         141                              http://sites.google.com/site/ijcsis/
                                                                                                          ISSN 1947-5500

				
DOCUMENT INFO
Shared By:
Stats:
views:218
posted:6/5/2011
language:English
pages:4