distributed+database+systems

Document Sample
distributed+database+systems Powered By Docstoc
					Distributed Database Systems

  Benchmark Specification of Course Project



               (Student Edition)

                  2008-12-30




   Database Research Group Tsinghua University
                                        Table of Contents

1   Data Specification ............................................................................................ 3

     1.1 Global Tables ......................................................................................... 3

     1.2 Table Description................................................................................... 3

     1.3 Fragmentation ........................................................................................ 4

     1.4 Allocation .............................................................................................. 6

     1.5 Data Format ........................................................................................... 7

2   Benchmark Procedure ...................................................................................... 7

     2.1 Database Initialization ........................................................................... 7

     2.2 Inserts & Deletes ................................................................................... 8

     2.3 Data Import............................................................................................ 8

     2.4 Queries................................................................................................... 8

             2.4.1 Basic Test Cases ........................................................................... 9

             2.4.2 Hybrid Test Cases ....................................................................... 10

     2.5 P2P Test (Optional) ............................................................................. 10




                                                        2
 1 Data Specification

       1.1 Global Tables

     You have to create four global tables for the benchmark. They are Customer,
 Producer, Product and Purchase described as follows.
       Table 1 the global tables
                                               Global Tables
 Customer (id int key, name char(25), gender char(1), rank int)
 Producer (id int key, name char(50), location char(2))
 Product (id int key, name char(25), producer_id int, stocks int)
 Purchase(customer_id int, product_id int, number int)


       1.2 Table Description

      Customer (id int key, name char(25), gender char(1), rank int)


       Table 2 Description of Table “Customer”
Attribute    Description                   Type                Range of Values
id (key)     The customer‟s ID             Integer             100001-115000, no duplicate
name         The customer‟s name           String              -
gender       The customer‟s gender         Character           „M‟: Male (Proportion: 50%)
                                                               „F‟: Female (Proportion: 50%)
rank         The rank of the customer      Integer             1: Golden Customer (Proportion: 40%)
                                                               2: Silver Customer (Proportion: 30%)
                                                               3: Bronze Customer (Proportion: 30%)

                                                                           Number of records: 15000

      Producer (id int key, name char(50), location char(2))

       Table 3 Description of Table “Producer”
Attribute    Description                Type               Range of Values
id (key)     The producer‟s ID          Integer            200001-200500, no duplicate
name         The producer‟s name        String             -
location     The location of the String                    “BJ”: Beijing (Proportion: 50%)
             producer                                      “SH”: Shanghai (Proportion: 50%)

                                                                             Number of records: 500


                                                       3
      Product (id int key, name char(25), producer_id int, stocks int)

       Table 4 Description of Table “Product”
Attribute       Description              Type       Range of Values
id (key)        The product‟s ID         Integer    300001-320000, no duplicate
name            The product‟s name       String     -
producer_id     The ID of producer       Integer    Producer.id
                who produces the
                product
stocks          The      number    of    Integer   The distribution follows U[0, 6000]
                products which the                 Explanation:
                producer stocks                                             ���� + 0.5 − ���� − 0.5    1
                                                   ���� ������������������������ = ���� =                       =
                                                                                    6000          6000
                                                         where �������� 0,6000 and ���� is an integer

                                                                            Number of records: 20000

      Purchase (customer_id int, product_id int, number int)

       Table 5 Description of Table “Purchase”
Attribute      Description               Type      Range of Values
customer_id    The ID of customer        Integer   Customer.id
               who     purchase    the
               product
product_id     The ID of the product     Integer   Product.id
number         number of products        Integer   (number − 1)~Exp(1)
                                                   Explanation:
                                                                                         −����+0.5
                                                   ���� ������������������������ = ���� = ���� −0.5− ���� −1
                                                                            −����+1.5
                                                                                                     ���� > 1
                                                                              ����      − ����       ���� = 1
                                                              where ���� > 0 ������������ ���� is an integer

                                                                            Number of records: 60000

       1.3 Fragmentation

      Customer (id int key, name char(25), gender char(1), rank int)
       Table 6 Horizontal Fragmentation of Table “Customer”
 Fragmentation Name                                  Fragmentation Condition
 Customer.1                                          id < 110000
 Customer.2                                          id >= 110000 and id < 112500
 Customer.3                                          id >= 112500 and id <= 115000


                                                     4
   Producer (id int key, name char(50), location char(2))

    Table 7 Horizontal Fragmentation of Table “Producer”
Fragmentation Name                                  Fragmentation Condition
Producer.1                                          id < 200200 and location=”BJ”
Producer.2                                          id < 200200 and location=”SH”
Producer.3                                          id >= 200200 and location=”BJ”
Pruducer.4                                          id >= 200200 and location=”SH”
   Product (id int key, name char(25), producer_id int, stocks int)



 Scheme 1: Vertical Fragmentation



Table 8 Vertical Fragmentation of Table “Product”
Fragmentation Name                                  Fragmentation Condition
Product.1                                           (id, name)
Product.2                                           (id, producer_id, stocks)



 Scheme 2: Hybrid Fragmentation



     Step 1: Vertical Fragmentation (identical to Scheme 1)



Table 9 Step 1/2: Vertical Fragmentation of Table “Product”
Fragmentation Name                                  Fragmentation Condition
Product.1                                           (id, name)
Product.2                                           (id, producer_id, stocks)



     Step2: Fragment Product.2 Horizontally



Table 10 Step 2/2: Horizontal Fragmentation of Table “Product.2”
Fragmentation Name                                  Fragmentation Condition
Product.2.1                                         stocks < 2000
Product.2.2                                         stocks >= 2000 and stocks < 4000
Product.2.3                                         stocks >= 4000

                                                    5
   Purchase (customer_id int, product_id int, number int)



Table 11 Horizontal Fragmentation of Table “Purchase”
Fragmentation Name                         Fragmentation Condition
Purchase.1                                 customer_id < 107000 and product_id < 310000
Purchase.2                                 customer_id < 107000 and product_id >= 310000
Purchase.3                                 customer_id >= 107000 and product_id < 310000
Purchase.4                                 customer_id >= 107000 and product_id >= 310000




    1.4 Allocation



        Site Configuration: 4 sites (Site1 ~ Site4) deployed at 3 computers

   Scheme1: Basic Fragmentation



Table 12 Allocation Scheme of Basic Fragmentation
Site Name                          Fragmentation Name
Site1                              Customer.1, Producer.1, Product.1, Purchase.1
Site2                              Customer.2, Producer.2, Product.2, Purchase.2
Site3                              Customer.3, Producer.3, Purchase.3
Site4                              Producer.4, Purchase.4



   Scheme2: Hybrid Fragmentation



Table 13 Allocation Scheme of Hybrid Fragmentation
Site Name                          Fragmentation Name
Site1                              Customer.1, Producer.1, Product.1, Purchase.1
Site2                              Customer.2, Producer.2, Product.2.1, Purchase.2
Site3                              Customer.3, Producer.3, Product.2.2, Purchase.3
Site4                              Producer.4, Product.2.3, Purchase.4




                                                 6
    1.5 Data Format

                                Table 14 the format of data file

                Table1_Name (field1, field2, …, filed n) 100
                value1,value2,…,valuen
                ……
                value1,value2,…,valuen
                Table1_Name (field1, field2, …, filed m) 200
                value1,„value2‟, …,valuem
                ……
                value1,‟value2‟,…,valuem
                …….

    Notes:

   Tablei_Name denotes the name of the ith table.
   fieldi denotes the name of the ith attribute.
   valuei denots the value of the ith field.
   Each value is separated by a comma.
   If a value is of String type, a pair of single quotation marks will be added to it.

2 Benchmark Procedure

    There are 6 steps in the benchmark procedure.

        Database Initialization
        Inserts & Deletes
        Data Import
        Queries
        P2P Test (Optional), and
        Demonstration of Additional Features (Optional)

     The time for benchmark is about 1 hour. Each team should finish the compulsory
requirements and is encouraged to show additional features or highlights of its
system.

    2.1 Database Initialization

    The system should be able to initialize the database according to the description

                                                7
in section 1. In general, the process includes

        Define sites
        Create database
        Use database
        Create tables
        Fragment tables
        Allocate tables

    Note that a command script should be used to initialize the database
automatically.

    2.2 Inserts & Deletes

     There are 5 insert statements and 5 delete statements. Four of them are released
in advance, while 6 of them will be released in the benchmark.

     Note that the system should be able to point out the involved site(s).

        insert into Customer(id, name, gender, rank) values(100010, 'Xiaoming',
         'M',1)

         The involved site: Site1

        insert into Producer(id, name, location) values(200201,'TCL','SH')

         The involved site: Site4

        delete from Producer where location = 'SH'

         The involved sites: Site2, Site4

        delete from Customer where gender = 'M' and rank = 1

         The involved sites: Site1, Site2, Site3

    2.3 Data Import

     The requirement of the import is the time efficiency. In general, the time of the
whole process should be within 5 minutes (Actually, some teams of ddb course 2007
can import the data within 1 minute or less).

    2.4 Queries

     We have 10 SELECT statements. Only eight of them are released in advance,

                                                 8
while two of them will be told to you when the final benchmark.

    2.4.1    Basic Test Cases

1) Get the information of all customers
    select * from Customer
2) Get the names of all products
    select Product.name from Product

3) Get names of products such that the stocks of the products are lower than 4000

    select Product.name from Product where stocks < 4000

4) Get the cosumer_id and number of purchases such that the number is not greater
   than 3

    select customer_id, number from Purchase where number <= 3

5) Get the product’s name, product’s stocks and producer’s name such that the
   location of the producer is BJ and the stocks of the product is greater than 4000

    select        Product.name,Product.stocks,Producer.name
    from          Product,Producer
    where         Product.producer_id=Producer.id and
                  Producer.location='BJ' and
                  Product.stocks > 4000

6) Get the customer’s name and purchase’s number such that the customer issues at
   least a purchase

    select        Customer.name,Purchase.number
    from          Customer,Purchase
    where         Customer.id=Purchase.customer_id

7) Get customer’s id, customer’s name, product’s name and purchase’s number such
   that the rank of customer is 1 and stocks of the product is greater than 2000

    select        Customer.id,Customer.name,Product.name,Purchase.number
    from          Customer,Product,Purchase
    where         Customer.id=Purchase.customer_id and
                  Product.id=Purchase.product_id and
                  Customer.rank = 1 and
                  Product.stocks > 2000




                                           9
    2.4.2    Hybrid Test Cases

    There are 2 cases for hybrid fragmentation. And one of them is released here.

8) Get the information of all products

    select * from Product

    2.5 P2P Test (Optional)

    Use the query 2) in section 2.4.1 to do the P2P test. There are two test cases:

        Site1 is on, while Site2 is off.

        Site2 is on, while Site1 is off.




                                            10

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:5/19/2010
language:English
pages:10