test case of grid cluster arc

Document Sample
test case of grid cluster arc Powered By Docstoc
					    Test Case of Grid Cluster ARC:
    Author/Tester: Yang Zhao
    Date: Apr-May 2006

    Project Name: Grid Cluster ARC
    Project Version: V_01
    Level of Testing: Functional Test/ Load Test
    Areas of Testing: Harvest/Index

    Installation/Environment:
            Index Cluster: 7 Linux nodes, c21.seven.research.odu.edu –
            c27.seven.research.odu.edu

           Harvester: Linux, dbwebdev2.seven.research.odu.edu

           Web Server: Tomcat 5, dbwebdev2.seven.research.odu.edu

           Database Servers: Mysql 5.0, c21.seven.research.odu.edu,
           dbwebdev2.seven.research.odu.edu


 Test         Test Case                               Test Procedure                              Observe    Defe
Case ID       Descript.         Step               Expected    Actual Result                                  ct
                                                   Result
arc_04_0   Indexing with 2      Start indexing     No Error    No Error
3_2006     cluster, harvest 3   service on c27,
           small archives to    cash.cs.odu.edu.
           check the            Start Tomcat
           functional           add 3 archives
           correctness of       through web
           harvester and        administration
           distribution of      Run harvester      Evenly        Harvest is completed without
           data on indexing                        distributed   error.
           cluster                                 on cluster    7.267 MB data is populated on
                                                                 cash.cs.odu.edu and 7.248MB
                                                                 on c27.

arc_04_0   Indexing with 3      Start indexing     No Error      No Error
8_2006     cluster nodes,       service on c27,
           harvest over         c26, and c23.
           100K records to      Start Tomcat on
           test the             dbwebdev2. Add
           performance of       4 archives
           harvest/index        through web
           and                  administration
           search/browse,       Run harvester on   Data evenly   It takes 107550 seconds (30     Sometimes
           the distribution     dbwebdev2          distributed   hours).                         indexing
           of data on                              on cluster    Data distribution:              process’
           cluster, the         (Interrupted the                     44.5MB on c23               CPU usage
            parallelism of      harvest after 1                      43.5MB on c26                  is high >
            harvester.          day)                                 45.0 MB on c27                 90% when
                                                                                                    harvester
                                                                                                    slows down
                                Start all service                 Instantly display the browsing
                                on c27, c26, c23,                 result. Totally 119147 records.
                                go to search
                                interface by
                                browser, click
                                “browse”
arc_04_1    Indexing with 5     Same as above       Same as       Same as above                                   Serio
0_2006      cluster nodes,                          above                                                         us
            harvest over                                                                                          Perfo
            100K records to                                                                                       rman
            test the                                                                                              ce
            performance of                                                                                        probl
            harvest/index                                                                                         em.
            and
            search/browse,
            the distribution
            of data on
            cluster, the
            parallelism of
            harvester.


Recode the cluster service module to have batch indexing and only optimize index once for every run of harvest.
Do performance test and stress test on harvest/Index.

arc_04_2    Indexing with 7     Start indexing      No Error      No Error
5_2006      cluster nodes,      service on c27-
            harvest from        c21. Start
            ARC production      Tomcat on
            server, to test     dbwebdev2. Add
            the performance     ARC(http://arc.c
            of harvest/index    s.odu.edu:8080/o
            (Performance        ai/oai20)
            Test)               through web
                                administration
                                Run harvest on      Data evenly   It takes 131,879 seconds (36      No
                                dbwebdev2           distributed   hours) to get 3,014,112           performance
                                                    on cluster    records from ARC                  degrading
arc_04_2    Indexing with 7     Start indexing      No Error      No Error
7_2006      cluster nodes,      service on c27-
            harvest from        c21. Start
            RePEc(http://oai.   Tomcat on
            repec.openlib.org   dbwebdev2. Add
            ) , to test the     RePEc(http://oai.
            performance of      repec.openlib.org
            harvest/index       ) through web
            with a data-        administration
            provider giving a   Run harvester on   Data evenly    For the first time, the harvester   There are
            large chunk of      dbwebdev2          distributed    ran out of heap memory when         large XML
            records for OAI                        on cluster     doing OAI request.                  trunks in
            response. (Stress                                                                         size of
            Test)                                  Large page     I increased the size of JVM         1000,
                                                   of XML         heap usage by command-line          2000,
                                                   leads to       option “java –Xmx1024m –            4000,
                                                   small          Xms1024m..”                         5000, 8000
                                                   number of                                          records
                                                   OAI Query.     Test the harvest again.             from some
                                                   With batch                                         sets of
                                                   indexing,      It takes 4,356seconds (1 hour)      RePEc.
                                                   uploading a    to get > 2,000,000 records
                                                   list of        from
                                                   records is     RePEc
                                                   fast. So
                                                   performance
                                                   is OAI-
                                                   request
                                                   bound,
                                                   instead of
                                                   metadata-
                                                   distribution
                                                   bound.

Try database version of ARC on the same harvest from RePEc. Install Mysql database on c21 and dbwebdev2.
Use optimized version of ARC harvester in our NASA project (11/2004).

There are 3 steps: (1) OAI harvest, (2) parse and (3) re-index.
arc_05_0 Harvest from         Run harvester on                    1. OAI harvest took 4087            Database
6_2006      RePEc(http://oai. dbwebdev2                              sec                              reached its
            repec.openlib.org (database on the                    2. Parse halted after 42988         storage
            ) , using the     same machine)                          sec                              limit
            Mysql database                                        3. Reindex takes ?? sec
            on dbwebdev2,
            to test the
            performance of
            harvest/
arc_05_0 Harvest from         Run harvester on                    1. OAI harvest took 3629            Database
7_2006      RePEc(http://oai. dbwebdev2                              sec                              is good.
            repec.openlib.org (database on the                    2. Parse took 18705 sec
            ) , using the     same machine)                       3. Re-index took 4 sec
            Mysql database
            on                                                    Size = 377,242 records
            c21.seven.resear
            ch.odu.edu, to
            test the
            performance of
            harvest/
arc_05_0    Same as above         I tried                      the total number was
8_2006                            harvesting with              stablized at 377,242. So
                                  our database                 this number is
                                  ARC for 2                    correct. (demo is:
                                  times.                       http://128.82.7.73:8080/dba
                                                               rc)


                                                                   Tested on the Lucene for 2     The
                                                                   times.                         harvester is
                                                                   At the first time, it is about not working
                                                                   640,000 records, which is      correctly
                                                                   much higher than it
                                                                   is supposed to be. For the
                                                                   second time, the total
                                                                   number doubles
                                                                   (1,299,000) (demo is
                                                                   http://128.82.7.73:8080/oai
                                                                   _arc/)
After diagnosis of the code, I found the error is that only one IndexReader object is created for deleting records
for the whole harvest process. IndexReader object has to be recreated every time for deletion. Another error is at
the OAI request component’s SAX parser, which mixes up the OAI identifier and DC’s identifier.

I fixed the bugs. And retested. It was working correctly.

I also implemented error handling for web component, so that the web interface will give proper messages when
there is no RMI service or no index at cluster store.

The security constrain module is added for web administration of harvester. (Default login maly/maly or
yang/yang)


arc_05_1    Harvest from          Same as before               Run of harvest takes 3764      There are
0_2006      RePEc as above                                     seconds.                       many
            with the Lucene                                    Total size = 404,350           records
            ARC harvester.                                     records                        with
                                                                                              duplicate
                                                               The second run takes 3800      IDs, from
                                                               second with 41,000             RePEc.
                                                               records.
                                                                                              In general,
                                                                                              the
                                                                                              harvester is
                                                                                              working
                                                                                              well.
arc_05_1    Harvest From          Same as above                With Lucene harvester, get     It seems
1_2006      Caltech_Lib (                                      41 records in total. Through   that, at
            http://caltechlib.l                                web browse, I found some       some point,
            ibrary.caltech.ed                                  duplicate records, such as     Lucene
            u/perl/oai2 )                                      Record with ID,                harvester
                                                               oai:caltechlib.library.calt    failed to
                                                               ech.edu:91                     delete the
                                                                                              existing
                                                               When use database              records
                                                               harvester, get 36 records.     with
                                                                                              identical ID
of new
record.

				
DOCUMENT INFO