Test Case of Grid Cluster ARC: Author/Tester: Yang Zhao Date: Apr-May 2006 Project Name: Grid Cluster ARC Project Version: V_01 Level of Testing: Functional Test/ Load Test Areas of Testing: Harvest/Index Installation/Environment: Index Cluster: 7 Linux nodes, c21.seven.research.odu.edu – c27.seven.research.odu.edu Harvester: Linux, dbwebdev2.seven.research.odu.edu Web Server: Tomcat 5, dbwebdev2.seven.research.odu.edu Database Servers: Mysql 5.0, c21.seven.research.odu.edu, dbwebdev2.seven.research.odu.edu Test Test Case Test Procedure Observe Defe Case ID Descript. Step Expected Actual Result ct Result arc_04_0 Indexing with 2 Start indexing No Error No Error 3_2006 cluster, harvest 3 service on c27, small archives to cash.cs.odu.edu. check the Start Tomcat functional add 3 archives correctness of through web harvester and administration distribution of Run harvester Evenly Harvest is completed without data on indexing distributed error. cluster on cluster 7.267 MB data is populated on cash.cs.odu.edu and 7.248MB on c27. arc_04_0 Indexing with 3 Start indexing No Error No Error 8_2006 cluster nodes, service on c27, harvest over c26, and c23. 100K records to Start Tomcat on test the dbwebdev2. Add performance of 4 archives harvest/index through web and administration search/browse, Run harvester on Data evenly It takes 107550 seconds (30 Sometimes the distribution dbwebdev2 distributed hours). indexing of data on on cluster Data distribution: process’ cluster, the (Interrupted the 44.5MB on c23 CPU usage parallelism of harvest after 1 43.5MB on c26 is high > harvester. day) 45.0 MB on c27 90% when harvester slows down Start all service Instantly display the browsing on c27, c26, c23, result. Totally 119147 records. go to search interface by browser, click “browse” arc_04_1 Indexing with 5 Same as above Same as Same as above Serio 0_2006 cluster nodes, above us harvest over Perfo 100K records to rman test the ce performance of probl harvest/index em. and search/browse, the distribution of data on cluster, the parallelism of harvester. Recode the cluster service module to have batch indexing and only optimize index once for every run of harvest. Do performance test and stress test on harvest/Index. arc_04_2 Indexing with 7 Start indexing No Error No Error 5_2006 cluster nodes, service on c27- harvest from c21. Start ARC production Tomcat on server, to test dbwebdev2. Add the performance ARC(http://arc.c of harvest/index s.odu.edu:8080/o (Performance ai/oai20) Test) through web administration Run harvest on Data evenly It takes 131,879 seconds (36 No dbwebdev2 distributed hours) to get 3,014,112 performance on cluster records from ARC degrading arc_04_2 Indexing with 7 Start indexing No Error No Error 7_2006 cluster nodes, service on c27- harvest from c21. Start RePEc(http://oai. Tomcat on repec.openlib.org dbwebdev2. Add ) , to test the RePEc(http://oai. performance of repec.openlib.org harvest/index ) through web with a data- administration provider giving a Run harvester on Data evenly For the first time, the harvester There are large chunk of dbwebdev2 distributed ran out of heap memory when large XML records for OAI on cluster doing OAI request. trunks in response. (Stress size of Test) Large page I increased the size of JVM 1000, of XML heap usage by command-line 2000, leads to option “java –Xmx1024m – 4000, small Xms1024m..” 5000, 8000 number of records OAI Query. Test the harvest again. from some With batch sets of indexing, It takes 4,356seconds (1 hour) RePEc. uploading a to get > 2,000,000 records list of from records is RePEc fast. So performance is OAI- request bound, instead of metadata- distribution bound. Try database version of ARC on the same harvest from RePEc. Install Mysql database on c21 and dbwebdev2. Use optimized version of ARC harvester in our NASA project (11/2004). There are 3 steps: (1) OAI harvest, (2) parse and (3) re-index. arc_05_0 Harvest from Run harvester on 1. OAI harvest took 4087 Database 6_2006 RePEc(http://oai. dbwebdev2 sec reached its repec.openlib.org (database on the 2. Parse halted after 42988 storage ) , using the same machine) sec limit Mysql database 3. Reindex takes ?? sec on dbwebdev2, to test the performance of harvest/ arc_05_0 Harvest from Run harvester on 1. OAI harvest took 3629 Database 7_2006 RePEc(http://oai. dbwebdev2 sec is good. repec.openlib.org (database on the 2. Parse took 18705 sec ) , using the same machine) 3. Re-index took 4 sec Mysql database on Size = 377,242 records c21.seven.resear ch.odu.edu, to test the performance of harvest/ arc_05_0 Same as above I tried the total number was 8_2006 harvesting with stablized at 377,242. So our database this number is ARC for 2 correct. (demo is: times. http://220.127.116.11:8080/dba rc) Tested on the Lucene for 2 The times. harvester is At the first time, it is about not working 640,000 records, which is correctly much higher than it is supposed to be. For the second time, the total number doubles (1,299,000) (demo is http://18.104.22.168:8080/oai _arc/) After diagnosis of the code, I found the error is that only one IndexReader object is created for deleting records for the whole harvest process. IndexReader object has to be recreated every time for deletion. Another error is at the OAI request component’s SAX parser, which mixes up the OAI identifier and DC’s identifier. I fixed the bugs. And retested. It was working correctly. I also implemented error handling for web component, so that the web interface will give proper messages when there is no RMI service or no index at cluster store. The security constrain module is added for web administration of harvester. (Default login maly/maly or yang/yang) arc_05_1 Harvest from Same as before Run of harvest takes 3764 There are 0_2006 RePEc as above seconds. many with the Lucene Total size = 404,350 records ARC harvester. records with duplicate The second run takes 3800 IDs, from second with 41,000 RePEc. records. In general, the harvester is working well. arc_05_1 Harvest From Same as above With Lucene harvester, get It seems 1_2006 Caltech_Lib ( 41 records in total. Through that, at http://caltechlib.l web browse, I found some some point, ibrary.caltech.ed duplicate records, such as Lucene u/perl/oai2 ) Record with ID, harvester oai:caltechlib.library.calt failed to ech.edu:91 delete the existing When use database records harvester, get 36 records. with identical ID of new record.