Docstoc

Zen Cart Templates - PowerPoint

Document Sample
Zen Cart Templates - PowerPoint Powered By Docstoc
					     UCLA - PoliMi


Schema Evolution in Wikipedia
 Toward a Web Information System Benchmark


             Carlo A. Curino
              Hyun J. Moon
              Letizia Tanca
              Carlo Zaniolo




             Pantha Rei Team        Idm-UCLA 04/18/2008
  Motivations
• Understand the role of the Schema Evolution (SE) in Web Information
Systems (WIS) our guess was:
        “on the web everything evolves faster -> SE should be relevant!”



• Compare the evolution in Traditional IS (known) and WIS
• Obtain an in-depth understanding of Wikipedia DB backend
• Lay the foundations of a “Benchmark for Schema Evolution”


• Why MediaWiki (software platform behind wikipedia)?
    • Popular (used by >30.000 websites including Wikipedia)
    • open-source and well-documented software
    • Wikipedia DATA and QUERIES also under open-source license


                             Pantha Rei Team                 Idm-UCLA 04/18/2008
What we did…

• We developed a tool-suite to analyze Web Information System DB
backends


• We collect and dissect MediaWiki schema history (170+ schema
versions in 4.5 years)


• We release tool-suite and data as a first step: “towards a Benchmark
for Schema Evolution”




                         Pantha Rei Team               Idm-UCLA 04/18/2008
    MediaWiki Architecture




• Classical Web architecture based on Linux, Apache, MySQL, PHP (LAMP)
•Big scalability issues:
    • Wikipedia is one of the 10 most popular websites in the WWW (about
    29k requests/sec in average, peaks up to 85k requests/sec)
    • Several Layers of caching (both explicit and not, at DBMS and WS level)
    • According to the developers DBMS performance are the major
    bottleneck, DB size > 700Gb not considering the multimedia content!!
         • plus poor load partitioning (one-language per server)!

                             Pantha Rei Team                Idm-UCLA 04/18/2008
     The Schema



• Tables can be grouped in:
    • article and content
    • links and structure
    • users and permissions
    • performance and caching
    • statistics and special features
    • history and archival (represent a big portion of the schema ~1/3
    they don’t know it but they need a temporal DB!)



                                Pantha Rei Team             Idm-UCLA 04/18/2008
        Basic Statistics 1

• Schema Evolution:
    • 170+ versions in 4.5 years
    • almost 250% increase




                                   Pantha Rei Team   Idm-UCLA 04/18/2008
       Basic Statistics 2

• More frequent schema changes far away
from releases
• Schema Elements Lifetime:
    • a group of stable relations
    • young tables and columns




                                    Pantha Rei Team   Idm-UCLA 04/18/2008
Type of Changes




 • NOTE: it doesn’t adds up to 100% since several changes
 might coexist in an evolution step
 • total lack of integrity constraints (a part from primary
 keys)!!

                      Pantha Rei Team                Idm-UCLA 04/18/2008
Type of Changes




• NOTE: simple schema modifications are the most common

                     Pantha Rei Team             Idm-UCLA 04/18/2008
Schema Changes per Version




 • NOTE:version 41-42 represents a MAJOR evolution step
 where article versioning management is heavily modified!!



                     Pantha Rei Team             Idm-UCLA 04/18/2008
Impact on the Applications




• NOTE: over ~4000 queries from which we extract 75 templates

                      Pantha Rei Team             Idm-UCLA 04/18/2008
Wikipedia Profiler Queries




• NOTE: 500 most common templates out of 2k extracted from over
780 millions query instances from the On-Line Wikipedia Profiler
http://noc.wikimedia.org/cgi-bin/report.py
                      Pantha Rei Team             Idm-UCLA 04/18/2008
Traditional vs Web IS




• Comparing our results with existing analysis for Traditional IS
    • WIS evolve faster 38% (w.r.t. Sjoberg) and 539% (w.r.t. Marche)
    • Collaborative WISs embrace information sharing, thus we got way
    better data: 170 versions vs 2 and 9 respectively. And we can share
    them (benchmark).
    • More in-depth analysis by means of SMOs

                          Pantha Rei Team                 Idm-UCLA 04/18/2008
Towards a Unified Benchmark




• We share the schema history we collect, the analysis data (raw
and stats), the queries, the tool-suite:
    http://yellowstone.cs.ucla.edu/schema-evolution/index.php

• Goal: create a benchmark for schema evolution and in general
a standard relational DB dataset

                              Pantha Rei Team                   Idm-UCLA 04/18/2008
  Conclusion
So far we:
• Create a tool-suite for schema evolution analysis
• Dissect Wikipedia Schema Evolution history
• Establish the core of a DB Schema Evolution Benchmark (released)
• We developed tools to support Graceful Schema Evolution (PRISM)


We plan to:
• Extend the analysis to several other Open-Source WIS (Joomla!,
TikiWiki, Slashcode, Zen-Cart)
• Extend the analysis towards public scientific DB (Genome, HGVS)
• Involve other research groups to define a commonly-agreed
Benchmark
• Improve the tool-suite and integrate it in PRISM

                           Pantha Rei Team              Idm-UCLA 04/18/2008

				
DOCUMENT INFO
Description: Zen Cart Templates document sample