What is new in the cloud?
Donald Kossmann
ETH Zurich
http://systems.ethz.ch
Acknowledgments
Questions?
Agenda
• Why?
• How?
• What?
Simple Truths
• „Power of data“
– the more data the merrier (GB -> TB -> PB)
– data comes from everywhere in all shapes
– value of data often discovered later
– data has no owner within an organization (no silos!)
• Services turn data into $
– the more services the merrier (10s -> 1000s -> Ms)
– need to adapt quickly
• Examples: Google, FB, Amadeus, Walmart, BMW, ...
• Platforms: Oracle, MS, SAP, Google, ..., 28msec
Promises of cloud computing?
• Cost
– „pay as you go“ for HW and SW
• no upfront cost / investment: CapEx vs. OpEx
• scale down if service becomes less popular
– utilization: statistical allocation of resources
– out-source and commoditize computing
• HW automatically gets cheaper and faster
• economy of scale for admin: patches, backups, etc.
– failures: cost of preventing and having failures
• Time to market
– avoid unnecessary steps
• HW provisioning, puchasing, test
What to optimize?
Feature Traditional Cloud
Cost [$] fixed optimize
Performance [tps, secs] optimize fixed
Scale-out [#cores] optimize fixed
Predictability [s($)] - fixed
Consistency [%] fixed ???
Flexibility [#variants] - optimize
Put $ on the y-axis of your graphs!!!
[Florescu & Kossmann, SIGMOD Record 2009]
Misconceptions
• Variable Cost -> Unpredictable Cost
– pay-as-you-go and predictability can be combined
– IT department needs to rethink „budget models“
• Performance is more fundamental than $
– at that scale, prices must be honest
– how relevant are your perf. numbers of 1992 today?
– technology follows business; business follows technol.
• Time is money („secs“ ~ „$“ in my graphs)
– often true; often enough not true:
• Put computing where the energy is (ocean, desert, ...)
• Writing inner track of disk consumes 2x energy
[Source: SIGMOD, VLDB, ICDE Reviews]
Problem: Vendor Lock-In
• Hardware
– no standard APIs for IaaS
– expensive to move TBs of data between clouds
– this was actually a solved problem before the cloud
• Platform
– PaaS makes it neither better nor worse
– (situation is very bad as is)
• Apps and Devices
– iTunes, Google Docs, Amazon Kindle, iPhone Apps, ...
– they own your data; you don´t own their (paid for) data
Agenda
• Why?
• How?
• What?
Teach your DBMS to swim
+
Industry: Add a layer to your favorite DBMS
Research Perspective ...
It is time to start from scratch!
Scope of this talk
• Workloads: Focus on OLTP
– OLAP under heavy debate by others
– streaming not addressed yet (~ OLTP)
– testing, archiving, etc. is boring
• Types of clouds: Any type
– both private, public, hybrid
• only difference: private clouds have planned downtime
– cloud on the chip
– swarms: ad-hoc private clouds
• IaaS vs. PaaS vs. SaaS: Focus on PaaS
Game Changers
• OLTP: „Key-value Store“ vs. „DBMS“ [No-SQL]
– virtually infinite scale-out
– fault-tolerance
– (OLAP: „Hadoop“ vs. „DBMS“)
• Virtualization
– transparent use of resources (computers + humans)
• hide heterogeneity of resources
• 100Ks machines are a reality
– problems that need 100Ks machines are a reality
Reference Architecture
Client
HTTP XML, JSON, HTML
Web
Server
FCGI, ... XML, JSON, HTML
App Server
SQL records
DB Server
get/put block
Store
Open Questions
Client
• How to map stack to IaaS?
Web
Server • How to implement store layer?
App Server • What consistency model?
DB Server
• What programming model?
• Whether and how to cache?
Store
Variant I: Partition Workload by „Request“
Client Client Client Client
HTTP XML, JSON, HTML
Web Workload Splitter
Server
XML, JSON, HTML
FCGI, ... XML, JSON, HTML
Server-A Server-B
App Server
Server-A Server-B
SQL records
DB Server Server-A Server-B
get/put block block
Store Store-A Store-B
Partition Workload by „Request“
• Principle
– partition data by „tenant“
– route request to DB of that tenant
• Advantages
– reuse existing database stack (RDBMS)
• Disadvantages
– multi-tenant problem [Salesforce], [Jacobs]
• optimization, migration, load balancing, fix cost
– need DB federator for inter-tenant requests
– expensive HW and SW for high availabilty
Variant II: Partition Workload by „Load“
Client Client Client Client
HTTP XML, JSON, HTML
Web Workload Splitter
Server
XML, JSON, HTML
FCGI, ... XML, JSON, HTML
Server-A Server-B
App Server
???
SQL records
Store (e.g., S3)
Store (e.g., S3)
DB Server Store (e.g., S3)
get/put block
Store
Partition Workload by „Load“
• Principle
– fine-grained data partitioning by page or object
– any server can handle any request
– implement DBMS as a library (not server)
• Advantages
– avoids disadvantages of Variant I
• Disadvantages
– new synchronization problem (CAP theorem)
– whole new breed of systems
– caching not effective (see later)
Experiments [Loesing et al. 2010]
• TPC-W Benchmark
– throuphput: WIPS
– latency: fixed depending on request type
– cost: cost / WIPS, total cost, predictability
• Players
– Amazon RDS, SimpleDB
– 28msec [Brantner et al. 2008]
– Google AppEngine
– Microsoft Azure
Scale-up Experiments
Cost / WIPS (m$)
Low Load Peak Load
Amazon RDS (V1) 1.212 0.005
Amazon S3 (V2) - 0.007
Google AE/C (V2) 0.002 0.028
MS Azure (V1) 0.775 0.005
Open Questions
• How to map traditional DB stack to IaaS?
• How to implement the storage layer?
• What is the right consistency model?
• What is the right programming model?
• Whether and how to make use of caching?
Store Variants
• Traditional (e.g., Amazon EBS)
– local disks with physically exclusive access
– put/get interface; no synchronization
– only works for V1
• Key-value stores (e.g., Amazon S3)
– DHTs with concurrent access
– put/get interface; no synchronization
– works for V1 and V2; makes more sense for V2
• ClockScan [Unterbrunner et al. 2009]
– massively shared scans in a distributed system
– push down predicates + simple aggr; write monotonicity
– works well for both variants
ClockScan
• Key ideas
– each core continuously scans one partition in MM
– while scanning, it executes queries/updates on the fly
– queries and updates are indexed; tuples probed
• just as in the stream processing world
• but queries are short-lived
– updates are processed before reads
• Properties
– very high query and update throughput (1000s / sec)
– predictable and guaranteed response times
• good enough, but not optimal
– write monotonicity at store level (more than disk)
Open Questions
• How to map traditional DB stack to IaaS?
• How to implement the storage layer?
• What is the right consistency model?
• What is the right programming model?
• Whether and how to make use of caching?
CAP Theorem
• Three properties of distributed systems
– Consistency (ACID transactions w. serializability)
– Availability (nobody is ever blocked)
– resilience to network Partitioning
• Result
– it is trivial to achieve 2 out of 3
– it is impossible to have all three
• Two schools
– Databases: sacrifice availability
– Distributed systems: sacrifice consistency
Why sacrifice Consistency?
• It is a simple solution
– nobody understands what sacrificing „P“ means
– sacrificing „A“ is unacceptable in the Web
– possible to push the problem to app developer
• „C“ not needed in many applications
– Banks do not implement ACID (classic example wrong)
– Airline reservation only transacts reads (Huh?)
– MySQL et al. ship by default in lower isolation level
• Data is noisy and inconsistent anyway
– making it, say, 1% worse does not matter
[Vogels, VLDB 2007]
What have people done?
• Client-side Consistency Models [Tannenbaum],[PNUTS08]
• New DB transaction models
– Escrow, Reservation Pattern [O‘Neil 86], [Gawlick 09]
– SAGAs and compensation; e.g., in BPEL [G.-Molina,Salem]
– SAP, Amadeus et al. [Buck-Emden], [Kemper et al. 98]
• Limit the size of transacted data
– E.g., Microsoft Azure
• Levels of Consistency, Consistency-Cost Tradeoffs
– read/write monotonicy + „A“ + „P“ [Brantner08]
– economic models for consistency [Amadeus], [Kraska09]
• Educate Application Developers [Helland 2009]
Does it matter?
• How far do traditional (monolithic) DBMSes go?
– unlimited scalability for all practical matters
– high availability for all practical matters
– monolithic DBMSes still hold records in all regards
• That is why we focus on the $ tradeoffs
– it is not a principle / religious matter
– it is a $ optimization problem
Open Questions
• How to map traditional DB stack to IaaS?
• How to implement the storage layer?
• What is the right consistency model?
• What is the right programming model?
• Whether and how to make use of caching?
Programming Model
• Properties of a programming lang. for the cloud
– support DB-style + OO-style + CEP-style
– avoid keeping state at servers for V2 architecture
• Many languages will work in the cloud
– SQL, XQuery, Ruby, ...; we have shown it for XQuery
– J2EE will not work
• Open (research) questions
– do OLAP on the OLTP data: My guess is yes!
– rewrite your apps: My guess is yes!
Caching
• Many Variants Possible
– this is just one
– V1 caching mandatory
– V2 caching prohibitive
• TPC-W Experiments
– marginal improvements
for Google AppEngine
• No low hanging fruit
Agenda
• Why?
• How?
• What?
What is Sausalito?
• Application Server + Web Server + Database
– keeps any kind of data
– runs services
• Fully cloud-enabled
– full elasticity (cost and throughput)
– full fault-tolerance
– runs on cheap hardware (private and public clouds)
• Fully Web Standard compliant
– Web Services, REST
– XML, JSON, CSV, ...
– XML Schema, XQuery, XPath
Sausalito in the Cloud (V2)
38
Sausalito in the Cloud (offline)
App1
Bets Made
• How to map traditional DB stack to IaaS?
– implemented both architectures (V1 + V2)
– V1 only in a single server variant for low end
• How to implement the storage layer?
– EBS for V1; KVS for V2
• What is the right consistency model?
– ACID for V1; configurable for V2
• What is the right data + programming model?
– XML & XQuery
• Whether and how to make use of caching?
– No! (Only for code / precompiled query plans)
Demo
• Getting started guide
– http://sausalito.28msec.com
• Example applications
– http://www.28msec.com/community
Cloud: Fans and Skeptics
• Fans
– VCs: low CapEx, Gartner hype
– USA Government: lack of alternative
– Departments: time-to-market, by-pass IT dept.
– USA Researchers: next big thing
– IT start-ups: levels the field
• Skeptics
– EU Government: next big USA thing
– EU Researchers: burnt by Grid Computing
– IT department: lock-in, become irrelevant
– Big enterprise IT vendors: low margins, forced to adapt
XML & XQuery: Fans and Skeptics
• Fans
– Large enterprises: reduces cost, helps abbandon silos
– EU Research: scientific challenge in PL, type theory, ...
– Government: lack of alternatives, standards, complete
• Skeptics
– VCs: do not understand the market
– Web 2.0: hard and boring, expensive
– USA Database Research: religion
Need intersection of fans for the bets made
Conclusion
• Researchers study tradeoffs
– Key-values stores are game changers
– Measuring $ is a game changer
– MMDBs (ClockScan) could be a game changer
• Entrepreneurs make bets
– Pay per use is a game changer
– XML & XQuery could be game changers
• Personal experience: You cannot do both!
– You cannot play and observe at the same time
[Heisenberg]