Startup Scalability Strategies
Frank Mashraqi
Hello!
Agenda
• • • • • • • • Keeping score: What to measure? Discerning the difference: What to focus on? Finding your path: Which way to go? Choosing your architecture: How to partition? Walking the line: How to balance? Building your team: Who to hire? Thinking ahead: What about the future? Offloading scalability: Is it for me?
Keeping Score: What to Measure?
distribution of data
disk response time
IO wait
threads created
CPU Saturation
writes per second
transactions per second
failure rates
threads running
threshold exceptions
queries per shard
disk utilization
client response time
cache utilization disk saturation thread thrashing
memory utilization
resource utilization
memory/ IO contention
cache hit ratio
connections per shard
cache prunes
exceeding high or low water marks
locking statistics
swap utilization
reads per second
throughput
growth rate
connections usage
Performance != High Availability != Scalability
Performance
Ability to process or execute a task compared to time and resources used
High Availability
flickr.com/photos/mag3737
Ability of a system to ensure a certain degree of operational continuity
Scalability
: freefoto.ca/key/viewpoint?g2_itemId=7348
Ability to handle growing amounts of traffic in a graceful manner or ability to be readily enlarged
Pick any two!!
• Consistency
Choose if you don’t care about 24/7 availability or accommodating high traffic.
• Availability
Choose for a site that must be available 24/7
• Partition-Tolerance
Choose for a high traffic website
Vertical or Horizontal?
Vertical • Aka Scaling up • Adding resources to a node
– Getting a bigger server – Using faster CPUs
Horizontal
• Aka Scaling out • Adding more nodes • Cost efficient
– Commodity hardware – increased management complexity
• Twice as fast servers can be more than twice expensive
• “more complex” programming model
– Right foundation
• Throughput and latency between nodes
How to Partition?
• Functional Partitioning • Key based partitioning
– (users ending in 01 go to server 1)
• Range based partitioning
– (records ranging from 2M to 4M go to server 8)
• Directory server based partitioning
– (no pre-defined partitioning scheme, instead a lookup is required)
How to balance?
• Balance is easier if the foundation is right • Use agile methodologies • Technical debt is expensive • Technical mortgage is a KILLER!
Before and After
• What to do before you get big?
– – – – – – – – – – Lay the right foundation Ability to Shard / Partition Decouple components Effectively cache Have a plan in place Now focus on micro optimizations Acquiring and upgrading hardware Performance optimizations and OS tuning Implementing High Availability and Disaster recovery Use CDN
• What can wait after things start to grow?
What skills / hires are most crucial to dealing with scalability?
Best Practices
• Go Asynchronous • Go Stateless • “Best IO is No IO”
– Cache effectively using shared cache and monitor utilization
• Decouple as much as possible • Build using APIs
– Easy to scale development and deployment and open up your service
• Virtualize/Abstract everything
How to blow up?
Can scalability be outsourced? aka Can Cloud fix Twitter?
• • • • • • Amazon Google AppEngine Rackspace AppNexus 10Gen Other providers?
Things to take away
• • • • Focus on scalability, the rest will follow Horizontal is better, Vertical is costly Go Asynchronous Architect so you don’t have to rearchitect • Choose two out of Consistency, Availability and Partition-Tolerance • Measure utilization first, then performance • Choose the right infrastructure & invest in right skills
Appendix
• Notes/Tips: http://mashraqi.com/2008/09/startonomicsstartup-scalability.html • Personal blog: http://mashraqi.com • Twitter: http://twitter.com/mashraqi • MySQL Blog: http://mysqldatabaseadministration.blogspot. com • Email: fmashraqi@yahoo.com