NoSQL _ the Quest for extreme Scalability by bestt571


NoSQL, refers to a non-relational database. With the rise of the Internet web2.0 site, the traditional relational database in dealing with web2.0 site, especially the large scale and high concurrent SNS type of web2.0 pure dynamic website has appeared to be inadequate, exposes a lot of difficult problems to overcome, rather than the relational database is characterized by its own has been very rapid development.

More Info

The Quest
for Extreme

I  n times of a growing audience, very
   successful internet applications have
all been facing the same database issue:
                                            R   elational databases and query
                                                languages       are     fundamentally
                                            designed for stable workloads and
while web servers can be multiplied         complex data extraction, which are not
without too many problems (scale            as common with modern applications,
out), this is not the case for relational   where the ability to handle very large
databases. Sustaining a growing database    data sets while maintaining speed and
workload requires either to buy more        scalability are actually more important.
powerful hardware (scale up) or to rely     This realization lead to the creation of the
on clustering abilities. Both solutions     NoSQL movement, based on innovative,
lead to increased complexity and costs.     open source, non-relational database
In this context, developers realized that   systems designed to achieve specific
relational databases and query languages    requirements and manage extreme
might be the bottlenecks.                   scalability on very large data sets.


                                                  Build to scale
                                                  «NoSQL» is a label qualifying database
                                                  management system that enables the
                                                  implementation of databases that are not only
                                                  based on SQL. NoSQL is usually associated
                                                  with extreme performance and/or the ability
                                                  to manage extremely large data sets.

                                                  NoSQL emerged as a movement in early 2009
                                                  during a meet up organized in San Francisco
                                                  to discuss the growing number of open source
                                                  distributed database management systems
                                                  that do not attempt to comply with ACID
                                                  guarantees (atomicity, consistency, isolation,

                                                    There are close to 100 NoSQL open source
projects being implemented today. Many NoSQL projects start by implementing a specific data
structure to solve a specific problem that SQL databases can-not solve. Despite this common
starting point, NoSQL databases vary quite a bit, reflecting the fact that NoSQL is more a label
that qualifies a variety of atypical databases than a uniformed set of equivalent solutions.

What are the main categories of NoSQL databases?


                                                     Extreme and affordable
                                                     There are actually four major types of NoSQL
                                                     data models :
                                                     - Key-value stores, which provide giant
                                                     hashtable to store data, very useful for high-
                                                     audience applications constantly broadcasting
                                                     data to their users: Memcached, Redis,
                                                     - Bigtable clones, storing data on a large,
                                                     multi-dimensional sorted map, very useful to
                                                     store, analyze and retrieve large amounts of
                                                     data: HBase, Cassandra,
                                                     - Document stores, designed for semi-
                                                     structured information: CouchDB, MongoDB,
                                                     - Graph databases, probably the most
                                                     experimental type, designed for graph-like
data: Neo4J.

According to Bruno Michel - lead developer of af83 R&D department, he has taken part in all
NoSQL-related projects by af83 - «Solutions like Cassandra are designed as an effective answer to
a specific problem, which is scalability with large data sets. Others, like MongoDB, are designed
for Web application development in general, allowing more flexibility and performance. Others
are meant to be solutions to very specific types of data, like graph databases or projects specialized
in geographic data».

What is the real life impact of this variety of approaches?


                                                      The right tool for the right
                                                     Bruno Michel says «Benefits of NoSQL
                                                     solutions depend on use cases». When
                                                     considering NoSQL, users are required to
                                                     select the database that will be the best fit
                                                     for their applications. Proper choice leads to
                                                     higher performance, or lower cost. Olivier
                                                     Desmoulin, founder of a geolocalized social
                                                     network for foodies, certainly understands
                                                     that: «We are serving up to 60,000 customers
                                                     per day, using just one low cost server which not
                                                     only serves the Rails applications, but also the
                                                     whole MongoDB database. MongoDB’s ability
                                                     to handle geolocation was also very helpful.
                                                     For a small start-up like us, NoSQL was critical
to ensure scalability». There are many examples such as these. Bruno Michel: «MongoDB is used
by very popular websites like, foursquare or disqus for its ability to deliver performance and
scalability by using sharding.»

Despite the good news, choosing a NoSQL database is tough. According to Ori Pekelman, CTO of
af83 with extensive experience on NoSQL technologies, having spearheaded the use of NoSQL on
numerous projects with af83 customers, «There are more than a hundred NoSQL projects going
on, most solutions are very new to the market, mature projects are only two or three years old,
and the hierarchy is constantly moving. This is a time of better opportunities for customers, but
the choice is tough». Bruno Michel «In the last three years, a lot of very promising solutions have
appeared, some of them proved to be extremely hard to sustain, due to a very slow development
or project instability».

What could be the approach to select and leverage the power of NoSQL?


Our Recommendations

1. Assess your situation
NoSQL databases were designed to handle very specific tasks: MongoDB was meant to
be the database engine of a cloud-based application platform, Cassandra was designed
to manage inbox search at Facebook, Memcached was designed to improve caching
at Livejournal. As long as you are not going to implement your own NoSQL database
management system, it is recommended that you clearly define your functional
requirements first, and then proceed with finding the appropriate solution - using NoSQL
or not: NoSQL should be used when scalability is a requirement, and avoided when
running complex queries is the requirement.

2. Be realistic with your scalability requirements
NoSQL is not appropriate for every application and project. Typical use cases are public-
facing internet applications with a really large audience, and very large data sets. Typical
NoSQL databases range from dozens of gigabytes to petabytes. Very few applications
fit that definition. Alternatively, smaller applications will benefit from the sometimes
comparatively lower requirements of NoSQL databases, but should assess if they are
able to sustain the NoSQL choice: NoSQL expertise is more difficult to find.

3. Check the performance thoroughly
Ori Pekelman: «We have been testing dozens of solutions; performances go tenfold from
one solution to another. Many benchmarks are available online, but due to the variety of
approaches, it is sometimes very difficult to find proper return on experience on specific
data volumes, read/write ratios or queries distribution».

4. Pay attention to the Open Source projects that is tied to the
Bruno Michel: «NoSQL solutions are only starting to be mature and production-ready,
but customers need to be cautious as solutions and projects are not equal and evolve
very quickly. Some solutions received a lot of visibility without the ability to deliver».

5. Check the availability of proper tools and documentation
Bruno Michel: «An issue that may be underestimated is the status of tools and
documentation related to your solutions. This sort of shortcoming leaves your project at
risk, whatever the performance levels of the database.”


Famous NoSQL users
Facebook, Foursquare, Google and Yahoo are all NoSQL users. A quick search on Google will
provide you a lot of coverage of their trial, errors and success toward NoSQL.

NoSQL is not always one-stop data solution
There are cases where NoSQL will be the data solution that will solve all your data requirements,
but more often than not, NoSQL will only solve part of your requirements and you will need to
implement a combination of solutions including NoSQL.

Innovative features
Depending on the implementation, NoSQL databases often include non-traditional features
such as the ability to run in memory (also known as «NoDisk»), sharding or optimized
mechanisms for geolocation.

Required reading
The Dynamo paper, about Amazon’s own highly scalable data store
The Bigtable paper, about Google’s own DBMS
SQL Databases Don’t Scale
The slideshow from the June 11, 2009 NoSQL meet up in San Francisco


To top