Hadoop Hbase-0.20.2 Performance Evaluation
D. Carstoiu, A. Cernian, A. Olteanu
“Politehnica” University of Bucharest
Spl. Independentei 313, 060042
Bucharest, Romania
II. RELATED WORK
Abstract- Hbase is the open source version of BigTable -
distributed storage system developed by Google for the Several reasons justify the choice of storage solutions
management of large volume of structured data. Hbase emulates based on key-value pairs [1]. Some of them are:
most of the functionalities provided by BigTable. Like most non
• Many of the RDBMS do not ensure a decent
SQL database systems, Hbase is written in Java. The current
work’s purpose is to evaluate the performances of the Hbase-
replication and the acquisition of a powerful RDBMS leads to
0.20.2 implementation in comparison with those of the Hbase- excessive costs of licensing;
0.20.0 implementation, and, of course, with the performances • It is necessary to store large volumes of semi-
offered by BigTable. The tests aim at evaluating the structured data;
performances regarding the random writing and random reading • It is a pretext to deal with new languages such as
of rows, how they are affected by increasing the number of Erlang;
servers, by the number of column families and by the system • Data is stored and accessed most often based on a
configuration parameters. primary key;
• Complex join operations are not necessary in
Keywords: Hadoop, Hbase, non SQL database, key, value, processing data;
Distributed Hash Table (DHT)
• The volume of data is very large and the issue raised
I. INTRODUCTION by the management of error scenarios caused by replication
becomes very difficult to handle;
Many modern applications include a database server,
serving multiple web servers accessed by many clients. In this For example, Facebook uses Haystack, thus storing a lot of
case, one may often find that performances are below the data in one file with an independent index, requiring 1M of
expected ones. In this situation, many consider upgrading the metadata for 1G of data [2]. A number of projects have been
hardware, without taking into consideration the database developed as an alternative to RDBMS, some of them more
server. If we take the example of Amazon.com which runs on than a key-value storage. For each of them, there is a number
an Oracle database, optimized and extended, we can consider of main defining characteristics:
that the SQL technology has reached its maximum point of • The implementation language – Java for Voldermort,
scalability. Cassandra, Hbase, Erlang for Ringo, Kai, Scalaris, Dynomite,
It is normal to take into account other approaches. One C (C++) for Hypertable, ThruDB, MemcacheDB;
such good example is Google's approach, using BigTable as • Data model: mostly blob, document oriented or
semi-structured database, which keeps most information from BigTable;
the Internet in cache. Comparing the two approaches leads to • Fault-tolerance based mostly on replication and
the conclusion that traditional SQL database systems, such as partitioning.
Oracle, DB2 and other implementations, are not suitable for a Some of them offer distributed storage facilities based on
certain class of applications. An approach similar to BigTable, key-value pairs with replication facilities. An important issue
from Google, was introduced around the 80’s in operating is the latency with which data is served to populate dynamic
systems through the so-called "hierarchical file system”. pages, especially for web applications. Latency depends on the
Currently, there is a multitude of database systems and environment and on the existence of the required data in
many applications using them. Many bottlenecks of these cache. Generally, we expect data to be available in no more
applications are due to the SQL component, which performs than 10 ms, otherwise cost analysis is needed to improve
very simple tasks in a very complex manner, which fits the performance.
80s computers, but no longer fits the current architectures.
Mainly, large companies developing SQL-based database III. NON SQL DATABASE - HBASE
management systems rely heavily on hardware to ensure the Implementing a secure distributed storage system for large
desired performance. A solution may be distributing the amounts of data must meet some important requirements:
software on multiple machines, in which case the licensing
costs become prohibitive. There is a need for a new approach • Data placement algorithms;
in which a large increase in performance requires • Cache management policies to ensure rapid access to
unsignificant costs and provides a good scalability. data;
84
• Ensure a high degree of reliability in the context of timestamp. A column may not have a value for a particular
data distributed over hundreds or thousands of nodes; private row key. HDFS has a master/slave architecture. A
• Scalability and adequate security measures. cluster is composed of a single NameNode - a server that
manages namespace, files and client access to files [5]. A
It is known that classical database design involves first cluster contains more DataNodes, usually running on one
defining the scheme, and if the application should require any physical node in the cluster, which manages storage space
modification during its evolution, the entire database scheme attached to that node. Through HDFS, user data are stored in
should be redesigned. It is said that data is stored in a database files which are divided into one or more blocks stored in a set
in structured manner, while a distributed storage system of DataNodes. NodeName performs operations on the file
similar to the one proposed by Google through BigTable [3] system, such as: opening, closing, renaming files and folders,
can store large amounts of semi-structured data without mapping blocks to DataNodes.
having to redesign the entire scheme. In this paper, we try to
assess the performances of an open source implementation of DataNode is responsible for serving reading and writing
BigTable, named Hbase, developed using the Java requests from the clients’ file system. DataNode is also in
programming language. charge with creating the blocks, deleting and replicating
according to the instructions received from NameNode [5].
Hbase is an Apache open source project and aims to Both DataNode and NameNode components are software
provide a storage system similar to Bigtable in the Hadoop components designed to run on comodity computers, running
distributed computing environment. Hadoop Distributed File on a Linux operating system. HDFS is written in Java and any
System (HDFS) is a distributed file system structure for machine supporting Java can run the software for NameNode
operating on common hardware structures (commodity and DataNode. The architecture allows more than one
computers) characterized by low cost implementation. DataNode to run on a machine, but in reality this is rarely
Through HDFS, applications can rapidly access data in the used. The existence of a single NameNode simplifies the
context of applications that handle large volumes of data. An architecture. It retains all HDFS metadata and the system is
HDFS instance may consist of hundreds or thousands of constructed so that user data do not cross the NameNode.
machines, each keeping parts of data files. In case of failure, it Decisions related to the replication of blocks are always taken
can be restored automatically. HDFS supports even millions of by NameNode, which receives regularly from each DataNode
files in one instance, agregating a scalable multitude of nodes in the cluster Heartbeat information and the proportion
in the same cluster. The simple consistency model between the blocks used. The information about a file has the
implemented is write-once-read-many. Processing in an form [5]:
application with large amounts of data is more efficient if
executed near where data are stored. This minimizes network NameNode (File_name, replica_numbers,
congestion and increases the system performance. Id_blocs, ...)
HDFS provides interfaces for moving the applications For exemple the next information:
closer to where data is stored. It can be easily ported from one /path/part-0,r:2,{1,3},...
platform to another. From a logical point of view, data in
Hbase are organized in tables, rows and columns. Each /path/part-0,r:3,{2,6,34},..
particular column can have several versions for the same row means that part-0 is stored with 2 replicas on the blocks 1
key. The data model used for Hbase is similar to the one used and 3, and 3 replicas on the blocks 2, 6, and 34. The strategies
by BigTable. The applications keep the rows of data in labeled for replicas place are more important for performance and
tables, each row having a sorting key and an arbitrary number aviability of HDFS.
of columns. Tables are seen as non dense, so the rows of one
table can have a variable number of columns. A column name The usual replication policy is to have two replica
is of the form ": " where and machines from the same rack and a replica for a node located
are an arbitrary string of bytes [4,5]. Upon creation, a in another rack. This policy limits the writing traffic between
table is specified by the set , also called “column racks and the chance of failure of the entire rack is much
families”. Updates on this set are performed through lower than the chance of failure of a node. To minimize
administrative operations. However, a new can be latency in reading, HDFS tries to read data from the nearest
used in any writing operation, without any previous replica, so if there is a replica hosted by the same rack, it will
specification. be preferred. A snapshot of the entire file system namespace
and block map is kept in memory. The format is compact
Hbase stores “column families” physically grouped on the enough and a machine with 4GB RAM supports a large
disk, so the items in a certain column have the same particular number of files and directories. Even in applications with a
read/write characteristics and contain similar data. Only one large volume of data, the volume of metadata is not very large,
row may be blocked by default at a given moment. Writing is so that performance can be very high.
always atomic, but a single row may be locked thus achieving
both reading and writing operations at that time. Recent A major drawback of the implementation is that
versions allow blocking several rows, if the option has been NameNode is a single point of failure in the cluster structure
explicitly activated. and if the machine running the NameNode breaks dows, data
recovery is difficult. Upon creation, a file stores the data
Conceptually, a table in Hbase can be thought of as a locally until its size exceeds the size of a block. At this point,
collection of rows identified by the row key and optionally by
85
NameNode is contacted for inserting the file into the system balance, the client will rescan the META table to determine
hierarchy and allocating data blocks for it. NameNode answers the new location for the user region. If the META region was
to the client’s request with the DataNode identity and the reassigned, the client will rescan the ROOT region to
destination of the data block, and the client will send the data determine the new location for the META region. If the
to specified data node. When the file is closed, the ROOT region was reassigned, the client will contact the
untransported data from the local temporary file will be sent to master to determine new location for the ROOT region and
the destination node and the client announces NameNode that will locate the user region by repeating the process described
the file is closed and completes the creating file transaction. for the initialization.
A possible improvement is the to include in the cluster a V. RESULTS
secondary NameNode to take over tasks when the primary
node has failed. The basic principle is that the secondary node First, tests similar to those presented in [7] were
captures a snapshot of information about the structure of the performed. In the performance analysis for Hbase verion
directories that the secondary node can use together with the 0.20.2, it seems that a single column family was used for one
EditLog file to restore data structure. row. We will perform this test as well. The performance
obtained is are slightly higher, probably due to improvements
IV. TEST SYSTEM ARCHITECTURE of the 0.20.2 version compared to the 0.20.0 version used by
Zhang [7]. In addition, we tried testing with a single region
Tests were performed on Hbase 0.20.2 and on Hadoop server and with 4 region servers. Table I presents the
0.20.0 using java 1.60.x with ssh to remotely manage Hadoop comparison between our results and the results of the tests
daemon. All tables are stored using HDFS. Fot the tests, a conducted by Zhang, first for one region server, then for 4
cluster was established, composed of 4 slave and 1 master to region servers.
keep compatibility with the tests described in [7]. Each
machine 4 CPU cores 2 GHz, 2x300 GB 7200 RPM SATA The analysis of experimental results presented in Table I
drive, 4 GB RAM, 1 Gbps network, all nodes under the same reveals that, considering the same test conditions, the
switch. Tests were performed with tables with more than 4 performances of Hbase-0.20.0 and Hbase 0.20.2 are roughly
million rows, in which keys are represented on 10 digits and similar. A comparison between Hbase-0.20.0 and BigTable is
the values are generated randomly with a length of 1000 byte made in [7, 9]. The number of random reads for a region
each. The total volume of data depends on the number of server and for 4 region servers are very close when we refer to
column families. MapReduce was used for the assessment. values per node.
From the implementation point of view, the Hbase The number of rows random writes increases
architecture has the following major components [6, 10]: unsignificantly if the number of nodes increases. The first
question is why random reads and random writes do not scale
1. HbaseMaster. HBaseMaster is responsible for in a similar way if the number of nodes increases. Note that
assigning regions to HRegionServers. The first assigned random reads increases approximately proportionally to the
region is ROOT region, which locates all META regions to be number of nodes, while random writes remains approximately
assigned. Each META region maps a number of user regions the same when we have a single node, compared to the case of
that contain multiple tables a particular Hbase instance serves. 4 nodes.
A row in ROOT and META table has a size of about 1KB. By
default, a region size is 256 MB, so that ROOT region can A possible explanation comes from the fact that by
map 2.6 x 105 META regions, which map a total of 6.9 x 1010 increasing the number of region servers, we will have more
user regions, approximately 1.8 x 1019 (264) bytes of data. RAM for block cache and will make each region server read
Once all META regions have been assigned, HbaseMaster will blocks from the disk more rarely.
assign user regions to HRegionServers, balansing the number
of regions served by each. TABLE I. SINGLE COLUMN FAMILIES
2. HregionServer is responsible for managing client Type of One region Four region servers Zhang
requests for reading and writing. experiment server number
of
3. Hbase Client. Hbase client is responsible for finding Number of Number Average rows/s
rows/s per of rows/s number of per
particular HregionServers serving sets of rows of interest. node rows/s per
Upon installation, Hbase client communicates with node
node
HBaseMaster to find the location of the ROOT region. This is Random 1296 4608 1152 1106
only a communication between the client and the master. After reads
the ROOT region is located, the client contacts that region Random 9570 11696 2924 2834
server and scans the ROOT region to find the META region, writes
which will contain the location of the user region that contains Initial 8480 10884 2721 2689
the range of rows desired. After locating the user region, the random
writes
client contacts the region server serving that particular region
and provides read or write requests. The client places the Scan 48270 62520 15630 15420
information in cache, so that the following requests do not
have to go through the entire process. When a region is
reassigned, as a result of server failure or in order to load
86
The average reading time for a row is better than that This behavior can be explained by the fact that tables with
offered by the hardware, justified by Block Cache philosophy many rows have a smaller chance of finding an arbitrary
and Hfile implementation. In contrast, a writing operation record in memory, without accessing the disk. Performance
requires access every time to add WAL (Write-ahead log) on degradation is less pronounced when increasing the number of
the disk. This causes a minor increase in random writes servers because of the augmentation of internal memory
performance compared to random reads. available.
It should also be noted that for each random read operation, VI. CONCLUSIONS
it is necessary to transfer an Hfile block from HDFS to a
region server, even if only a small part of the transferred Hbase-0.20.2 performances are substantialy improved over
information is used [7]. previous versions. The random reads have the worst
performance, because each operation requires an Hfile block
The performances heavily depend on the value of the transfer to a region server and only a small part of that
configuration variable value "rows per fetch" [6]. information is used. Thus, if each randomly read row requires
Performances increase with increasing the value of the an Hfile block transfer, the ratio between relevant and read
parameter "rows per fetch". This value should be correlated information is given by the ratio no_bytes_value /
with internal memory capacity. Increased performance is no_bytes_block. Consequently, random read performance will
determined by significantly reducing the number of RPC calls. increase when the number of byte per value is higher. Another
way to increase reading performance is obtained by increasing
Another question we have raised is how performance is
the file system cache. Similarly, sequential writes is faster than
affected by changes in the number of "column families” for
random writes, because fewer RPC packages are used.
Hbase-0.20.2 and the effect of changing the number of region
servers (Table II). A similar test was made by Dana [8] Last, but not least, configuration issues are extremely
without specifying the Hbase version. As it is known from important in terms of performance [7]. We can conclude that
Hbase-0.20.0, performances increase by introducing HFile the Hbase, as an open source alternative to BigTable, is
(new file format) similar to SSTable, new scanners, new Block designed for operating clusters with a reasonable number of
Cache, new compression methods [7]. servers and reasonable data volumes. Performance does not
change substantially when increasing the number of servers. In
The tests were performed with a reasonable number of
the future, we would like to test on a larger number of servers.
column families. Although the documentation specifies that
At this point, we did not have the possibility to perform the
Hbase can manage a large number of column families, the test
tests with dozens of servers.
conducted by Dana [8] shows that a number of column
families close to 1000 PRC leads to timeout before the It is hard to predict the future of distributed databases at
operation is performed. On the other hand, in most practical this time, but we believe that research will focus on
cases, having a few hundreds column families is reasonable. guaranteeing consistency, improving data distribution
Experimently, we have observed that performances are not strategies, maturing failover and recovery algorithms and
significant by the increase of region servers. This can be optimizing data storage.
explained by the fact that a single line is read at a time and
increasing the number of servers does not affect the REFERENCES
performances.
[1] R. Jones, Anti-RDBMS: A list of distributed key-value stores,
A test which determines the number of rows read per http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-
second when increasing the number of rows of the table value-stores/.
stored shows that performances decrease when increasing the [2] J. Sobel, Needle in a Haystack: Efficient Storage of Billions of Photos,
number of rows per table. http://perspectives.mvdirona.com/2008/06/30/FacebookNeedleInAHayst
ackEfficientStorageOfBillionsOfPhotos.aspx.
[3] F. Chang, J. Dean, S. Et al, Bigtable: A Distributed Storage System for
TABLE II. MULTIPLE COLUMN FAMILIES Structured Data, OSDI 2006.
[4] R. Rawson, HBase committer, Hbase,
Type of
1 Region Server 4 Region Server http://docs.thinkfree.com/docs/view.php?dsn=858186
experiment
(Number of column (Number of column [5] Hbase, www.apache.org/hadoop/Hbase/HbaseArchitecture
families) families) [6] A. Khetrapal, V. Ganesh, HBase and Hypertable for large scale
distributed storage, systems: A Performance evaluation for Open Source
1 10 100 1 10 100 BigTable Implementations,
Random 1296 1628 1676 4608 4569 4621 http://www.ankurkhetrapal.com/downloads/HypertableHBaseEval2.pdf
reads [7] A. Rao, S. Zang, Hbase-0.20.0 Performance Evaluation,
http://cloudepr.blogspot.com/2009_08_01_archive.html
Random 9570 14272 16452 11696 17429 19782 [8] K. Dana, Hadoop HBase Performance Evaluation,
writes http://www.cs.duke.edu/~kcd/hadoop/kcd-hadoop-report.pdf
[9] J. Graz, J. D. Crzans, Hbase goes Realtime, The Hbase presentation at
Initial 8480 6428 4572 10884 7846 5674 Hadoop Summit 2009.
random [10] Hbase-0.20.2 Documentation,
writes http://hadoop.apache.org/hbase/docs/r0.20.2.
Scan 48270 52736 53274 62520 67420 68742
87