Conceptos by pengxiang


A high level cassandra java client.

Cassandra is a highly available column oriented database:

Hector is the greatest warrior in the greek mithology, Troy's builder and brother of Cassandra

This client provides:

o high level, simple object oriented interface to cassandra

o failover behavior on the client side

o connection pooling for improved performance and scalability

o JMX conters for monitoring and management

o load balancing

The work was initially inspired by but has taken off to different directions

  * Insert a new value keyed by key
 * @param key Key for the value
 * @param value the String value to insert
 public void insert(final String key, final String value) throws Exception {
     execute(new Command(){
      public Void execute(final Keyspace ks) throws Exception {
          ks.insert(key, createColumnPath(COLUMN_NAME), bytes(value));
          return null;

 * Get a string value.
 * @return The string value; null if no value exists for the given key.
 public String get(final String key) throws Exception {
     return execute(new Command(){
      public String execute(final Keyspace ks) throws Exception {
          try {
              return string(ks.getColumn(key, createColumnPath(COLUMN_NAME)).getValue());
          } catch (NotFoundException e) {
              return null;

 * Delete a key from cassandra
 public void delete(final String key) throws Exception {
     execute(new Command(){
      public Void execute(final Keyspace ks) throws Exception {
          ks.remove(key, createColumnPath(COLUMN_NAME));
          return null;
Here are the high level features of Hector, currently hosted at github.
      A high-level object oriented interface to cassandra. As noted before, Cassandra’s out of the box client is a thrift
         client, which isn’t always that nice and clean to work with. I wanted to provide higher level and cleaner API. This
         part was mainly inspired by the mentioned cassandra-java-client. The API is defined in the Keyspace interface. See
         for example methods such as Keyspace.insert() and keyspace.getColumn()
      Failover support. Cassandra is a distributed data store and it may handle very well one or several hosts going down.
         However, out of the box thrift provides no support for failing clients. What it the client is configured to connect a
         cassandra host that just happened to be down right now? In hector, if a client is connected to one host in the ring
         and this host goes down, the client will automatically and transparently search for other available hosts to perform
         its operation before giving up and returning an error to its user. There are currently 3 ways to configure the failover
         policy:     FAIL_FAST        (no     retry,    just     fail     if    there     are     errors,    nothing     smart),
         ON_FAIL_TRY_ONE_NEXT_AVAILABLE                     (try     one     more      host    before     giving     up)     and
         ON_FAIL_TRY_ALL_AVAILABLE (try all available hosts before giving up). See CassandraClient.FailoverPolicy.
      Connection pooling. This is a real necessity for high scale applications. The usual pattern for DAOs (Data Access
         Objects) is large number of small reads/writes. Clients cannot afford to open a new connection with each and every
         request, not only because of the overhead in the tcp handshake (thrift uses tcp), but also because of the fact that
         sockets remain in TIME_WAIT so a client may easily run out of available sockets if it operates fast enough. This
         part was also inspired by cassandra-java-client but was improved in my version. Hector provides connection
         pooling and a nice framework that manages all its gory details.
      JMX support. It’s a widely known fact that applications have a life of their own. You built it to do X but it does Y
         b/c you didn’t expect Z to happen. Running an application without the ability to monitor it is like walking
         blindfolded on a dark highway; sooner or later you’ll get hit by something. Hector exposes JMX for many
         important runtime metrics, such as number of available connections, idle connections, error statistics and more.
      Support for the Command design pattern to allow clients to concentrate on their business logic and let hector take
         care of the required plumbing. This is demonstrated in the code above.

In Greek mythology Cassandra is captured by the triumphant king Agamemnon after the fall of Troy, with whom she has
two sons, Pelops and Teledamus. This Java client library is Pelop’s namesake nicknamed “Cassandra’s beautiful son”
because it offers a beautiful way to code against the Cassandra database. This is a quick introduction to the library.
You can find the open source code here

Pelops was born to improve the quality of Cassandra code across a complex commercial project that makes extensive use of
the database. The main objectives the library are:
   To faithfully expose Cassandra’s API in a manner that is immediately understandable to anyone:
      simple, but beautiful
   To completely separate low-level concerns such as connection pooling from data processing code
   To eliminate “dressing code”, so that the semantics of data processing stand clear and obvious
   To accelerate development through intellisense, function overloading and powerful high-level methods
   To implement strategies like load balancing based upon the per node running operation count
   To include robust error handling and recovery that does not mask application-level logic problems
   To track the latest Cassandra releases and features without causing breaking changes
   To define a long-lasting paradigm for those writing client code

Up and running in 5 minutes
To start working with Pelops and Cassandra, you need to know three things:
     1. How to create a connection pool, typically once at startup
     2. How to write data using the Mutator class
     3. How to read data using the Selector class.
It’s that easy!

Creating a connection pool
To work with a Cassandra cluster, you need to start off by defining a connection pool. This is typically done once in the
startup code of your application. Sometimes you will define more than one connection pool. For example, in our project, we
use two Cassandra database clusters, one which uses random partitioning for data storage, and one which uses order
preserving partitioning for indexes. You can create as many connection pools as you need.
To create a pool, you need to specify a name, a list of known contact nodes (the library can automatically detect further
nodes in the cluster, but see notes at the end), the network port that the nodes are listening on, and a policy which controls
things like the number of connections in your pool.

Here a pool is created with default policies:
  new String[] { "", "", ""},
  new Policy());

Using a Mutator
The Mutator class is used to make mutations to a keyspace (which in SQL speak translates as making changes to a
database). You ask Pelops for a new mutator, and then specify the mutations you wish to make. These are sent to Cassandra
in a single batch when you call its execute method.
To create a mutator, you must specify the name of the connection pool you will use and the name of the keyspace you wish
to mutate. Note that the pool determines what database cluster you are talking to.

Mutator mutator = Pelops.createMutator("Main", "SupportTickets");

Once you have the mutator, you start specifying changes.
 * Write multiple sub-column values to a super column...
 * @param rowKey               The key of the row to modify
 * @param colFamily             The name of the super column family to operate on
 * @param colName               The name of the super column
 * @param subColumns             A list of the sub-columns to write
mutator. writeSubColumns(
   UuidHelper.newTimeUuidBytes(), // using a UUID value that sorts by time
     mutator.newColumn("category", "videoPhone"),
     mutator.newColumn("reportType", "POOR_PICTURE"),
     mutator.newColumn("createdDate", NumberHelper.toBytes(System.currentTimeMillis())),
     mutator.newColumn("capture", jpegBytes),
     mutator.newColumn("comment") ));

 * Delete a list of columns or super columns...
 * @param rowKey                 The key of the row to modify
 * @param colFamily               The name of the column family to operate on
 * @param colNames                The column and/or super column names to delete

After specifying the changes, you send them to Cassandra in a single batch by calling execute. This takes the Cassandra
consistency level as a parameter.


Note that if you need to know a particular mutation operation has completed successfully before initiating some subsequent
operation, then you should not batch your mutations together. Since you cannot re-use a mutator after it has been executed,
you should create two or more mutators, and execute them with at least a QUORUM consistency level.
Browse the Mutator class to see the methods and overloads that are available here

Using a Selector
The Selector class is used to read data from a keyspace. You ask Pelops for a new selector, and then read data by calling its

Selector selector = Pelops.createSelector("Main", "SupportTickets");

Once you have a selector instance, you can start reading data using its many overloads.
 * Retrieve a super column from a row...
 * @param rowKey                  The key of the row
 * @param columnFamily               The name of the column family containing the super column
 * @param superColName               The name of the super column to retrieve
 * @param cLevel                 The Cassandra consistency level with which to perform the operation
 * @return                   The requested SuperColumn
SuperColumn ticket = selector.getSuperColumnFromRow(

assert ticketId.equals(

// enumerate sub-columns
for (Column data : ticket.columns) {
   String name =;
   byte[] value = data.value;

 * Retrieve super columns from a row
 * @param rowKey                   The key of the row
 * @param columnFamily                The name of the column family containing the super columns
 * @param colPredicate              The super column selector predicate
 * @param cLevel                 The Cassandra consistency level with which to perform the operation
 * @return                   A list of matching columns
List<SuperColumn> allTickets = selector.getSuperColumnsFromRow(
   Selector.newColumnsPredicateAll(true, 10000),

 * Retrieve super columns from a set of rows.
 * @param rowKeys                  The keys of the rows
 * @param columnFamily                The name of the column family containing the super columns
 * @param colPredicate              The super column selector predicate
 * @param cLevel                  The Cassandra consistency level with which to perform the operation
 * @return                    A map from row keys to the matching lists of super columns
Map<String, List<SuperColumn>> allTicketsForFriends = selector.getSuperColumnsFromRows(
   Arrays.asList(new String[] { "matt", "james", "dom" }, // the friends
   Selector.newColumnsPredicateAll(true, 10000),

 * Retrieve a page of super columns composed from a segment of the sequence of super columns in a row.
 * @param rowKey                    The key of the row
 * @param columnFamily                 The name of the column family containing the super columns
 * @param startBeyondName                The sequence of super columns must begin with the smallest super column name
greater than this value. Pass null to start at the beginning of the sequence.
 * @param orderType                  The scheme used to determine how the column names are ordered
 * @param reversed                  Whether the scan should proceed in descending super column name order
 * @param count                   The maximum number of super columns that can be retrieved by the scan
* @param cLevel                   The Cassandra consistency level with which to perform the operation
* @return                     A page of super columns
List<SuperColumn> pageTickets = getPageOfSuperColumnsFromRow(
  lastIdOfPrevPage, // null for first page
  Selector.OrderType.TimeUUIDType, // ordering defined in this super column family
  true, // blog order
  10, // count shown per page

There are a huge number of selector methods and overloads which expose the full power of Cassandra, and others like the
paginator methods that make otherwise complex tasks simple. Browse the Selector class to see what is available here

Other stuff
All the main things you need to start using Pelops have been covered, and with your current knowledge you can easily feel
your way around Pelops inside your IDE using intellisense. Some final points it will be useful to keep in mind if you want
to work with Pelops:
   If you need to perform deletions at the row key level, use an instance of the KeyDeletor class (call
   If you need metrics from a Cassandra cluster, use an instance of the Metrics class (call Pelops.createMetrics).
   To work with Time UUIDs, which are globally unique identifiers that can be sorted by time – which you will find to
      be very useful throughout your Cassandra code – use the UuidHelper class.
   To work with numbers stored as binary values, use the NumberHelper class.
   To work with strings stored as binary values, use the StringHelper class.
   Methods in the Pelops library that cause interaction with Cassandra throw the standard Cassandra exceptions defined

The Pelops design secret
One of the key design decisions that at the time of writing distinguishes Pelops, is that the data processing code written by
developers does not involve connection pooling or management. Instead, classes like Mutator and Selector borrow
connections to Cassandra from a Pelops pool for just the periods that they need to read and write to the underlying Thrift
API. This has two advantages.
Firstly, obviously, code becomes cleaner and developers are freed from connection management concerns. But also more
subtly this enables the Pelops library to completely manage connection pooling itself, and for example keep track of how
many outstanding operations are currently running against each cluster node.
This for example, enables Pelops to perform more effective client load balancing by ensuring that new operations are
performed against the node to which it currently has the least outstanding operations running. Because of this architectural
choice, it will even be possible to offer strategies in the future where for example nodes are actually queried to determine
their load.
To see how the library abstracts connection pooling away from the semantics of data processing, take a look at the execute
method of Mutator and the tryOperation method of Operand. This is the foundation upon which Pelops greatly improves
over existing libraries that have modelled connection management on pre-existing SQL database client libraries.

Thrift is a software framework for scalable cross-language services development. It combines a software stack with a code
generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang,
Perl, Haskell, C#, Cocoa, Smalltalk, and Ocaml.

To top