Tuesday, April 24, 2012

Gemcached - Improved Memcached

Memcached is an in-memory key-value store for small chunks of arbitrary data. memcached is distributed, however its servers are disconnected from each other. This means it is the client's responsibility to put and retrieve data from the correct server. Most memcached clients use some sort of hashing to decide a key's location. While deploying a memcached cluster (if you can call it that) there are a lot of things that can go wrong.

Configuring Servers

To determine the server where a particular key should reside, most memcached clients will use the key's hash code and mod it by the number of servers (this changes slightly when the servers are weighed). In order for all the clients to do this consistently, you will have to configure all of them with the same server list. Not only that you have to make sure that the order of the servers is the same too, because some clients may not sort the server list provided to them.

Mixed memcached Clients

If you want to use a mix of clients, say a php and a java client, you could be in for trouble. These clients may not hash the keys in a uniform manner, so you could end up putting same key on more than one servers, leading to stale data, lost updates and all sorts of confusion.

High Availability

If one of the memcached servers crashes, a get request for a key on that server simply results in a cache miss. The application will then have to go to its datastore and fetch the entry again. If many concurrent requests arrive for keys on the crashed server, there can be severe delays in serving all the requests.

Dynamic Scaling

When you want to add capacity to your system, you start a new memcached server. But that in itself does not enable the clients to use the new server. You have to let the client know that there is a new server either by stopping and re-configuring the client or by invoking API (like addserver() for xmemcached). To add capacity, I don't think stopping your client (which can be your application server) is really an option, so you will have to build a mechanism to notify an already running client about a server that was added.

Gemcached


I recently worked on gemcached which solves all problems listed above with memcached. gemcached uses VMware GemFire to provide clustered caching. You can download gemcached from VMware labs.

Starting gemcached

After adding gemfire.jar and gemcached-X.X-RELEASE.jar to your CLASSPATH, you can start gemcached from the command line using:
 java com.gemstone.memcached.GemFireMemcachedServer -port=[port]

If port is not specified, 11212 is used by default. You can start as many gemcached servers as you want and point your clients to one, some or all of them without worrying about the order of the servers. When you want to scale the system, just start another gemcached server. This new server discovers running server by default and shares their load. Clients do not have to be re-started, they simply start using the new server by continuing to talk with its configured set of servers.

No comments: