After reading about distributed database solutions and watching the entire NOSQL movement I had to figure out a way to start building something on a distributed database system. While I only needed one server to get started on (i.e. I didn’t need multiple database nodes right away), I already have a couple projects that are breaking from the old MySQL replication plus Memcached problems (effectively articulated here). In other words, it’s becoming a pain in the butt to maintain the system so I decided to dive in. Here’s how I first got setup.
Before diving in to the tutorial, I thought it would be useful to mention that developers who are far more qualified to speak on the topic of Cassandra and other distributed data storage models will be speaking at our upcoming conference: Social Developer Summit. If you are interested in learning from the leading developers who are overcoming some of the challenges in scaling and reliability, you should come to the event! If you read this article and have already gone through this process before, you should email us at contact [at] socialtimes [d0t] com about speaking! Now on to Cassandra!
Media Temple DV Running Cassandra … Are You Serious?!?!?
It’s important to note that I’m running Cassandra on a Media Temple DV server which should be a warning flag in itself. If you are looking to create a massive distributed database that scales horizontally, Cassandra is for you. However you should also have your own truly dedicated servers. Media Temple DV is not truly dedicated. That’s because the “V” in DV stands for virtual which means you are sharing your “dedicated” server with other people.
Typically this wouldn’t matter because most people running applications on these servers are not doing calculations or data storage that will crash the system. Also, you may wonder why on earth I’m even dealing with Cassandra, a database which is still in Alpha. The true reason was that I was tired of dealing with master/slave crap with MySQL and being forced to shard my data for various tools we are currently running on AllFacebook.
While I could have managed this, there has been so much buzz about NOSQL and other tools being developed, I had to dive in head first. The result has been a week of pain, but hopefully after reading this article you can set up on your Media Temple DV server (or CentOS or similar server … Fedora, etc) and start playing with Cassandra.
Aside from wanting to dive head first, Cassandra also solved a problem of mine: instant scalability. You can add on servers on the fly without having to go through dramatic configuration changes to your code or to the database. So rather than putting your time into constantly scaling your database system, you can instead put your money into buying new servers and hard drives as needed.
All that’s left is say “I’ve had enough of MySQL so it’s on to the next one!”
Cassandra’s “Easy” Installation
If you read other tutorials, you’d hear about how easy it is to get Cassandra up and running. Evan Weaver has a great explanation for Ruby people and there are a number of other tutorials scattered around the web. However I’m not a “master” of Linux so things didn’t work out too smoothly. The greatest failure was probably my
yum database which appeared to be really out of date. Rather than updating the database (which I still have yet to do), I decided to install and update any of the services required to get Cassandra up an running from scratch.
By the time you install Cassandra on your server I can only hope you can avoid the problems I’ve run into.
Step 1: Install The Current Version Of Java
I believe MediaTemple installs a version of Java when you add their developer tools to your server, but it’s an old version of Java. It’s not really useful if you want to install Cassandra. So you’ll need to install the latest version of Java and make sure that the
java bash command runs the proper version of Java. You can accomplish most of these things by reading this article by Chris Maynard. There is one flaw with that article though: if you have an older version of java also installed on your server,
/usr/sbin/alternatives may already have a record for the
To work around this, just run the following command:
/usr/sbin/alternatives --set java /usr/local/java/jdk1.6.0_18/bin/java
Keep in mind that I’ve installed everything under /usr/local for the purpose of this entire tutorial so you may need to replace the path used above with the proper path and version. Is your server still causing problems? Check out this forum post as it may help you out some.
Step 2: How To Install Thrift
Congratulations on installing Java from scratch! You aren’t done yet though. Now you need to install Thrift, a service first developed by Facebook which enables multiple languages to communicate with each other. In this case, you code will be using thrift to communicate with Cassandra. I used Git to get Thrift on my server. If you don’t have Git, you can grab it here and watch this video if you are interested in learning more about the benefits of Git.
- Navigate to /usr/local (or the directory that you would like to install Thrift under)
- Get the latest version of thrift by running this command:
git clone git://github.com/apache/thrift.git(you can find the full Thrift project here)
- Install thrift by running
./bootstrap.shand the standard
Did it fail? Not surprising! On my server, the version of Boost I had was old, wasn’t recognized, and possibly didn’t install correctly. I uninstalled the package and installed Boost from scratch as described here.
After you’ve built boost, you can navigate back to your thrift directory (/usr/local/thrift) and run
Step 2.5: Latest Version Of Ruby
I should note that after noticing the bug I headed into the #cassandra IRC chat room to see if anybody had run into similar problems. Given that the error was referencing a Ruby function, one of the developers in the room suggested upgrading Ruby. He didn’t need to suggest it twice before I was removing the original version of Ruby on my server and
ruby-gem. I also decided to throw in rails just so I could feel good about having a server that’s ready for anything! If you want to know how to install Ruby on CentOS, check out this person’s article. Did upgrading ruby help at all? Not at all, but now I had a fully functional version of Ruby on Rails on my server so if I want to learn it, I can!
Step 3: Install Cassandra
No we’re actually to the part that you really wanted to know about: installing Cassandra on your server. While installing Cassandra is easy if you have the latest versions of the software Cassandra is dependent on, MediaTemple doesn’t have the latest versions installed. Now that you’ve installed Thrift and the latest versions of Java and Boost you are good to go.
Grab the latest copy of Cassandra here. I just ran
wget http://mirror.candidhosting.com/pub/apache/incubator/cassandra/0.5.1/apache-cassandra-0.5.1-bin.tar.gz. You can find all the mirrors here. Next, untar Cassandra in your working directory (for me it was /usr/local) by running the command
tar -zxvf cassandra-$VERSION.tgz. Then run the following commands to prior to running the install scripts:
sudo mkdir -p /var/log/cassandra
sudo chown -R `whoami` /var/log/cassandra
sudo mkdir -p /var/lib/cassandra
sudo chown -R `whoami` /var/lib/cassandra
You can replace “`whoami`” in these commands with your user name. Move into the top Cassandra directory and run
bin/cassandra -f and see if it runs. As is usually the case for me, it didn’t run! Instead, I received an error which said “Can’t start up: not enough memory”. The issue here was with my java configuration. To fix this, modify cassandra.in.sh so that -Xmx is set to
-Xmx512M. While you aren’t supposed to overwrite the Cassandra configuration files, I went ahead and made the modification. Feel free to
cp a backup version of the config file.
I also happened to notice that running
bin/cassandra -f was returning more errors. Many of them happen to be warnings and not fatal errors. The way to resolve the problem is described here. Just remove
-Werror from extconf.rb and you’ll be good to go.
You should now be able to run
/bin/cassandra -f from the top Cassandra directory and you Cassandra will boot up. In order to get the Cassandra client up and running you’ll need to get Cassandra to run in the background. You can do this by running the
/bin/cassandra command without the -f option or alternatively load up another terminal and run the client with
/bin/cassandra-cli --host localhost --port 9160 running in the foreground. Either way will work. Also, make sure you aren’t running as root when you run Cassandra or you could get permission problems next time you try to load Cassandra.
That’s it! While installing Cassandra should not be this complicated, MediaTemple appears to use old versions of software on their servers (or at least versions that aren’t compatible with Cassandra). In future articles I will describe more about my experience with Cassandra and actually developing with the database in Python and PHP.