Tired of generating new ideas within a distributed database system for billions of messages? We’re getting tired too! Thankfully, Twittter has open sourced a new product this evening called “Snowflake”, which enables developers to generate new ID numbers, by combining timestamp, worker number, and sequence number, something produced through Apache Zookeeper.
If all this sounds relatively complex, that’s because it is. Twitter is dealing with billions of messages that need to be indexed in a short amount of time. As such, they needed an easier solution beyond integers, the method they previously used. If your startup isn’t dealing with such scalability issues, don’t be jealous! However once you do get to such levels, Facebook and Twitter have both open sourced enough products that you should now have plenty of help.
Given that Twitter is producing so many messages at such a quick rate, they’ve decided to use Cassandra (a project first open-sourced by Facebook) as well as sharded MySQL databases (using gizzard, an automated sharding system first open-sourced by Twitter) to scale quickly. In order to have unique identifiers across each of these messages, the company needed a solution which worked within a distributed system.
If you want to learn more about building the infrastructure for large scale web applications, come to our upcoming Social Developer Summit later this month in San Francisco.
Snowflake image found via Discovery Channel.