What Is RocksDB, And Why Is Facebook Open-Sourcing It?

Facebook announced that it is open-sourcing its RocksDB embeddable, persistent key-value store, which enables fast storage and global, real-time data fetching of the social network’s massive cache of user data.

RocksDBArchitectureFacebook announced that it is open-sourcing its RocksDB embeddable, persistent key-value store, which enables fast storage and global, real-time data fetching of the social network’s massive cache of user data.

The social network first made the announcement at its Data @ Scale conference last week, and Engineer Dhruba Borthakur offered more details in a note on the Facebook Engineering page.

Borthakur explained why Facebook is open-sourcing RocksDB:

Every time one of the 1.2 billion people who use Facebook visits the site, they see a completely unique, dynamically generated homepage. There are several different applications powering this experience — and others across the site — that require global, real-time data fetching.

Storing and accessing hundreds of petabytes of data is a huge challenge, and we’re constantly improving and overhauling our tools to make this as fast and efficient as possible. Today, we are open-sourcing RocksDB, an embeddable, persistent key-value store for fast storage that we built and use here at Facebook.

Borthakur went on to explain why Facebook uses an embedded database, and he offered specific details on the following four goals for RocksDB:

  • Scales to run on servers with many CPU cores.
  • Uses fast storage efficiently.
  • Is flexible to allow for innovation.
  • Supports IO-bound, in-memory, and write-once workloads.

He also offered his insights on the architecture and performance of RocksDB, and detailed typical workloads it is suitable for:

  • A user-facing application that stores the viewing history and state of users of a website.
  • A spam-detection application that needs fast access.
  • A Graph Search query that needs to scan a data set in real-time.
  • An app that needs to query Hadoop in real-time.
  • A message-queue that supports a high number of inserts and deletes.

And Borthakur concluded:

Our use cases for RocksDB have grown tremendously, and we have close to 1 petabyte of data across different applications being managed by RocksDB today. We’re excited to release RocksDB to the community and hope people will find it as useful as we have.

Our code is now live at http://github.com/facebook/rocksdb, and we hope that software programmers and database developers will use, enhance, and customize RocksDB for their own use cases. We’re looking forward to hearing your feedback and continuing to improve RocksDB and add more features. Check out our RocksDB Facebook group to join the conversation.

Readers: Had you heard of RocksDB before reading this post?