Facebook VP Of Infrastructure Engineering Jay Parikh Talks Big Data, Project Prism

By David Cohen 

Facebook Vice President of Infrastructure Engineering Jay Parikh offered up some data on big data at Facebook’s headquarters in Menlo Park, Calif., Wednesday, sharing statistics with reporters and describing the social network’s Project Prism data-management effort.

Parikh said that Facebook currently stores its entire live user database in a single data center, using its other data centers for redundancy and other data, and when the main database outgrows one data center, it is moved to another that has been expanded to support it, according to TechCrunch, adding that Project Prism will allow the social network to split up the live user database and host it across all of its data centers.

Among the data on big data shared by Parikh, as reported by TechCrunch:

  • Facebook’s system processes 2.5 billion pieces of content and more than 500 terabytes of data daily, including 2.7 billion likes and 300 million photos.
  • The social network scans some 105 TB of data every half-hour.
  • More than 100 petabytes of data are stored in a single Hadoop disk cluster.

Parikh said, as reported by TechCrunch:

Big data really is about having insights and making an impact on your business. If you aren’t taking advantage of the data you’re collecting, then you just have a pile of data, you don’t have big data.

No one will care that you have 100 petabytes of data in your warehouse. The world is getting hungrier and hungrier for data.

We’re tracking how ads are doing across different dimensions of users across our site, based on gender, age, and interests. Actually, this ad is doing better in California, so we should show more of this ad in California to make it more successful.

Project Prism lets us take this monolithic warehouse … and physically separate (it) but maintain one view of the data.