Facebook announced at the end of September that posts, status updates, photo captions, check-ins, and comments were being added to Graph Search results, and Ashoat Tevosyan, an engineer on the social network’s search quality and ranking team, offered his insights on just how monumental of a task this was, and how it all began.
Tevosyan described the technical challenges in a note on the Facebook Engineering page:
One of the biggest challenges we faced in building this product was that Facebook’s underlying data schema reflects the needs of a rapidly iterated Web service. New features often demand changes to data schemas, and our culture aims to make these changes easy for engineers. However, these variations make it difficult to sort posts by time, location, and tags as wall posts, photos, and check-ins all store this information differently. We currently have 70 different kinds of data we sort and index on, many of them specific to certain types of posts. In addition, the data is contained in a production MySQL database. This means that harvesting this data puts a significant load on databases that are simultaneously serving production traffic, and the process must be carefully monitored.
The posts index is much larger than other search indexes that Facebook maintains. Before we started working on the ability to search through posts, all Facebook search indexes were served entirely from RAM. This was ideal for quick lookup and was a tenable setup given reasonably small search indexes. However, storing more than 700 terabytes in RAM imposes a large amount of overhead, as it involves maintaining an index that is spread across many racks of machines. The performance cost of having these machines coordinate with each other drove the Unicorn team to look into new solutions for serving the posts index. The solution we decided on involves storing the majority of the index on solid-state flash memory. We managed to preserve performance by carefully separating out the most frequently accessed data structures and placing those in RAM.
With a trillion posts in the index, most queries return many more results than anyone could ever read. This leads us to the results ranking step. To surface content that is valuable and relevant to the user, we use two primary techniques: query rewriting and dynamic result scoring. Query rewriting happens before the execution of the query, and involves tacking on optional clauses to search queries that bias the posts we retrieve towards results that we think will be more valuable to the user. Result scoring involves sorting and selecting documents based on a number of ranking “features,” each of which is based on the information available in the document data. In total, we currently calculate well over 100 distinct ranking features that are combined with a ranking model to find the best results. We will continue to work on refining these models as we roll out to more users and listen to feedback.
Much like many Facebook features, building the ability to search posts into Graph Search started at one of the social network’s hackathons, and Tevosyan participated in this one on his second day with the company, as an intern:
Like many other products at Facebook, the ability to search over posts was originally conceived as a hackathon project. My second day as a Facebook intern coincided with a companywide hackathon, and I spent the night aiming to implement a way for my friends and I to find old posts we had written. I quickly discovered that the project was much more challenging than I had first anticipated. However, the engineering culture at Facebook meant that I was supported and encouraged to continue working on it, despite the scope of the project. The majority of the work — infrastructure, ranking, and product — has been accomplished in the past year by a few dozen engineers on the Graph Search team. We are very excited to share the results of our hard work with people using Facebook, and look forward to seeing how people use the new feature and improving it based on their feedback. We hope that being able to search for posts will enable a richer Facebook experience for everyone.
Readers: Have you begun taking advantage of the new capabilities of Graph Search?