The Nuts and Bolts Behind Facebook’s 360-Degree Videos

Facebook is running a pyramid scheme. The goal is not stealing users’ money—rather, it explains the social network's process for handling 360-degree videos, including for virtual reality.

Facebook is running a pyramid scheme. The goal is not stealing users’ money—rather, it explains the social network’s process for handling 360-degree videos, including for virtual reality.

The social network announced a host of updates at its Video @Scale event at its headquarters in Menlo Park, Calif., including a detailed explanation of how it encodes 360 video with a pyramid geometry.

The code for Facebook’s custom filter to transform 360-degree videos uploaded by users into the cube map format is now available on GitHub, and announcements related to encoding 360 videos to stream them in VR without buffering were:

  • A move from equirectangular layouts to a cube format reduces file size by 25 percent against the original.
  • Encoding 360 video with a pyramid geometry (see more below) reduces file size by 80 percent against the original.
  • View-dependent adaptive bit-rate streaming allows Facebook to optimize the experience in VR.

Facebook also build a new streaming video engine for transcoding video, migrating all of its traffic (on Facebook and Messenger) from the old system without service interruptions, and announcements included:

  • Rather than treating videos as single files, the social network’s new SVE splits them into segments, enabling parallel processing, which results in lower latency. Encoding starts as videos are uploaded to Facebook, further reducing latency.
  • Work on the new SVE began in January 2015, and it took about nine months to build and deploy.
  • The new SVE resulted in a 10X improvement in processing time between video uploads and playback.

Finally, Facebook detailed the artificial intelligence architecture created by its Vision Understanding team, which it called “an early step toward unsupervised understanding of what is happening in a video and predicting what’s going to happen next”:

  • The social network said its Vision Understanding team is studying ways to understand video with unsupervised learning models, adding that rather than labeling objects, scenes and actions, it deals with voxels– individual video pixels over time.

Facebook provided a detailed look at its pyramid geometry for 360-degree video in a blog post by software engineers Evgeny Kuzyakov and David Pio. Highlights include:

Video is an increasingly popular means of sharing our experiences and connecting with the things and people we care about. Both 360 video and VR create immersive environments that engender a sense of connectedness, detail and intimacy.

Of course, all of this richness creates a new and difficult set of engineering challenges. The file sizes are so large that they can be an impediment to delivering 360 video or VR in a quality manner at scale. We’ve reached a couple of milestones in that effort, building off traditional mapping techniques that have been powerful tools in computer graphics, image processing and compression. We took these well-known ideas and extended them in a couple of ways to meet the high bandwidth and quality needs of the next-generation displays. First, we’ll discuss our work around our 360 video filter and its source code, which we’ll be making available today.

We encountered several engineering challenges while building 360 video for News Feed. Our primary goal was to tackle the drawback of the standard equirectangular layout for 360 videos, which flattens the sphere around the viewer onto a 2D surface. This layout creates warped images and contains redundant information at the top and bottom of the image—much like Antarctica is stretched into a linear landmass on a map even though it’s a circular landmass on a globe.

Our solution was to remap equirectangular layouts to cube maps. We did this by transforming the top 25 percent of the video to one cube face and the bottom 25 percent to another, dividing the middle 50 percent into the four remaining cube faces, and then later aligning them in two rows. This reduced file size by 25 percent against the original, which is an efficiency that matters when working at Facebook’s scale.

Recommended articles