The Nuts and Bolts Behind Facebook’s 360-Degree Videos

By David Cohen Comment


Facebook is running a pyramid scheme. The goal is not stealing users’ money—rather, it explains the social network’s process for handling 360-degree videos, including for virtual reality.

The social network announced a host of updates at its Video @Scale event at its headquarters in Menlo Park, Calif., including a detailed explanation of how it encodes 360 video with a pyramid geometry.

The code for Facebook’s custom filter to transform 360-degree videos uploaded by users into the cube map format is now available on GitHub, and announcements related to encoding 360 videos to stream them in VR without buffering were:

  • A move from equirectangular layouts to a cube format reduces file size by 25 percent against the original.
  • Encoding 360 video with a pyramid geometry (see more below) reduces file size by 80 percent against the original.
  • View-dependent adaptive bit-rate streaming allows Facebook to optimize the experience in VR.

Facebook also build a new streaming video engine for transcoding video, migrating all of its traffic (on Facebook and Messenger) from the old system without service interruptions, and announcements included:

  • Rather than treating videos as single files, the social network’s new SVE splits them into segments, enabling parallel processing, which results in lower latency. Encoding starts as videos are uploaded to Facebook, further reducing latency.
  • Work on the new SVE began in January 2015, and it took about nine months to build and deploy.
  • The new SVE resulted in a 10X improvement in processing time between video uploads and playback.

Finally, Facebook detailed the artificial intelligence architecture created by its Vision Understanding team, which it called “an early step toward unsupervised understanding of what is happening in a video and predicting what’s going to happen next”:

  • The social network said its Vision Understanding team is studying ways to understand video with unsupervised learning models, adding that rather than labeling objects, scenes and actions, it deals with voxels– individual video pixels over time.

Facebook provided a detailed look at its pyramid geometry for 360-degree video in a blog post by software engineers Evgeny Kuzyakov and David Pio. Highlights include:

Video is an increasingly popular means of sharing our experiences and connecting with the things and people we care about. Both 360 video and VR create immersive environments that engender a sense of connectedness, detail and intimacy.

Of course, all of this richness creates a new and difficult set of engineering challenges. The file sizes are so large that they can be an impediment to delivering 360 video or VR in a quality manner at scale. We’ve reached a couple of milestones in that effort, building off traditional mapping techniques that have been powerful tools in computer graphics, image processing and compression. We took these well-known ideas and extended them in a couple of ways to meet the high bandwidth and quality needs of the next-generation displays. First, we’ll discuss our work around our 360 video filter and its source code, which we’ll be making available today.

We encountered several engineering challenges while building 360 video for News Feed. Our primary goal was to tackle the drawback of the standard equirectangular layout for 360 videos, which flattens the sphere around the viewer onto a 2D surface. This layout creates warped images and contains redundant information at the top and bottom of the image—much like Antarctica is stretched into a linear landmass on a map even though it’s a circular landmass on a globe.

Our solution was to remap equirectangular layouts to cube maps. We did this by transforming the top 25 percent of the video to one cube face and the bottom 25 percent to another, dividing the middle 50 percent into the four remaining cube faces, and then later aligning them in two rows. This reduced file size by 25 percent against the original, which is an efficiency that matters when working at Facebook’s scale.

With cube maps, we put a sphere inside a cube, wrap an equirectangular image of a frame around the sphere and then expand the sphere until it fills the cube. We built on this idea by taking a pyramid geometry and applying it to 360 video for VR.

We start by putting a sphere inside a pyramid, so that the base of the pyramid is the full-resolution FOV and the sides of the pyramid gradually decrease in quality until they reach a point directly opposite from the viewport, behind the viewer. We can unwrap the sides and stretch them slightly to fit the entire 360 image into a rectangular frame, which makes processing easier and reduces the file size by 80 percent against the original.

VideoAtScale360VideoVRSphere VideoAtScale360VideoViewerPyramidViewport VideoAtScale360VideoUnfoldingPyramid1 VideoAtScale360VideoUnfoldingPyramid2 VideoAtScale360VideoUnfoldingPyramid3 VideoAtScale360VideoUnfoldingPyramid4 VideoAtScale360VideoPyramidLayout VideoAtScale360VideoInPyramidLayout

Facebook co-founder and CEO Mark Zuckerberg uploaded a video (embedded below) describing the process, as well.

Readers: What did you think of the information Facebook shared at Video @Scale?