How Lingospot is Using Metadata to Improve TV Viewing

Lingospot began in 2007 as a content recommendation service for publishers. At the time, the product analyzed the context of Forbes, Bloomberg, and Boston Globe content, and then linked readers to other related content within the publisher. Lingospot has since pivoted and in the past year and a half has built a similar type of product for TV viewing. Through metadata, the company attempts to understand what viewers are watching on TV – and more, what is taking place from moment to moment. Then, they try to figure out how to improve that TV viewing experience.

“I think metadata is in its infancy when it comes to video,” Lingospot CEO Nikos Iatropoulos tells Lost Remote. “In a typical music scheduling system, you’ll find hundreds of metadata fields per song, and Pandora has said to use over 450 different metadata fields per song for their Music Genome project. So, if a 5-minute song warrants 450 fields, how many should a one-hour segment of ESPN’s ‘SportsCenter,’ an episode of the ‘Tonight Show’ or an episode of ‘The Voice’ have? I think that we are merely scratching the surface when it comes to the use of TV metadata, and the next three to five years will be extremely interesting as some of the applications of this newfound intelligence about video start hitting the market.”

As more households become ‘connected,’ whether through smart TVs or OTT devices, viewers will want supplementary content while watching TV. This was the idea behind Pixie, which we recently wrote about — to provide additional information on the first screen rather than the second or third screens. For more on Lingospot, and how it intends to use metadata to bridge the gap between lean-back TV viewing and the interactivity of web content, we spoke with Iatropoulos.

LR: What is metadata and why is it so important to the broadcast industry?

Nikos Iatropoulos: Metadata is defined as “data about data.” More simply, we can think of metadata as data about content. For example, in the case of a webpage, metadata could include the language that the page is written in. In the case of a TV show, metadata could be the information about a particular TV series (name, season number, main actors, etc). This is the information you will see in your electronic programming guide (EPG)—the grid with channels on the vertical access and time slots on the horizontal. Usually, TV metadata is limited to information about a season, or a particular episode. For example, it could include that you are watching the Tonight Show, and that the main guest is Heidi Klum. It would not, however, provide any information about what was actually discussed during Heidi’s appearance. That’s where Lingospot comes in. Our scene-level metadata can provide information about what is happening in each scene. In the above example, we would know that Heidi spoke about a specific challenge in Project Runway, or that she mentioned she had just come back from a trip to Tulum, Mexico. Similarly, if I were watching ESPN SportsCenter, the metadata available from existing providers would simply label the channel and show. Lingospot can go deeper, identifying a specific segment to be about X Games and even tagging that segment with the athletes that were mentioned.

To understand why metadata is important, we must consider the consumer shift away from appointment TV viewing and the increasing prevalence of IP video delivery. The ability to search, discover and navigate will become even more essential to the way viewers find and consume TV content. Scene-level metadata can dramatically improve the effectiveness of this new mode of video discovery and consumption. In the above examples, it would allow a consumer to find all segments where a judge on Project Runway is mentioned, or, in the latter case, all segments of ESPN SportsCenter that talk about the X Games. Neither is possible without scene-level metadata.

LR: How does Lingospot get the required metadata to recommend content?

Iatropoulos: We look at multiple modalities of TV broadcast, including using our natural language processing technology to analyze the closed captions of a TV channel; image analysis to get additional information about what is happening in a scene; optical character recognition to analyze the text that shows up on scene; facial matching technologies to identify who is on screen; and, soon, musical background to classify the mood of a scene.

LR: Why is it necessary to accumulate so much information from so many different sources?

Iatropoulos: There is a tremendous variety of information available within a TV broadcast across different show genres. To offer a uniform metacontent experience across multiple genres, we need to look at all possible inputs to successfully reconstruct the context in each instance. For example, in a news broadcast from CNN or Fox News, much of the information will be present in the closed captions. Other information can be extracted from the text that shows up on screen. Facial matching can tell us which reporter is in each scene. There would be little or no music background, so music is not relevant. If we’re watching a movie, on the other hand, the transcript does not provide much useful information, since it’s mostly dialogue, so we need to pay closer attention to image analysis and sound. Reality, cooking, travel and sports shows each require different approaches as to how we leverage our inputs to better understand the context in each situation.

LR: How do you think metadata will affect the consumption of news?

Iatropoulos: I think metadata, especially metacontent, is going to have a huge impact on news consumption. It will bridge the gap between the lean-back experience of watching the news on TV with the interactive experience of news consumption online. A great example of this is already on the market from Turner in the form of the CNNx app and website. It allows viewers to lean-back and watch the news in a linear fashion, while keeping the detailed and deeper coverage of any particular topic available with one click. At Lingospot, we initially focused on the news and sports genres because we think they are extremely well-suited for TV metacontent. Viewers of these genres usually want to learn more about what they are watching, whether it be stats about the players in a game, or deeper coverage about a breaking news items. There are some challenges because of the real-time nature of these genres, but we’ve managed to overcome most of them.

LR: Do you think the average news consumer wants to have so much information delivered to them at once, or could it potentially overwhelm them?

Iatropoulos: I think consumers want to have that choice. There are times when they will want more information, and times when they are probably seeing too much already. For example, I like investing in technology and healthcare, but never cared much for energy and retail. So, if I’m watching a fund manager being interviewed on CNBC, I would probably ignore any energy stocks mentioned. If there are interesting tech or healthcare stocks mentioned, however, I’d welcome having as much information as possible about that stock/company at my fingertips. This could include stock charts, analyst reports, executive profiles, financials and even tweets from the CEO’s account. Lingospot can make all that available for whenever a consumer decides that they need it.