If you’re like most of the 500 million LinkedIn users, you probably spend a few hours per week on the site reading articles, seeing what your professional pals are up to and, perhaps, networking for a new job.
Now imagine if a company that you’ve never heard of used automated bots to download your public profile (viewable via search engines such as Google), analyzed it to identify behavioral signals that you’re job shopping and warned your employer.
That’s exactly what a San Francisco-based startup called hiQ Labs can do with its software, which scrapes publicly available LinkedIn profiles to offer clients, according to its website, “a crystal ball that helps you determine skills gaps or turnover risks months ahead of time.”
Unsettling, isn’t it? Yet a judge decided on Aug. 14 that this was OK.
Judge Edward Chen of the U.S. District Court in San Francisco agreed with hiQ’s claim in a lawsuit that Microsoft-owned LinkedIn violated antitrust laws when it blocked the startup from accessing such data. He ordered LinkedIn to remove the barriers within 24 hours. LinkedIn filed paperwork to appeal.
The ruling is baffling and disturbing. It contradicts years of legislation and court decisions clamping down on web scraping—the often-harmful practice of using bots to extract data from websites, unbeknownst to the site operator. And it opens a Pandora’s box of questions about social media user privacy and the right of businesses to protect themselves from data hijacking.
There’s also the matter of fairness. LinkedIn spent years creating something of real value. Why must it now abide a parasite like hiQ, paying for the servers and bandwidth to host all of that bot traffic on top of its own human users just so hiQ can ride LinkedIn’s coattails?
A victory for bad bots
Chen’s ruling sent a chill through those of us in the cybersecurity industry devoted to fighting web-scraping bots.
Scraping has existed for a long time and, in its good form, it’s a key underpinning of the internet. “Good bots” enable, for example, search engines to index web content, price comparison services to save consumers money and market researchers to gauge sentiment on social media.
“Bad bots,” however, fetch content from websites with the intent of using it for purposes outside of the site owner’s control. They are used to conduct a variety of harmful activities, such as denial-of-service attacks, competitive data mining, online fraud, account hijacking, data theft, stealing of intellectual property, unauthorized vulnerability scans, spam and digital ad fraud.
Bad bots make up 20 percent of all web traffic and have become so rampant that Congress last year passed its first legislation specifically to target them—the Better Online Ticket Sales (BOTS) Act, which bans the use of software that circumvents security measures on ticket seller websites.
It’s difficult not to see Chen’s ruling as a win for the bad bots.
Stealing or fair use?
The central question in the LinkedIn case is whether hiQ was taking and using profile data without permission or was merely accessing publicly available information.
LinkedIn asserted in a cease-and-desist letter to hiQ in May that the company’s practices were illegal and violated the LinkedIn user agreement’s prohibition on profile scraping. “LinkedIn has earned its members’ trust by acting vigilantly to keep their data secure. hiQ’s actions and products violate this trust,” the letter said.
hiQ had some powerful support in its case, including noted Harvard University constitutional law professor Laurence Tribe, who portrayed hiQ as an innovator following in the footsteps of web pioneers like Google.
“Data analytics on public information is a foundation stone of the modern internet,” Tribe wrote in a brief with two other lawyers. “Without such technologies, internet users would be unable to make sense of the billions of web pages that exist in this modern marketplace of ideas. To allow LinkedIn to impose debilitating financial and criminal liability on a startup for accessing public pages would have a widespread chilling effect on innovation across the country and thereby thwart valuable commercial and academic research.”
As much as I respect Tribe, he is wrong. First of all, the LinkedIn user agreement clearly states that no one is allowed to “copy, use, disclose or distribute any information obtained from the services, whether directly or through third parties (such as search engines), without the consent of LinkedIn.”
By claiming that it wasn’t bound by the user agreement because public profiles are viewable online through search engines, hiQ was cheaply exploiting a loophole and flouting the spirit if not the letter of the rule.
Second, Tribe’s allusion to search engines is specious. Search engines use bots to provide mutual value to the other party, which enjoys the benefits of being indexed. hiQ, however, is providing zero value to LinkedIn—just looking for a free ride.
Furthermore, we’ve been down this path before. In a landmark 2013 case, a judge ruled in favor of the Associated Press in a lawsuit against Meltwater, a news monitoring service that used bots to crawl the internet for news. The AP alleged that Meltwater infringed its copyrights by delivering excerpts from 33 AP articles to Meltwater customers.
Although there are technical legal differences between the two rulings—the LinkedIn case doesn’t involve copyrights— the principle is the same: Unauthorized use is unauthorized use.
Clarity is needed
It’s ironic that hiQ and its lawyers are playing the innovation card, because in reality, it’s innovation by companies like LinkedIn that is at stake.
Why should companies like hiQ be able to “innovate” on the backs of other companies? Why should we expect LinkedIn to give away the data it has collected, in effect punishing it for being successful? If this ruling stands, will Facebook, Twitter and every other large social media company have to keep the doors open for web scrapers?
Here’s hoping that Chen’s ruling is overturned on appeal. Here’s also hoping that the case helps spur a revision of the 1986 Computer Fraud and Abuse Act, which makes it unlawful to break into a computer to access or alter information and which LinkedIn unsuccessfully cited in its legal battle with hiQ. The law was written before many of today’s scenarios were even imagined, and we need Congress to step in and remove the ambiguities.
Moral and legal issues aside, the case serves as a good reminder of the importance of security technology to safeguard infrastructure against bots. LinkedIn’s use of such technology enabled it to know that its data was being scraped; it’s a curious ruling by a judge that failed them.