Data scraping is the method of obtaining and copying specific information from a website or other database. It is the primary method used by internet search engines and web aggregators to index and arranges the massive amount of data available on the internet or another database such as consumer product data or social media profile data. Such information may be readily available on Google or it may require access by accepting the terms of service agreement of the application.

The laws concerning data scraping of publicly available data are continually evolving and involve a huge number of statutory regimes such as the Computer Fraud and Abuse Act ("CFAA"), privacy laws, Digital Millennium Copyright Act ("DMCA") among the other statutes. Through this article, the authors aim to: firstly, discuss the rules and policies concerning data scraping and commercial usage of such data provided on the platform of LinkedIn; secondly, analyze the legality of publicly available data scraping in consonance with the case of HiQ Labs Inc. v. LinkedIn Corp., and lastly, discuss what the awaiting decision holds and the way forward.

LinkedIn Rules and Policies

Rules Governing general usage:

The usage of LinkedIn Platform and its various other services such as LinkedIn.com, LinkedIn-branded apps, LinkedIn Learning, and other LinkedIn-related sites, by a user, is governed by the following documents/ agreements:

  1. User Agreement
  2. Privacy Policy
  3. Cookie Policy
  4. Copyright Policy
  5. California Consumer Privacy Disclosure

Further, the paid services are subject to certain specific terms in addition to the above-mentioned agreements.

Rules related to data scraping:

As per Clause 3.4 of the User Agreement, LinkedIn reserves the right to restrict, suspend, or terminate the User's account if the user "breaches the terms of this Contract or the law or are misusing the Services (e.g., violating any of the Dos and Don'ts or  Professional Community Policies)."  Some key Don'ts with respect to usage of data available on LinkedIn are as under:

"8.2. Don'ts

You agree that you will  not:

  1. ...;
  2. Develop, support, or use software, devices, scripts, robots, or any other means or processes (including crawlers, browser plugins, and add-ons or any other technology) to scrape the Services or otherwise copy profiles and other data from the Services;
  3. ...;
  4. Copy, use, disclose or distribute any information obtained from the Services, whether directly or through third parties (such as search engines), without the consent of LinkedIn;
  5. ...;
  6. Violate the intellectual property rights of others, including copyrights, patents, trademarks, trade secrets, or other proprietary rights. For example, do not copy or distribute (except through the available sharing functionality) the posts or other content of others without their permission, which they may give by posting under a Creative Commons license;
    .....
  1. Rent, lease, loan, trade, sell/re-sell or otherwise monetize the Services or related data or access to the same, without LinkedIn's consent;
    .....
  1. Violate the  Professional Community Policies or any additional terms concerning a specific Service that are provided when you sign up for or start using such Service, and the  Bing Maps terms where applicable."

It is pertinent to note that while the above-mentioned Don'ts specifically disallow the process of scraping, no clarity is available in the Do's and Don'ts with respect to usage of Public Data of other users for commercial usage after further processing/analysis. However, any copying, use, or monetization of data available on LinkedIn without LinkedIn's consent shall be in violation of the User Agreement and the Policies as the Professional Community Policies require that the user follows the rules, agreements, and policies of LinkedIn in letter and spirit.

Insight on Commercial Usage of data

In July 2019, LinkedIn has introduced a commercial use limit for searching where user activity on LinkedIn indicates that user is likely using LinkedIn for commercial use, like hiring or prospecting. In such cases, LinkedIn suggests the user upgrade to one of LinkedIn's Premium account plans such as Premium Business, Sales Navigator, Sales Insight, Recruiter, etc. to increase the number of profile searches, views, and additional insights into user data.

HiQ v. LinkedIn apropos legality of scraping publicly available data

HiQ Labs, a data analytics startup has been suspended by LinkedIn for allegedly violating its User Agreement as it scrapes information from LinkedIn members' public profiles (including name, job title, work history, and skills) and provides business analytics solutions to businesses.

In 2017, LinkedIn issued a cease-and-desist letter to hiQ, alleging that hiQ's usage of scraping bots violated LinkedIn's User Agreement, as well as the CFAA and other laws. HiQ retaliated by filing a lawsuit, requesting certification that the company was not breaking any laws and an injunction prohibiting LinkedIn from denying it access to its users' data. It is important to note that this decision is in conflict with the 2003 decision of the 1st Circuit Court of Appeals in the case of EF Cultural Travel BV v. Zefer Corp.,  where it was held that "where a publicly available website explicitly bans data scrapers (e.g., in its terms of service), further access by data scrapers is without authorization under the CFAA".

In 2019, the 9th U.S. Circuit Court of Appeals ("Court") dismissed LinkedIn's allegations that it was protecting its users' privacy and implementing its User Agreement  as it emphasized that "LinkedIn has only a non-exclusive license to the data shared on its platform, not an ownership interest." The core business model of LinkedIn was identified by the Court as a platform for professionals to share their information for commercial gains. It was also noted by the Court that the very fact LinkedIn has developed its own data analytics tool to generate revenue from its members' data in fact supported the Court's view of LinkedIn not having "its members' privacy interests in mind". Subsequently, LinkedIn was restrained from disallowing hiQ access to its LinkedIn account.

In 2021, upon appeal filed by LinkedIn, the Supreme Court of the U.S., based on Van Buren v. United States, vacated the decision of the Court and remanded back the case for further review to decide whether data scraping by hiQ amounted to unauthorized access. The Supreme Court in Van Buren overturned an 11th Circuit Court decision and adopted a narrow interpretation of "exceeds unauthorized access" under the CFAA, ruling that an individual "exceeds authorized access" when he or she gains access to a computer with authorization but then obtains information from off-limits areas of the computer, such as files, folders, or databases. Scraping of publicly available data from LinkedIn is not likely to violate the CFAA as LinkedIn 'computers' are accessible to the public at large. As such, hiQ did not access the 'computers' of LinkedIn without any authorization as required by the CFAA.

The Way Forward

While the user data available on LinkedIn is owned by the users themselves and LinkedIn having only a non-exclusive license to the same, it can be argued that any public data e.g., User Name, Current Job, Experience, Education, etc. taken from the platform shall not violate the LinkedIn's User Agreement and other policies. Accordingly, any processing of such data for commercial purposes shall not be violative of LinkedIn's Policy/User's Privacy as evidenced from the decision of the Court in hiQ case.

However, where any data in addition to the public data such as data restricted only to connections, not visible to the public due to privacy settings, etc. is taken out and processed for commercial purposes, the same shall be subject to dispute as it may amount to unauthorized to the access of personal data and punishable under the CFAA.

A question may arise that since the information is publicly available, there exists no such legal reason that disallows the usage. However, LinkedIn noted that hiQ's software bots were able to extract data on a large scale that was "far beyond what any individual person could do when viewing public profiles". Just because users are adhering to the guidelines and usage rules of an individual platform does not mean they are giving their consent for the information to be used by other companies.

If the Supreme Court rules in favor of hiQ, more companies as hiQ may mine and exploit users' personal information on publicly available websites regardless of the websites; terms, and conditions and without the fear of CFAA. A ruling of such sorts would sabotage the effectiveness of the terms of service of websites and applications that explicitly prohibit data scraping.

Given Van Buren's restrictive interpretation of the CFAA's "exceeds authorized access" provision, it wouldn't be shocking if the Court follows suit with its 2019 opinion on access "without authorization." Still, it is unclear how the Court would decide when it hears the case again, especially considering the several episodes of large-scale scraping of social network content that have occurred in the last year (including LinkedIn), perhaps boosting LinkedIn's policy argument about the need to preserve user privacy.

Data Scraping On LinkedIn Vis-À-Vis Commercial Usage

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.