Legal Landscape Of Web Scraping And Practice Tips

Article Insights

Snell & Wilmer are most popular:

in European Union

This Legal Alert is a follow up to our June 3, 2021 Legal Alert, "Supreme Court Narrows Scope of the Computer Fraud and Abuse Act," and provides an overview of relevant legal developments related to the topic as well as considerations for entities seeking to engage in web-scraping practices.

Computer Fraud and Abuse Act

The Computer Fraud and Abuse Act ("CFAA") ¹ generally prohibits computer hacking and enumerates severe criminal penalties when an individual "intentionally accesses a computer without authorization or exceeds [his or her] authorized access." The CFAA also includes a private right of action in which persons suffering "damage" or "loss" as a result of a CFAA violation can sue the violator for money damages and equitable relief. ²

Two recent high-profile cases concerning CFAA interpretation will affect the legality of web scraping for publicly available information. As a reminder, web scraping is the process of extracting data from a website or specific webpage.

Van Buren v. United States ³and "Exceeding Authorized Access"

On June 3, 2021, the U.S. Supreme Court ruled in Van Buren that an individual "exceeds authorized access" when such individual "accesses a computer with authorization but then obtains information located in particular areas of the computer—such as files, folders, or databases—that are off limits to [the individual]." In other words, the Court found that an individual may be liable under the CFAA on a "gates-up-and-down" approach. On the one hand, an individual may not be liable under the CFAA if they are authorized to access an entire computer system, (i.e. the gates are up), and access any portion(s) thereof. On the other hand, an individual may be liable under the CFAA if they are granted access only to a limited portion of a computer system, (i.e. the gates are down), and the individual accesses areas within the system beyond those to which they were granted access.

LinkedIn v. hiQ Labs, Inc. ⁴ and "Without Authorization"

On June 14, 2021, the U.S. Supreme Court granted LinkedIn's petition for certiorari filed in the hiQ case. The Court subsequently vacated the Ninth Circuit's prior ruling and remanded the case back to the appeals court for further consideration in light of the Court's ruling in Van Buren. For purposes of CFAA liability, it would seem that hiQ may rest on whether LinkedIn actually lowered the gate (i.e. limited access) to publicly available information on its website through technical restrictions and a formal revocation of access such that hiQ's access to LinkedIn's data was "without authorization." Ultimately, the Ninth Circuit may determine that CFAA liability does not apply to publicly available website data and that the gate for public website content is always up.

Practice Tips

As of today, it is unclear how a court would apply CFAA to publicly available information on a webpage. In addition to potential CFAA liability, various State statutes and common law claims which may be applicable to web scraping are ambiguous.

In light of the foregoing, entities who are seeking to engage in web scraping practices may consider the following:

Review the target website's terms of use/terms of service. This may allow the entity to understand what the website does and does not permit. By not doing so, the entity may unknowingly violate prohibitions and subject itself to breach of website contract claims. Alternatively, entities seeking to engage in web scraping may also consider mitigating risk by entered into a licensing arrangement with the target website to access the desired data.
Avoid performing prohibited operations on websites. This may include avoiding tools that allow circumvention of the security measures that are in place to deter automatic data downloads and/or ignoring explicit limitations on the allowable access to data that may forbid duplication and storage. By not doing so, the scraping entity may subject itself to breach of website contract claims.
Use the target website's relevant Application Programming Interface (API). An API is a set of procedures and communication protocols that provide access to the data of an application, operating system, or other services. Because APIs are controlled by the owner of the dataset in question, this approach may give an entity seeking relevant data clear access to the owner's publicly available data for free or at a set price.
Respect the robots.txt file on the target website. A robots.txt file will contain instructions on how bots should treat a site when they access it. By not respecting the robots.txt file, an entity may overload the target website with requests and cause the website to kick the bot off the website. By not doing so, the scraping entity could cause physical harm to the computer network by consuming a significant portion of the target website's capacity, which may lead to a potential trespass to chattels claims.
Monitor restrictive actions taken by the target website. This may include the use of CAPTCHAs, rate limits, and/or blocking of IP addresses. By not doing so, and thereby reproducing a portion of a website's database deemed to be a trade secret, a scraping entity may be subject to unlawful misappropriation of trade secret claims.
Respect cease-and-desist letters. While the Ninth Circuit has opined on whether the formal revocation (in the form of a cease-and-desist letter) of hiQ's access to LinkedIn's information is enough to close the gate on hiQ's access to public information, it is prudent to be aware of this unanswered legal question.
Avoid collecting personally identifiable information. This may include personally identifiable information collection and use, such as the E.U.'s General Data Protection Regulation ("GDPR") and the California Consumer Privacy Act of 2018 ("CCPA"). By not doing so, the scraping may be in violation of certain privacy statutes – certain of which include a private right of action.
Avoid bypassing or deceptively creating access permissions to gain access to a computer network. This may include websites with username and password requirements. By not doing so, the scraping entity could be accessing information which may no longer be deemed "publicly available." Additionally, web scraping that simulates organic human use of the website may constitute a fraudulent misrepresentation on which the user of the web scraper intends the website to rely, which may lead to potential fraudulent misrepresentation claims.
Avoid using the exact same scraped data for commercial purposes. This may include reproducing scraping information on other website(s) and/or using such information for commercial gain. By not doing so, the scraping entity may be subject to the Digital Millennium Copyright Act ("DMCA") claims that protects the creative selection, coordination, and arrangement of information and materials forming a database or compilation.

Footnotes

1 18 U.S.C. § 1030

2 18 U.S.C. § 1030(g)

3 593 U.S. __ (2021)

4 938 F.3d 985 (9th Cir. 2019); No. 19-1116, 2021 WL 2405144 (U.S. June 14, 2021)

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

Legal Landscape Of Web Scraping And Practice Tips

Contributor

Computer Fraud and Abuse Act

Practice Tips

Footnotes

Litigation, Mediation & Arbitration

Contributor

United States