The increasing datafication of our lives has spotlighted data as a key resource of our generation. Following the elevation of data value, "web crawling (hereinafter referred to as "crawling"), a technique used to collect data from the web, has quickly gained relevance. Crawling is a tool that automatically collects and indexes websites, hyperlinks, data, and information resources that are distributed and stored in many computers.
Although crawling is currently applied broadly in many fields, there are still conflicting social views on this topic. Even Korean law does not clearly stipulate the standards for lawful crawling. Companies have been complaining of difficulties arising from the uncertainty, and critics have pointed to this phenomenon as a hindrance to developing new industries that utilize data.
By reviewing past Court decisions, this article will analyze the key issues and legal opinions (and defenses) on crawling to fathom the possibility of establishing a cohesive standard for lawful crawling.
- Legal Opinions in Favor of Restricting Crawling
- Copyright and Database Producers Rights Infringement Claims and Copyright Restrictions
- Overview of Copyright Regulations
Under the Copyright Act, the copyright owner has limited exclusive rights on the work of the copyright, the rights to reproduce, distribute, publicly perform, exhibit, and produce a derivative of the original work (Article 16). Database Producers, who produce databases that are not of creative nature, also have rights similar to copyrights.
Likewise, the Copyright Act stipulates limitations on the Author's Economic Rights (Articles 23 through 38), which is applied mutatis mutandis to database usage in the Limitations on the Rights of Database Producers (Article 94), which limits the use of copyright regardless of the copyright holder's will. This law corresponds to the "Fair Use" doctrine in U.S. law.
- Court Decision on Whether Crawling Infringes Copyright and Database Producers' Rights
When crawling public data from several websites, there are occasions when data such as copyrighted material or databases are copied.
In a civil suit where Saramin (Defendant) crawled and collected recruitment information posted on the website of JobKorea (Plaintiff), the Supreme Court affirmed the unlawfulness of the Defendant's actions by acknowledging the Plaintiff's infringement of data-producer rights (such as the right of reproduction) on the grounds that the Defendant copied recruitment information from the Plaintiff's database methodically and repetitively without paying any form of compensation, and that it had taken the Plaintiff several years to accumulate such data (Seoul High Court Decision 2016Na2019365, Decided on April 6, 2017).
On the other hand, in another crawling case where GC Company (Defendant) collected information on accommodation services from Yanolja (Plaintiff), the Supreme Court ruled that the Defendant did not cause unfair damage to the Plaintiff's business interest on the grounds that the Defendant collected the data of accommodation services of only 3 to 8 items out of a total of 50 items, and that the information of accommodation services was already well known and did not require a considerable cost or effort to acquire.
- Claims of the Violation of the Act on Promotion of Information and Communications Network Utilization and Information Protection, Etc.
- Overview of the Information and Communications Network Act
Article 48(1) of the Act on Promotion of Information and Communications Network Utilization Act (hereinafter referred to as the "Information and Communications Network Act") prohibits intrusive acts and states that "no one shall intrude on an information and communications network without a rightful authority for access or beyond a permitted authority for access." Furthermore, Article 72(1)1 stipulates the crime of intrusion of information and communications network by punishing "a person who intrudes on an information and communications network in violation of Article 42(1)." This regulation was newly established to legally punish hacking after cyberattacks began appearing in the early 2000s.
- Court Decision on Whether Crawling Violates the Information and Communications Network Act
Regarding a case where Saramin (Defendant) collected recruitment information from JobKorea (Plaintiff) through crawling, the Court recognized a violation thereof under the Information and Communications Network Act as the Plaintiff had uniformly established measures to prohibit crawling that the Defendant bypassed using virtual private network (VPN).
In contrast, in the second trial of the GC Company v. Yanolja case, the Court acquitted the Defendant on the grounds that the Plaintiff prohibited neither packet capturing (a program that analyzes program source codes) nor crawling in its terms of service. Furthermore, the Court noted that the Plaintiff had not installed any accessibility restrictions, allowing easy access to API servers on mobile and PC web browsers. The Supreme Court later confirmed this decision (Seoul Central District Court Decision 2020No611, Decided on January 13, 2021).
- Claims for Computer Interference with Business under the Criminal Act
- Overview of Interference with Business
Article 314(2) under the Criminal Act stipulates Computer Interference with Business to mean an act by any person who interferes with another person's business by damaging or destroying any data processor, such as computers or special media records, such as electromagnetic records, inputting false information or improper order into the data processor, or making any impediment in processing any data by any other way.
- Court Decision on Whether Crawling Constitutes an Interference with Business
In the case above, GC Company (Defendant) was acquitted of Interference with Business charges in the second and third instance trials as the courts found neither the intent to cause interference to Yanolja (Plaintiff) 's business nor the proof thereof.
- Claims of Violation of the Unfair Competition Prevention and Trade Secret Protection Act
- Overview of the Unfair Competition Prevention Act
Item (m) of subparagraph 1 of Article 2 of the Unfair Competition Prevention and Trade Secret Protection Act (hereinafter referred to as the" Unfair Competition Prevention Act") stipulates unfair competition acts to be "any acts infringing on other persons' economic interests by using the outcomes, etc. achieved by them through substantial investment or efforts, for one's own business without permission, in a manner contrary to fair commercial practices or competition order."
Meanwhile, the Framework Act on Promotion of the Data Industry and the Use of Data (hereinafter the Data Framework Act) was implemented on April 22, 2022, and ordered the establishment of Item (k) in subparagraph 1 of Article 2, which prescribes a smaller scope of data that is subject to protection and acts of unfair competition, compared to what is stipulated in the Data Framework Act.
- Court Decision on Whether Crawling Violates the Unfair Competition Prevention Act
Before the addition of Item (k) to subparagraph 1 of Article 2 under the Unfair Competition Prevention Act (hereinafter referred to as Item (k)), the data collected by crawling that could receive legal protection was provided by Item (m) of Article 2, subparagraph 1 (hereinafter referred to as Item (m)). Thus, crawling constitutes an act of unfair competition if it corresponds to a so-called free-ride targeting web/app services in (potential) competition in the market.
However, in the case of Enhawiki v. Enhawiki Mirror Site, although the Plaintiffs accused the Defendants of violating the Unfair Competition Prevention Act, the Court did not adjudicate on violation thereof as the Unfair Competition Prevention Act is of supplementary nature, and it had already recognized a breach in the Copyright Act.
Meanwhile, in the civil trial of Saramin v. JobKorea, the court of the first instance acknowledged the Defendant's unfair competition acts under the Unfair Competition Prevention Act because the Defendant had used VPN services to conceal its IP and had anonymized the User Agent name of its search engine robot, and then crawled HTML source code without reading the Plaintiff's robots.txt file (Seoul Central District Court Decision 2015Gahap517982, Decided on February 17, 2016).
In the case of Yanolja v. GC Company, although GC Company was acquitted in the second-instance criminal trial, in the civil trial, the court of the first instance affirmed that the acts of unfair competition under Item (m) before the amendment were still effective and ordered the Defendant to pay KRW 1 billion in damages and cease further duplication of the Plaintiff's accommodation data (Seoul Central District Court Decision 2018Gahap508729, Decided on August 19, 2021). The Court ruled that the Plaintiff's database could not be regarded as part of the public domain and that the Plaintiff's database was subject to legal protection as the Plaintiff had visited each accommodation five to ten times a month that year and had spent a considerable amount of money to collect information.
- Evaluation on Establishing a Legal Allowance Standard for Crawling
As explored above, traditional legal systems, such as the Copyright Act, focus on protecting data, leading to many legal controversies surrounding violations of crawling.
Although the Copyright Act now includes a fair use doctrine and allows the possibility of exemption from the act of crawling, there is a limit to the clarity provided by comprehensive general provisions, especially when other related laws do not include exemption clauses similar to the fair use doctrine. Consequentially, although crawling may be exempted from the liability under the Copyright Act for fair use, there are still possibilities of it being exposed to civil or criminal liabilities under other statutes.
This uncertainty becomes a risk for companies striving to establish an information-based industry in stride with the era of the Fourth Industrial Revolution. Therefore, criteria for permissible crawling must be legislatively clarified by including specific fair use reasons for exempting regular crawling activity from copyright liabilities and introducing or applying similar clauses to other related Acts.
When comprehensively examining the standards of fair use and the arguments above, the evaluation of lawful crawling seems to be decided based on its purpose, subject, method, and consequences. On the other hand, considering the development of technology and the current internet environment, the size and proportion of the collected data do not seem to be a critical criterion in deciding the lawfulness of crawling.
In terms of the purpose of crawling, unlawful crawling is determined based on whether crawling was used for unfair free-riding purposes. These unfair acts must be distinguished from crawling activities with the purpose of establishing datasets for AI learning, research and comparison studies, or strategy development.
Regarding what information was subjected to crawling, the primary consideration is whether the collected information was public data or encrypted confidential data. Especially in cases where public data is an accumulation of raw data, the matter should be treated differently than when the data on the site had been consistently verified and updated by its owner.
One should also consider whether the method of crawling was ubiquitous and normal. This has also been a persistent consideration in court decisions on crawling. Normal crawling activities should be differentiated with cases of mirroring where a competing business replicates its competitor's entire webpage.
Lastly, we must consider whether crawling breaches the interest of the owner of the data. For example, instances where the information collected by crawling is stored on a database via outlinks are relatively less likely to harm the data holder's interest, and should thus not be easily classified as unlawful crawling. Furthermore, as profits created from collecting and utilizing public data may lead to increased consumer welfare and add to public interest, such profits must be quantified before making judgment on the consequences of crawling.
Data-related rights and legal protection have positive aspects in promoting data production. However, the excessive protection of data rights can conversely discourage data collection and use, hinder the industry's development in the long run, and even reduce data production.
In particular, there are many issues prohibiting the collection and use of public data, especially for raw or simple data. These types of data are considered public goods rather than the exclusive property of the person who first collected the data.
Therefore, while data protection is essential, it is also imperative for ethical crawling to be legally protected. The first step in accomplishing this task is establishing a clear legal standard for what constitutes legal crawling.
- Jong Sang Jo, Park Jun Seok, Intellectual Property Rights 5th Edition, Hongmoonsa, 2020.
- Kim Daehwi, Kim Shin, Annotated Criminal Law (주석 형법), Korean Institute of Judicial Administration, 2017, p.314.
- Ramge Thomas and Schönberger Viktor Mayer, Reinventing Capitalism in the Age of Big Data, 2018.
- Hwang Tae Hi, An Analysis of Web Crawling from a perspective of Competition Law, 2021.
- Kim Hyun-Sook, Critical review on legal issues in the public data and its use through crawling, 『KWLR』 61, 2020.
- Kwon Se Jin, Lee Jung Hun, Lee Chang Moo, A Study on the Legal Perception of Web Crawling in the Data Economy Era, Korean Journal of Industrial Security, Vol.11, No.3, 2021.
- Choi Hojin, Loopholes in Criminal Law to New Types of Hacking Attacks, Korean Institute of Criminal Justice Policy, Vol.18, No.4, 2007.
- Choi Ji Yei, The Legal Implication of the Robots Exclusion Standard, Center for Law & Technology, Vol.16, No.5, Issue 89. 2020.
- Zamora Amber, Making Room for Big Data: Web Scraping and an Affirmative Right to Access Publicly Available Information Online, The Journal of Business, Entrepreneurship & the Law, 2019, p.223.
- Park Hyun Ik, 데이터 수집부터 겁나는 기업들... “불법인지 합법인지 모르겠다” [“Legal or Not?” Companies Cautious of Collecting Data] (Accessed on June 10, 2022.) https://biz.chosun.com/site/data/html_dir/2020/09/23/2020092300325.html
- Park Min Jae, 이제까지 이런 크롤링 소송 없었다...네이버 열받게한 스타트업 [“The Start-Up Company that Gave NAVER a Headache… Groundbreaking Case on Web Crawling] (Accessed on June 10, 2022.) https://www.joongang.co.kr/article/25064806
- Shim Hyun Joo, Lee Heonhui, A Study on the Protection of Data as a Type of Unfair Competition, Hanyang Law Review, Vol.35, No.4, 2018, p.179.
- Jong Sang Jo, 인공지능과 데이터법 [Artificial Intelligence and Data Laws] (Accessed on June 10, 2022.) https://m.lawtimes.co.kr/Content/Info?serial=159295
- Koo Minki, “크롤링 무죄” 판결에 속타는 대형 플랫폼 [Court Decision Pronounces Web Crawling Legal… Major Social Platforms Begin to Fret] (Accessed on June 10, 2022.) https://www.hankyung.com/society/article/2022051568601
- Lee Timothy B., Court Rejects LinkedIn Claim that Unauthorized Scraping Is Hacking (Accessed on June 10, 2022.) https://arstechnica.com/tech-policy/2017/08/court-rejects-linkedin-claim-that-unauthorized-scraping-is-hacking/
- Jeon Seung Jae, 크롤링은 정보통신망 ‘침입'인가 [Is Crawling an Intrusion on Information and Communication Networks?] (Accessed on June 10, 2022.) http://m.segye.com/view/20210427512517
- Lee Kwang Wook, Lim Chulgun, Lee Keun Woo, 크롤링, 최근 개정된 부정경쟁방지법이 바라보는 시각 [Crawling in the Eyes of the Newly Amended Unfair Competition Prevention Act] (Accessed on June 10, 2022.) https://m.lawtimes.co.kr/Content/Article?serial=178323
- Wikipedia (Accessed on June 10, 2022.) https://ko.wikipedia.org/wiki/%EB%A1%9C%EB%B4%87_%EB%B0%B0%EC%A0%9C_%ED%91%9C%EC%A4%80
- Telecommunications Technology Association, IT Terminology Dictionary.
- PMG Knowledge Engine Laboratory, Common Sense Dictionary.
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.