With the world becoming a global market with incessant increase in cut-throat competition, businesses are now more than ever dependent on data and data analysis and develop business strategies in order to excel in their relevant field. Data or statistics are the basis of carrying out any administrative, business, management or advisory decision. One prominent practice of obtaining data is that of data scraping. This software extracts large amount of publicly available information quickly. Its importance is seen in almost every business and industry. But it is pertinent to look at this program in the light of IT and IP perspective in order to address an even bigger issue at hand.
Data Scraping- Meaning
Data scraping (or web scraping) is a methodology in which a computer program / software is used to import any data or information from a website into a readable output format. Although, an alternate option of traditional copy-pasting of data from a website is available, it is very cumbersome and time-taking. Websites such as e-commerce have hundreds of pages and manually copy-pasting the same is not a feasible option.
Data scraping is an easier method to do the same task in a faster and efficient manner. This also reduces the probability of errors as compared to manual copying. Irrespective of the size of data or number of pages the data scarping tool enables to import all the required information within a fraction of time that would take in manually doing the same. Further, certain websites contain data that cannot be manually copied. However, a data scraping software can extract such data as well.
In short, data scraping or web scraping is an automated process of extracting data from a website. This poses two serious issues, one is that data scraping process can also infringe upon protected content and import and access it and second, the process can import personal data that goes against privacy policies of the government as well as of most websites. However, a good tool can act as a useful instrument or a potential weapon, and data scraping can act as both very effectively.
As data is the basis for most business decisions it is most commonly used for the purposes of a business or to gain competitive intelligence. The need to know the business decisions of a competitor such as latest updates, promotions, products information, etc. in the shortest time-frame in order to take greatest advantage of a situation, the need for such data has increased tremendously. Companies analyze the data obtained from these resources to develop market strategies and make major business decisions.
However, this use of a data scraping tool, could result in infringing of intellectual property of a company or an individual while obtaining data. Data scraping is also actively resisted by websites, as it is estimated by a data security service that 40% of the total traffic on a website is of data extractors which in turn slows down the servers.
Although, data scraping or web scraping is also used in competitive analysis for price comparison, reputation monitoring to extract customer reviews, journalism with infographics provides increased reliability and an array of varied fields. Therefore, the use of this methodology is not limited to any specific industry.
The data in the digital world is ever changing. The time-critical information such as share prices, news, etc. are all public information. With increased technology, the automated data extraction from these data sources has become a lucrative business. But with such constant extraction of large amounts of data leads to slowdown of the website. Apart from this, it also causes increased server costs. Sometimes, this activity when used illegally also leads to DDoS (Distributed Denial of Service) Attack.
Data scraping also infringes upon copyrightable or copyrighted content over the websites. This results in an information compilation of infringing content that more often than not are hard or sometimes impossible to trace / track down.
Illegal uses of data scraping
Data scraping does not only acquire publicly available information but sometimes also extract confidential information. These leakages can significantly harm the growth and business plans of a company.
An airline industry was unable to sell its seats to real people as bots were programmed to scrape certain flights, routes and classes of tickets. These bots acting as fake buyers were continuously creating but not completing the reservation of those tickets thereby disallowing real customers to buy tickets.1
Legality of Data Scraping / Extraction / Mining
Copyright Act and Data Scraping
Data scraping / extraction / mining leads to legal challenges for both the content creator and the content importer. Data scraping majorly involves copying of data from a source, therefore the Copyright laws come into picture. Section 2 (o) of the Copyright Act, 1957 provides that a literary work includes compilations and since the data scraped is a compilation it comes under the category of a literary work.
As per Section 13 (1) (a) of the Copyright Act, 1957, a copyright vests in "original" literary work. Due to this fact, a question that arises is that whether the content importer is a copyright holder in his compilation or not.
In this regard, it can also be contended on behalf of the content owner that the compilations have been extracted from their website and therefore holds copyright over their content.
Remedies for the content importer
In order to establish copyright in such compilation the content importer needs to prove that there has been some minimal degree of creativity involved in such compilation. This was reiterated by the Supreme Court of India in EBC v. D.B. Modak2 wherein it was held that copyright subsists in a compilation which fulfils the criteria of skill and judgment doctrine. This doctrine provides that any compilation which is not novel or non-obvious but at the same is not merely a product of labour and capital and involves a minimal degree of creativity, shall have a copyright.
Another remedy for a content importer is that the said work is being used for a non-commercial purpose or fall under Section 52 of the Copyright Act, 1957
However, it is always advisable for the importer to take prior permissions from the copyright holder before initiating the process of data scraping even if obtaining such data would fall under fair use provisions of the copyright laws.
Remedies for the content creator
Section 51 read with Section 14 of the Copyright Act states that a work will be deemed to have been infringed if it is in violation of any of the rights and provisions laid out in Section 14. However, prior to determining whether an infringement has occurred the owner should establish ownership and that such act of the alleged infringer does not fall within fair use exception under Section 52.
IT Act and Data Scraping
The Information and Technology Act, 2000 recognizes the potential threat that the process of data scraping can pose and has incorporated provisions that provide for limitations and sanctions for any illegal or harmful use of it. Some Sections of the IT Act that provide for data scraping regulations are:
Section 43: This section provides for sanctions for any damage to a computer or a computer system by the use of computer contaminant that modifies, destroys, records or transmits data inside of a computer, computer system or network. Although, this provision is not limited to just use of computer contaminants and also provide penalty for physically done damage or by accessing a computer system or server remotely.
Section 66: This section is a penal section that provides for punishment in terms of jail time and fine for the acts mentioned under Section 43. The imprisonment term under this section can extend up to 3 years and invoke a fine of INR 5 Lacs.
The IT Act, 2000 has checks and balances in place that broadly in their definition and scope include the process of data scraping and provide for a detailed description of what amounts to misuse of data scraping. However, the issue of data scraping still persists, mostly because its uses outweighs its misuse. The IT Act, 2000 though should not be seen as a limiting Act but be seen as a regulating legislature.
Technology is a double-edged sword and the user of such technology determines the purpose of the same. With data scraping, a widespread array of websites has achieved mutual functionary benefits that not only enhances their business but also enhances customer satisfaction. In contrast to this, data scraping is rampantly causing media content piracy online and slowing down online streaming websites. In these depictions of the diverse and varied uses of data scraping, there exists an underlying balance between IPs as well as its wrongful exploitation. However, our legal diversification is able to curb the misuse of this tool and make it function within the legal boundaries. Although, there still remains the issues of vigilance, data scraping policies of the websites and traceability of data importer / extractor; tackling them requires simple steps such as usage of a protected and regulated server, constant check-up of computer systems and networks, coding IP content in non- extractable formats and recording server information for easy traceability of an infringer. That still does not guarantee an all rounded protection and much relies on constant vigilance.
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.