On August 24, 2023, the Office of the Privacy Commissioner of Canada ("OPC") issued a joint statement suggesting that social media companies ("SMC") and other websites have an obligation to actively protect the personal information posted by users against unlawful data scraping("Joint Statement").This Joint Statement was issued along with the regulators of other members from the Global Privacy Assembly's International Enforcement Cooperation Working Group ("IEWG").1
- Personal information that is publicly accessible may still be subject to data protection laws and require protection.
- While the expectations are phrased as recommendations, the OPC stated that "many of them are explicit statutory requirements in particular jurisdictions or may be interpreted as such by courts and data protection authorities", suggesting that the OPC will be interpreting PIPEDA in this manner going forward.
- Social media companies and other organizations with websites containing personal information should review their practices around preventing data scraping to ensure appropriate diligence beyond notices to users that they should be careful about what personal information they post.
- A larger issue is whether such companies and websites should have an obligation to protect against data scraping. Where an individual is made aware of the risks, and chooses nonetheless to make their personal information publicly available, is it appropriate to hold social media companies and others accountable for the choices of consumers?
- Massive data scraping incidents can be considered reportable breaches
What is Data Scraping?
Data scraping is an automated technique in which a computer program is used to extract (or "scrape") information available on web pages. The company or individual using data scraping typically collects the scraped information and uses it for another purpose. For instance, a company may data scrape the social media websites of individuals who have posted their age, thereby creating a database of a certain demographic that may be sold or made available to data brokers or marketers. There OPC has said there is increased data scraping of individuals' personal information from social media and other websites that host publicly accessible data.
The OPC and IEWG identified a number of privacy concerns with the use of scraped data, including such information being used for:
- Targeted cyberattacks;
- Identity fraud;
- Monitoring, profiling or surveilling individuals;
- Unauthorized political or intelligence gathering purposes; and
- Unwanted direct marketing or spam.
What are the Recommended Steps?
To address these concerns, the Joint Statement provided "recommendations" on how SMCs and other websites should implement multi-layered technical and procedural controls to mitigate the risks, including:
- Designating a team and/or specific roles within the organisation to identify and implement controls to protect against, monitor for, and respond to scraping activities.
- 'Rate limiting' the number of visits per hour or day by one account to other account profiles, and limiting access if unusual activity is detected.
- Monitoring how quickly and aggressively a new account starts looking for other users (as abnormally high activity could be indicative of unacceptable usage).
- Taking steps to detect scrapers by identifying patterns in 'bot' activity. For example, a group of suspicious IP addresses can be detected by monitoring from where a platform is being accessed by using the same credentials from multiple locations. This would be suspicious where these accesses are occurring within a short period of time.
- Taking steps to detect bots, such as by using CAPTCHAs, and blocking the IP address where data scraping activity is identified.
- Where data scraping is suspected and/or confirmed, taking appropriate legal action such as the sending of 'cease and desist' letters, requiring the deletion of scraped information, obtaining confirmation of the deletion, and other legal action to enforce terms and conditions prohibiting data scraping.
The various regulators also suggest that at least in some jurisdictions, data scraping may constitute a data breach, triggering notifications to affected individuals and privacy regulators as required.
The Joint Statement is not binding on Canadian organizations, but the endorsement of the Joint Statement by the OPC suggests the OPC shares the approach behind the Joint Statement and could launch investigations of social media companies and other website owners that it believes are falling short of the recommendations in the Joint Statement.
The difficulty here is that the responsibilities the Joint Statement would assign to Canadian companies are not necessarily clearly grounded in the Personal Information Protection and Electronic Documents Act, ("PIPEDA"), the Act that governs privacy protection in the private sector.
Under PIPEDA, companies are required to safeguard personal information under their control and to protect it against "loss or theft, as well as unauthorized access, disclosure, copying, use, or modification". The recommended practice on SMC's to proactively monitor data scraping, while congruent with this statutory obligation, goes one step beyond the protection of personal information as anticipated under PIPEDA.
Under PIPEDA, a "breach of security safeguards" is "the loss of, unauthorized access to or unauthorized disclosure of personal information resulting from a breach of an organization's security safeguards". It would appear on its face to meet this criteria, although it is questionable whether the use/misuse of publicly posted information is a result of a breach of the organization's "safeguards".
Assuming for the moment that it is, the next step would be to do conduct a "real risk of significant harm" assessment to determine whether a breach is reportable. It is the breach itself which must create the harm (the language in PIPEDA says a breach is reportable "if it is reasonable in the circumstances to believe that the breach creates a real risk of significant harm to an individual." If the information is already publicly available, is there any additional harm created by the "breach" (which assumes there use of such publicly available information is, in fact, a breach)?
Factors that are relevant to determining whether a breach of security safeguards creates a real risk of significant harm include the sensitivity of the personal information involved in the breach of security safeguards and the probability the personal information has been/is/will be misused.
Note that under PIPEDA, companies have the right to collect personal information that is "publicly available" without the consent of the individual.2 However "publicly available" information is narrowly defined in the Regulations but does include "personal information that appears in a publication, including a magazine, book or newspaper, in printed or electronic form, that is available to the public, where the individual has provided the information."3 This would appear to squarely address the situation of at least some social media sites (e.g., "information that appears in a publication....in printed or electronic form....where that individual has provided the information").
However, the OPC has previously condemned the collection of such information, for instance, in the case of Clearview AI's scraping of billions of images of people from across the Internet and providing it to third parties. This joint statement reinforces the OPC's approach that personal information hosted on social media is not considered as publicly available information exempted under PIPEDA.
The Joint Statement also advises individuals on steps to help protect themselves against the risks of scraping such as paying attention to platforms' privacy policies; being careful about what they choose to share online; modifying their privacy settings; and making complaints to the SMC and then to the OPC where they are concerned about having been targeted by data scraping.
The Joint Statement is an attempt to harmonize the global data protection principles and practices on data scraping and to protect individuals specifically against the generative AI tools which have been trained on people's data without their knowledge or consent. However, the OPC's interpretation appears to be an expansive reading of the text the Regulation.
1. Australia, United Kingdom, Hong Kong, Switzerland, Norway, New Zealand, Colombia, Jersey, Morocco, Argentina and Mexico.
2. Personal Information Protection and Electronic Documents Act, SC 2000, c 5, s 7(1)(d).
3. Regulations Specifying Publicly Available Information, SOR/2001-7, s 1(e).
Dentons is the world's first polycentric global law firm. A top 20 firm on the Acritas 2015 Global Elite Brand Index, the Firm is committed to challenging the status quo in delivering consistent and uncompromising quality and value in new and inventive ways. Driven to provide clients a competitive edge, and connected to the communities where its clients want to do business, Dentons knows that understanding local cultures is crucial to successfully completing a deal, resolving a dispute or solving a business challenge. Now the world's largest law firm, Dentons' global team builds agile, tailored solutions to meet the local, national and global needs of private and public clients of any size in more than 125 locations serving 50-plus countries. www.dentons.com
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances. Specific Questions relating to this article should be addressed directly to the author.