The explosion of connected devices and digital data over the past decade has led to an exponential increase in the number of data points collected about any one individual. More data points cross the Internet every second today than were stored in the entire Internet 20 years ago. Connected fitness devices, navigation and other mobile apps, social media platforms, and tracer or beacon technologies are but a few examples of technologies that have made it possible to collect an unprecedented volume and variety of data about individuals. This has created new opportunities for using such data, including for analytics, product and service development, and machine learning applications.

In order to protect individual privacy and use "big data" effectively, organizations employ a variety of techniques to remove personal identifiers so that such data no longer constitutes personal information that is subject to privacy laws. The federal government's proposed Consumer Privacy Protection Act (CPPA), which seeks to reform the Personal Information Protection and Electronic Documents Act (PIPEDA), introduces new standards for "anonymization" and "de-identification" that may change how organizations create and use "big data" in Canada.

Existing Regime Under PIPEDA

Under PIPEDA, "personal information" means "information about an identifiable individual." Information is about an identifiable individual if there is a serious possibility that an individual could be identified through the use of that information, either alone or in combination with other information. If data meets this definition of personal information, it is subject to privacy protections under PIPEDA.

PIPEDA imposes limitations on how personal information can be collected, used, and disclosed. Use, for instance, is generally limited to the purposes for which the information was collected and which were communicated to the individual at the time of collection. This can reduce the utility of such information, since all of its potential uses may not be known at the time the information is collected. Potential uses may become apparent only after the information has been collected; however, if such uses were not communicated to the individual at the time of collection, the information cannot be used for such new purposes unless the individual consents to them (subject to certain exceptions).

To realize the tremendous value that "big data" represents, organizations often modify personal information so that it is no longer about an identifiable individual, thereby removing it from the scope of PIPEDA as the information no longer meets the definition for "personal information". Once a data set no longer contains personal information, an organization is free to use it for any purpose. By generating larger, more complete, and more diverse data sets – and by structuring and organizing them intelligently – organizations can use them to develop more accurate predictions and models, and ultimately, to make better decisions and outputs.

PIPEDA does not, however, set standards for how to modify personal information in this way, nor does it define terms such as "de-identify", "anonymize" or "pseudonymize." Organizations have been left to determine the most appropriate techniques for ensuring that personal information is modified in a manner that excludes it from the statutory definition of "personal information."

Anonymization and De-identification as Solutions for Maximizing Utility of "Big Data"

De-identification and anonymization are two techniques that are used to protect privacy and create useful data sets.

De-identification typically involves replacing identifying personal information in a given data set with a code or key so that individuals are no longer identifiable. For example, instead of an individual's personal information being labelled with their first and last name, it would be labelled with a code such as "Subject 999". That individual is thus no longer directly identifiable, though it remains possible to re-identify them by consulting the coded key list. It may also be possible to indirectly identify the individual if one is aware of some of their attributes which are part of the data set (e.g., if Subject 999 has a relatively rare medical condition, and/or the data set is specific to a contained geographic area). Although the risk of re-identification remains, de-identification is a useful measure to safeguard privacy while using "big data."

Anonymization involves the permanent removal of identifying information in a way that no longer allows the individual to be identified. Although there is no risk of re-identification (assuming the anonymization is effective), this may result in a less useful data set. For example, the data elements may be less granular, or may be point-in-time (rather than allowing for new data about an individual to be added to existing data about the same individual, over time).

PIPEDA does not distinguish between de-identification and anonymization, and generally treats de-identified information and anonymized information in the same manner – neither are personal information for the purposes of PIPEDA. However, the CPPA would introduce new rules regarding de-identification and anonymization.

CPPA Changes and Standards for Anonymization and De-identification

The CPPA expressly distinguishes between de-identification and anonymization.

  • Anonymize means to irreversibly and permanently modify personal information, in accordance with generally accepted best practices, to ensure that no individual can be identified from the information, whether directly or indirectly, by any means.
  • De-identify means to modify personal information so that an individual cannot be directly identified from it, though a risk of the individual being identified remains.

Relative to PIPEDA, the CPPA proposes to regulate both types of information.

  • The CPPA proposes to regulate whether anonymization is effective by referencing generally accepted best practices. However, if anonymization is effective, the anonymized information is excluded from the scope of the CPPA.
  • The CPPA also proposes to regulate the use and disclosure of de-identified information, as de-identified information would still be subject to the CPPA (with certain limited exceptions).

In this way, the CPPA proposes to regulate information that is not, strictly speaking, personal information. This is a notable divergence from PIPEDA, which treats both de-identified information and anonymized information as being outside the scope of PIPEDA.

Anonymization Standards

The CPPA does not expressly permit organizations to anonymize personal information without an individual's knowledge or consent, however, there are various grounds for interpreting the CPPA as not requiring consent for anonymization (consistent with interpretations of PIPEDA). These grounds include: (i) anonymization is a means of disposing of personal information (and disposal does not require consent), (ii) no consent is needed since the process does not result in personal information that is subject to the CPPA, (iii) consent can be implied (particularly as the end result is not personal information that is subject to the CPPA), and (iv) under the CPPA, consent is not needed if the organization determines that anonymization falls under the CPPA's new consent exemptions for specified business activities or activities in which the organization has a legitimate interest.

The stringent standard for anonymization under the CPPA requires that personal information be irreversibly and permanently modified to ensure than an individual cannot be identified by any means.

This onerous standard may present challenges, particularly given the rife debate about whether or not personal information can ever be truly anonymized. Consider, for example, information about "a female engineer" at a company of 200 employees. If combined with enough data, there will always be a possibility that the female engineer in question can be identified. This creates uncertainty around the potential consequences of using anonymized data, as it may turn out to be identifiable in some contexts and thus subject to the CPPA's privacy requirements.

Notwithstanding the above phrasing, which suggests that the standard is an absolute one, the CPPA's requirement is qualified: the information is anonymized in accordance with generally accepted best practices. So, the standard may be relative, rather than absolute. However, what constitutes generally accepted best practices remains subject to interpretation.

De-identification Standards

The CPPA permits organizations to de-identify personal information without an individual's knowledge or consent, provided that the technical and administrative measures used to de-identify information are proportionate to the purpose for which the information is de-identified and the sensitivity of the information.

Once de-identified, the CPPA allows certain uses and disclosures of that de-identified information without the knowledge or consent of the individual to whom the information relates, namely:

  • internal research, analysis and development purposes;
  • use or disclosure in connection with a prospective business transaction; and
  • disclosure to a government, health or educational institution for a socially beneficial purpose (i.e., a purpose related to health, the provision or improvement of public amenities or infrastructure, the protection of the environment or any other prescribed purpose).

Also, de-identified information is not considered personal information in connection with CPPA requirements to:

  • ensure that personal information under an organization's control is accurate, up-to-date, and complete; and
  • respond to requests from individuals to access their personal information or to have their personal information corrected or disposed of, or disclosed to another organization within the same data mobility framework.

In other respects, the CPPA requires that de-identified information be handled as though it is personal information.

The CPPA imposes strict limitations on re-identification. Once an organization has de-identified personal information, the information cannot be used to identify an individual except to (a) test the effectiveness of security safeguards, (b) test the fairness and accuracy of models, processes, and systems that were developed using the de-identified information, (c) test the effectiveness of the de-identification process, or (d) comply with the CPPA or other law.

The Privacy Commissioner of Canada may also authorize an organization, at the organization's request, to re-identify information that has been de-identified, if it is in the interests of the individual.

Standards for Anonymization and De-identification in Other Jurisdictions

As illustrated in our Comparative Table, definitions of "anonymize" and "de-identify" vary across jurisdictions, which can make compliance more complicated for organizations operating in multiple jurisdictions. The terminology varies, as do the standards for anonymization and de-identification.

  • Alberta and British Columbia's private sector privacy laws are similar to PIPEDA in that they do not expressly define (or otherwise address) anonymization or de-identification.
  • Quebec's Bill 64 considers that personal information has been anonymized if it is "reasonably foreseeable in the circumstances" that the information, at all times, no longer allows an individual to be identified directly or indirectly. The reasonable foreseeability qualifier makes this a slightly lower standard than that of the CPPA.
  • The California Consumer Privacy Act (CCPA) defines "deidentified" in a manner that essentially equates to how the CPPA and Quebec's Bill 64 define "anonymize." The CCPA's definition of "deidentified" includes a reasonability qualifier.
  • The European Union's General Data Protection Regulation imposes a similarly onerous standard for anonymization as the CPPA and expressly excludes such information from the scope of the GDPR. "Pseudonymization" under the GDPR is generally equivalent to the CPPA's definition of "de-identify".

Final Thoughts and Key Takeaways

As the CPPA continues through the legislative process, its provisions on anonymization and de-identification merit close scrutiny by legislators – particularly in relation to how generally accepted best practices will be identified in relation to anonymization.

It is critical that the CPPA find the right balance between creating standards for anonymization and de-identification that reasonably protect individual's privacy without hindering innovative and beneficial uses of data. Unclear or overly strict standards can cause a significant amount of uncertainty for organizations wishing to create and use anonymized or de-identified data, and in doing so, will impede potential benefits or lead to undesirable outcomes. For example, individuals or communities in jurisdictions with more onerous standards for anonymization may become underrepresented in data sets used to train algorithms and generate predictions, models and other outputs, which could ultimately lead to these outputs being biased, inaccurate or imprecise.

In light of the proposed CPPA, organizations should be prepared to re-assess their approach to creating anonymized or de-identified data, and the purposes for which each type of data is used. Keeping abreast of developments in the legal and regulatory landscapes and periodically assessing against best industry practices can help ensure that organizations remain compliant, something that is particularly important given the significant sanctions introduced in the CPPA.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.