India has witnessed a steep rise in digital adoption, culminating in an increased usage of networked payment infrastructures like UPI, coordinated healthcare and vaccination platforms like CoWIN and Ayushman Bharat and the like. At the same time, processing abuse poses privacy risks and other harms at an individual and collective level. Unauthorized access, disproportionate surveillance and identity theft are rampant and deserves attention from all stakeholders.

Various e-governance projects have been implemented to improve the delivery of public services and simplify accessing them. Projects such as UID (Aadhaar), e-Pramaan, banking, insurance, custom-excise etc. at the Central level and many other at the State level, require gathering and processing of personal information (often sensitive information) by various entities for provisioning services. New age projects like National Healths Mission, Cowin vaccination, Arogya Setu etc. generate vast volumes of data. Multiple projects might need to scale to a large volume when required instantly during emergencies, law and order issues, elections etc. Additionally, cross-functional access and processing of personal information can be attributed to integration of services. Government entities are the most extensive data fiduciary and accountable for lawful processing and protection of personal information.

In the above backdrop, it is vital that safety and privacy protection are given paramount importance to build trust in use of digital systems and platforms and all safety measures are followed to improve security and privacy. One such privacy-enhancing technique is 'data anonymisation'.

Data Anonymisation is a processing technique that removes or modifies direct and indirect personally identifiable attributes to eliminate or significantly reduce identifiability. It typically results in "anonymised data sets" that cannot be associated with an individual. The Ministry of Electronics and Information Technology ("MeitY") released draft guidelines for anonymisation of data ("Guidelines"). The Guidelines aim to provide measures to all entities engaged in processing of personal information (and subtypes) in e-governance projects.

The Guidelines suggest a number of techniques and SOPs that e-governance projects can adopt to anonymise the data they gather. Following are the key elements of the Guidelines:

What, When and How to Anonymise?

The Guidelines clarify that determining what data to anonymise, at which stage of the data processing cycle, and how depends on an organisation's objectives, emerging regulatory regimes and standards. There is a need to consolidate diverse practices and develop standards for data anonymisation. Different industry sectors and projects would require different implementation strategies, depending on the sensitivity of data and potential harm that may be caused to data principals in the event of re-identification etc.

However, the Guidelines clarify that as best practice and keeping in mind the principle of 'data minimisation', anonymisation should be done before active data processing or the earliest in the information lifecycle, wherever possible.

Processing Purposes

The Guidelines categorises organisational data, broadly as follows:
(i) Purpose based processing, which requires the organisation to clearly define all such purposes and get clear, affirmative, and explicit consent from the data principals for processing purpose.
(ii) Processing to fulfil a lawful disclosure request.
(iii) Sharing data with data-processors/third parties and other entities for processing purposes.
(iv) Processing to integrate products and services with other data-tech ecosystems for the benefit of consumers.
(v) Any additional processing that the organisation carries out to improve services, cross-sell, collaborate or maintain the competitive edge. In some cases where processing is experimental or short-lived, some organisations do not typically declare it as a formal purpose and collect consent against it.

The Guidelines provide an overview of the data anonymisation process, detailed hereinafter:

Step 1: Identify PII and Sensitive Data – Personally Identifiable Information (PII) and sensitive personal data and critical data or project or software application should be identified. Examples are biometrics, health records, financial records, authenticated services, addresses, unique identifiers etc.

Step 2: Determine Data Sources – The data sources where the PII and sensitive data is potentially stored, used or referred should be identified. For instance: (i) Application User Interface – The PII Data in the user interface should be masked; (ii) Files – The PII data in text files (like CSV), spreadsheets (like excel), documents (like the word), pdf documents; (iii) Documents and Images – The files should not contain PII data in scanned documents and images (like Aadhaar, PAN) etc.

Step 3: Data Discovery – (i) Identify the fields where the PII and sensitive data is being stored or used in the project; (ii) Identify all the PII fields in the application. In tables, files, logs, storage media and prints; (iii) The discovery of PII Data can be manual or automated, while an automated approach is preferred over manual, to improve the accuracy of discovery; and (iv) Define the patterns like Aadhar, PAN.

Step 4: Determine the Anonymisation Technique – (i) The data Anonymisation techniques like data pseudonymisation, data redaction should be determined; (ii) For the purpose and the roles, Anonymisation technique that would be used should be determined; (iii) Define Data redaction rules based on the roles. Certain information should be made available to certain roles only. For instance, the last few digits of the PAN information would be made available to high privilege users.

Step 5: Anonymize Data – (i) Apply anonymisation rules for the data identified; (ii) Risks outstanding – Identify the data that is still not anonymised for various technical reasons. This should be reported as a part of Governance; and (iii) For the existing applications, one-time data Anonymisation needs to be done.

The Guidelines also define various techniques for anonymisation of data. Their advantages, limitations and suitability for various datasets/scenarios have also been elaborated under the Guidelines. The Guidelines also classify the anonymisation into two categories: Static or In-place anonymisation, which is a permanent irreversible alteration of data; and dynamic anonymisation, which is applied dynamically to the results of a query and not to the entire data set.

The Guidelines add that while data anonymisation of personal data is possible in many circumstances, anonymisation should not be seen as a silver bullet solution to ensure privacy, arguing that it should be a component of a larger privacy-by-design approach to an e-governance project's operations. The Guidelines are open to public comments until September 21, 2022.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.