A recent OAIC decision provides useful guidance for businesses on techniques for de-identifying personal information - especially for the purpose of training AI models. Importantly, good governance and planning for privacy is vital when commencing a new initiative involving new technology.
The Office of the Australian Information Commissioner (OAIC) recently concluded its preliminary inquiries into I-MED Radiology Network Limited's (I-MED) disclosure of de-identified patient data to Annalise.ai without taking regulatory action. I-MED had been sharing the de-identified patient data without patient consent or providing notice to train an AI model for diagnostic imaging.
The OAIC found that the data was sufficiently de-identified and no longer constituted 'personal information' under the Privacy Act 1988 (Cth) (Privacy Act).
Key takeaways for de-identifying personal information
Businesses should consider the following key takeaways from the OAIC's report when proposing to use de-identified personal information to train AI models:
- Develop a robust de-identification
methodology: Use recognised standards and techniques (e.g.
hashing, redaction, aggregation) to ensure data is no longer
reasonably identifiable. Ensure this methodology is documented and
reviewed regularly.
- Mitigate risk of re-identification: Impose
contractual obligations on data recipients to prevent
re-identification, including prohibiting data merging and AI based
re-identification. Use technical controls to prevent linkage with
other datasets.
- Strengthen data governance: Establish clear
internal policies and procedures for de-identification and data
sharing, aligned with frameworks like the 5-Safes Principles.
Include prescriptive guidance and ensure staff are trained on
compliance requirements.
- Transparency and reputation risks: Inform customers about how their de-identified data may be used to mitigate reputational risks.
In addition to any Privacy Act considerations, businesses should also be mindful of consumer law risks including:
- False, misleading or deceptive conduct: Ensure
that any representations made to consumers about the use of their
data, including how the business will be using and training AI
models, are not false, misleading or deceptive.
- Unfair contract terms and unfair trading practices: Requiring consumers to allow their data to be used to train an AI model as a prerequisite or condition of receiving services could potentially amount to an 'unfair contract term' if included in a contract, or constitute an 'unfair trading practice' under the Federal Government's proposed unfair trading practices regime. See Extending an unfair trading practices prohibition to commercial arrangements with small businesses: a potential chilling effect for more information.
Background to the OAIC's preliminary inquiries
Between 2020 and 2022, I-MED shared de-identified patient data (including clinical scans and reports) with Annalise.ai to train an AI model for diagnostic imaging. Patients were not notified, nor was consent obtained.
Following media coverage in September 2024, the OAIC launched a preliminary inquiry into I-MED's data practices in response to growing public concern over the use of personal information in AI development. The OAIC has emphasised that training AI models with personal information is a high-risk activity under the Privacy Act and will be a regulatory focus going forward.
Application of the APPs
As part of its preliminary inquiry, the OAIC examined whether I-MED's disclosure of patient data to Annalise.ai involved personal information, including health information, or whether the information had been sufficiently de-identified so that the Australian Privacy Principles (APPs) did not apply.
The OAIC's assessment focused on the following key concepts:
Personal Information
The APPs apply to information about an identified or reasonably identifiable individual. Health information is treated as 'sensitive information' under the Privacy Act and is subject to more stringent requirements for how it is collected, used and disclosed by organisations.
The definition of personal information may also soon be expanded to information or an option that "relates to" an identified individual as one of the Federal Government's many proposed privacy reforms.
De-identification
Information that has been de-identified so that it no longer reasonably identifies an individual will not be subject to the APPs. De-identification involves removing or altering identifiers (e.g. names, addresses or rare traits) to prevent re-identification.
Re-identification risk
Data will only be considered de-identified, and exempt from the application of the APPs, in circumstances where the process to re-identify an individual is so impractical that there is almost no likelihood of it occurring.
The risk of 're-identification' is heightened when the de-identified information is combined with other datasets or processed by AI systems that are trained on very large datasets. An MIT study in 2015 found that just four fairly vague pieces of information are enough to identify 90% of individuals in a data set of 1.1 million users' credit-card transactions.
The broader privacy risks from re-identification are also being addressed as part of upcoming privacy reforms, with the Federal Government agreeing to further consultation on introducing a criminal offence for malicious re-identification of de-identified information, and considering how de-identified information can be protected from unauthorised re-identification under the Privacy Act.
IMED's de-identification process
After reviewing IMED's data practices, the OAIC was satisfied that the following measures that IMED implemented were sufficient to de-identify the personal information, and the APPs would not apply.
Technical measures
I-MED de-identified the patient records by:
- segregating the patient data from the underlying dataset;
- scanning the records with text recognition software;
- using two hashing techniques (for unique identifiers such as
patient ID numbers, and names, addresses and phone numbers);
- time-shifting dates (to a random date within a specified number
of years);
- aggregating certain fields into large cohorts to avoid
identification of outliers; and
- redacting any text that appears within or within 10% from the boundary of an image scan.
Contractual measures
I-MED imposed the following obligations on Annalise.ai in relation to how it handled the de-identified patient data:
- prohibiting them from doing any act, or engaging in any
practice, that would result in the patient data becoming
'reasonably identifiable';
- prohibiting them from disclosing or publishing the patient data
for any purpose (to prevent wider dissemination of the dataset and
accordingly reduce the risk that the patient data may become
re-identifiable in the hands of other third parties or the public
domain);
- requiring them to store the patient data in a secure
environment, and
- requiring them to notify I-MED if it inadvertently received any patient personal information.
These contractual obligations were crucial in addressing rare instances where personal information was mistakenly shared with Annalise.ai due to de-identification errors. In line with its contractual duties, Annalise.ai promptly identified and reported these issues to I-MED, allowing the data to be deleted or properly de-identified.
Governance measures
I-MED developed a Data De-identification Policy and Approach to guide how de-identified patient data was shared, which reflected many of the practices endorsed by the National Institute of Standards and Technology. This included:
- utilising the 5-Safes Principles (safe people, projects,
settings, data and outputs);
- ensuring separation of the Annalise.ai and I-MED
environments;
- utilising a 'Data Use Agreement Model';
- imposing prescriptive de-identification standards;
- removing or transforming all direct identifiers; and
- utilising top and bottom coding and aggregation of outliers.
Guidance for using de-identified personal information
The OAIC's report serves as a valuable reference for businesses seeking to use de-identified personal information responsibly. By adopting rigorous technical, contractual, and governance measures, organisations can reduce privacy risks and comply with legal obligations, while still leveraging data for innovation.
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.
![]() |
![]() |
Lawyers Weekly Law firm of the year
2021 |
Employer of Choice for Gender Equality
(WGEA) |