The NHS is welcoming the use of artificial intelligence (AI) as a tool to reduce time spent on laborious administration and give clinicians more time for patient care. One such development is the ambient scribe: a generative AI system that transcribes and summarises consultations in real-time, with some products capable of automatically populating the electronic health record (EHR) with clinical notes and summaries. Although this technology is incredibly powerful and potentially transformative, it introduces a plethora of data protection risks that must be carefully assessed and mitigated.
Even before formal NHS guidance was introduced, many clinicians had begun trialling AI scribing tools to help manage growing documentation demands. This early adoption raised concerns around the use of unregulated software, lack of patient transparency, and gaps in data governance.
In its April 2025 guidance, the NHS formally endorsed ambient scribe tools in health care settings but set out clear expectations for its safe use. The guidance emphasises the need for: clinical safety assurance, legal and regulatory compliance, data protection and cybersecurity controls, and human oversight of AI-generated content.
It also reinforces the requirement for organisations to complete a Data Protection Impact Assessment (DPIA), implement appropriate safeguarding, and maintain full accountability for how patient data is processed, particularly when using AI tools outsourced to external vendors.
NHS organisations and any provider using these tools must identify a lawful basis for processing under Article 6 of the UK GDPR. Since the data involved includes health information and voice recordings, it also falls into the category of special category personal data. This means a specific condition under Article 9 must also be met, most commonly Article 9(2)(h): the provision of health or social care.
Before the use of any ambient scribe software, it is mandatory to complete a DPIA. Your DPIA should address the types of data involved (e.g. audio, transcripts), how the tool processes and stores that data, the risk of inaccurate transcriptions, and whether vendors have access to the data and reuse it for AI model training, which remains a key concern for many patients.
Patients must also be informed when AI tools are being used to capture, transcribe, or summarise their consultations, and provided with clear information on how their data is processed, stored, and safeguarded. Additionally, it is necessary that the clinician review and verify the generated summary before it is committed to the EHR, to ensure clinical accuracy.
Vendors will often state that data is anonymised (all personal identifiers are removed such that it cannot be linked to any individual), or it is pseudonymised (identifiers are replaced with code), but in this case, there is still potential for identification, particularly with biometric data such as voice recordings.
Voice data poses unique challenges because it is biometrically identifiable. A person's voice carries distinct characteristics: accent, pitch, tone, cadence, and speech patterns, that can be used to recognise or verify identity. Even short clips of recorded speech can be cross-referenced with other available data (e.g. previous calls, public interviews, or online videos). Furthermore, the content of the recording often includes contextual information such as the speaker's profession, location, or names of relatives that can indirectly lead to re-identification, especially in smaller communities or where rare clinical conditions are discussed. Hence, voice data is extremely difficult to fully anonymise.
Currently, under the UK GDPR, pseudonymised data remains personal data and must be treated as such. True anonymisation is extremely difficult to achieve in healthcare, particularly when voice recordings are involved. Medical transcripts often contain rare medical conditions, specific occupations, geographic references, or familial details that, even in the absence of direct identifiers, can significantly increase the risk of re-identification.
Therefore, while vendors may use truly anonymised data outside the scope of UK GDPR, they must be able to demonstrate that anonymisation is irreversible and independently assessed. Any reuse of identifiable or re-identifiable data for training purposes without an appropriate legal basis is unlawful.
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.