By now, it should come as no surprise to anyone that ChatGPT uses vast amounts of datasets for training that also contain personal data. However, the fact that it can potentially be made to disclose information about these datasets and other people to users is new.

A group of scientists discovered that when ChatGPT is asked to repeat a word endlessly, the result can be that it is quoting phrases from its source data.1 This source data in some cases also contains personal information such as name, e-mail address and phone number. In theory, ChatGPT is designed to not disclose its training data, let alone large amounts of it. However, the scientists were able to extract "several megabytes of ChatGPT's training data". Machine-learning models generally recall a certain percentage of the data used to train them. The more sensitive or unique this data is, the less desirable it can be that parts of it are made public directly. In some cases, machine-learning models may be used to exactly reproduce training data, but in these cases a generative (language) model such as ChatGPT should not be used.

How much training data such models actually remember cannot be determined. For this reason alone, the team of scientists are worried that it may be impossible to distinguish between "safe" models and those that appear to be safe but are not. The scientists discovered this exploit back in July and reported it to OpenAI, the creator of ChatGPT, in August. When we tested this exploit in December, we were able to replicate the results in some cases and got ChatGPT to disclose some information on its training data (e.g. about the conflict in the Middle East). This shows that even the developers of these models still do not fully understand how they work. Surprises and potentially dangerous situations can still occur during their development.



The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.