This is a little alarming.
Anonymised data isn’t nearly anonymous enough – here’s how we fix it
We developed a machine learning model to assess the likelihood of reidentifying the right person. We took datasets and we showed that in the US fifteen characteristics, including age, gender, marital status and others, are sufficient to reidentify 99.98 per cent of Americans in virtually any anonymised data set.
Some more examples.
The simple process of re-identifying patients in public health records
In late 2016, doctors’ identities were decrypted in an open dataset of Australian medical billing records. Now patients’ records have also been re-identified – and we should be talking about it.
‘Anonymous’ browsing data can be easily exposed, researchers reveal
A journalist and a data scientist secured data from three million users easily by creating a fake marketing company, and were able to de-anonymise many users.
“What would you think,” asked Svea Eckert, “if somebody showed up at your door saying: ‘Hey, I have your complete browsing history – every day, every hour, every minute, every click you did on the web for the last month’? How would you think we got it: some shady hacker? No. It was much easier: you can just buy it.”