Pseudonymization vs. Anonymization and How They Help With GDPR

Pseudonymization vs. Anonymization Pseudonymization and Anonymization are two distinct terms that are often confused in the data security world. With the advent of GDPR, it is important to understand the difference, since anonymized data and pseudonymized data fall under very different categories in the regulation.

Pseudonymization and Anonymization are different in one key aspect. Anonymization irreversibly destroys any way of identifying the data subject. Pseudonymization substitutes the identity of the data subject in such a way that additional information is required to re-identify the data subject.

You can think about it in terms of authors. Let’s say we have 10 books written by “Anonymous”, we have no way of identifying if all 10 books were written by the same person, or if they were written by 2,3,4 or 10 different persons. Now let’s say we have 10 books written by Mark Twain. We know that all 10 books were written by the same person, even if we don’t know that Mark Twain is actually Sam Clemens. Clemens wrote under a pseudonym, while the other authors in our example were anonymous.

In practice, let’s look at tokenization. Tokenization provides a consistent token for each unique name and requires access to additional information (our static lookup tables/code books) to re-identify the data:

Pseudonymization vs Anonymization

Here, with the pseudonymized data, we may not know the identity of the data subject, but we can correlate entries with specific subjects (records 1 and 7 reference the same person, records 2 and 5 reference the same person, records 3 and 4 reference the same person). If we have access to re-identify the data via the token lookup tables, then we can get back to the real identity. With the anonymized data, however, we only know that there are 7 records and there is no method to re-identify the data.

Pseudonymization is a method to substitute identifiable data with a reversible, consistent value. Anonymization is the destruction of the identifiable data.

With Anonymization, we must also be concerned about “indirect re-identification”. If we return to our author example above. An analysis of the writing style of our anonymous authors might allow us to indirectly identify them. We might not be able to identify the name, but we might be able to identify that specific books were written by the same person, because of their unique writing style. If that author has also written something under their own name, we might be able to completely identify the individual, by comparing the anonymous writing style with known author styles.

As an example, let’s say an organization retains records of a customer’s purchase history, but they anonymize the name, address and other easily identifiable records. Since humans are creatures of habit, it may still be possible to identify a record indirectly.

Every morning, Monday through Friday, Bob goes to the same coffee shop and buys the same coffee and scone for breakfast. He always uses his debit card. On Friday night, he always withdraws $200 from the ATM next to his office, because it’s poker night with his buddies.

Even if the organization has “anonymized” Bob’s personally identifiable data (destroyed his name, address, etc.), his behavior allows us to indirectly re-identify him (all of these transactions reference the same person, because we can identify his predictable behavior). Therefore, the data set has not been properly anonymized.

To properly anonymize this data, we might have to use additional methods to ‘hide’ individual behavior. For example, we might only store records based on some kind of grouping.

“50 people went to this coffee shop every morning.”
“100 people got money from this ATM every Friday.”
“A total of $100,000 was taken from this ATM on Friday.”
“30 people bought scones today”

Now the data has been anonymized, because we have no way of seeing Bob’s predictable pattern of behavior.

Protegrity tokenization is an excellent form of pseudonymization.

Anonymization, is an exercise that should be undertaken by expert statisticians, data scientists, etc. and based on the sort of data retained by the individual organization.

The post Pseudonymization vs. Anonymization and How They Help With GDPR appeared first on Protegrity.

Pseudonymization vs. Anonymization and How They Help With GDPR

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112