Pseudonymization vs anonymization: Which to use when

Written by Amitai Richman | April 18, 2023

Data privacy laws have made pseudonymization or anonymization a requirement for data processing, access, and security. But which is best for your needs?

Table of Contents

Pseudonymization vs Anonymization: Background
What is Pseudonymization?
What is Anonymization?
Pseudonymization vs Anonymization: What’s Better for Me?
Pseudonymization vs Anonymization: The Business Entity Approach

Pseudonymization vs anonymization backgrounder

The advent of the California Consumer Privacy Act (CCPA) the EU’s General Data Protection Regulation (GDPR) meant that privacy by design and other data security mechanisms that protect the integrity of Personally Identifiable Information (PII) must be built into IT systems and services. One of the core principles of privacy by design is data minimization – requiring all data products and services to store, process, and display as little PII as possible.

Data minimization encourages enterprises to:

Limit data processing to information that could not directly identify an individual.
Restrict data gathering of non-sensitive data as much as possible.
Pseudonymize or anonymize any remaining sensitive data fields.

The last action (pseudonymization vs anonymization) falls under different GDPR categories. Thus, it’s crucial for data-driven organizations to understand the differences and similarities between pseudonymization vs anonymization, to ensure compliance and reduce liability.

Pseudonymization defined

The origin of the word pseudonym is Greek, meaning false name. A simple example of a pseudonym is Clark Kent, who is also known as Superman, or Bruce Wayne, who is also known as Batman.

In data, pseudonymization is slightly more complex. It’s the act of replacing sensitive data fields with non-sensitive data fields to make PII less accessible to unauthorized users. But it needs to be done is a certain manner. According to Article 4 of GDPR, pseudonymization is the processing of PII in such a way that it can no longer be attributed to a specific data subject without using additional information. This additional information – which changes according to the pseudonymization technique chosen – must be stored separately and is subject to the strictest data masking standard.

Anonymization defined

The word anonymous also comes from Greek and means no name. Thus, anonymized data is data that is made permanently unrecognizable. Data anonymization tools remove any possibility of identifying the data subject, and there’s no additional information on Earth that could restore the original data.

At the same time, even organizations that are required to anonymize PII need to ensure that they can use anonymized data for research or statistical purposes. This means that effective data anonymization is more complicated than pseudonymization. The goal of anonymization is not only to eliminate personal identifiers (so that non-authorized users can’t discover the identity of an individual from the remaining data), but also to ensure that anonymized data retains whatever business value the organization requires.

It's worth noting that because the anonymization process is permanent, GDPR does not specifically mandate the anonymization of data – whereas it does mandate pseudonymization. The reason? According to Recital 26 of the GDPR, once data is irrevocably anonymized, principles of data protection no longer apply to it. Anonymized data can no longer be related to an identifiable individual, and regulations like GDPR essentially disregard it.

Pseudonymization vs anonymization techniques

There are numerous techniques for both anonymizing and pseudonymizing data. Here are some of the most prominent for each:

Pseudonymization methods

Here are the 5 most common pseudonymization techniques:

Counter – Considered the simplest technique for pseudonymization, the counter substitutes identifiers with a number chosen by a monotonic counter, whose values never repeat to prevent ambiguity.]
Message Authentication Code (MAC) – This keyed-hash function requires a secret key to generate the pseudonym for each data field. Without the key, it’s impossible to map the identifiers to pseudonyms.
Random Number Generator (RNG) – The RNG mechanism creates unpredictable values – numbers that have equal probability of selection from the total possibility population – then assigns these values to an identifier. This can be accomplished using a cryptographic pseudo-random generator or a true random number generator.
Encryption – This technique converts plain text into coded text to protect it from unauthorized access. Despite its reputation as a robust pseudonymization method, regulations like GDPR still regard encrypted text as PII.
Cryptographic Hash Function (CHF)– This function maps input strings of arbitrary length to fixed length outputs, thus directly applying the hashing function to the identifier. This then provides the corresponding pseudonym, which is dependent on the length of the digest the function produces.

Anonymization methods

Below are 5 common data anonymization techniques, among many others:

Scrambling – This reversible process rearranges data in a non-intuitive way to render it useless without a special key or algorithm to decode it. For example, it can mix up letters (turning secure into curese), swap bits or bytes, fabricate pseudorandom numbers, or apply mathematical transformations to the data.
Personalized anonymization – This type of anonymization empowers the user to choose his or her own method, generally using applications or scripts.
Directory replacement – This technique replaces a real, existing directory with a new, fictitious one while preserving its name, attributes, and contents. It preserves the statistical properties of the original dataset, while protecting the privacy of the individuals represented in the data.
Masking – Data masking replaces sensitive data with fake, but structurally similar data, to preserve the original format and functionality of the data, while making it unreadable to unauthorized users or applications. Data masking tools employ a variety of methods including string or character substitution, data or number variance, and synthetic data generation, among others.
Blurring – Blurring leverages an approximation of data values to make their meaning obsolete and make identifiers irrelevant. It’s commonly used to obscure PII in unstructured data such as images, PDFs, etc.

Choosing between pseudonymization vs anonymization

The choice between pseudonymization and anonymization has legal, security, and data usability implications.

From a legal perspective, because some form of re-identification is possible, pseudonymized data is still considered PII and thus subject to regulations like GDPR. Anonymized data is not. So, if an organization needs access to identifiers for its own business purposes, it would probably prefer pseudonymization. While a company seeking to avoid regulatory liability altogether, would probably choose anonymization for its sensitive data.

In an enterprise, while customer support might require reversible data pseudonymization tools (in order to access PII in its call centers, for example), software testing teams may prefer data anonymization tools, due to their uncompromising security.

Business entity approach to data masking

Whether an organization chooses pseudonymization or anonymization for its sensitive data – or a combination of the two – the most advanced approach to identifying, organizing, categorizing and masking data is via business entities. A business entity approach to data masking technology assigns each data asset to a specific business entity (e.g., a customer, service, or payment). This way, sensitive data can either be pseudonymized or anonymized on demand, despite being drawn from disparate systems. When used in conjunction with intelligent business rules, the business entity approach is a highly pragmatic, yet flexible, way to achieve sensitive data minimization and privacy by design.

Learn how K2view data masking tools provide a business
entity approach to pseudonymization and anonymization.

View full post