Data pseudonymization in web analytics: the ultimate guide

One of the biggest surprises of GDPR is that it includes pseudonymous data into the scope of personal data.

However, it doesn’t necessarily mean that the regulation treats this type of data exactly the same as identified pieces of information. Already feeling confused? Don’t worry, we’ve got you covered.

In this blog post we want to show you what pseudonymized data is and why it’s worth your effort. We’ll also look at the most important threats and obligations involved in dealing with it.

Sound good? Then let’s get started.

What is data pseudonymization?

First, let’s cover some basics – the definition of the term.

Article 4(5) of GDPR states that:

‘pseudonymisation’ means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.

So, what constitutes a pseudonymous data is that:

it’s been altered in a way that makes it impossible to single out an individual without some additional information,
additional information is kept separately and protected using technical and organizational measures (for instance, control access) to ensure that personal data is not attributed to a natural person.

However, under GDPR, pseudonymization techniques are not enough to provide full anonymity of the data.

Although Recital 28 recognizes that pseudonymization can reduce risks to the data subjects, it is not alone a sufficient way to exempt data from the scope of the Regulation.

As we can read in the Recital 26:

…Personal data which have undergone pseudonymisation, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person…

If you want to read more about data anonymization, we recommend you read this exhaustive blog post: The Ultimate Guide to Data Anonymization in Analytics.

Which kinds of data should be pseudonymized?

In order to comply with the GDPR, you should pseudonymized every piece of personal data. The definition of personal data in the GDPR is very broad and includes information such as:

Login details
Device IDs
IP addresses
Cookies, including ones storing a device/user identifier
Browser type
Device type
Plug-in details
Language preference
Time zones
Screen size, screen color depth, system fonts
… and much more

That’s quite a long list, isn’t it?

What are the most popular methods of data pseudonymization?

The regulation itself doesn’t specify which pseudonymization techniques are considered to be adequate. However, there are a few decent sources of information on that subject.

For instance, some good examples of GDPR-compliant pseudonymization techniques can be found in this extremely informative article by Alex Ewerlof: GDPR Pseudonymization Techniques.

If you want to learn more about more technical aspects of pseudonymization, be sure to check it out.

A Practical Guide to Acquiring Consent in the Age of GDPR

Read our exhaustive guide on collecting, managing, and storing user consents, plus learn the ways GDPR Consent Manager can help you remain privacy compliant

Download FREE Guide

If not, get familiar with our quick wrap up.

Below are the most popular pseudonymization techniques according to the author:

Scrambling: a mixing or obfuscation of letters.

Encryption: a process of encoding data, making it unintelligible and scrambled. In a lot of cases, encrypted data is also paired with an encryption key, and only those that possess the key will be able to open it.

Masking: a technique that allows you to hide the most important part of the data with random characters or other data.

Tokenization: replacing sensitive data with non-sensitive substitutes (tokens). The tokens have no extrinsic or exploitable meaning or value.

Source

Data blurring: blurring distinctive variables to reduce the risk that someone could trace the individual based on their characteristics.

What are the benefits of using data pseudonymization?

As we’ve mentioned earlier, GDPR sets more relaxed standards for processing pseudonymized data than it does in the case of identified personal data. With pseudonymized data:

1. You’re allowed to use it for the different purposes that it was collected for

In Article 6, GDPR provides an exception to the purpose limitation principle saying that the data should be used only for the purpose it was initially collected for. We can read there that:

Where the processing for a purpose other than that for which the personal data have been collected is not based on the data subject’s consent or on a Union or Member State law which constitutes a necessary and proportionate measure in a democratic society to safeguard the objectives referred to in Article 23(1), the controller shall, in order to ascertain whether processing for another purpose is compatible with the purpose for which the personal data are initially collected, take into account, inter alia […] the existence of appropriate safeguards, which may include encryption or pseudonymisation.

Also, GDPR provides an exception to the purpose limitation principle for data processing for scientific, historical and statistical research.

However, the data should be processed using appropriate safeguards, in accordance with this Regulation, for the rights and freedoms of the data subject.

2. You’re free from processing data subject requests regarding data access, rectification or erasure

Under the GDPR, data which have undergone pseudonymization may be exempt from certain data subject rights, such as subject correction and erasure requests. However, to do that, you should demonstrate that they themselves are not able to identify the data subject.

Also, the latest opinion of Information Commissioner’s Office indicates that pseudonymized data should be included in the scope of portability rules.

3. You prove that you’re following protection by design and data privacy principles

What’s more, thanks to data pseudonymization your data could be used as a way to prove that you’re making all possible efforts to ensure the security of your users’ data. According to data privacy experts, this technique can be treated as:

Part of a privacy by design strategy
Part of a risk minimization strategy
A way to prevent personal data security breaches
Part of a data minimization strategy

Surely, the list of benefits is not so extensive as in the case of anonymized data.

However, data pseudonymization may be a great solution for companies who’d like to take advantage of greater freedom to use data without having to get involved in a complicated data anonymization process.

Pseudonymous data – your obligations

Unfortunately, the pseudonymization of data releases only part of the obligations imposed on businesses who want to process the personal data of their clients. As a data controller you’ll still have some duties to fulfil. Among other things, you’ll have to:

Acquire consent from your users
Keep records of data processing
Carry out privacy impact assessments
Appoint a Data Protection Officer (if necessary)
Demonstrate compliance with the principle of privacy by design
And many, many more

Data pseudonymization in analytics – some problems to address

Considering you want to pseudonymize web analytics data, you’ll also have to make sure that your vendor will be up to the task.

The environment in which you’ll keep the data should allow not only to perform pseudonymization of data, but also to apply other security measures, like the ones described in the Article 32 of the regulation.

A Practical Guide to Acquiring Consent in the Age of GDPR

Read our exhaustive guide on collecting, managing, and storing user consents, plus learn the ways GDPR Consent Manager can help you remain privacy compliant

Download FREE Guide

In that case, a good idea may be to take advantage of on-premises analytics, where, in the case of Piwik PRO, you are the only owner of the cloud account in which data is stored. Piwik PRO will implement security measures and monitor and maintain the infrastructure and product.

However, on-premises is not always necessary. You can achieve data ownership and security in the private cloud as well.

With Piwik PRO, you can take advantage of a private cloud (dedicated database). It consists of physically separate virtual machines storing your analytical data and generating reports while keeping the remaining server resources shared, with a logical separation of data. This grants you an additional layer of security and more dedicated resources.

Another alternative is a private cloud (dedicated hardware), where all server resources are dedicated to one organization. This results in higher prices and longer implementation time but ensures the physical separation of servers used in the application that capture and store data and generate reports and metadata.

If you want to learn more about those options, be sure to check our web analytics product page.

Final thoughts

It seems that pseudonymization of data is a good way to reduce restrictions involved in dealing personal data in the age of GDPR.

However, it is worth remembering that it introduces many responsibilities and should be approached with due diligence.

Otherwise, you expose yourself to hefty fines and the loss of the trust of your customers.

Also, remember – if you would like to learn more on this topic, don’t hesitate to contact us – we’ll gladly answer all your questions!