Privacy-friendly analytics is a set of methods for collecting, measuring and analyzing data in a way that both respects individual privacy rights and delivers relevant insights. These methods allow for data-driven decisions while still giving individuals control over personal data.
Control over data is key for both individuals and organizations.
Control gives individuals a feeling of safety and assurance. And organizations get more data because individuals now feel taken care of and have the trust needed to provide data.
Luckily, a lot of work has already been done to show a path for how analytics insights and protection of individual privacy can go together. That is to say, we know the ingredients of privacy-friendly analytics projects.
So let’s walk through some key definitions and must-have elements for any analytics approach that wants to earn the title of privacy friendly.
Privacy-friendly analytics goes by many names. Some say privacy-focused analytics, others say privacy-first analytics. Those who are focused more on regulatory compliance will say privacy-compliant analytics. All these terms are aiming at the same idea. Though of course the exact definition of any of them will depend on who you ask.
The concept of privacy has no single definition. Its various meanings stem from the fields of philosophy, sociology and anthropology. Generally it refers to freedom from interference or intrusion. One of the interpretations of privacy as “control over information about oneself” comes from Samuel Warren and Louis Brandeis, two American lawyers, and their essay “The Right to Privacy”.
Although published in 1890, this essay contains explanations that are still useful today. It labels a violation of privacy as revealing personal information to the public without the consent of those involved. Warren and Brandeis pointed out that privacy law should safeguard the extent to which information can be shared. But this idea of control also means protecting against violations of privacy such as eavesdropping, the “appropriation and misuses of one’s communications.”
What Warren and Brandeis discussed 130 years ago is echoed by modern experts. The contemporary academic paper, A critical review of 10 years of Privacy Technology presents privacy as confidentiality, that is “…avoiding making personal information accessible to a greater public. If the personal data becomes public, privacy is lost.”
Alan F. Westin, whose pioneering works sparked the US privacy legislation, the Federal Privacy Act of 1974, stated that privacy is “the right of the individual to decide what information about himself should be communicated to others and under what circumstances.”
Unfortunately, this right is often violated, causing us to worry about our control over the confidentiality of personal information. Violations have become even more frequent in the internet age. In the US alone, the majority of the public “feel as if they have little control over data collected about them by companies and the government”, according to a 2019 Pew Research Center report. The same report found that “79% of adults assert they are very or somewhat concerned about how companies are using the data they collect about them.” One of the reasons for that concern is a lack of transparency.
It’s important to ensure that privacy is more than an abstract concept. Privacy should be an enforceable right. That means regulating what that control means in practice and balancing the needs of both individuals and organizations.
Most privacy regulations apply only to information that can identify individuals, usually referred to as personal data or personally identifiable information (PII). This is the case for Europe’s General Data Protection regulation (GDPR), the California Consumer Privacy Act (CCPA) and Brazil’s General Personal Data Protection Law (LGPD).
PII and personal data are not the same thing, though they are often treated as such. They are similar though. Each attempts to classify the kinds of information that could disclose an individual’s identity, directly or indirectly.
PII is the dominant term in the US, but no single American legal document defines it. Numerous federal and state laws and sector-specific regulations classify different pieces of information under the PII umbrella.
On the other hand, personal data is a legal definition. It was the GDPR that set the trend towards using this term. Now various regulations around the globe, such as the Virginia Consumer Data Protection Act, Thailand’s Personal Data Protection Act 2019 and India’s Personal Data Protection Bill, use it as well.
There’s also an important subcategory of personal data, sensitive data. It can include information related to medical issues, religious affiliation and sexual orientation, for example.
Read more about the legal meaning of personal data in: What is PII, non-PII, and personal data?
Talking about privacy in the internet age, we also have to mention cookies. These small text files are seemingly harmless, and often they are. Cookies can be used to collect anonymous data. However, they can also carry data used to identify and recognize users across countless websites without the user’s knowledge.
That’s why cookies often fall under the GDPR definition of personal data. What’s more, they can be considered PII within the meaning provided by the National Institute of Standards and Technology (NIST), i.e.
PII is any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual‘s identity, such as name, social security number, date and place of birth, mother‘s maiden name, or biometric records;
It’s also important to look at privacy from two perspectives, individuals’ and organizations’.
Organizations that want to obtain and manage personal information must also ensure both the control and confidentiality of that information. Although they need control over collected personal data, only individuals own their personal information. Balancing the needs of organizations and individuals is the main goal of the regulations we’ve mentioned. This balancing act affects these regulations and also the scope of privacy as a whole.
Most people today would say that technology is a threat to privacy. Facial recognition, location tracking by mobile phones, targeted internet ads… The list of tech with worrying implications for privacy rights is long.
However, technology can be an ally in keeping personal data safe and confidential. Privacy technology, in the modern sense of the word, came about in the late 1970s when David Chaum presented the first method offering anonymous network communications under surveillance. Chaum, a pioneer in privacy-preserving technologies, used a technique based on public key cryptography to hide both the identity of those communicating and what they said.
Most technology can be invasive or privacy-friendly. The details of design, implementation and operations are what counts. For example, cookies themselves are not invasive. It’s about how and for what purpose they’re used.
Take Google’s new technology, Federated Learning of Cohorts (FLoC), that is intended to replace third-party cookies. FLoC groups visitors into interest-related cohorts based on their browsing history. Though promoted as privacy-preserving, FLoC is just the opposite according to most experts. FLoC will share users’ browsing data with advertisers and strengthen browser fingerprinting. It will be easier with FLoC to track individual users and share personal data with trackers that are able to recognize visitors. All this without cookies.
The technology the internet is built on provides many choices. Many possible paths don’t preserve privacy, but some do. Take the approach of DuckDuckGo for example. It’s a search engine and advertising platform just like Google, but it doesn’t collect personal data from individuals. DuckDuckGo instead offers contextual advertising based on search terms. This approach avoids all of the sticky privacy issues Google has run up against.
Another example is our own product, Piwik PRO Analytics Suite. We designed it to allow for collection of data both with and without cookies. In each case, personal data is only collected when proper consent is in place. Otherwise, at most anonymous data is collected.
Technology doesn’t dictate whether a product or use case will be privacy-friendly or not. It’s up to creators and users of technology to decide.
Read more about Google’s new initiative in an article by the Electronic Frontier Foundation “Google’s FLoC is a terrible idea”
So if it’s not completely about the technology being used, what is it about? Products and users staying in line with the following four points will have earned the title of privacy-friendly.
To collect any personal data, consent is needed. This consent must be freely given, unambiguous and involve a clear affirmative action.
After consent, individuals still need control of their personal data. This means they need to be able to:
- Access their personal data
- Make corrections
- Delete some or all data
- Prevent some or all actions with that data
Finally, respecting individuals’ choices means gathering only the information they agreed to share and using it only for the purposes they agreed to. For instance, personal data collected for first-party use shouldn’t be shared with partners or third parties sometime later.
Organizations collecting data also need to say clearly what they will do with the data. They should then only do what they said they would do. This sounds simple, but it’s surprising how often organizations repurpose large sets of personal data without the knowledge of the individuals who gave them that data in the first place. We’ll talk about what this means in the next section on transparent data collection.
If there are any changes in how data will be used, then individuals need to be informed and given the opportunity to opt out of any new uses of their data.
GDPR is widely seen as an example of strong, modern data privacy regulation. Read more about how it addresses individual rights and affects those collecting data:
To meet all those requirements, any privacy-friendly analytics project needs a way to manage individuals’ consents and data requests. This means organizations need a way to automatically change data collection based on consent status and also keep track of consent status over time. Individuals should have a way of easily changing their consent status and that status should immediately affect how data about them is collected.
There also needs to be a system that ensures that organizations do what they said they would do with personal data. This requires a healthy regulatory environment. Regulators need to fairly enforce privacy laws to prevent the collection and use of personal data without proper consent and transparency. This is also necessary to maintain an even playing field. If some companies are allowed to collect masses of data in violation of regulations, it will lead to an unfair competitive advantage in addition to the trampling of individual privacy rights.
A good example of where regulators need to step in is to establish the difference between personal data and anonymous data. Some organizations collect pseudonymous data without consent in the European Union, while this is still personal data under GDPR and also requires consent.
Read more about different kinds of data in: The most important benefits of data pseudonymization and anonymization under GDPR
Whatever local regulations look like, the basic approach to being privacy-friendly is the same. All organizations, and especially businesses, should ask for consent before collecting personal data. Even after consent, individuals should still be in full control of what happens with their personal data.
Privacy-friendly analytics methods are also transparent ones. As explained by the UK data regulator ICO, it “is about being clear, open and honest with people from the start about who you are, and how and why you use their personal data.”
Here, transparency also means letting people know whether their data is shared with any third parties and if so, with whom and why.
Organizations that want to improve that transparency can turn to first-party data. It’s data obtained directly from individuals as they interact with an organization. This kind of data is more accurate. It’s also easier to obtain consent for it, because an organization asks individuals to share their personal information, and they’ll know for what purpose it will be used.
That kind of transparency usually requires regulations. There needs to be some kind of system that encourages organizations to follow the rules and inform individuals about data collection. That paves the way for individuals to trust the system, which is essential for both the individuals and the organizations trying to collect data.
If consumers don’t trust businesses on the web, that’s bad for both businesses and consumers. Consumers won’t build relationships and may get worse service in return. They’ll install ad blockers and do whatever it takes not to share any data. Such a scenario will hurt businesses’ bottom line in the long term, as they’ll have less consumer data to generate insights and improve customer satisfaction.
Part of transparency is giving precise data storage location. “In the cloud” isn’t enough because privacy protections depend on what legal jurisdiction data is stored in. Individuals should get all possible storage locations at the time of consent. It’s not only the privacy-friendly thing to do, it’s also often required by law.
Legal obligations for data location
Organizations that handle personal data and operate in multiple countries need to be aware of local data privacy laws. This includes laws in the countries where data is collected and where it’s stored.
For example, American health data regulation requires storing medical information with backups within US borders. Moreover, certain countries across the world, such as Australia, Canada, Germany, India, Russia, and Switzerland, have adopted laws that mandate storing their residents’ personal data within the nation’s physical borders.
If you’re collecting data about EU residents, then it’s strongly advisable to keep that data stored on servers in the EU. This makes collecting and processing the data much easier and also avoids possible legal and regulatory problems with data transfers outside the EU.
There is a whole other discussion to be had about different hosting models, but the most important point here is where the servers physically sit. Organizations need to be aware of local regulations when they start planning to collect data.
Read more about different hosting options in: How to host your analytics: public cloud vs private cloud vs self-hosted
Data transfers have become so common that most individuals don’t even think about them. That doesn’t mean they can’t be problematic. As with data location, any data transfers, routine or one-time, need to be agreed to in the original consent for data collection.
Take the recent example of Privacy Shield. Until recently organizations have been transferring data, without need for prior consent, from the EU and Switzerland to the US within the Privacy Shield framework. But on July 16 2020, European courts found American protections of personal data were insufficient, so the framework became invalid. EU-US data transfers have mostly continued as usual on other legal grounds, but the process is riskier and less transparent.
The main issue around Privacy Shield and GDPR is personal data. Anonymous data that can’t identify an individual isn’t affected by the recent invalidation of Privacy Shield.
Transfers between other countries can be even trickier. Most aren’t protected by anything like a Privacy Shield framework.
To avoid possible threats, organizations can follow the approach of data protection by design, which we’ll cover in the next section. They can also simply ask for consent while also saying where data will be stored and transferred to.
Read more about EU-US data transfers in: The invalidation of Privacy Shield and the status of EU-US data transfers
Privacy-friendly analytics methods are based on the principles of privacy by design. The idea behind this concept, coined by Dr. Ann Cavoukian, is that privacy shouldn’t only be ensured by legal frameworks, but also that “privacy assurance must ideally become an organization’s default mode of operation”.
One of the pillars of the approach is “proactive not reactive; preventative not remedial.”
Preventing any privacy invasion is always better than resolving issues after privacy has been lost. There are two key ways to be proactive: data minimization and purpose limitation.
Data minimization means only processing information that is indispensable for a particular objective.
Purpose limitation means specifying the goal of processing data, documenting it and telling individuals about it before any processing starts.
After data collection and processing, the data should be kept only as long as is necessary to fulfill the purpose for which it was collected.
Data protection by design also requires adopting technical and organizational measures in the initial design phases of processing operations. In this way, an organization ensures that privacy and security mechanisms are in place from the beginning. The exact mechanisms depend on the use case, but they could be applying data anonymization, monitoring data processing, or adding new privacy-protecting features to analytics software.
First, let’s get straight what data security is, as it’s often mistaken with privacy.
Data security is a set of methods and tools that safeguard data from unauthorized access, theft and corruption. Data security covers physical security of hardware, organizational procedures and standard policies.
Even if data security is perfect, an organization can share data inappropriately or in ways that weren’t consented to. On the other hand, respecting data privacy can’t happen without solid data security. Data leaks or breaches often lead to individuals losing control over their personal data, a clear violation of their privacy.
Privacy-friendly analytics methods also need to be secure. Specifically, they should:
- Minimize risks of data leaks
- Keep data secure and minimize data breaches
- Prevent malicious attacks
- Guard data from human errors
Mitigating the risks of data violation protects organizations from reputation damage and high fines. According to the Ponemon Institute and IMB report, a global average total cost of a data breach is $3.86 million.
At the same time, security of digital information is crucial for individuals. They need assurance that any personal data they share will stay safe. Without that basic level of trust, individuals will likely avoid sharing data at all, even when they have something to gain from doing so.
The most important thing is to treat analytics data like any other source of personal data at an organization. This means applying good security practices such as:
- Regular audits of internal processes and externally procured analytics software
- Limiting access to data, such as granular user permissions and (firewalls) for on-premises instances to prevent access from external networks
- Data backup policies and data fallback mechanisms
- More secure access to data, such as using single-sign-on and requiring secure HTTPS connections for all online tools accessing the data
Read more about how to approach analytics data encryption and security in our whitepaper: What is PII, non-PII, and personal data? And how to protect each
If you’re looking for a way to collect and analyze data about your website, digital product or mobile app, then the choice of what platform to use is a crucial one.
So which analytics platforms are privacy-friendly? This is a tricky question. A lot depends on how the organization collecting data uses any given platform. That said, many platforms make it hard to respect user privacy.
The short answer is that most major analytics platforms, such as Google Analytics and Adobe Analytics, weren’t designed with privacy in mind. The major platforms get some things right, such as data security, but fall short in other areas, mostly transparency and providing control of data to individuals.
Google, for example, has a business model that relies on maximizing data collection. That can result in a loss of control over data collected with Google Analytics. By default, Google uses data from Google Analytics to improve their services, effectively sharing analytics insights with users of other Google products.
Adobe has data centers only in a few regions. Google Analytics keeps data on remotely located servers, mostly located in the US. If an organization has to adhere to different data residency laws around the globe, it might be tough to do with either Adobe or Google.
Finally, using Adobe Analytics or Google Analytics you’ll need to reach for third-party tools or build your own if you want to manage visitor consent and data requests.
The good news is that the market for privacy-friendly analytics is flourishing. There are more options than ever before which shows that data protection is more than a slogan. This trend also suggests that there’s a demand for analytics software that allows you to measure the performance of your website or product while respecting data privacy.
The variety of privacy-friendly analytics platforms available also means more use cases are possible. See the table below for a quick summary. Everything from simple metrics and anonymous data to more ambitious customer journey analyses is covered.
|Individual customer journey analysis||Built-in consent management||Built-in tag management||Flexible cloud data residency||On-premises hosting|
|Cloudflare Web Analytics|
Privacy-friendly methods are based on general principles, many of which we’ve described here. It’s also based on the interpretation of laws and regulations all over the world. Following the general principles will usually lead to legal compliance, but it’s better to be sure.
We’ve made reference to many regulations in this article, but you may want to know more about how all those regulations affect organizations collecting data. If so, our article below is a great place to start.