What are the causes of data discrepancies between Piwik PRO and other analytics platforms?

Written by Karolina Matuszewska, Michael Sweeney

Published March 24, 2020

data-discrepancies-between-Piwik-PRO-analytics-platforms

Your Piwik PRO reports state that you had 35,292 visits last month, whereas your other analytics software says that you had 27,540 visits. Why the big difference? Where do these data discrepancies come from?

If you are using Piwik PRO and another analytics platform to source your stats, at some point you may notice slight or even significant discrepancies between the data provided by each of these tools. In this post, we’ll explore some of the possible roots for this issue and look at what can be done to solve the problem.

It’s important to note that there will always be a certain mismatch between various analytics platforms. According to some experts, 5% is an acceptable threshold. Anything more than that will require a small investigation to discover the causes of such a big difference. In most cases, there won’t be only one reason, but many.

  1. Not placing the tracking code on every page
  2. Comparing apples and oranges
  3. Employing Consent Manager
  4. Relying on device fingerprints
  5. Session length
    1. Possible solution
  6. Blocking certain visitors based on IP addresses
    1. How can it affect the data reports?
    2. Possible solution
  7. JavaScript error
  8. Campaign data
  9. Time zones
  10. Metrics definition
  11. Data sampling
  12. Intelligent Tracking Prevention
  13. Attribution models
  14. Conclusion

1. Not placing the tracking code on every page

This might seem like an obvious one, but it’s easy to overlook. If you’re using Piwik PRO and some alternative platform, you will need to ensure that both place tracking codes on all the same pages.

Otherwise, your reports will show completely different numbers. If you have any doubts whether you’ve done this correctly check out our tracking code implementation guide.

Now it’s time to dig a bit deeper! We’ll now walk you through some more intricate possible grounds for organic data discrepancies.

2. Comparing apples and oranges

You might fall into a trap many people do and compare distinct metrics such as clicks in Google Ads to visits in Piwik PRO or actions in Piwik PRO to page views in other platforms, etc. You need to make sure that you analyze the right metrics in the first place to ensure accurate reporting.

Start with comparing the stats on page views. This is a universal metric and you shouldn’t be worried if you see slight deviations in your data. The 5% we’ve already mentioned is acceptable between diverse analytics platforms, regardless of how they handle more complex metrics such as unique visitors/devices, real users or sessions.

Since GDPR entered into force, you need to get visitors’ consent to process their data if it’s reasonably likely it could identify an individual. This is where things get tricky. If you employ Piwik PRO Consent Manager, your report details will be different than what you collect with other analytics software. Here’s why.

Above all, most unknown visitors are reluctant to share their information and simply don’t give this consent, or they give it for a limited number of purposes. That restricts the amount of information you’ll be able to gather.

However, you can take advantage of the data anonymization feature. When you turn it on, you will definitely see that it impacts the number of unique visitors in your reports. On the other hand, if you use Google Analytics, it won’t let you obtain personal information at all.

Check out further details on Google’s approach to data collection and processing: Piwik PRO vs. Google Analytics: The Most Comprehensive Comparison

COMPARISON

The comparison of 10 web and app analytics platforms

Compare the main features of Piwik PRO Enterprise, Google Analytics 4, Matomo Cloud, Adobe Analytics, AT Internet, Countly Enterprise, Mixpanel Enterprise, Amplitude Enterprise, Snowplow Enterprise, and Heap Premier.

4. Relying on device fingerprints

If you employ device fingerprint to recognize a particular user on your site or within an app, your reports will show some data inconsistency.

For instance, Piwik PRO, by default, utilizes cookie ID together with device fingerprint, based on the IP address, browser and its installed plugins, as well as OS versions, to identify a user’s session.

In the case of strict intranet environments where all the computers have the same configuration, meaning their fingerprint is alike, actions from several people may appear under one session. It will bring a high average number of actions per visit.

5. Session length

By default, Piwik PRO records a new session from the time someone accesses your site as long as that time is greater than the selected time of inactivity.

For example, imagine your Piwik PRO time of inactivity is set at 30 minutes. A person enters your website, surfs a few pages, walks away from their computer, returns 30 minutes or later and starts using your website again. Then they will be counted as a visitor once, but you’ll find in your report two sessions for that person.

The setting of “30 minutes of inactivity” is pretty standard across most analytics platforms. However, if this setup varies across your marketing stack, then the numbers will also be different.

Let’s say that in Google Analytics you have the “minutes of inactivity” set at 30 minutes, and on the other platform you have it at 5 minutes, then this will surely bring about some mismatch.

Possible solution:

Check the session length in both of your analytics platforms and see if the minutes of inactivity are the same. If they aren’t, then you’ll need to align them so they match.

To do this in Piwik PRO, simply reach out to our team and we’ll help you adjust it to your requirements.

6. Blocking certain visitors based on IP addresses

Blocking certain users based on their IP address is another way to report the right users. It means seeing real ones rather than seeing one user as a unique visitor 100 times a day. You can exclude from your reports the IPs that are assigned to your home and work computer, the laptops in your office or ones you use for application testing, etc.

How can it affect the data reports?

If you have some computers (IP addresses) blocked on one analytics platform and not the other, then this will cause your numbers to be off. In some cases, even if you don’t block the same IPs, this might not make a noticeable difference.

However, when you spot that the number of non-actual users is much higher, then this will have a great impact on the data. Think about a medium-sized company that employs 50 staff. If the office IP addresses are not excluded, and when each staff member accesses the company’s website 20 times a day, that will give you a difference of at least 1000 sessions and 50 visitors.

Either way, to ensure that you are getting the most accurate numbers, you should make sure that both analytics systems block identical IP addresses.

Possible solution:

To start, find out which IP addresses are being blacklisted by both platforms, then align them so they match. To make this easier, Piwik PRO enables you to omit the traffic for a particular website or the whole account.

To get specific guidance on this matter you can always turn to our help center.

7. JavaScript error

Data discrepancies can also happen because of issues relating to tracking codes. The problem could be as simple as not having one placed on a certain page, or it might be that the settings vary and you need to change them.

To debug this problem, you can use tools like Webinspector and see if the requests are sent to the tracker (in the case of Piwik PRO, it will be a request to yourpiwikproserver.com/ppms.php).

Another source of problems may be multiple Piwik PRO trackers installed on the same page, which can send the data to the wrong website profile, leading to under- or over-reporting in the respective profiles.

8. Campaign data

You will often see reporting inconsistencies between the traffic received on your website and traffic sent by your advertising vendor, for example, Google Ads, AppNexus, etc.

Before we dive into the details, it’s crucial to understand the difference between the metrics. Consider the number of clicks you get from the advertising vendor, in Piwik PRO or from other analytics software, you’ll see the number of sessions.

These are two distinct metrics, the number of ad clicks doesn’t equals the number of sessions resulting from those clicks. It’s possible you find invalid ad clicks. That happens because of bots, unintentional clicks or when a user closes the browser before an analytics code was loaded.

When the difference is only slight, let’s say it amounts 5% then you shouldn’t worry. If it’s more than that, you should further analyze the case. The reason behind it might be some website issues or the tracking code.

Piwik PRO allows you to monitor the efficiency of your various marketing campaigns and provides you with custom reports to help you with this task by adjusting the setup to your specific business goals.

One of the ways to do it is by applying UTM tags. Use them to distinguish the data from your advertising vendors so that you can see which under- or over-report the data. Or maybe your ad network is counting bot traffic?

When you tag each of your advertising vendors and campaigns with a unique name, it’s easier to track the performance of each as well as the discrepancies between the data reported by the vendor and your analytics tool. All in all, a lot depends on the configuration of your campaign and analytics setup.

9. Time zones

One of the culprits of analytics inconsistency is when time zone settings aren’t the same – not only in separate analytics platforms, but also in advertising tools.

Let’s say you set Pacific time in your Google Ads and Eastern time in Piwik PRO. When you want to check reports for a specific date you’ll get dissimilar numbers.

To avoid any information mismatch, you need to make sure every tool you use has the same time zone by default.

10. Metrics definition

If you still wonder how come your data across various analytics platforms is inconsistent, think about how they define certain metrics. Take bounces, for instance. In Google Analytics a “bounce is a single-page session on your site,” such as when a visitor opens a single page and leaves it without taking any action.

When a person lands on your page and then leaves, GA will record a bounce. But if they land on this page, click the play button or type something in the search box, which you’ve configured as a custom event, then leave without visiting another page, it’s not a bounce.

By contrast, Piwik PRO counts as a bounce a session that has only one page view, and ignores other events besides goal conversion. In case a visitor enters your website, scrolls down, clicks the play button, then exits the site without going to another page or converting the goal, a bounce will be recorded.

11. Data sampling

Information mismatch in your reporting might also result from sampling. Certain analytics platforms perform it so you get only a subset of the traffic data for your analysis. That’s the case, for example, with Google Analytics, which does sampling when you exceed a certain number of sessions.

Meanwhile, Piwik PRO lets you work on the full set of information to ensure the complete accuracy of your reports. Keep in mind that besides information inconsistency, sampling may lead to missing out on some crucial details.

All in all, if you rely on one analytics software that offers only bits of data and another that gives you the full set, you’ll see different numbers in the same reports.

To get more information about this issue, look at our post: What is Data Sampling and Why Should You Avoid It?

12. Intelligent Tracking Prevention

Maybe it’s not obvious, but mechanisms like Apple’s Intelligent Tracking Prevention disrupt not only the way you track but also how you analyze and measure Safari users. ITP impacts how you recognize a person browsing the Web, influences your analytics metrics and overall reports.

But most importantly, Safari deletes first-party cookies set by, for instance, analytics platforms, after seven days, and after 24 hours under strict conditions of the ITP 2.2. version.

Consequently, it seriously affects unique visitors and new vs. returning visitors, as well as sessions to conversion and days to conversions. But those are just a few examples of how Safari’s engine can cause deviations in your data. You can add to the list troubles with executing attribution and inaccurate A/B test results.

If you want to go into details of this problem, check out our post: What Intelligent Tracking Prevention (ITP) Means for Web Analytics & Marketing [Updated]

13. Attribution models

While looking for the culprits of variance in your reports, it’s worth checking what attribution models you employ across separate analytics tools. These models define how you should allocate credit to touchpoints or channels in the sales cycle.

You’ve got a couple of models at your disposal, and they may vary significantly. Suppose you set the first interaction model in Google Analytics but in another platform you apply the time decay attribution. In this way, you’ll end up with inconsistent information that won’t help you to decide which marketing channel performs best.

Conclusion

As you work with a marketing stack full of diverse software and apps, you won’t escape some minor data mismatches. The solution comes from proper understanding of how they handle the data, then focusing on what you need to fix and what fits within the acceptable limits. It might be fine to have a 3% discrepancy in the number of white paper downloads, but your reports on revenue must be consistent.

If your reports are still displaying incorrect numbers, or you have some burning questions about the inner workings of Piwik PRO Analytics Suite, just reach out to our experts.