Yes, we said it. Web analytics tools can lie, skew your reports and hurt your business. Bad numbers leads to bad decisions, right?
When the problem of false data occurs, it may be hard to get rid of. And it may even be difficult to spot, as in most cases your reports can display data that seems plausible at first sight. You could become suspicious when comparing your site’s performance results with another analytics platform, only to discover that it delivers a completely different set of results than your default tool. We have already covered some of the causes of data discrepancies between Piwik PRO and other tools in this post.
Now we would like to focus on 10 common issues that may be skewing your data and suggest solutions to overcome these problems. The list was prepared following discussions with our customer support team, who mentioned some frequently recurring issues and suggested ways to deal with them. Read on!
Here’s top 10 reasons why web analytics may be lying to you:
1. Tracking code is not placed on every page
This might seem more than obvious, but you’d be surprised how often it gets overlooked, especially in the case of large websites with millions of pages. If code is missing on a given page then its traffic will simply not be recorded, thus contributing to data skewing.
Piwik PRO tracking code should be pasted in the section of each of your pages, or in a general header file that is included at the top of them. You can read more about it in our Tracking Code Implementation User Guide.
What can be done to solve this issue?
Use software like Web Link Validator or W3C Link Checker to identify missing tags and add the code where it’s absent
You can use tools such as Google Tag Manager or Piwik PRO Tag Manager to make this process more manageable. When attaching a Piwik PRO tag, you will need to specify the triggers on which you want the tag to fire. If you select Page View without setting further conditions, Piwik PRO tracking code will load on all pages.
2. Applying wrong data filters
A majority of analytics platforms, if not all, let you tinker with the amount and types of data you can view. Such filters are a powerful feature allowing you to limit and modify the numbers that you obtain. A common example is excluding traffic from particular IP addresses, such as your home, office, or both. Most websites now use a filter that removes company traffic from reports, as your employees and customers behave in very different ways. If you want to get accurate insights on your prospects and clients, then it makes sense to exclude internal IPs from your reports.
Practical as they may seem, if used incorrectly, filters can truly skew your data for good. This is because once your filters or settings are applied to raw data, there’s
no going back. After everything has been implemented and the data is flowing in accordance with the new rules, you know it’s over.
What if you select the wrong option or insert an incorrect parameter by accident? All it takes is one little mistake, which is so easy to make, especially when you are choosing from opposing options on a dropdown menu.
What can be done to solve this issue?
Better safe than sorry, so set up safety procedures. Rules can be as short and sweet as something like this: think twice before adding any new filters, then verify and test your preferences, double-check your settings, and only then click “Save”.
Free Comparison of 5 Leading Web Analytics Vendors
Compare 40 Variables of 5 Leading Enterprise-Ready Web Analytics Vendors:
3. No filtering of self-referrals
Referrers is an important group of reports presenting segments of traffic from external sources. It’s valuable information on your audience and where they come from.
But have you ever opened your referrers list only to see your domain at the top? If you have, that means you’ve encountered a self-referral problem. This may happen when you’re missing tracking code or have a configuration issue that causes one visitor to trigger multiple sessions when there should only be one.
A small amount of self-referrals is natural if you have configured your analytics platform to track across multiple domains or subdomains. But if this occurs frequently, then you’re running the risk of having badly skewed data.
What can be done to solve this issue?
If you’re missing tracking code on a given page, this will bring a visit to an end and start counting a new session as the user moves to the next page on your website that includes tracking code. The problem usually ceases when you add the missing code.
If your landing page and referrer are on separate domains (e.g. piwik.pro and piwikpro.de), then you have incorrectly configured cross-domain tracking, so you need to review your setup. If you’re not sure what to do, consult your support coordinator or an external analytics expert.
4. Referral spam
Another issue that can seriously undermine the reliability of your referrers reports is so-called “referral spam”, which fills your analytics with fake data. So what is this, exactly? It starts with spam bots that detect weak websites, exploit their vulnerabilities, and send fake referrer information, including domain names. Since this information is tracked by your analytics software, it will show up in your reports.
As a result, instead of valuable insights your reports may be skewed with numerous URLs linking to shady websites trying to improve their rankings in search engines through backlink building. Spam bots not only skew your reports, but they may also affect your site’s loading time, leading to higher bounce rates and degrading your own SEO. All websites can receive some bogus traffic from time to time, but alarm bells should go off if you continuously observe high levels of suspicious links in your reports.
If you are using more than one web analytics platform, sooner or later you will spot discrepancies in referrer results. One reason for this is that web tracking tools approach this issue in a variety of ways.
What can be done to solve this issue?
Unfortunately, we have to say that there is no single universal solution to this problem. Simple IP blocking of where your spam comes from may not be enough in the face of powerful botnets – networks of infected computers that can access your site from many different IPs.
Another idea for preventing this kind of spam is to block suspicious URLs through your .htaccess file in the root directory of your site domain. Although this method may work, it’s far from ideal, as it only stops crawler referrer spam. It’s the other type, called ghost spam, that is far more widespread and can only be blocked from pinging your analytics account if you use specific filters, as described in this post.
Alternatively, you can try using tools that deploy community-maintained and regularly expanded spam blacklists (such as Piwik PRO). Since new spam sources are added to the platform with every release, to automatically exclude a great deal of spam it’s enough to keep your analytics software up to date. Of course, some newer or more exotic bots can always slip through the shield. While for high traffic websites with this won’t be a big deal, it can still pose a risk to smaller sites’ data accuracy.
5. Improper Use of Regular Expressions
Regular expressions (regex) are special characters for describing a string or pattern to be searched for within a longer piece of text. For that reason, they are commonly used in programming languages, search engines, applications and tools.
You can create some truly powerful implementations with regular expressions in analytics, for instance when setting up custom segments, applying a view filter, or matching multiple pages while defining a goal or a funnel step. In a nutshell, a regex lets you quickly find the required data and perform an action when such a match is achieved.
So what could possibly go wrong here? An incorrectly prepared regex can really spoil your data accuracy, as it may be unable to catch all the required words, expressions or conditions. One potential result is that your segment will not load all the required data. Or even worse, a goal you set up will only convert partially, or even not at all.
What can be done to solve this issue?
Again, better safe than sorry. Make the regular expression you use as simple as possible so that you and your colleagues can easily grasp their meaning and intent in the future. Keep the number of regexes you use to a minimum. It is also recommended to test all regular expressions before implementation, for example using tools like RegExr.com:.
6. Data sampling
Sampling is a common statistical technique. It’s based on the simple assumption that in order to determine the most popular trend in a given group, you don’t need to talk to all of its members. Instead, you can select a representative subset of people, hoping it will be big enough to make the results accurate.
Sampling in web analytics works in a very similar way. Only a subset of your traffic data is selected, and that sample is analysed and used to estimate the final results. Many popular web analytics tools automatically start sampling data when you reach a particular limit of actions tracked on your website. You know that you have this option active when you see a message at the top of your report saying “The report is based on x visits (x% of visits).”
The lower the sample size, the bigger the problem you face. Sampled data can show some ups and downs in your reports, but not much more. If you are serious about growing your business, you need solid numbers and reliable insights rather than a guesswork. So the question is, how can you really be sure that your tracking tool chooses a representative set of your traffic?
What can be done to solve this issue?
Only with 100% of your data can you be fully confident that your reports are correct. Always make sure your analytics platform provides reliable data, and try to avoid sampling. If your tool allows automatic data sampling when you reach your monthly limit of hits, then you have two options. You either need to upgrade to a plan with a higher data allowance, or start looking for another tool that comes without sampling.
And if for any reason you decide to carry on with sampled data, do so with caution. High-level data, such as page views, can still give you useful reports with sampling. However, more granular data like conversion rates and revenue should by no means be sampled, as the data sets are too small to analyse.
7. DNT and Adblocks
Do Not Track is a technology that provides users with a simple and persistent choice to opt out of being tracked by websites and platforms they visit. All of the most popular modern browsers support this function, but users have to activate it manually.
Adblocking tools work in a similar manner – users can install them to avoid ads cluttering a page or having their data sent back by third parties. They are available as desktop browser extensions or as mobile apps to block ads from the Internet.
Practical as they may seem from the user’s point of view, both DNT and content blocking software impact your business analytics data. If a user has activated this option in their browser, your web tracking tool should by default respect this decision and refrain from collecting data on that user. At least, that’s what Piwik PRO does, but there are reports that many services still track you even when the DNT option is activated.
Adblocks, on the other hand, can prevent pages from rendering properly, thus hamstringing your analytics platform. This occurs through the processes of element hiding and asset blocking. With install rates constantly on the rise, you may not even know how far from reality your reports are.
What can be done to solve this issue?
First of all, it is worth checking to see if your analytics platform respects DNT and how many of your users have this option activated. Get started using the tips described in this post.
If you use Piwik PRO, Do Not Track will be enabled by default, but you can switch it off at any time in the Administration panel (we do not recommend doing this!)
Adblock tools are continuously developing, and they can instruct your browser to hide or avoid downloading any assets from URLs that include specific keywords or expressions referring to advertising or analytics. This means your cloud-hosted analytics data can suffer from some inaccuracies. You could consider hosting your analytics files locally, or deploying an on-premises platform such as Piwik PRO. Comparing your website’s performance using data from on-premises and cloud-hosted instances can give you a rough idea of how adblocks might be impacting your results. You can also see how many of your users deploy these tools using solutions like Adblock Analytics.
8. Poor URL Tagging
Adding tags to your campaign links helps you assess exactly how each individual ad or marketing initiative you’ve launched is performing. Each time you ask users to click a link, you can set up campaign parameters to tell you precisely how many of your referrals came through email, banners, ads, landing page buttons, etc. URL tagging is powerful and easy, but can anything go wrong?
Yes, it can. If done improperly, pk or utm parameters can blow up your data accuracy. Imagine you publish a new entry on your blog, then share it on Twitter and Linkedin, and then one of your followers clicks on the link and arrives at your post. When they are done reading, they may go to your home page, or even better, proceed directly to a contact form and fill it in. If by any chance your link uses “pk_source=blog”, then you can lose the precious piece of information telling you that your visitor came from a given social media channel. Instead, your report will only say that the session was referred by your blog. Can you really trust this data? Probably not.
What can be done to solve this issue?
First of all, make sure you observe the URL tagging best practices described in this post. The list of do’s and don’ts is rather consistent no matter which analytics platform you use. One of the key rules is to never use parameters for internal links on your website or blog. It’s also important to use only the campaign variables that you really need to keep things as straightforward as possible. It’s good to keep the rest of your team updated on the naming system you have in place so everyone can understand what data is provided in your campaigns report:
Because tagged links can be pretty long and difficult to type in correctly, always use a link builder tool that is compliant with your tracking tool parameters. For instance, try Campaign URL Builder from Google Analytics or URL Builder from Piwik PRO.
9. Low awareness of the difference between first- and third-party cookies
Cookies are placed on users’ computers by websites in order to track visitors and store their preferences. They have long embodied the worst privacy fears, but there’s much more about them that makes them ubiquitous. For instance, web analytics platforms rely on cookies for doing their job of providing you with invaluable insights.
One of the key attributes of a cookie is its host. We speak of a first-party cookie when the host name matches the domain in the browser’s address bar at the time it is set or retrieved. In contrast, third-party cookies belong to domains different from the one currently being viewed by the user. And this difference can have a major impact on your data accuracy.
Some articles on the subject of web tracking suggest that third-party cookie rejection is on the rise, with a growing number of users now manually blocking them or even deleting all the cookies stored on their computers. According to stats provided by Webtrends, this may account for anywhere from 12% to 18% of all Internet users. Some common problems include inaccurate visitor, retention-based, e-commerce and conversion metrics, as well as unreliable campaigns and search reporting.
What can be done to solve this issue?
First of all, find out if your web analytics platform uses first- or third-party cookies. Ask your provider or ping your support manager to clarify. Check if your platform offers alternative ways of tracking users when cookies are disabled, and how this can impact your data accuracy.
As a rule of thumb, it’s recommended to only use tools that deploy first-party cookies for two reasons: fewer people block them, and anti-spyware software and privacy settings do not usually target first-party cookies.
If possible, try to stay away from platforms that use third-party cookies by default. Should you already have such a tool in use, consider implementing an additional platform that deploys only first-party cookies. Comparing data discrepancies in reports provided by both of these tools can give you an idea how seriously rejection of third-party cookies can impact the stats provided by your original tool.
10. And finally: some common interpretation issues
Whichever analytics platform you use, you immediately get access to tonnes of data pouring from numerous reports. But instead of drowning in oceans of stats, you need to find your way to the right metrics and select only the ones you really need. According to Avinash Kaushik, a great metric is:
- uncomplex
- relevant
- timely
- instantly useful
Let’s be honest: because we have so many reports, even the best of us cherry-pick our favorites and stick with them. But if we don’t understand the real meaning of our metrics, the way they are collected, and the business context they should be analysed, we may completely miss the point. All that data we’ve worked to gather can give us useless information.
Add to that the fact that various analytics vendors may define seemingly identical metrics in wildly different ways, or give a range of names to similar reports covering things like Visits, Visitors, Unique Visitors and Conversions. This can cause real havoc in your dashboard. Houston, we have a problem…
And if that wasn’t enough, there are still other factors that can make it tricky to get the right interpretation of your data. We have already mentioned things like improper use of filters or regular expressions. But are you taking into account things like seasonal trends, market conditions, a major rework of your homepage, or when your last newsletter was sent? If not, then how do you plan to interpret the various spikes in your data trends?
What can be done to solve this issue?
First of all, focus on choosing KPIs that match your business goals. In this post we describe some popular performance indicators for different categories of websites, including e-commerce site, content portal, lead generation and customer support. But whatever type of business you’re doing, remember that quality is always better than quantity. Follow Kaushik’s advice and focus only on the few metrics that are critical to your business’ existence.
When you have the reports you think will help you assess your business performance, make sure you understand exactly how they are defined in your tool. Ask your vendor or provider for an explanation of each metric, so you can grasp the differences and use those metrics correctly.
Always add notes on your timeline to mark events or conditions that can influence your stats. The majority of analytics tool let you effortlessly add annotations, which can prove invaluable in your future interpretation efforts.
Conclusion:
As you can see, there are many issues that can make your data lie to you, or at least not tell you the whole truth. Terrifying as they may seem at first, awareness of these problems is crucial to overcome them. We hope that the tips and tricks in our guide to common problems will get you a long way down the path of protecting your analytics insights from the curse of damaged data.
Along the way you definitely want to avoid tools that come with data sampling, heavy referral spam or default usage of third-party cookies, as these can be detrimental to your insights. Taking the right precautions will also help, like paying close attention to how you apply data filters, use regular expressions and tag your campaign links. Good luck!
Resources:
- How to Avoid Corrupting Your Google Analytics Data, by Lars Lofgren, Kissmetrics Blog
- How To Fix Self-Referrals In Google Analytics, Three Vantage Blog
- What is Data Sampling and Why Should You Avoid It?, by Karolina Gawron, Piwik PRO Blog
- Data discrepancies between analytics platforms, by Mike Sweeney, Piwik PRO Blog
- Ad-blocking and Analytics Data Accuracy, by Ewa Balazinska, Piwik PRO Blog
- The 2015 Ad Blocking Report, by FairPage and Adobe
- How Many of Your Users Set “Do Not Track”?, by Jason Packer, Quantable Blog
- What is Referrer Spam and How Do You Get Rid of It?, by Shane Jones, Search Engine Journal
- Regular-Expressions.info, a free resource online
- Web Metrics Demystified, by Avinash Kaushik, Occam’s Razor
- Standard Metrics Revisited: #6: Daily, Weekly, Monthly Unique Visitors, by Avinash Kaushik, Occam’s Razor
- Negative Impact of Third-Party Cookie Rejection, Webtrends Analytics 9 Administration Guide, November 2012
- What are Internet Cookies?, by Cookie Controller
Free Comparison of 5 Leading Web Analytics Vendors
Compare 40 Variables of 5 Leading Enterprise-Ready Web Analytics Vendors: