Sampled data depicts a subset of your traffic data that has been selected, extrapolated, and assumed to accurately represent all the data from the set.
Data sampling is a process designed to speed up reporting in web analytics, but depending on the circumstances and sampling approach, it may cause issues.
For example, sampled data may not be useful when you need to perform a precise analysis, such as on your site’s conversion rate or total revenue. However, in some cases, sampling might be necessary. For example, if you are creating a report for a huge number of events or sessions, it may take too long to generate, impeding your reporting speed.
Data sampling is commonly applied by several major analytics platforms. For example, in Google Analytics 4 (GA4), you may find sampled data in standard reports and advanced analysis when you cross a threshold of 500k sessions (in some cases it might be even less). Some analytics platforms, such as Piwik PRO, don’t sample data by default and only do it on request when it’s necessary to improve reporting performance.
Analysts can turn to raw data, which is a set of events and sessions collected from visitors’ activity on a website or app and used to calculate reports. Raw data is the initial data collected directly from sources without manipulation or analysis. Because raw data is not filtered or processed, it provides a complete view of information. It allows for in-depth analysis and accurate insights. With proper tools, raw data provides more possibilities for exploring data insights and making them useful.
Learn more: