Raw data, the fundamental building block of accurate web analytics, remains an underutilized asset in many organizations that opt for sampled data due to its simplicity and ease of management.
This article explores the benefits and challenges of using raw data in analytics, and discusses its potential use cases for gaining valuable insights.
What is raw data?
Raw data is unstructured and unformatted data an organization gathers from various sources: databases, files, social media, web pages, images, etc.
Because raw data is not filtered or processed, it provides a complete view of information. It allows for in-depth analysis and accurate insights but also may be large and difficult to handle. With proper tools, raw data provides more possibilities for exploring data insights and making them useful.
Key differences between raw data, processed data and sampled data
Raw data typically comes in its most basic and unorganized form, representing the original observations, measurements, or responses.
On the other hand, processed data refers to the transformed or analyzed form of raw data. Processed data, through data transformation and analysis, is more interpretable, making it easier to identify patterns, trends, or relationships. It is more organized than raw data but tends to be condensed or summarized. You can find processed data in analytics reports, among others.
Sampled data represents only a selected subset of a larger dataset. Data sampling offers a manageable yet representative snapshot of the whole. It comes in handy in scenarios where speed and resource efficiency are prioritized.
However, it may not accurately represent the bigger picture given by the full data set, potentially leading to less precise results. Although it might not provide the detail of raw data, sampled data is structured and tailored for advanced analysis, enabling analytics optimization and timely insights.
These inherent differences between data types entail certain advantages and disadvantages, and define unique use cases for each type of data.
Raw data
Advantages:
- Provides a complete and comprehensive dataset, allowing for thorough analysis.
- Enables filters and visualizations to be applied to derive new insights and perspectives.
- Offers flexibility to revisit and reanalyze the data for different reporting needs.
Disadvantages:
- Requires significant time and effort to process and transform into actionable insights.
- Handling large datasets or using APIs for raw data extraction is more resource-intensive, particularly in advanced analytics.
Processed data
Advantages:
- Cleaned, organized and presented in a format that is easy to interpret and analyze.
- Can be easily used for making informed decisions and drawing insights.
Disadvantages:
- May lack detail as the data is aggregated and condensed.
Sampled data
Advantages:
- Requires less processing power and time, making it ideal for quick analysis.
- Easier to handle due to its reduced size compared to raw data.
- Generally less expensive in terms of storage and processing requirements.
Disadvantages:
- May not fully represent the entire dataset, leading to potential biases or incomplete insights.
- Sampled data offers less detailed analysis, which might overlook nuanced data points.
Explore the full scope of differences between these types of data: Raw data and sampled data: How to ensure your data is accurate.
Data sampling in different analytics platforms
Some platforms, like Google Analytics, use data sampling as a default method, particularly when dealing with large datasets. Others, like Matomo and Snowplow, don’t sample data by default, providing more comprehensive data analysis. The decision to apply default data sampling tends to be influenced by the platform’s capabilities, the size of the datasets it can handle, and the specific needs of its users.
Let’s have a quick look at data sampling defaults offered by different analytics vendors:
Data sampling | |
---|---|
Piwik PRO | Does not sample data by default. Data sampling is still possible on request. |
Google Analytics 4 (GA4) | May sample data when event counts for reports exceed quota limits. |
Mixpanel | Uses data sampling for certain functions like totals and uniques. |
Matomo | Does not use data sampling. |
AT Internet | Employs data sampling when the event count for reports exceeds quota limits. |
Amplitude | Upsampling can be enabled for accurate estimates. |
Snowplow | Does not sample data. |
Heap | Uses data sampling when events for reports exceed quota limits. |
Adobe Analytics | Does not sample data except in selected reports or if limits are exceeded. |
Countly | Does not sample data except for visualizing flow reports. |
Using raw data in analytics
The primary value in data is after it has been processed and interpreted. There is little value in holding onto raw data without a way to use it. Raw data unlocks the door to more sophisticated and precise analytics, enabling a better understanding of information that often remains untapped in summarized data formats.
One practical benefit of using raw data in web analytics is that it can be exported from analytics platforms using other tools, paving the way for more extensive and insightful analyses. This approach caters to the technical complexities demanded by professionals in the field. It also remains accessible to those with less technical backgrounds, ensuring a balanced understanding across different levels of expertise.
The usual workflow when working with raw data involves:
- Identifying raw data sources – Before choosing the right analytics tool, it’s important to determine where the data will come from. This includes internal sources (like CRM systems, sales data, etc.) and external sources (market research, social media, etc.).
- Raw data collection and integration – Gathering data from the identified sources and integrating it for further use.
- Data cleaning and preparation – This involves sorting, cleaning, and organizing the raw data to make it suitable for analysis or visualization.
- Exporting data to other tools – Data exports can broaden the scope of analysis and provide more detailed insights.
- Data analysis – Using various data analysis techniques to uncover patterns, trends, and insights. This can include statistical analysis, predictive modeling, data visualization, and more.
- Interpreting the results – Understanding what the data says and connecting it with business goals.
How different analytics vendors handle raw data
Transforming raw data into actionable insights is the backbone of accurate analytics. However, the capabilities connected with accessing, exporting and using raw data vary across different analytics platforms. Here’s a quick overview:
Access to raw data | Data export limits | Data access and tools | |
---|---|---|---|
Piwik PRO | Yes | No data export limits. | Full access to raw data through API, BigQuery and CSV. |
Adobe Analytics | Access to raw data through its predefined tools. | No explicitly stated data export limits. | Several ways to access and use raw data, including Analysis Workspace, Analytics dashboards, Activity Map, Report Builder, Analytics APIs, and Reports & Analytics. |
Google Analytics 4 (GA4) | Access to raw event and user-level data through BigQuery. | Export limit of 5,000 rows when you download a report as a CSV. | Raw data can be accessed and exported through Reports or Explorations in the GA4 web interface, Analytics Data API, and BigQuery. |
Countly | Yes | No explicitly stated data export limits. | Several ways to access and use data, including through its server, a mobile SDK for mobile analytics, or a web SDK for web analytics. |
Mixpanel | Yes | At most two recurring pipelines and one non-recurring pipeline for event export pipelines per project. The raw export API has a rate limit of 60 queries per hour, 3 queries per second, and a maximum of 100 concurrent queries. | Data can be accessed via HTTP API or direct database export. Raw data can be viewed in Mixpanel. |
Matomo | Yes | No specified data export limit. The data can be exported in full. | Data can be accessed via HTTP API or direct database export. Raw data can be viewed in Matomo. |
Amplitude | Yes | CSV report: 5,000 users from Users page, 100,000 rows of data per metric from Charts view. | Over 20 SDKs, HTTP API v2, Batch API, and SQL access. |
Snowplow | Yes | No specified data export limit. | Storage options for data warehouses and lakes, loading data into Redshift, BigQuery, Snowflake, and Databricks, and querying data. |
Heap | Yes | CSV report: 5,000 users from Users page, 100,000 rows of data per metric from graphs and funnels. | Tracking events, querying data, and using a data model to aggregate data. |
Raw data benefits and use cases
Raw data is beneficial because it’s highly relevant, specific to the research being done, and provides fresh information. This makes it suitable for supporting data-driven decisions.
As it gives preliminary visibility to the dataset, it offers more freedom in data transformation. On top of that, raw data gives you a backup to refer to when encountering problems after processing and analyzing your data.
Let’s now briefly discuss typical use cases of raw data.
Tracking complete customer journeys
Raw data from multiple sources can be used to track the complete customer journey across different platforms. This includes online and offline data and data from different platforms like a website and mobile app.
This type of data represents the unaltered voices and behaviors of consumers. Whether it’s the transcript of a focus group discussion or the record of online purchases during a holiday sale, raw data captures the market in its most natural state. By connecting these data points, organizations can understand how users move between different platforms and how their campaigns lead to conversions.
Attribution modeling
Organizations can use raw data for attribution modeling, either by doing it themselves or by hiring agencies that use algorithms. Attribution modeling involves using data analysis and statistical modeling techniques to determine the contribution of each marketing touchpoint in driving conversions or sales.
Custom dashboards
Unlike processed or summarized data, raw data hasn’t been subjected to any interpretation or altered by any tool. This means businesses can approach it with different analytical tools and derive various insights based on changing needs.
For example, raw data can be used to create insights dashboards in BI tools or companies’ apps for internal or external needs. Or, agencies might compile reports for their clients using raw data. These dashboards can help organizations visualize their data and extract meaningful insights.
Adhering to regulations
Certain client segments, such as government entities or media, may have to adhere to regulations that require them to send data to regulatory and controlling entities. Raw data can help fulfill these regulatory requirements, for example:
- Healthcare providers often need to send raw data to government entities for public health monitoring, research, and regulatory compliance. This includes data related to patient demographics, diagnoses, treatments, and outcomes.
- Banks, insurance companies, and other financial institutions often need to send raw data to regulatory bodies for compliance with financial regulations. This can include data on transactions, customer behavior, risk assessments, and more.
- Telecommunications companies may need to send raw data to regulatory entities to comply with regulations related to network performance, customer service, pricing, and more.
- Retailers may need to send raw data to supply chain partners or regulatory bodies. This can include data on sales, inventory, customer behavior, and more.
- Government entities often need to send raw data to other government bodies for purposes of oversight, coordination, and compliance with laws and regulations.
- Media companies may need to send raw data to regulatory entities, especially if they’re based in countries with strict media regulations. This can include data on viewership, content, advertising, and more.
Accessing reports outside analytics platforms
Companies may want to access raw data without creating accounts in their analytics platform. In such cases, downloaded raw data can be made available as reports in tools like Power BI, and anyone within the organization can access it.
Conclusion
While data sampling can be useful for efficiency and performance, raw data is still indispensable in specific contexts. It can lead to more accurate and insightful decisions by providing precision and depth. Both sampled and raw data have their place in data analytics. They can significantly benefit decision-making processes and play a vital role in an organization’s sustainable growth and success.
However, in the end, precise data fuels precise decisions. Raw data is a viable source for a sustainable future for your organization.
If you want to learn more about how you can use raw data in Piwik PRO to do more with advanced analytics, reach out to us: