Before looking to technology to fix your data quality issues, first make sure your data infrastructure, business processes, and organizational culture promote the use of high-quality data. This blog discusses why data quality matters, sources of bad data quality, and how you can fix your data quality issues.

Data is a vital asset to any organization, and the way it is collected, organized, and managed should be a top priority. Without high-quality data, it can be impossible to move forward with business objectives—be it customer satisfaction, organizational growth, or improved products and services.

The longevity and success of your business rests on the quality of your data. And the quality your data—which comes from a multitude of sources—rests on more than just the shoulders of technology. You need the right people and processes, too.

What Are the Sources of Data Quality Issues?

For data to be of high quality, it needs to be fit for its intended uses in operations, decision-making, and planning. High-quality data can help a business meet its objectives. Bad quality data, on the other hand, can lead to costly mistakes and missed opportunities.

But how do you know if you have data quality issues and why they are happening? The first step to ensure good data quality is to understand the root cause of your data issues. Here are a few common sources of where data quality issues often come from:

  • Business Process Issues: There are many points during the data lifecycle when data quality issues can arise—the primary one is when data is created and handled by business users. For example, when sales is entering a new contact into the CRM, when marketing is importing a list of leads from a tradeshow, when finance is invoicing, or when supply chain management is updating inventory. If you do not have outlined business processes in place when data is being handled and used by the business, data quality issues are sure to arise.In order to help combat this, for any use case, you need to make clear which data fields need to be mandatory, which can be optional, and provide the rationale for why. You want the workflows for each entry to make sense from end-to-end or it may result in inaccurate data, or worse, it can jam up a business process. Additionally, you want to allow for the input and approval of new values when warranted. Otherwise, a business user will be forced to pick from a list of inaccurate or incomplete values, or use a free-text field, which can produce variables in your data.
  • Lack of Actively Managed Reference Data: Reference data—which is curated prior to the use of data—characterizes, classifies, or relates to other data within enterprise applications and databases. It provides the who, what, and where in business transactions and is necessary to understand all of your other datasets. If not properly managed, it can be very costly to clean and reconcile everything that is associated with it and can lead to operational, compliance, analytical, and distribution risk.In order to actively manage reference data, you need to define who is in charge of it, how often it is reviewed, what the process is for adding or removing values, and how to manage historical, inactive values.
  • Lack of Clear Ownership: Data quality affects the entire organization and good quality data must be recognized as an enterprise-wide asset. This needs to be a top-down mentality and requires executive sponsorship in order for the program to be successful.Often times, different business functions are reluctant to give up control of the data they consume, or in some cases, they are reluctant to accept ownership of the data they input. This can cause data quality issues for other parts of the organization. You need to have clear ownership with assigned roles and responsibilities. You need to have a data owner who is responsible for the data and a data steward, who manages the data. You need to have data governance standards for your organization, which will help determine who needs access to what, where data resides, and how your company can meet business needs without compromising security or compliance requirements. Data that is not actively monitored and managed creates risk over time.
  • Lack of Data Standards: There is a strong relationship between data quality and data standards, which are meant to ensure a unified approach to data entry. If incorrect, duplicate, or bad data is being entered into your systems, then you will never get good analytics from these systems (garbage in, garbage out). You need to define standards for data types, sizes, allowable characters, formatting, and much more to ensure the data getting entered from the start is of good quality. The key is to have a standard way of storing data to ensure the entire organization is aligned. Although typically done at the point of data entry, there can be instances where you’ll need to adjust standards at other points of the data workflow in order to conform.
  • Lack of Single Source of Truth: As more and more data sources become available and as organizations utilize multiple operational systems—it is critical to have one single source of truth for the data that is being used enterprise wide. A single source of truth implies that your organization’s data is stored in a central data repository, like a data warehouse, and this data is being used in your analytics. Without a single source of truth, it can be impossible to make informed decisions across the company.In situations where there isn’t one single source of truth for data, you need to define which data will ultimately be used downstream. Clear rules and ownership help remove confusion and help make a decision about consolidation and conformity.
  • Lack of Provenance: Data provenance refers to the historical record of data and its origins, as well as an explanation of how and why it got to its present place within an organization. It also helps source issues and answer questions such where the quality issue originated. This information ensures authenticity and allows for that data to be reused. Without data provenance, you cannot answer why, how, where, when, and by whom data was produced, ultimately making it less reliable.

How Do You Fix Your Data Quality Issues?

As more organizations drive down the path of digital transformation, they invest a lot of time and money in modern technology and infrastructure. But often times, they miscalculate the importance of people and processes. Although technology helps address problems downstream, many times simple upstream fixes focused on people and processes can be less costly overall.

There are many ways to address data quality issues, and you can start at different points along the way. Ultimately, you should really look at and address data quality issues at all levels of your data lifecycle. Here are some ways you can start to address data quality issues within your organization from all major points of interaction:

  • Develop a Data Governance Program: Data governance ensures that data quality standards are being followed. The program is in place to hold people, processes, and technology accountable. It answers questions such as: What constitutes data; where and how it is collected, extracted, transformed, delivered, and used; and who cares for and maintains which data. Data governance goes beyond IT and includes stakeholders from across the organization to break down departmental silos and multiple versions of the truth. It is imperative to have a data governance program in place that assigns roles and responsibilities for who owns systems, who owns data, who stewards data, and who holds owners accountable. With this in place, you can catch or even prevent issues before they become costly problems.
  • Examine Source Systems: Take the time to examine each source system feeding your analytics. Check to make sure the source systems are accurately capturing your business processes. Look for bottlenecks and gaping holes. Look at business processes and analyze what fields are mandatory; make adjustments if necessary. Analyze system configuration for approval workflows and catch errors prior to them being propagated. And finally, analyze reference data and how the systems are being leveraged. Tighten up systems and processes so that data collection is seamless and less prone to creating data quality issues.
  • Read Between the Lines: To start, you can create exception reports to identify where gaps within your data exist. Exception reports automatically identify gaps such as missing product names, orders without a status, sales opportunities without an assigned salesperson, etc. Exception reporting isn’t the end of the line, however. Alerting thresholds, remediation lineage, replacement values, and remediation workflows should be part of this process as well.You can identify issues with your data by reading between the lines and monitoring end-to-end data flows—the movement of data across the different systems—and addressing quality issues when you find them. Define data quality thresholds which will help identify outliers and alert data owners and stewards when there are red flags. These individuals will be responsible for ensuring quality is maintained, especially as your data grows. It is critical to identify outliers as they are problematic for statistical analyses. It is equally important to define replacement values for known bugs as a stopgap. Each of these measures can help address data quality issues after the data has left the source system, but before it reaches its target. Once you have identified the issues within your data, you can take corrective action to ensure your data is accurate and complete.
  • Bring It to Light: It can be easy to let something fall between the cracks, so it is important to make data quality part of your organization’s overall business objectives. Educating your business users on data quality is critical. It will not only ensure that everyone is using best practices, but it will also empower everyone to raise a red flag when they see it.One way to do this is by making audit reports visible to users of data for accountability with a dashboard that shows counts of threshold exceptions, outliers, duplicate entries, etc., and how it changes over time. This helps with governance and ensuring that standards are being followed. Everyone has a role to play with data quality—whether they are inputting or just simply using it—all hands are on deck.

Making data quality a strategic part of business operations empowers users to know their limitations as well as their abilities—with everyone participating, your business can be one step closer to high-quality data. Having the right data infrastructure in place is important. Having the right tools and technologies in place is also important. But understanding and bringing to light the gaps in business processes and user skillsets and addressing them is what ultimately will lead you to having better quality data.

As data becomes the cornerstone of every successful business, it is critical to establish high-quality data from the start.

 

Talk With a Data Analytics Expert

Sharon Rehana Sharon Rehana is the content manager at Analytics8 with experience in creating content across multiple industries. She found a home in data and analytics because that’s where storytelling always begins.
Subscribe to

The Insider

Sign up to receive our monthly newsletter, and get the latest insights, tips, and advice.