In this weeks Five Question Blitz we focus on data quality. Data quality is an important foundational component for having an effective Data & Analytic Information system.
The Five Question Blitz was created to answer five questions relevant to Data & Analytics. Topics will be broad and answers will be simplified. Our goal is to promote common definitions and increase the general knowledge of individuals with interest in Data & Analytics.
Data quality is a measurement of the ability for data to fulfill its intended purpose. Data with the purpose of supporting Data & Analytics must be accurate, clearly defined, have consistent formats, and maintain time integrity. Data that can fulfill all of these needs has a high level of quality.
Data cleansing is the process of ensuring data accuracy. At times source data may contain records that are incomplete, corrupted, incorrect, or any number of issues that make it unqualified to exist inside of a data warehouse. The goal of data cleansing is to transform the data (if possible) to ensure accuracy and consistency within the data warehouse.
A system of record (sometimes referred to as source system of record) is the data source declared to be the accurate representation for a given data element. This means if there are multiple data environments or multiple reports pulling data from multiple data environments; to determine data accuracy the data must be compared to the system of record.
The source of truth is an information systems design theory. The intent is to ensure that data throughout the data environment is accurate. This is accomplished by declaring a single data source as the accurate representation of data elements. All other data systems then pull data from that single source. Typically this is done in a systemic fashion to maintain integrity.
Data governance is the process of creating a system for managing data. This data management system contains information describing what processes generate the data, how the data fits into the overall business process, what the business definition is, and how it will be used/consumed by the business. The primary goal of data governance is to ensure availability, usability, consistency, and integrity.
Click here to read last weeks article, Five Question Blitz: Data Storage Systems.
Here are some websites for reference:
About the Author: My name is Ion King and I am the Chief Executive Officer at SimDnA. My focus is on helping others passionate about growing careers in Data Science & Analytics achieve their goals. Connect with me on LinkedIn or find more of my articles on medium