Share

DATA QUALITY DEVELOPMENT - Part 5

Christopher Wagner • Jan 12, 2022

You want quality data? You gotta build it!

Data platforms do not spring forth fully built and operational. They take time to develop and build out. The same is true for a Data Quality practice. To ensure you have the appropriate Data Quality checks on your platform, include QA Elements to your Quality Framework for each Feature, HotFix, and Defect.

 

Feature- every feature added to a data environment should include a series of tests and validations necessary to say that the feature is delivered successfully. Define the QA tests when creating each new feature so Quality Engineering, Data Engineering, and Analytics Engineering understand the scope and complexity of the feature. Validation tests should be a baseline requirement for every code deployment. 

 

It's essential to understand the Quality requirements as you design features, the acceptable variance within the system, and the appropriate actions to take given an issue. 

 

Some data may have no tolerance for variance, and some may have a degree of variance and still be acceptable. 

 

Examples: 

Batch loads of data from a live system of record will have a variance between when the batch loads start and when validated against the source. 

 

Large numbers of transactions added/multiplied together can create variance due to rounding differences between systems. Enough records with '.00000000012' cents can add up. 

 

Setting up expectations for reasonable variance levels for any feature is vital to establish at the start. 

 

These are the recommended thresholds and responses: 

 

0-1% Feature passes QA 

 

1-3% Feature has a Level 1 Defect. Log the Defect in the backlog, and the business provides approval/ rejection of moving to Production. 

 

3-5% Feature has a Level 2 Defect. Log the Defect in the backlog, and the business provides approval/ rejection of moving to Production. A warning is added to the Defect as a potentially significant issue. If this issue is detected in Production, follow rollback/ 3x retry, then manual review process: Alert engineering, quality, and business of a potential problem. 

 

5%+ feature fails QA and is sent back to engineering as a bug where code cannot move to Production. If detection occurs in Production, alerts are sent to engineering, quality, business, and any change review process warning of the issue and the need for a HotFix. The case is logged on the production quality dashboard. 

 

HotFix: Issues with upstream data quality will always occur. Upstream changes impact data consumption, leading to Hot Fix changes to align data ASAP. 

 

If there is ever a need for engineering to build code and implement a Hot Fix that was not triggered by a QA test, then QA needs to add quality tests when the Hot Fix change request goes live. This ensures that: 

 

1. QA has 'skin in the game' and is working closely with the business and engineering. 

 

2. Any defects addressed by the Hot Fix have quality tests associated with the issue, ensuring that these issues are blocked or engineering is alerted to the problem in the future. 

 

3. Encourages QA to have robust data quality checks in place to minimize their Hot Fix engagements (no one wants to respond to issues in the dead of night).

 

 

Defect: In every complex system, imperfections will exist. When a Defect is identified, and engineering develops a fix, quality should build tests to ensure the issue is resolved correctly. Treat defects in the same way features are treated. 



DATA QUALITY BLOG SERIES

Each day the Data Quality Blog post will be released at 8:45 AM each day.


DATA QUALITY - Part 1 January 6th

DATA QUALITY CONCEPTS - Part 2 January 7th

DATA QUALITY FOR EVERYONE - Part 3 January 10th

DATA QUALITY FRAMEWORK - Part 4 January 11th

DATA QUALITY DEVELOPMENT - Part 5 January 12th

QUALITY DATA - Part 6 January 13th




CHRIS WAGNER, MBA MVP

Analytics Architect, Mentor, Leader, and Visionary

Chris has been working in the Data and Analytics space for nearly 20 years. Chris has dedicated his professional career to making data and information accessible to the masses. A significant component in making data available is continually learning new things and teaching others from these experiences. To help people keep up with this ever-changing landscape, Chris frequently posts on LinkedIn and to this blog.
By Christopher Wagner 15 Nov, 2023
In a dynamic data engineering scenario, Sam, a skilled professional, adeptly navigates urgent requests using Microsoft Fabric. Collaborating with Data Steward Lisa and leveraging OneLake, Sam streamlines data processes, creating a powerful collaboration between engineering and stewardship. With precision in Azure Data Factory and collaboration with a Data Scientist, Sam crafts a robust schema, leading to a visually appealing Power BI report.
By Christopher Wagner 28 Apr, 2023
NOTE: This is the first draft of this document that was assembled yesterday as a solo effort. If you would like to contribute or have any suggestions, check out my first public GIT repository - KratosDataGod/LakehouseToPowerBI: Architectural design for incorporating a Data Lakehouse architecture with an Enterprise Power BI Deployment (github.com) This article is NOT published, reviewed, or approved by ANYONE at Microsoft. This content is my own and is what I recommend for architecture and build patterns.
Work Hard - Let's GO!
By Christopher Wagner 14 Apr, 2023
Work Hard - Let's GO!
Data God Guide to Learning DAX
By Christopher Wagner 22 Jan, 2023
DAX is more than the name of my new puppy
Show More
Share by: