Does anyone have any advice for good data quality audit processes/ checks?
I have a live data set and I have a number of criteria which each record must have. Some sensical and logical checks. Such as missing fields, if this field is x then this field must have a date and anomaly detection.
Has anyone got any suggestions- currently the process I run is very manual filtering the data to check. I wonder if there is a better way.
I am thinking a algorithm which I can press run and it will run these checks and flag outliers/ records which do not pass the checks
Here is a good start to do that: https://github.com/SauceCat/pydqc. Good luck!
Posted 1 March 2021
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles