Subscribe to DSC Newsletter

There is often confusion between the definitions of "data veracity" and "data quality".

Data veracity is sometimes thought as uncertain or imprecise data, yet may be more precisely defined as false or inaccurate data. The data may be intentionally, negligently or mistakenly falsified. Data veracity may be distinguished from data quality, usually defined as reliability and application efficiency of data, and sometimes used to describe incomplete, uncertain or imprecise data.

The unfortunate reality is that for most data analytic projects about one half or more of time is spent on "data preparation" processes (e.g., removing duplicates, fixing partial entries, eliminating null/blank entries, concatenating data, collapsing columns or splitting columns, aggregating results into buckets...etc.). I suggest this is a "data quality" issue in contrast to false or inaccurate data that is a "data veracity" issue.

Data veracity is a serious issue that supersedes data quality issues: if the data is objectively false then any analytical results are meaningless and unreliable regardless of any data quality issues. Moreover, data falsity creates an illusion of reality that may cause bad decisions and fraud - sometimes with civil liability or even criminal consequences.

Views: 1966

Tags: Data, Efficiency, Falsity, Illusion, Imprecise, Quality, Reality, Uncertain, Veracity, of

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Dennis Crow on January 30, 2015 at 8:14am

Another perspective is that veracity pertains to the probability that the data provides 'true' information through BI or analytics. This is very likely to derive from statistical estimates.  Even if you are working with raw data, data quality issues may still creep in. Veracity is the end result of testing and evaluation of the content and structure of the data. Getting the 'right' answer does supersede data quality tests. This applies to geo-spatial and geo-spatially-enabled information as well.

Comment by Soren Olegnowicz on January 29, 2015 at 5:49am

So, in essence, data veracity has to do with errors of content while data quality more with errors or inconsistencies in structure?

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service