The figure titled "Data Pipeline" is from an article by Jeffrey T. Leek & Roger D. Peng titled, "Statistics: P values are just the tip of the iceberg. These are both well known scientists in the field of statistics and data science, and for them, there is no need to debate the importance of data integrity; it is a fundamental concept. Current terminology uses the term "tidy data", a phrase coined by Hadley Wickham from an article by the same name. Whatever you…
ContinueAdded by Randall Shane on July 3, 2015 at 12:00pm — 1 Comment
Buzz words are one of my least favorite things, but as buzz words go, I can appreciate the term “Data Lake.” It is one of the few buzz words that communicates a meaning very close to its intended definition. As you might imagine, with the advent of large scale data processing, there would be a need to name the location where lots of data resides, ergo, data lake. I personally prefer to call it a series of redundant commodity servers with Direct-Attached Storage, or hyperscale computing with…
ContinueAdded by Randall Shane on May 21, 2015 at 3:13pm — 1 Comment
We are all very fortunate to be alive during this exciting time in history. Some truly disruptive technologies are on the verge of exploding into reality and it is difficult to imagine what the future holds. With these new technologies, however, we must not ignore the technically sound practices that allowed us to reach this point – managing data integrity is one of those practices.
As promised from my last post, I will discuss the importance of data integrity in the…
Added by Randall Shane on May 2, 2015 at 4:30pm — No Comments
The subject of this blog might seem rather rudimentary for those who fully understand the importance of properly managing data. For those people, hopefully, you will find the post worth reading and provide constructive feedback and augment the discussion. The purpose of this post is to highlight the necessity to keep data clean and orderly so that the results of the analysis are reliable and trustworthy - if data integrity is intact, information derived from this data will be trustworthy…
ContinueAdded by Randall Shane on March 8, 2015 at 12:00pm — No Comments
© 2019 Data Science Central ®
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles