Subscribe to DSC Newsletter

Randall Shane's Blog Posts Tagged 'Big' (5)

Summary of NOAA Analysis

v0.2 of the application.

The analysis and discussion over the last few months on data integrity have finally positioned me to do some basic analysis of the NOAA data. Undoubtedly, as you look through the data in the interactive program below, you will see things that cause you to question the data. If you have the time, and the interest, please go back and read through my earlier posts on …

Continue

Added by Randall Shane on August 22, 2015 at 7:00am — No Comments

Little Debate: Data Priorities for all Industries

The figure titled "Data Pipeline" is from an article by Jeffrey T. Leek & Roger D. Peng titled, "Statistics: P values are just the tip of the iceberg. These are both well known scientists in the field of statistics and data science, and for them, there is no need to debate the importance of data integrity; it is a fundamental concept. Current terminology uses the term "tidy data", a phrase coined by Hadley Wickham from an article by the same name. Whatever you…

Continue

Added by Randall Shane on July 3, 2015 at 12:00pm — 1 Comment

Data Integrity: The Rest of the Story Part II

Buzz words are one of my least favorite things, but as buzz words go, I can appreciate the term “Data Lake.” It is one of the few buzz words that communicates a meaning very close to its intended definition. As you might imagine, with the advent of large scale data processing, there would be a need to name the location where lots of data resides, ergo, data lake. I personally prefer to call it a series of redundant commodity servers with Direct-Attached Storage, or hyperscale computing with…

Continue

Added by Randall Shane on May 21, 2015 at 3:13pm — 1 Comment

Data Integrity: The Rest of the Story Part I

Continue

Added by Randall Shane on May 2, 2015 at 4:30pm — No Comments

Data Integrity - A Sequence of Words Lost in the World of Big Data

The subject of this blog might seem rather rudimentary for those who fully understand the importance of properly managing data. For those people, hopefully, you will find the post worth reading and provide constructive feedback and augment the discussion. The purpose of this post is to highlight the necessity to keep data clean and orderly so that the results of the analysis are reliable and trustworthy - if data integrity is intact, information derived from this data will be trustworthy…

Continue

Added by Randall Shane on March 8, 2015 at 12:00pm — No Comments

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service