Subscribe to DSC Newsletter

Randall Shane's Blog (8)

Flafka: Big Data Solution for Data Silos

From the previous post on “Poor Data Management Practices“,  the discussion ended with a high level approach to one possible solution for data silos. Traditional approaches for solving the data silo problem can cost millions of dollars (even for a moderately sized company), and typically requires a huge effort in integration work (e.g., data modeling, system…

Continue

Added by Randall Shane on March 22, 2017 at 1:30pm — No Comments

Ontologies: Practical Applications

In the previous post, "Why Ontologies", we explored concepts at a very high level about what an ontology is, and how they can be used in AI, NLP, data integration, and knowledge management applications. So what does the picture of satellites orbiting the earth have to do with…

Continue

Added by Randall Shane on February 28, 2017 at 8:00am — No Comments

Why Ontologies?

In short, an ontology is the specification of a conceptualization. What does that mean from the perspective of the information sciences? Wikipedia's definition: "formal naming and definition of the types, properties, and interrelationships of the entities that really or fundamentally exist for a particular domain of discourse." This basically means, like all models, that it is a representation of what actually exists in reality. A statistical or mathematical…

Continue

Added by Randall Shane on February 22, 2017 at 5:30pm — No Comments

Summary of NOAA Analysis

v0.2 of the application.

The analysis and discussion over the last few months on data integrity have finally positioned me to do some basic analysis of the NOAA data. Undoubtedly, as you look through the data in the interactive program below, you will see things that cause you to question the data. If you have the time, and the interest, please go back and read through my earlier posts on …

Continue

Added by Randall Shane on August 22, 2015 at 7:00am — No Comments

Little Debate: Data Priorities for all Industries

The figure titled "Data Pipeline" is from an article by Jeffrey T. Leek & Roger D. Peng titled, "Statistics: P values are just the tip of the iceberg. These are both well known scientists in the field of statistics and data science, and for them, there is no need to debate the importance of data integrity; it is a fundamental concept. Current terminology uses the term "tidy data", a phrase coined by Hadley Wickham from an article by the same name. Whatever you…

Continue

Added by Randall Shane on July 3, 2015 at 12:00pm — 1 Comment

Data Integrity: The Rest of the Story Part II

Buzz words are one of my least favorite things, but as buzz words go, I can appreciate the term “Data Lake.” It is one of the few buzz words that communicates a meaning very close to its intended definition. As you might imagine, with the advent of large scale data processing, there would be a need to name the location where lots of data resides, ergo, data lake. I personally prefer to call it a series of redundant commodity servers with Direct-Attached Storage, or hyperscale computing with…

Continue

Added by Randall Shane on May 21, 2015 at 3:13pm — 1 Comment

Data Integrity: The Rest of the Story Part I

Continue

Added by Randall Shane on May 2, 2015 at 4:30pm — No Comments

Data Integrity - A Sequence of Words Lost in the World of Big Data

The subject of this blog might seem rather rudimentary for those who fully understand the importance of properly managing data. For those people, hopefully, you will find the post worth reading and provide constructive feedback and augment the discussion. The purpose of this post is to highlight the necessity to keep data clean and orderly so that the results of the analysis are reliable and trustworthy - if data integrity is intact, information derived from this data will be trustworthy…

Continue

Added by Randall Shane on March 8, 2015 at 12:00pm — No Comments

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service