From the previous post on “Poor Data Management Practices“, the discussion ended with a high level approach to one possible solution for data silos. Traditional approaches for solving the data silo problem can cost millions of dollars (even for a moderately sized company), and typically requires a huge effort in integration work (e.g., data modeling, system…Continue
Added by Randall Shane on March 22, 2017 at 1:30pm — No Comments
In the previous post, "Why Ontologies", we explored concepts at a very high level about what an ontology is, and how they can be used in AI, NLP, data integration, and knowledge management applications. So what does the picture of satellites orbiting the earth have to do…Continue
Added by Randall Shane on February 28, 2017 at 8:00am — No Comments
In short, an ontology is the specification of a conceptualization. What does that mean from the perspective of the information sciences? Wikipedia's definition: "formal naming and definition of the types, properties, and interrelationships of the entities that really or fundamentally exist for a particular domain of discourse." This basically means, like all models, that it is a representation of what actually exists in reality. A statistical or mathematical…
Added by Randall Shane on February 22, 2017 at 5:30pm — No Comments
v0.2 of the application.
The analysis and discussion over the last few months on data integrity have finally positioned me to do some basic analysis of the NOAA data. Undoubtedly, as you look through the data in the interactive program below, you will see things that cause you to question the data. If you have the time, and the interest, please go back and read through my earlier posts on …
Added by Randall Shane on August 22, 2015 at 7:00am — No Comments
The figure titled "Data Pipeline" is from an article by Jeffrey T. Leek & Roger D. Peng titled, "Statistics: P values are just the tip of the iceberg. These are both well known scientists in the field of statistics and data science, and for them, there is no need to debate the importance of data integrity; it is a fundamental concept. Current terminology uses the term "tidy data", a phrase coined by Hadley Wickham from an article by the same name. Whatever you…Continue
Buzz words are one of my least favorite things, but as buzz words go, I can appreciate the term “Data Lake.” It is one of the few buzz words that communicates a meaning very close to its intended definition. As you might imagine, with the advent of large scale data processing, there would be a need to name the location where lots of data resides, ergo, data lake. I personally prefer to call it a series of redundant commodity servers with Direct-Attached Storage, or hyperscale computing with…Continue
We are all very fortunate to be alive during this exciting time in history. Some truly disruptive technologies are on the verge of exploding into reality and it is difficult to imagine what the future holds. With these new technologies, however, we must not ignore the technically sound practices that allowed us to reach this point – managing data integrity is one of those practices.
As promised from my last post, I will discuss the importance of data integrity in the…
Added by Randall Shane on May 2, 2015 at 4:30pm — No Comments
The subject of this blog might seem rather rudimentary for those who fully understand the importance of properly managing data. For those people, hopefully, you will find the post worth reading and provide constructive feedback and augment the discussion. The purpose of this post is to highlight the necessity to keep data clean and orderly so that the results of the analysis are reliable and trustworthy - if data integrity is intact, information derived from this data will be trustworthy…Continue
Added by Randall Shane on March 8, 2015 at 12:00pm — No Comments