By Siddartha Mani
Few would argue with the statement that Hadoop HDFS is in decline. In fact, the HDFS part of the Hadoop ecosystem is in more than just decline - it is in freefall. At the time of its inception, it had a meaningful role to play as a high-throughput, fault-tolerant distributed file system. The secret sauce was data locality.
By co-locating compute and data on the same nodes, HDFS overcame the limitations of slow network access to data. The…Continue
Added by Jonathan Symonds on August 6, 2019 at 1:00pm — No Comments
When one looks at the amazing roster of talks for most data science conferences what you don’t see is a lot of discussion on how to leverage object storage. On some level you would expect to — ultimately if you want to run your Spark or Presto job on peta-scale data sets and have it be available to your applications in the public or private cloud — this would be the logical storage architecture.
While logical, there has been a catch, at least historically, and that is object storage…Continue
Added by Jonathan Symonds on June 25, 2019 at 9:00am — No Comments
By Gunnar Carlsson
December 3, 2018
Added by Jonathan Symonds on December 4, 2018 at 3:00pm — No Comments
The appeal of forecasting the future is very easy to understand, even though it is not realizable. That has not stopped an entire generation of analytics companies from selling such a promise. It also explains the myriad methods that attempt to give partial, inexact, and probabilistic information about the future.
Even if they could deliver on a…
Added by Jonathan Symonds on November 20, 2018 at 1:00pm — No Comments
Deep neural nets typically operate on “raw data” of some kind, such as images, text, time series, etc., without the benefit of “derived” features. The idea is that because of their flexibility, neural networks can learn the features relevant to the problem at hand, be it a classification problem or an estimation problem. Whether derived or learned, features are important. The challenge is in determining how one might use what one learned from the features in future work (staying…Continue
Added by Jonathan Symonds on August 30, 2018 at 7:00am — No Comments
In my earlier post I discussed how performing topological data analysis on the weights learned by convolutional neural nets (CNN’s) can give insight into what is being learned and how it is being learned.
The significance of this work can be summarized as follows:
Added by Jonathan Symonds on August 9, 2018 at 11:30am — No Comments
TLDR: Neural Networks are powerful but complex and opaque tools. Using Topological Data Analysis, we can describe the functioning and learning of a convolutional neural network in a compact and understandable way. The implications of the finding are profound and can accelerate the development of a wide range of applications from self-driving everything to GDPR.
Neural networks have demonstrated a great…
Added by Jonathan Symonds on June 21, 2018 at 9:30am — No Comments
For many, mathematical modeling is exclusively about algebraic models, based on one form or another of regression or on differential equation modeling in the case of dynamical systems.
However, this is too restrictive a point of view. For example, a clustering algorithm can be regarded as a modeling mechanism applicable to data where linear regression simply isn’t applicable. Hierarchical clustering can also be regarded as a modeling mechanism, where the output is a dendrogram and…Continue
Added by Jonathan Symonds on March 31, 2017 at 7:00am — No Comments
“Is the model a black-box?”
This is a question that many a data scientist struggle with in communicating with business. In all fairness, there are plenty of business situations which require the models to be transparent, such…Continue
Topological data analysis has been very successful in discovering information in many large and complex data sets. In this post, I would like to discuss the reasons why it is an effective methodology.
One of the key messages around topological data analysis is that data has shape and the shape matters.…
2014 has been a landmark year for Big Data. The most spectacular example of this was the Hortonworks IPO – a success by any measure.
As we look forward to 2015, it is clear that…