Subscribe to DSC Newsletter

Jonathan Symonds's Blog (11)

Breaking the HDFS Performance Barrier; An Object Storage First

By Siddartha Mani, MinIO

Few would argue with the statement that Hadoop HDFS is in decline. In fact, the HDFS part of the Hadoop ecosystem is in more than just decline - it is in freefall. At the time of its inception, it had a meaningful role to play as a high-throughput, fault-tolerant distributed file system. The secret sauce was data locality. 

By co-locating compute and data on the same nodes, HDFS overcame the limitations of slow network access to…

Continue

Added by Jonathan Symonds on August 6, 2019 at 1:03pm — No Comments

Running Peta-Scale Spark Jobs on Object Storage Using S3 Select

When one looks at the amazing roster of talks for most data science conferences what you don’t see is a lot of discussion on how to leverage object storage. On some level you would expect to — ultimately if you want to run your Spark or Presto job on peta-scale data sets and have it be available to your applications in the public or private cloud — this would be the logical storage architecture.

While logical, there has been a catch, at least historically, and that is object storage…

Continue

Added by Jonathan Symonds on June 25, 2019 at 9:00am — No Comments

Relationships, Geometry, and Artificial Intelligence

By Gunnar Carlsson

December 3, 2018

In their very provocative paperPeter Battaglia and his colleagues, posit that in order for artificial intelligence (AI) to achieve the capabilities of human intelligence, it must be…

Continue

Added by Jonathan Symonds on December 4, 2018 at 3:00pm — No Comments

Using unsupervised learning to improve prediction performance

By Gunnar Carlsson

The appeal of forecasting the future is very easy to understand, even though it is not realizable.  That has not stopped an entire generation of analytics companies from selling such a promise. It also explains the myriad methods that attempt to give partial, inexact, and probabilistic information about the future.

Even if they could deliver on a…

Continue

Added by Jonathan Symonds on November 20, 2018 at 1:00pm — No Comments

How the incorporation of prior information can accelerate the speed at which neural networks learn while simultaneously increasing accuracy

Deep neural nets typically operate on “raw data” of some kind, such as images, text, time series, etc., without the benefit of “derived” features. The idea is that because of their flexibility, neural networks can learn the features relevant to the problem at hand, be it a classification problem or an estimation problem.  Whether derived or learned, features are important. The challenge is in determining how one might use what one learned from the features in future work (staying…

Continue

Added by Jonathan Symonds on August 30, 2018 at 7:00am — No Comments

Going Deeper: More Insight Into How and What Convolutional Neural Networks Learn

In my earlier post I discussed how performing topological data analysis on the weights learned by convolutional neural nets (CNN’s) can give insight into what is being learned and how it is being learned.  

The significance of this work can be summarized as follows:

  1. It…
Continue

Added by Jonathan Symonds on August 9, 2018 at 11:30am — No Comments

Using Topological Data Analysis to Understand the Behavior of Convolutional Neural Networks

TLDR: Neural Networks are powerful but complex and opaque tools. Using Topological Data Analysis, we can describe the functioning and learning of a convolutional neural network in a compact and understandable way. The implications of the finding are profound and can accelerate the development of a wide range of applications from self-driving everything to GDPR.

Introduction

Neural networks have demonstrated a great…

Continue

Added by Jonathan Symonds on June 21, 2018 at 9:30am — No Comments

Alternatives to algebraic modeling for complex data: topological modeling via Gunnar Carlsson

For many, mathematical modeling is exclusively about algebraic models, based on one form or another of regression or on differential equation modeling in the case of dynamical systems.  

However, this is too restrictive a point of view.  For example, a clustering algorithm can be regarded as a modeling mechanism applicable to data where linear regression simply isn’t applicable. Hierarchical clustering can also be regarded as a modeling mechanism, where the output is a dendrogram and…

Continue

Added by Jonathan Symonds on March 31, 2017 at 7:00am — No Comments

Peering into the Black Box - Gurjeet Singh, co-founder and CEO of Ayasdi

“Is the model a black-box?”

 

This is a question that many a data scientist struggle with in communicating with business. In all fairness, there are plenty of business situations which require the models to be transparent, such…

Continue

Added by Jonathan Symonds on January 9, 2016 at 1:11pm — 1 Comment

Why Topological Data Analysis Works

Topological data analysis has been very successful in discovering information in many large and complex data sets. In this post, I would like to discuss the reasons why it is an effective methodology.

One of the key messages around topological data analysis is that data has shape and the shape matters.…

Continue

Added by Jonathan Symonds on January 8, 2015 at 11:00am — 2 Comments

Looking Forward: Big Data in 2015 by Gurjeet Singh

2014 has been a landmark year for Big Data. The most spectacular example of this was the Hortonworks IPO – a success by any measure.

As we look forward to 2015, it is clear that…

Continue

Added by Jonathan Symonds on January 8, 2015 at 11:00am — 1 Comment

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service