Cassandra is a popular, tried-and-true NoSQL database that supports key-value wide-column tables. Like any powerful tool, Cassandra has its ideal use cases - in particular, Cassandra excels at supporting write-heavy workloads, while having limitations when supporting read-heavy workloads. Cassandra's eventual consistency model and lack of…Continue
Added by Jonathan Symonds on January 24, 2021 at 7:30am — No Comments
Over the last decade or so, object storage use cases have evolved considerably as they replace traditional file and block use cases. Specifically the need to work with small data objects is becoming commonplace. Yes, there’s still plenty of large objects but small objects are becoming more prevalent than large for specific workloads and application environments.
Traditional object storage systems were designed for large objects that were infrequently accessed. With today’s…Continue
Added by Jonathan Symonds on December 23, 2020 at 1:55pm — No Comments
Everybody claims to be a software company these days. From the nearly decade old pronouncement by Marc Andressen that “Software Is Eating the World” to the push from Wall Street to produce recurring software revenue; the pressure is on to claim - at least - that you are a software company.
This is obviously problematic for appliance vendors. Try as they might, it does not take much…Continue
Added by Jonathan Symonds on October 2, 2020 at 1:37pm — No Comments
By Frank Wessels
The recent announcement from AWS about the general availability of their new ARM-powered Graviton2 servers caused us to take another look at the performance of these ARM servers. In this blog post we describe the results which you may find surprising.
MinIO is an Apache licensed, open source S3-compatible object storage server with a particular focus on high…Continue
Added by Jonathan Symonds on June 23, 2020 at 10:00am — No Comments
By Frank Wessels
While MD5 hashing is no longer a good choice when considering a hash function, it is still being used in a great variety of applications. As such any performance improvements that can be made to the MD5 hashing speed are worth considering.
Due to recent improvements in SIMD processing (AVX2 and especially AVX512) we are providing a Go …Continue
Added by Jonathan Symonds on April 30, 2020 at 1:06pm — No Comments
Edge computing is a hot topic and carries with it some confusion, particularly around storage. Handling data properly at the edge can ensure a scalable, cost-effective and secure infrastructure - but failing to set up the right architecture can lead to data loss, security vulnerabilities and sky-high costs related to the bandwidth needed to transfer data repeatedly to and from the public cloud. Bandwidth is a key consideration from an architecture perspective, and the reason why is clear: it…Continue
Added by Jonathan Symonds on March 9, 2020 at 8:32am — No Comments
There are two forces that are fundamentally remaking the technology landscape today. One is Kubernetes and the other is high performance Object Storage. They are powering (or are shaped by, depending on your perspective) modern, data-rich applications that include AI/ML and application logs. Either way, modern applications need Kubernetes and Object Storage and Kubernetes and Object Storage owe their rise in part to these same modern applications.
They are symbiotic and they are…Continue
Added by Jonathan Symonds on February 14, 2020 at 10:16am — No Comments
Written by Frank Wessels
JSON has established itself as the "lingua franca" of the web. As such the parsing performance of JSON is hugely important for many applications. Despite the simple and human-friendly nature of JSON, it is not a technically trivial format to parse at high speeds.
Recently some new designs have been presented one of which is …Continue
Added by Jonathan Symonds on February 11, 2020 at 8:30am — No Comments
Via Nitish Tiwari
Kubernetes has fundamentally altered the traditional application development and deployment patterns. Application development teams can now develop, test and deploy their apps in days, across different environments, all within their Kubernetes clusters. Previous generations of technology typically took weeks if not months.
This acceleration is possible due to the abstraction that Kubernetes brings to the table, i.e. it deals with underlying details of…Continue
Added by Jonathan Symonds on February 6, 2020 at 9:00am — No Comments
By Siddartha Mani
Few would argue with the statement that Hadoop HDFS is in decline. In fact, the HDFS part of the Hadoop ecosystem is in more than just decline - it is in freefall. At the time of its inception, it had a meaningful role to play as a high-throughput, fault-tolerant distributed file system. The secret sauce was data locality.
By co-locating compute and data on the same nodes, HDFS overcame the limitations of slow network access to data. The…Continue
Added by Jonathan Symonds on August 6, 2019 at 1:00pm — No Comments
When one looks at the amazing roster of talks for most data science conferences what you don’t see is a lot of discussion on how to leverage object storage. On some level you would expect to — ultimately if you want to run your Spark or Presto job on peta-scale data sets and have it be available to your applications in the public or private cloud — this would be the logical storage architecture.
While logical, there has been a catch, at least historically, and that is object storage…Continue
Added by Jonathan Symonds on June 25, 2019 at 9:00am — No Comments
By Gunnar Carlsson
December 3, 2018
Added by Jonathan Symonds on December 4, 2018 at 3:00pm — No Comments
The appeal of forecasting the future is very easy to understand, even though it is not realizable. That has not stopped an entire generation of analytics companies from selling such a promise. It also explains the myriad methods that attempt to give partial, inexact, and probabilistic information about the future.
Even if they could deliver on a…
Added by Jonathan Symonds on November 20, 2018 at 1:00pm — No Comments
Deep neural nets typically operate on “raw data” of some kind, such as images, text, time series, etc., without the benefit of “derived” features. The idea is that because of their flexibility, neural networks can learn the features relevant to the problem at hand, be it a classification problem or an estimation problem. Whether derived or learned, features are important. The challenge is in determining how one might use what one learned from the features in future work (staying…Continue
Added by Jonathan Symonds on August 30, 2018 at 7:00am — No Comments
In my earlier post I discussed how performing topological data analysis on the weights learned by convolutional neural nets (CNN’s) can give insight into what is being learned and how it is being learned.
The significance of this work can be summarized as follows:
Added by Jonathan Symonds on August 9, 2018 at 11:30am — No Comments
TLDR: Neural Networks are powerful but complex and opaque tools. Using Topological Data Analysis, we can describe the functioning and learning of a convolutional neural network in a compact and understandable way. The implications of the finding are profound and can accelerate the development of a wide range of applications from self-driving everything to GDPR.
Neural networks have demonstrated a great…
Added by Jonathan Symonds on June 21, 2018 at 9:30am — No Comments
For many, mathematical modeling is exclusively about algebraic models, based on one form or another of regression or on differential equation modeling in the case of dynamical systems.
However, this is too restrictive a point of view. For example, a clustering algorithm can be regarded as a modeling mechanism applicable to data where linear regression simply isn’t applicable. Hierarchical clustering can also be regarded as a modeling mechanism, where the output is a dendrogram and…Continue
Added by Jonathan Symonds on March 31, 2017 at 7:00am — No Comments
“Is the model a black-box?”
This is a question that many a data scientist struggle with in communicating with business. In all fairness, there are plenty of business situations which require the models to be transparent, such…Continue
Topological data analysis has been very successful in discovering information in many large and complex data sets. In this post, I would like to discuss the reasons why it is an effective methodology.
One of the key messages around topological data analysis is that data has shape and the shape matters.…
2014 has been a landmark year for Big Data. The most spectacular example of this was the Hortonworks IPO – a success by any measure.
As we look forward to 2015, it is clear that…