By
- Priya Sharma – Sr. Data Scientist -IoT Analytics, SAS Institute Inc.
- Saurabh Mishra – Product Management, IoT, SAS Institute Inc.
June 12, 2020
Description: Majority of AI approaches are based on the construct of training against historical data and then inferencing new data. While this is a sound and proven approach, a lot of IoT assets coming online don’t have historical data and we don’t necessarily have the time to wait.
Modern Machine…
Continue
Added by Jane Howell on June 12, 2020 at 2:30pm —
No Comments
Why would a data scientist use Kafka Jupyter Python KSQL TensorFlow all together in a single notebook?
There is an impedance mismatch between model development using Python and its Machine Learning tool stack and a scalable, reliable data platform. The former is what you need for quick and easy prototyping to build analytic models. The latter is what you need to use for data ingestion, preprocessing, model deployment and monitoring at scale. It…
Continue
Added by Kai Waehner on January 22, 2019 at 10:00am —
No Comments
I built a scenario for a hybrid machine learning infrastructure leveraging Apache Kafka as scalable central nervous system. The public cloud is used for training analytic models at extreme scale (e.g. using TensorFlow and TPUs on Google Cloud Platform (GCP) via Google ML Engine. The predictions (i.e.…
Continue
Added by Kai Waehner on August 1, 2018 at 11:00pm —
1 Comment
Machine Learning / Deep Learning models can be used in different ways to do predictions. My preferred way is to deploy an analytic model directly into a stream processing application (like Kafka Streams or KSQL). You could e.g. use the …
Continue
Added by Kai Waehner on July 8, 2018 at 4:26pm —
No Comments
Articulate is an open source project that will allow you to take control of you conversational interfaces, without being worried where and how your data is stored. Also, Articulate is built with an user-centered design where the main goal is to make experts and beginners feel comfortable when building their intelligent agents.
The main features of Articulate are:
- Open source project
- Based on…
Continue
Added by Daniel Calvo-Marin on July 2, 2018 at 7:00pm —
No Comments
I had a new talk presented at "Codemotion Amsterdam 2018" this week. I discussed the relation of Apache Kafka and Machine Learning to build a Machine Learning infrastructure for extreme scale.
Long version of the title:
"Deep Learning at Extreme Scale (in the Cloud)
with the Apache Kafka Open Source Ecosystem - How to Build a Machine Learning Infrastructure with Kafka, Connect, Streams, KSQL, etc."
As always, I want to share the slide deck. The talk was…
Continue
Added by Kai Waehner on May 8, 2018 at 9:30pm —
No Comments

After reviewing 8 great ETL tools for fast-growing startups, we got a request to tell you more about open source solutions.There are many open source ETL tools and frameworks, but most of them require writing code.…
Continue
Added by Luba Belokon on April 26, 2018 at 2:30am —
No Comments
Today I'm writing this post to explain how it's possible to make geographic analysis and answer questions like: which is the richest area in my city? How many people do live in one neighborhood?
You can do it combining shape files with an excel spreadsheet, let's understand it together...
First of all, we need to install one Geographic Information System (GIS), and I recommend QGIS - free and open source GIS
Then,…
Continue
Added by Thiago Buselato Maurício on February 11, 2018 at 9:30am —
No Comments
There are precious few things that everybody adores. Once you get past breakfast in bed and two dollar bills, the list starts to look a little barren. But if there's one thing we can agree on as a society it's this: free stuff is good and cool and you want some of it right now.
In the spirit of this immutable law, we've compiled a list of our ten favorite places to find open data. Here they are, in no particular order.…
Continue
Added by Leena Kamath on February 8, 2016 at 9:37am —
No Comments
Those who follow big data technology news probably know about Apache Spark, and how it’s popularly known as the Hadoop Swiss Army Knife. For those not so familiar, Spark is a cluster computing framework for data analytics designed to speed up and simplify common data-crunching and analytics tasks. Spark is certainly creating buzz in the big data world, but why? What’s so special about this…
Continue
Added by Ritesh Gujrati on January 8, 2016 at 2:30am —
No Comments
From episode 10 of my Naked Analyst Channel on YouTube.
I think I do - and it is the ‘appification’ of analytics. What I mean by this is the reduction of a complex analytic activity such as market segmentation, down to a single button on your computer interface. Very much like the…
Continue
Added by Steve Bennett on October 6, 2014 at 2:07pm —
4 Comments

8 years ago not even Doug Cutting would have thought that the tool which he's naming after the name of his kid's soft toy would so soon become a rage and change the way people and organizations look at their data. Today Hadoop and BigData have almost become synonyms to each other. But Hadoop is not just Hadoop now. Over the time it has evolved…
Continue
Added by Mohammad Tariq Iqbal on April 25, 2013 at 3:54pm —
No Comments
How to use s3 (s3 native) as input / output for hadoop MapReduce job. In this tutorial we will first try to understand what is s3, difference between s3 and s3n and how to set s3n as Input and output for hadoop map reduce job. Configuring s3n as I/O may be useful for local map reduce jobs (ie MR run on local cluster), But It has significant importance when we run elastic map reduce job (ie when we run job on cloud). When we run job on cloud we need to specify storage location for input as…
Continue
Added by Rahul Patodi on November 11, 2012 at 8:00am —
No Comments

What is Hadoop: Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce. HDFS is a highly fault-tolerant distributed file system and like…
Continue
Added by Rahul Patodi on November 11, 2012 at 8:00am —
No Comments
In this article we will discuss about using S3 as replacement of HDFS (Hadoop Distributed File System) on AWS (Amazon Web Services), and also about what is the need of using S3. Before coming to original use-case and performance of S3 with Hadoop let’s understand …
Continue
Added by Rahul Patodi on June 27, 2012 at 9:07pm —
No Comments