Subscribe to DSC Newsletter

All Blog Posts Tagged 'Source' (14)

Apache Kafka + KSQL + TensorFlow for Data Scientists via Python + Jupyter Notebook

Why would a data scientist use Kafka Jupyter Python KSQL TensorFlow all together in a single notebook?

There is an impedance mismatch between model development using Python and its Machine Learning tool stack and a scalable, reliable data platform. The former is what you need for quick and easy prototyping to build analytic models. The latter is what you need to use for data ingestion, preprocessing, model deployment and monitoring at scale. It…

Continue

Added by Kai Waehner on January 22, 2019 at 10:00am — No Comments

Scalable IoT ML Platform with Apache Kafka + Deep Learning + MQTT

I built a scenario for a hybrid machine learning infrastructure leveraging Apache Kafka as scalable central nervous system. The public cloud is used for training analytic models at extreme scale (e.g. using TensorFlow and TPUs on Google Cloud Platform (GCP) via Google ML Engine. The predictions (i.e.…

Continue

Added by Kai Waehner on August 1, 2018 at 11:00pm — 1 Comment

Model Serving: Stream Processing vs. RPC / REST - A Deep Learning Example with TensorFlow and Kafka

Machine Learning / Deep Learning models can be used in different ways to do predictions. My preferred way is to deploy an analytic model directly into a stream processing application (like Kafka Streams or KSQL). You could e.g. use the …

Continue

Added by Kai Waehner on July 8, 2018 at 4:26pm — No Comments

Articulate - Open source platform for build conversational interfaces with intelligent agents

Articulate is an open source project that will allow you to take control of you conversational interfaces, without being worried where and how your data is stored. Also, Articulate is built with an user-centered design where the main goal is to make experts and beginners feel comfortable when building their intelligent agents.

The main features of Articulate are:

  • Open source project
  • Based on…
Continue

Added by Daniel Calvo-Marin on July 2, 2018 at 7:00pm — No Comments

Deep Learning Infrastructure for Extreme Scale with 
the Apache Kafka Open Source Ecosystem

I had a new talk presented at "Codemotion Amsterdam 2018" this week. I discussed the relation of Apache Kafka and Machine Learning to build a Machine Learning infrastructure for extreme scale.

Long version of the title:

"Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Source Ecosystem - How to Build a Machine Learning Infrastructure with Kafka, Connect, Streams, KSQL, etc."

As always, I want to share the slide deck. The talk was…

Continue

Added by Kai Waehner on May 8, 2018 at 9:30pm — No Comments

Open Source ETL: Apache NiFi vs Streamsets

After reviewing 8 great ETL tools for fast-growing startups, we got a request to tell you more about open source solutions.There are many open source ETL tools and frameworks, but most of them require writing code.…

Continue

Added by Luba Belokon on April 26, 2018 at 2:30am — No Comments

Analyzing Geographic Data with QGIS - Part 1

Today I'm writing this post to explain how it's possible to make geographic analysis and answer questions like: which is the richest area in my city? How many people do live in one neighborhood? 

You can do it combining shape files with an excel spreadsheet, let's understand it together...

First of all, we need to install one Geographic Information System (GIS), and I recommend QGIS - free and open source GIS

Then,…

Continue

Added by Thiago Buselato Maurício on February 11, 2018 at 9:30am — No Comments

Ten Favorite Open Data Libraries by Justin Tenuto

There are precious few things that everybody adores. Once you get past breakfast in bed and two dollar bills, the list starts to look a little barren. But if there's one thing we can agree on as a society it's this: free stuff is good and cool and you want some of it right now.

In the spirit of this immutable law, we've compiled a list of our ten favorite places to find open data. Here they are, in no particular order.…

Continue

Added by Leena Kamath on February 8, 2016 at 9:37am — No Comments

5 Reasons Apache Spark is So Awesome

Those who follow big data technology news probably know about Apache Spark, and how it’s popularly known as the Hadoop Swiss Army Knife. For those not so familiar, Spark is a cluster computing framework for data analytics designed to speed up and simplify common data-crunching and analytics tasks. Spark is certainly creating buzz in the big data world, but why? What’s so special about this…

Continue

Added by Ritesh Gujrati on January 8, 2016 at 2:30am — No Comments

Do you know what is bigger than Big Data?

From episode 10 of my Naked Analyst Channel on YouTube.

I think I do - and it is the ‘appification’ of analytics. What I mean by this is the reduction of a complex analytic activity such as market segmentation, down to a single button on your computer interface. Very much like the…

Continue

Added by Steve Bennett on October 6, 2014 at 2:07pm — 4 Comments

Hadoop Herd : When to use What...

8 years ago not even Doug Cutting would have thought that the tool which he's naming after the name of his kid's soft toy would so soon become a rage and change the way people and organizations look at their data. Today Hadoop and BigData have almost become synonyms to each other. But Hadoop is not just Hadoop now. Over the time it has evolved…

Continue

Added by Mohammad Tariq Iqbal on April 25, 2013 at 3:54pm — No Comments

S3 as Input or Output for Hadoop MR jobs

How to use s3 (s3 native) as input / output for hadoop MapReduce job. In this tutorial we will first try to understand what is s3, difference between s3 and s3n and how to set s3n as Input and output for hadoop map reduce job. Configuring s3n as I/O may be useful for local map reduce jobs (ie MR run on local cluster), But It has significant importance when we run elastic map reduce job (ie when we run job on cloud). When we run job on cloud we need to specify storage location for input as…

Continue

Added by Rahul Patodi on November 11, 2012 at 8:00am — No Comments

Hadoop:- A soft Introduction



What is Hadoop:

Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce. HDFS is a highly fault-tolerant distributed file system and like…
Continue

Added by Rahul Patodi on November 11, 2012 at 8:00am — No Comments

S3 instead of HDFS with Hadoop

In this article we will discuss about using S3 as replacement of HDFS (Hadoop Distributed File System) on AWS (Amazon Web Services), and also about what is the need of using S3. Before coming to original use-case and performance of S3 with Hadoop let’s understand …

Continue

Added by Rahul Patodi on June 27, 2012 at 9:07pm — No Comments

Monthly Archives

2019

2018

2017

2016

2015

2014

2013

2012

2011

1999

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service