Subscribe to DSC Newsletter

All Blog Posts Tagged 'Open' (20)

Apache Kafka + KSQL + TensorFlow for Data Scientists via Python + Jupyter Notebook

Why would a data scientist use Kafka Jupyter Python KSQL TensorFlow all together in a single notebook?

There is an impedance mismatch between model development using Python and its Machine Learning tool stack and a scalable, reliable data platform. The former is what you need for quick and easy prototyping to build analytic models. The latter is what you need to use for data ingestion, preprocessing, model deployment and monitoring at scale. It…

Continue

Added by Kai Waehner on January 22, 2019 at 10:00am — No Comments

Scalable IoT ML Platform with Apache Kafka + Deep Learning + MQTT

I built a scenario for a hybrid machine learning infrastructure leveraging Apache Kafka as scalable central nervous system. The public cloud is used for training analytic models at extreme scale (e.g. using TensorFlow and TPUs on Google Cloud Platform (GCP) via Google ML Engine. The predictions (i.e.…

Continue

Added by Kai Waehner on August 1, 2018 at 11:00pm — 1 Comment

Model Serving: Stream Processing vs. RPC / REST - A Deep Learning Example with TensorFlow and Kafka

Machine Learning / Deep Learning models can be used in different ways to do predictions. My preferred way is to deploy an analytic model directly into a stream processing application (like Kafka Streams or KSQL). You could e.g. use the …

Continue

Added by Kai Waehner on July 8, 2018 at 4:26pm — No Comments

Articulate - Open source platform for build conversational interfaces with intelligent agents

Articulate is an open source project that will allow you to take control of you conversational interfaces, without being worried where and how your data is stored. Also, Articulate is built with an user-centered design where the main goal is to make experts and beginners feel comfortable when building their intelligent agents.

The main features of Articulate are:

  • Open source project
  • Based on…
Continue

Added by Daniel Calvo-Marin on July 2, 2018 at 7:00pm — No Comments

Deep Learning Infrastructure for Extreme Scale with 
the Apache Kafka Open Source Ecosystem

I had a new talk presented at "Codemotion Amsterdam 2018" this week. I discussed the relation of Apache Kafka and Machine Learning to build a Machine Learning infrastructure for extreme scale.

Long version of the title:

"Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Source Ecosystem - How to Build a Machine Learning Infrastructure with Kafka, Connect, Streams, KSQL, etc."

As always, I want to share the slide deck. The talk was…

Continue

Added by Kai Waehner on May 8, 2018 at 9:30pm — No Comments

Open Source ETL: Apache NiFi vs Streamsets

After reviewing 8 great ETL tools for fast-growing startups, we got a request to tell you more about open source solutions.There are many open source ETL tools and frameworks, but most of them require writing code.…

Continue

Added by Luba Belokon on April 26, 2018 at 2:30am — No Comments

Cluster.OBeu v1.2.1 release on CRAN

Cluster.OBeu v1.2.1 release on CRAN

We are very pleased to announce Cluster.OBeu v1.2.1 on CRAN!

Cluster.OBeu is used on OpenBudgets.eu data mininig tool platform with OpenCPU integration of R and JavaScript to estimate and return the necessary parameters for cluster…

Continue

Added by Kleanthis Koupidis on March 12, 2018 at 4:00am — No Comments

Analyzing Geographic Data with QGIS - Part 1

Today I'm writing this post to explain how it's possible to make geographic analysis and answer questions like: which is the richest area in my city? How many people do live in one neighborhood? 

You can do it combining shape files with an excel spreadsheet, let's understand it together...

First of all, we need to install one Geographic Information System (GIS), and I recommend QGIS - free and open source GIS

Then,…

Continue

Added by Thiago Buselato Maurício on February 11, 2018 at 9:30am — No Comments

10 Tools for Data Visualizing and Analysis for Business

Digging through messy data and doing numerous calculations just so you can submit a report or arrive at the result of your quarterly business development can sometimes be nigh impossible. After all, we are only human, and by the time we get to the other side of our spreadsheet equation, we have lost all sense of what we were trying to accomplish.

Luckily, there…

Continue

Added by Dante Munnis on September 23, 2016 at 3:00am — 1 Comment

Top 20 Open Data sources

Data is everywhere, created and used by just about anyone. The days when companies or individuals had to pay significant sums of money to access useful and interesting datasets is long gone. Here is our top 20 list of the best free data sources available online.

 

1. Data.gov.uk the UK government’s open data portal including the British National Bibliography – metadata on all UK books and publications since 1950.



2. …

Continue

Added by Zygimantas Jacikevicius on February 24, 2016 at 3:00am — 3 Comments

Relax: Automation isn't coming for your job

Relax: Automation isn't coming for your job

By Justin Tenuto

For the past few years, the drumbeat of think pieces about automation taking your job–yes,your job–has gotten both louder and more incessant. Smart people like the folks at …

Continue

Added by Leena Kamath on February 8, 2016 at 9:41am — 1 Comment

Ten Favorite Open Data Libraries by Justin Tenuto

There are precious few things that everybody adores. Once you get past breakfast in bed and two dollar bills, the list starts to look a little barren. But if there's one thing we can agree on as a society it's this: free stuff is good and cool and you want some of it right now.

In the spirit of this immutable law, we've compiled a list of our ten favorite places to find open data. Here they are, in no particular order.…

Continue

Added by Leena Kamath on February 8, 2016 at 9:37am — No Comments

Open Data in Government

Open data has several definitions but our preferred one at Data To Value is from the Open Data Institute‘Open data is data that anyone can access, use and share.’ Simple really but there is a follow-on – ‘For data to be considered ‘open’, it must be published in an accessible…

Continue

Added by Zygimantas Jacikevicius on February 3, 2016 at 6:30am — No Comments

5 Reasons Apache Spark is So Awesome

Those who follow big data technology news probably know about Apache Spark, and how it’s popularly known as the Hadoop Swiss Army Knife. For those not so familiar, Spark is a cluster computing framework for data analytics designed to speed up and simplify common data-crunching and analytics tasks. Spark is certainly creating buzz in the big data world, but why? What’s so special about this…

Continue

Added by Ritesh Gujrati on January 8, 2016 at 2:30am — No Comments

A Database of Police Killings Since 2013 by Justin Tenuto

A while back, we found an interesting dataset online. The URL, killedbypolice.net, is fairly self-explanatory. It's a community-sourced list of all "police-involved fatalities", started in May of 2013, but the data itself was a bit jumbled and messy. Race and gender identifiers were in the same column, dates were inconsistent, and though most entries had a news story, we felt there was more information we wanted to know. We put the dataset…

Continue

Added by Leena Kamath on June 19, 2015 at 8:30am — No Comments

Do you know what is bigger than Big Data?

From episode 10 of my Naked Analyst Channel on YouTube.

I think I do - and it is the ‘appification’ of analytics. What I mean by this is the reduction of a complex analytic activity such as market segmentation, down to a single button on your computer interface. Very much like the…

Continue

Added by Steve Bennett on October 6, 2014 at 2:07pm — 4 Comments

Hadoop Herd : When to use What...

8 years ago not even Doug Cutting would have thought that the tool which he's naming after the name of his kid's soft toy would so soon become a rage and change the way people and organizations look at their data. Today Hadoop and BigData have almost become synonyms to each other. But Hadoop is not just Hadoop now. Over the time it has evolved…

Continue

Added by Mohammad Tariq Iqbal on April 25, 2013 at 3:54pm — No Comments

S3 as Input or Output for Hadoop MR jobs

How to use s3 (s3 native) as input / output for hadoop MapReduce job. In this tutorial we will first try to understand what is s3, difference between s3 and s3n and how to set s3n as Input and output for hadoop map reduce job. Configuring s3n as I/O may be useful for local map reduce jobs (ie MR run on local cluster), But It has significant importance when we run elastic map reduce job (ie when we run job on cloud). When we run job on cloud we need to specify storage location for input as…

Continue

Added by Rahul Patodi on November 11, 2012 at 8:00am — No Comments

Hadoop:- A soft Introduction



What is Hadoop:

Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce. HDFS is a highly fault-tolerant distributed file system and like…
Continue

Added by Rahul Patodi on November 11, 2012 at 8:00am — No Comments

S3 instead of HDFS with Hadoop

In this article we will discuss about using S3 as replacement of HDFS (Hadoop Distributed File System) on AWS (Amazon Web Services), and also about what is the need of using S3. Before coming to original use-case and performance of S3 with Hadoop let’s understand …

Continue

Added by Rahul Patodi on June 27, 2012 at 9:07pm — No Comments

Monthly Archives

2019

2018

2017

2016

2015

2014

2013

2012

2011

1999

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service