I built a scenario for a hybrid machine learning infrastructure leveraging Apache Kafka as scalable central nervous system. The public cloud is used for training analytic models at extreme scale (e.g. using TensorFlow and TPUs on Google Cloud Platform (GCP) via Google ML Engine. The predictions (i.e.…Continue
Machine Learning / Deep Learning models can be used in different ways to do predictions. My preferred way is to deploy an analytic model directly into a stream processing application (like Kafka Streams or KSQL). You could e.g. use the …Continue
Added by Kai Waehner on July 8, 2018 at 4:26pm — No Comments
I had a new talk presented at "Codemotion Amsterdam 2018" this week. I discussed the relation of Apache Kafka and Machine Learning to build a Machine Learning infrastructure for extreme scale.
Long version of the title:
"Deep Learning at Extreme Scale (in the Cloud) with the Apache Kafka Open Source Ecosystem - How to Build a Machine Learning Infrastructure with Kafka, Connect, Streams, KSQL, etc."
As always, I want to share the slide deck. The talk was…Continue
Added by Kai Waehner on May 8, 2018 at 9:30pm — No Comments
R shiny app is an interactive web interface. R shiny app has two components user interface object (UI.R) and server function (Server .R). The two components are passed as arguments to the shiny app function that creates a shiny app object. For more info on how to build Shiny…Continue
In 2006, marketing commentator Michael Palmer had blogged, “Data is just like crude. It’s valuable, but if unrefined it cannot really be used.”
After nine years, the statement still holds true across any industry that depends on large volumes of data. It is true that until and unless, data is not broken down into pieces and analyzed, it holds little value.
As the world becomes more receptive to the advantages of big data, the oil industry does not seem to be far…Continue
Added by Deena Zaidi on October 4, 2017 at 7:00pm — No Comments
Enterprise applications trending to adopt Machine Learning as their strategic implementation and performing machine learning deep analytics across multiple problem statements is becoming a common trend. There are variety of machine learning solutions / packages / platform that exist in market. One of the main challenges that the teams initially trying to resolve is to choose the correct platform / package for their solution.
Based on my limited…Continue
During the last few years, the hottest word on everyone’s lip has been “productivity.” In the rapidly evolving Internet world, getting something done fast always gets an upvote. Despite needing to implement real business logic quickly and accurately, as an experienced PHP developer I still spent hundreds of hours on other tasks, such as setting up database or caches, deploying projects, monitoring online statistics, and so on. Many developers have struggled with these so called miscellaneous…Continue
Added by Irina Papuc on March 24, 2016 at 9:00am — No Comments
From querying your data and visualizing it all in one place, to documenting your work and building interactive charts and dashboards, to running machine learning algorithms on top of your data and sharing the results with your team, there are very few limits to what one can do with the Jupyter + Redshift stack. However, setting everything up and resolving all the package dependencies can be a painful experience.
In this blog post I will walk…Continue
Added by Yevgeniy Slutskiy Meyer on February 12, 2016 at 1:30am — No Comments
Previously, we discussed the role of Amazon Redshift's sort keys and compared how both compound and interleaved keys work in theory. Throughout that post we used some dummy data and a set of Postgres queries in order to explore the…Continue
Added by sasha blumenfeld on August 28, 2015 at 7:20am — No Comments
How to use s3 (s3 native) as input / output for hadoop MapReduce job. In this tutorial we will first try to understand what is s3, difference between s3 and s3n and how to set s3n as Input and output for hadoop map reduce job. Configuring s3n as I/O may be useful for local map reduce jobs (ie MR run on local cluster), But It has significant importance when we run elastic map reduce job (ie when we run job on cloud). When we run job on cloud we need to specify storage location for input as…Continue
Added by Rahul Patodi on November 11, 2012 at 8:00am — No Comments
In this article we will discuss about using S3 as replacement of HDFS (Hadoop Distributed File System) on AWS (Amazon Web Services), and also about what is the need of using S3. Before coming to original use-case and performance of S3 with Hadoop let’s understand …Continue
Added by Rahul Patodi on June 27, 2012 at 9:07pm — No Comments