Last week Microsoft has announced that Apache Spark on Azure HDInsight (Microsoft’s managed Hadoop and Spark cloud service) is now generally available. I spoke to Tampa Bay Data Science Group last night regarding Apache Spark on Azure HDInsight and the associated offerings.
Spark for Azure HDInsight offers customers an enterprise-ready Spark solution that’s fully managed, secured, and highly available and made simpler for users with compelling and interactive experiences.
The slides from my presentation along with references to codebase and links are available as follows.
Apache Spark is an open source processing framework that runs large-scale data analytics applications. Built on an in-memory compute engine, Spark enables high performance querying on big data. It leverages a parallel data processing framework that persists data in-memory and disk if needed. This allows Spark to deliver 100x faster speed and a common execution model to various tasks like extract, transform, load (ETL), batch, interactive queries, and others on data in a Hadoop Distributed File System (HDFS). Azure makes Apache Spark easy and cost effective to deploy with no hardware to buy, no software to configure, a full notebook experience to author compelling narratives, and integration with partner business intelligence tools.