Subscribe to DSC Newsletter

All Blog Posts Tagged 'spark' (11)

Speedup your Machine Learning applications without changing your code

Emerging cloud applications like machine learning, AI and big data analytics require high performance computing systems that can sustain the increased amount of data processing without consuming excessive power. Towards this end, many cloud operators have started adopting heterogeneous infrastructures deploying hardware accelerators, like FPGAs, to increase the performance of computational intensive tasks. However, most hardware accelerators lack…

Continue

Added by Chris Kachris on November 6, 2018 at 7:00am — 2 Comments

Why choose Apache Spark over Hadoop for your Big Data project?

When the first release of Spark became available in 2014, Hadoop had already enjoyed several years of growth since 2009 onwards in the commercial space. Although Hadoop solved a major hurdle in analyzing large terabyte-scale datasets efficiently, using distributed computing methods that were broadly accessible, it still had shortfalls that hindered its wider acceptance.



Limitations of Hadoop



A few of the common…

Continue

Added by Packt Publishing on May 3, 2018 at 1:30am — No Comments

What’s Better to Learn First: Spark Vs Hadoop?

Spark VS Hadoop

Spark and Hadoop are two different frameworks, which have similarities and differences. Also, both of them have their unique pros and cons. So, which one is better; Spark or Handoop? There is no exact answer, because, these platforms are different for comparison, and everyone may find some new and useful features in both of them. So let’s start from history of developing of these two.

Spark and Hadoop are…
Continue

Added by Azharuddin on February 14, 2018 at 10:30pm — No Comments

Data Cleansing with Apache Spark and Optimus

Outdated, inaccurate, or duplicated data won’t drive optimal data driven solutions. When data is inaccurate, leads are harder to track and nurture, and insights may be flawed. The data on which you base your big data strategy must be accurate, up-to-date, as complete as possible, and should not contain duplicate entries. Clean data results in…

Continue

Added by Favio Vázquez on August 18, 2017 at 8:00am — No Comments

IoT 101 – Lesson 3: Everything You Need to Know to Start Your IoT Project

Summary:  In this Lesson 3 we continue to provide a complete foundation and broad understanding of the technical issues surrounding an IoT or streaming system so that the reader can make intelligent decisions and ask informed questions when planning their IoT system. 

In Lesson 1

In Lesson 2…

Continue

Added by William Vorhies on July 5, 2016 at 7:33am — 1 Comment

IoT 101 – Lesson 2: Everything You Need to Know to Start Your IoT Project

Summary:  In this Lesson 2 we continue to provide a complete foundation and broad understanding of the technical issues surrounding an IoT or streaming system so that the reader can make intelligent decisions and ask informed questions when planning their IoT system. 

In Lesson 1

In This Article…

Continue

Added by William Vorhies on June 27, 2016 at 8:10am — No Comments

The Business Translator - The Missing Link to Make Data Science Projects Stick

Why aren’t models and insights generated by many Data Science projects an instant hit with companies looking for data driven growth? They miss the Business Translator, an important role that nobody is currently recruiting for. Read on to learn about my proposal on how to make Data Science projects stick at your company and build an enduring business…

Continue

Added by MARIUS MARCU on September 3, 2015 at 8:00am — No Comments

Welcome to Sparkling Land

Note: Opinions expressed are solely my own and do not express the views or opinions of my employer.

As a data scientist who has been munging data and building machine learning models in tools like R, Python and other software(s) (open source and proprietary), I had always longed for a world without technical limitations. A world which would allow me to create data structures (data scientists usually call them vectors, matrices or dataframes) of virtually any…

Continue

Added by Fawad Alam on May 18, 2015 at 8:30am — No Comments

Is Spark The Data Platform Of The Future?

Hadoop has been the foundation for data programmes since Big Data hit the big time. It has been the launching point for data programmes for almost every company who is serious about their data offerings.

However, as we predicted we are seeing that the rise in in-memory databases has seen the need for companies to adopt frameworks that harness this power effectively.

It was therefore…

Continue

Added by Chris Towers on March 13, 2015 at 3:42am — 4 Comments

Get started with Hadoop and Spark in 10 minutes

With the big 3 Hadoop vendors – Cloudera, …

Continue

Added by Maloy Manna on January 1, 2015 at 1:46pm — 1 Comment

Apache Spark: distributed data processing faster than Hadoop

This blog is extrapolated from DataScience Hacks by the author himself. 

Apache Spark, another apache licensed top-level project that could perform large scale data processing way faster than Hadoop (I am referring to MR1.0 here). It is possible due to Resilient Distributed Datasets concept that is behind this fast data processing. RDD is basically a collection of objects,…

Continue

Added by Pavan Kumar on September 28, 2014 at 7:00am — 1 Comment

Monthly Archives

2019

2018

2017

2016

2015

2014

2013

2012

2011

1999

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service