All Blog Posts Tagged 'Spark' (24)

Spark Troubleshooting, Part 1 - Ten Challenges

“The most difficult thing is finding out why your job is failing, which parameters to change. Most of the time, it’s OOM errors…”…


Added by Sara Petrie on September 30, 2021 at 6:00am — No Comments

Running Peta-Scale Spark Jobs on Object Storage Using S3 Select

When one looks at the amazing roster of talks for most data science conferences what you don’t see is a lot of discussion on how to leverage object storage. On some level you would expect to — ultimately if you want to run your Spark or Presto job on peta-scale data sets and have it be available to your applications in the public or private cloud — this would be the logical storage architecture.

While logical, there has been a catch, at least historically, and that is object storage…


Added by Jonathan Symonds on June 25, 2019 at 9:00am — No Comments

Building machine learning models in Apache Spark using SCALA in 6 steps


When dealing with building machine learning models, Data scientists spend most of the time on 2 main tasks when building machine learning models

Pre-processing and Cleaning

The major portion of time goes in to collecting, understanding, and analysing, cleaning the data and then building features. All the above steps mentioned are very important and critical to build successful machine learning…


Added by Rohit Walimbe on April 21, 2019 at 9:00pm — 1 Comment

Beyond Datawarehouse - The Data Lake

Over the last few years, organizations have made a strategic decision to turn big data into competitive advantage. Owing to rapid changes in the trends of BI and DW space, Big Data has been driving the organizations to explore the    implementation aspects on how to integrate big data into the existing EDW infrastructure. The process of extracting data from multiple sources such as social media, weblogs, sensor data etc. and transforming that data suit the organization’s analytical needs is…


Added by Sirish M Simha on August 24, 2017 at 10:30pm — 1 Comment

Four great machine learning eBooks

Want to learn machine learning? Looking for data science tutorials and guides to help you master your data and produce actionable, game-changing insights?

Look no further than this list of machine learning eBooks from the Packt team....

1. …


Added by Richard Gall on July 21, 2017 at 6:00am — No Comments

Record linking with Apache Spark’s MLlib & GraphX

The challenge

Recently a colleague asked me to help her with a data problem, that seemed very straightforward at a glance. 

She had purchased a small set of data from the chamber of commerce (Kamer van Koophandel: KvK) that contained roughly 50k small sized companies (5–20FTE), which can be hard to find online.

She noticed that many of those companies share the same address,…


Added by Tom Lous on April 4, 2017 at 11:00pm — 5 Comments

Q&A: The Transformative Power of Big Data Paired with Data Science

Data analytics is a mature discipline at this point, and even those outside the data science world generally understand what it’s all about. Modern data science, however, is still new enough to spur questions. Vincent Glanville, Executive Data Scientist at Data Science Central, spoke with Roy Wilds, Chief Data Scientist from PHEMI, a Vancouver-based big data startup, about the best way to educate people…


Added by Roy Wilds, PhD, PHEMI Systems on December 6, 2016 at 8:00am — No Comments

Choosing the correct ML Solution for you...

               Enterprise applications trending to adopt Machine Learning as their strategic implementation and performing machine learning deep analytics across multiple problem statements is becoming a common trend. There are variety of machine learning solutions / packages / platform that exist in market. One of the main challenges that the teams initially trying to resolve is to choose the correct platform / package for their solution.

                Based on my limited…


Added by Aravindakumar Venugopalan on October 5, 2016 at 5:00pm — 1 Comment

Characteristics of Good Visual Analytics and Data Discovery Tools

Visual Analytics and Data Discovery allow analysis of big data sets to find insights and valuable information. This is much more than just classical Business Intelligence (BI). See this article for more details and motivation: "Using Visual Analytics to Make Better Decisions: the Death Pill Example". Let's take a look at important characteristics to choose the right tool for…


Added by Kai Waehner on July 27, 2016 at 10:00pm — No Comments

Hadoop VS Spark: Which is the best Data Analytics engine?

In the book Hadoop: The definitive guide, Tom white quotes Grace Hopper, “In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger computers, but for more systems of computers.” For long Hadoop has been the data analytics system preferred by businesses all over. The recent entry of the spark engine has however given businesses an option other than Hadoop for data analytics…


Added by Tanmay Bhandari on June 7, 2016 at 7:29pm — No Comments

Collaborative Business Intelligence Aspects

Collaborative business intelligence is an environment In which users can communicate and collaborate each other with ease, they are able to sharing information, ideas, and decision making in their communities.

Information retention

Each and every day, no one holds the millions of data items of intellectual property (telephone calls , conversations, and e-mails) in companies and organizations across the world. Using important collaborative software to…


Added by Priyanka Jain on May 19, 2016 at 9:00pm — No Comments

How to Architect a Big Data Application to Unleash Its Full Potential

For a world, that's churning out and recording infinite volumes of data every second, where dependency on data is steeply rising, the need to implement Big Data architecture becomes natural.

Big Data solutions can resolve specific big data problems and requirements for data analysis, curation, capturing, sharing, searching,…


Added by Ritesh Gujrati on May 5, 2016 at 3:30am — No Comments

Self-Learn Yourself Apache Spark in 21 Blogs – #5

In Blog 5, we will see Apache Spark Languages with basic Hands-on. Click to have quick read on the other blogs of Apache Spark in this learning series.

With our cloud setup of our Apache Spark now we are ready to develop big data Spark applications. And before getting started with building Spark applications let’s review the languages which can be used to develop Apache Spark applications. It has many APIs like Scala, Hive, R, Python, Java, and Pig.

Scala – It’s the language…


Added by Kumar Chinnakali on January 23, 2016 at 3:32am — No Comments

Self-Learn Yourself Apache Spark in 21 Blogs – #4

In Blog 4, we will see what are Apache Spark Core and its ecosystem and Apache Spark on AWS Cloud. Click to have quick read on blog 1-3 in this learning series.

Apache Spark has many components including Spark Core which is responsible for Task Scheduling, Memory Management, Fault Recovery, and Interacting with storage…


Added by Kumar Chinnakali on January 12, 2016 at 8:00am — No Comments

Self-Learn Yourself Apache Spark in 21 Blogs – #3

In this Blog 3 – We will see what is Apache Spark’s History and Unified Platform for Big Data, and like to have quick read on blog 1 and blog 2.

Spark was initially started by Matei at UC Berkeley AMPLab in 2009, and open sourced in 2010…


Added by Kumar Chinnakali on January 9, 2016 at 9:00pm — 1 Comment

Self-Learn Yourself Apache Spark in 21 Blogs – #2

By this blog we will share the titles for learning Apache Spark, Basics on Hadoop which is one of the big data tool, and motivations for Apache Spark which is not replacement of Apache Hadoop, but its friend of big data.

Blog 1 – Introduction to Big Data

Blog 2 – Hadoop, Spark’s Motivations

Blog 3 – Apache Spark’s History and Unified Platform for Big Data

Blog 4 – Apache Spark’s First Step – AWS, Apache Spark

Blog 5 – Apache Spark Languages with basic…


Added by Kumar Chinnakali on January 8, 2016 at 9:00pm — No Comments

5 Reasons Apache Spark is So Awesome

Those who follow big data technology news probably know about Apache Spark, and how it’s popularly known as the Hadoop Swiss Army Knife. For those not so familiar, Spark is a cluster computing framework for data analytics designed to speed up and simplify common data-crunching and analytics tasks. Spark is certainly creating buzz in the big data world, but why? What’s so special about this…


Added by Ritesh Gujrati on January 8, 2016 at 2:30am — No Comments

Self-Learn Yourself Apache Spark in 21 Blogs - #1

We have received many requests from friends who are constantly reading our blogs to provide them a complete guide to sparkle in Apache Spark. So here we have come up with learning initiative called “Self-Learn Yourself Apache Spark in 21 Blogs".

We have drilled down various sources and archives to provide a perfect learning path for you to understand and excel in Apache Spark. These 21 blogs which will be written over a course of time will be a complete guide for you to understand and…


Added by Kumar Chinnakali on December 30, 2015 at 3:00am — No Comments

How Uber Uses Spark

Added by Bradley Wogsland on October 25, 2015 at 8:00am — No Comments

Machine Learning at Scale with Spark

In my last post, I covered setting up the basic tools to start doing machine learning (Python, NumPy, Matplotlib and Scikit-Learn).  Now, you are probably wondering how to do this on a very large scale, involving terabytes (may be even petabytes) of data and across several server nodes.  

The best answer is Apache …


Added by Somnath Banerjee on July 9, 2015 at 8:30am — 4 Comments

Monthly Archives













© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service