Expand Machine Learning tools: Configure Jupyter/IPython notebook for PySpark 1.6.1

Data Analytics favorites include Apache Spark, which is becoming a reference standard for Big Data, as a “fast and general engine for large-scale data processing”. Its built-in PySpark interface can run as a Jupyter notebook, but recent posts didn’t quite allow me to do that perfectly with the latest Spark 1.6.1 version.

Here is an approach that worked for me on a Ubuntu 14.04.3 LTS desktop which runs the Hadoop 2.7.2 HDFS stack in an Oracle VirtualBox on my Windows 10 laptop. 

See details here.

Views: 1887


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service