Subscribe to DSC Newsletter

Expand Machine Learning tools: Configure Jupyter/IPython notebook for PySpark 1.6.1

Data Analytics favorites include Apache Spark, which is becoming a reference standard for Big Data, as a “fast and general engine for large-scale data processing”. Its built-in PySpark interface can run as a Jupyter notebook, but recent posts didn’t quite allow me to do that perfectly with the latest Spark 1.6.1 version.

Here is an approach that worked for me on a Ubuntu 14.04.3 LTS desktop which runs the Hadoop 2.7.2 HDFS stack in an Oracle VirtualBox on my Windows 10 laptop. 

See details here.

Views: 1763

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service