Apache Spark is generally known as a fast, general and open-source engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. It allows you to speed analytic applications up to 100 times faster compared to technologies on the market today. You can interface Spark with Python through "PySpark". This is the Spark Python API exposes the Spark programming model to Python. 

The cheat sheet below was produced by DataCamp. You can find the original version (PDF format) here. Zoom in on the picture below, by clicking on it. 

You can find many more cheat sheets, covering all data science topics, by clicking here

DSC Resources

Popular Articles

Views: 39230


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service