7 Python Tools All Data Scientists Should Know How to Use

This reference was first posted here on Galvanize by Dynelle Abeyta, and several authors contributed.

Here we provide a summary: 

  • IPython - IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, that offers enhanced introspection, rich media, additional shell syntax, tab completion, and rich history.
  • GraphLab Create - GraphLab Create is a Python library, backed by a C++ engine, for quickly building large-scale, high-performance data products.
  • Pandas - Combined with the excellent IPython toolkit and other libraries, the environment for doing data analysis in Python excels in performance, productivity, and the ability to collaborate. pandas does not implement significant modeling functionality outside of linear and panel regression; for this, look to statsmodels and scikit-learn. More work is still needed to make Python a first class statistical modeling environment, but we are well on our way toward that goal.
  • PuLP - Linear Programming. 
  • Matplotlib
  • Scikit-Learn - Scikit-Learn is a simple and efficient tool for data mining and data analysis.  It is built on NumPy,SciPy, and mathplotlib. Scikit-Learn has the following features: Classification, Regression, Clustering, Dimensionality Reduction, Model Selection, Preprocessing
  • Spark - For distributed programming

Check out other Python resources on DSC.

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 26117


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service