This blog is a part of the learn machine learning coding basics in a weekend . We recommend the book Python Data Science Handbook by Jake VanderPlas

There are four main libraries in Python that you need to know: numpy, pandas, mathplotlib and sklearn

The Python built-in list type does not allow for efficient array manipulation. The NumPy package is concerned with manipulation of multi-dimensional arrays. NumPy is at the foundation of almost all the other packages covering the Data Science aspects of Python. From a Data Science perspective, collections of Data types like Documents, Images, Sound etc can be represented as an array of numbers. Hence, the first step in analysing data is to transform data into an array of numbers. NumPy functions are used for transformation and manipulation of data as numbers – especially before the model building stage – but also in the overall process of data science.

The Pandas library in Python provides two data structures: The **DataFrame and the Series** object. The Pandas Series Object is a one-dimensional array of indexed data which can be created from a list or array. The Pandas DataFrames objects are essentially multidimensional arrays with attached row and column labels. A DataFrame is roughly equivalent to a ‘Table’ in SQL or a spreadsheet. Through the Pandas library, Python implements a number of powerful data operations similar to database frameworks and spreadsheets. While the NumPy’s ndarray data structure provides features for numerical computing tasks, it does not provide flexibility that we see in Tale structures (such as attaching labels to data, working with missing data, etc.). The Pandas library thus provides features for data manipulation tasks.

The Matplotlib library is used for data visualization in Python built on numpy. Matplotlib works with multiple operating systems and graphics backends.

The Scikit-Learn package provides efficient implementations of a number of common machine learning algorithms. It also includes modules for cross validation, grid search and feature engineering

For more details on how to apply these ideas see

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central