This article was written by Manish Saraswat.
Source for picture: click here
This article is meant to help R users enhance their set of skills and learn Python for data science (from scratch). After all, R and Python are the most important programming languages a data scientist must know.
Python is a supremely powerful and a multi-purpose programming language. It has grown phenomenally in the last few years. It is used for web development, game development, and now data analysis / machine learning. Data analysis and machine learning is a relatively new branch in python.
For a beginner in data science, learning python for data analysis can be really painful. Why ? You try Googling "learn python," and you'll get tons of tutorials only meant for learning python for web development. How can you find a way then ?
In this tutorial, we'll be exploring the basics of python for performing data manipulation tasks. Alongside, we'll also look how you do it in R. This parallel comparison will help you relate the set of tasks you do in R to how you do it in python! And in the end, we'll take up a data set and practice our newly acquired python skills.
Note: This article is best suited for people who have a basic knowledge of R language.
Table of Contents:
1. Why learn Python (even if you already know R)
No doubt, R is tremendously great at what it does. In fact, it was originally designed for doing statistical computing and manipulations. Its incredible community support allows a beginner to learn R quickly.
But, python is catching up fast. Established companies and startups have embraced python at a much larger scale compared to R.
According to indeed.com (from Jan 2016 to November 2016), the number of job postings seeking "machine learning python" increased much faster (approx. 123%) than "machine learning in R" jobs. Do you know why ? It is because
2. Understanding Data Types and Structures in Python vs. R:
These programming languages understand the complexity of a data set based on its variables and data types. Yes! Let's say you have a data set with one million rows and 50 columns. How would these programming languages understand the data ?
Basically, both R and Python have pre-defined data types. The dependent and independent variables get classified among these data types. And, based on the data type, the interpreter allots memory for use. Python supports the following data types:
integer
type.bit64
package to read hexadecimal values.numeric
type.factor
type or a character
type. There exists a tiny difference between Boolean values in R and python. In R, Boolean are stored as TRUE and FALSE. In python, they are stored as True and False. There's a difference in the letter case.character
type.Since R is a statistical computing language, all the functions to manipulate data and reading variables are available inherently. On the other hand, python hails all the data analysis / manipulation / visualization functions from external libraries. Python has several libraries for data manipulation and machine learning. The most important ones are:
In a way, python for a data scientist is largely about mastering the libraries stated above. However, there are many more advanced libraries which people have started using. Therefore, for practical purposes you should remember the following things:
list
. It can be multidimensional. It can contain data of the same or multiple classes. In case of multiple classes, the coercion effect takes place.data.frame
and python uses the Dataframe
function from the pandas library.matrix
function. In python, we use the numpy.column_stack
function.Until here, I hope you've understood the basics of data types and data structures in R and Python. Now, let's start working with them!
To read the full original article (and to learn writing code in Python vs. R and practice Python on a data set) click here. For more Python vs. R related articles on DSC click here.
DSC Resources
Popular Articles
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central