It's a complete tutorial on data wrangling or manipulation with R. This tutorial covers one of the most powerful R package for data wrangling i.e. dplyr. This package was written by the most popular R programmer Hadley Wickham who has written many useful R packages such as ggplot2, tidyr etc. It's one of the most popular R package as of date. This post includes several examples and tips of how to use dply package for cleaning and transforming data.
dplyr vs. Base R Functions
dplyr functions process faster than base R functions. It is because dplyr functions were written in a computationally efficient manner. They are also more stable in the syntax and better supports data frames than vectors.
dplyr Function | Description | Equivalent SQL |
---|---|---|
select() | Selecting columns (variables) | SELECT |
filter() | Filter (subset) rows. | WHERE |
group_by() | Group the data | GROUP BY |
summarise() | Summarise (or aggregate) data | - |
arrange() | Sort the data | ORDER BY |
join() | Joining data frames (tables) | JOIN |
mutate() | Creating New Variables | COLUMN ALIAS |
Example 3 : Selecting Variables (or Columns)
Suppose you are asked to select only a few variables. The code below selects variables "Index", columns from "State" to "Y2008".
Example 4 : Dropping Variables
The minus sign before a variable tells R to drop the variable.
The above code can also be written like :
mydata = select(mydata, -c(Index,State))For Original Article , click here
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central