Gest blog post.
Vozag downloaded CRAN data from the R project to understand the top projects & which ones had the most discussions. Given below is a list of the top 20 packages downloaded in a single day. The full list of the top 100 most downloaded R packages is here.
Rank |
Package |
No. of Downloads |
1 |
Rcpp |
1960 |
2 |
ggplot2 |
1785 |
3 |
digest |
1709 |
4 |
reshape2 |
1651 |
5 |
plyr |
1634 |
6 |
rJava |
1577 |
7 |
stringr |
1549 |
8 |
RColorBrewer |
1497 |
9 |
colorspace |
1372 |
10 |
manipulate |
1363 |
11 |
scales |
1347 |
12 |
labeling |
1320 |
13 |
proto |
1301 |
14 |
munsell |
1291 |
15 |
gtable |
1290 |
16 |
dichromat |
1289 |
17 |
RCurl |
1144 |
18 |
zoo |
1085 |
19 |
mime |
1038 |
20 |
RcppEigen |
1033 |
We also decided to then analyze Stack Overflow data to understand most discussed packages and analyze the one with the most questions & unanswered questions.
GGPlot was ranked first with the most questions at ~7200 questions followed by Data table (2135), Plyr (1213) & Knitr (1136). Other packages had less than 1000 questions each. We also looked at the unanswered questions. The packages with the highest percentage of unanswered packages were Knitr, Lattice and iGraph at 24.2%, 23.5% and 19.8% respectively.
Comparing the top downloaded packages with the most discussed packages shows little correlations between them. For Instance, GGplot has the most questions asked & is the second highest downloaded package but “Data.Table” package (the second highest ranked R package for questions asked) is not even in the top 100 packages downloaded. Knitr is another example which is in the top 5 questions asked, but is 27th ranked in downloaded packages.
So- does the R community need to focus on packages that have the highest questions to resolve their issues rather than the ones with the most downloads?
===
Guest Post by Vozag
Comment
Part of the reason could be that the Data.Table and Knitr packages are used in some popular MOOCs (I know that they are both used in the Coursera Data Science MOOC, for instance) and that this results in a disproportionately high number of questions related to these packages from relative beginners in R.
Posted 12 April 2021
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central