This article presents various ways of measuring the popularity or market share of software for analytics including: Alpine, Alteryx, Angoss, C / C++ / C#, BMDP, FICO, IBM SPSS Statistics, IBM SPSS Modeler, InfoCentricity Xeno, Java, JMP, KNIME, Lavastorm, Mathworks’ MATLAB, Megaputer’s PolyAnalyst, Minitab, NCSS, Python, R, RapidMiner, SAS, SAS Enterprise Miner, Salford Predictive Modeler (SPM) etc., SAP KXEN, TIBCO Spotfire, Stata, Statistica, Systat, WEKA / Pentaho.
Figure 1: Number of R packages in each new release - chart from Robert's article
Personal comments
I believe that adding new methods in statistical packages, to the point that each package now offers hundreds of functions (dozens of regressions, dozens of classifiers, dozens of time series methods and so on), is a bad idea. Most of these functions are never used. It only confuses the high-level user, and makes these packages not suitable for automated or black-box data science by non-statisticians (engineers, economists). If you really need that level of sophistication and fine-tuning, you are better off writing your own code in Perl, Python, or R or some other programming language.
Dr Granville is currently working on a new approach to statistical software development. It consists of producing very few, global methods with few parameters (one method per core problem, e.g. one generic clustering technique, one generic regression technique etc.) with focus on automation (algorithms run in batch mode and/or automatically scheduled), streaming data, black-box data processing by non-statisticians, and ability to process large data while avoiding the curse of big data at the same time. These methods are designed for robustness, simplicity and scalability, with minimum accuracy loss over traditional methods, and can be integrated as modules in existing production-mode machine learning applications, large and small. The new methods, initially designed in Data Science Central's research lab, are in the process of being made easy to implement, with code and explanations provided. It started with model-free confidence intervals 2-3 weeks ago, including hypothesis testing. This week it will be about predictive power (a metric for feature selection), followed in september by Hidden Decision Trees blended with Jackknife regression. The results will be presented in an upcoming book, Automated Data Science.
For another article about software comparison, click here.
The article
This is a very long article written (I believe in 2013) by Robert Muenchen. Read the full version, with all the reports and charts. The following metrcis are used for software comparison, in Robert's article.
Comment
Very very good article. Thank you very much. My favorite, for very longtime is R for so much reason and applications and evolutions...Have a very nice day.
Hola Vincent Granville, a mi parecer lo mas importante de cada software es la el diseño y modelo estadístico que cada persona desarrolla para la necesidad que se tiene y son pocas laspersonas que lo realizan..... estos softwares no son para todos.... y lo que si creo es que en éste tiempo lo más importante es que quien realice los diseños y modelos con más rapidez cumpliendo lo más posible con todos los supuestos en las funciones de cada software podrá tener mejores resultados..... saludos.
Hi Vincent Granville, I think the most important thing is the software of each design and the statistical model that each person develops the need to have and there are few who do it ..... laspersonas these softwares are not for everyone .. and what if .. I think this time is that the most important thing is that whoever made the designs and models more quickly fulfilling as possible with all the assumptions on the functions of each software may have better results ..... greetings .
© 2019 Data Science Central ®
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central