Top 50 data science / big data tools, described in less than 40 words, for decision makers. Please help us: any definition that you fill will have your name attached to it: send your definition or new term and definition to [email protected]

  1. AnalyticTalent
  2. Asterdata: MPP (Massive Parallel Processing) DB, able to work work with semi-structured data by SQL commands. A part of Teradata.
  3. Cloudera
  4. EMC (submitted by Mark Burnard): Acquired Greenplum in 2010 to enable analytics on "Big Data". EMC has since integrated Greenplum with Hadoop and with "scale-out NAS" platform Isilon. Also integrates with EMC DataDomain for backup, and VPLEX for multi-site replication. See also Pivotal Initiative.
  5. Greenplum: (submitted by Mark Burnard): A highly scalable (100s of nodes, PBs of data) MPP database based on Postgres and designed for analytical workloads. Supports queries in SQL, SAS, R, java, perl, python, map-reduce, ODBC, JDBC. Runs on Linux on x86 or virtualised.
  6. Hadoop (submitted by Mohammad Tariq): An open source platform, written in Java and distributed under Apache's licence, that allows us to store, manage and process gigantic amounts of data in a highly parallel manner on clusters of commodity machines. Most suitable for batch processing.
  7. Infinite Insight: KXEN’s predictive modeling suite that focuses on ease of use and rapid deployment of predictive models. Infinite Insight has been mostly deployed in the Communications, Financial Services, e-Business and Retail industries.

  8. Kaggle
  9. KNIME: A user-friendly graphical workbench for the entire analysis process: data access, data transformation, initial investigation, powerful predictive analytics, visualisation and reporting. The open (source) platform provides a home for over 1000 modules, including those of the KNIME community.
  10. Lavastorm: Lavastorm's analytic software enables companies to empower financial and operational teams to tackle high-value, complex, financial and operational analytic problems in a new way, delivering actionable insights and results faster at a significantly lower cost than traditional business intelligence solutions.
  11. MarkLogic
  12. Pervasive: End-to-end data load, transformation, preparation and/or analytics that executes natively on Hadoop clusters at impossible speeds.  Pervasive offers a suite of extensible big data and analytics software solutions that save development and deployment time while conserving hardware dollars.  
  13. Pivotal Initiative (submitted by Mark Burnard): A wholly-owned spinoff from EMC and VMware which combines the Greenplum MPP database with distributed in-memory databases Gemfire and SQLfire, plus Application Development platforms to integrate "Big Data Analytics" with Event Processing, supporting real-time "big data-driven" B2B and B2C apps.
  14. R
  15. Rapid Miner
  16. SAS
  17. Splunk
  18. SPSS: Product family of market leading data mining, statistical analysis, text analytics, survey analysis and data collection tools.  SPSS was acquired by IBM in 2009.
  19. Statistica: Data mining, text mining and statistical analysis package developed and marketed by StatSoft.  Its deep integration with the Wintel platform translates to rapid speeds in analytic data processing.
  20. Tableau
  21. Teradata
  22. Tibco
  23. Vertica

Related articles:

Views: 12749

Tags: predictive modeling


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Vincent de Stoecklin on December 29, 2015 at 6:30am

Thanks for the list !

In the up-and-coming hybrid Data Analytics Platforms segment you could also include Dataiku's Data Science Studio. The tool allows data teams to combine coding environments for data wrangling and machine learning (HDFS, R, Python, Spark...) with visual data exploration and preparation features for business users. You can try the free Community Edition here

Comment by Lars Fiedler on October 7, 2014 at 11:13am

I'd put Composable Analytics on the list.

Full disclosure, I'm the creator :)

Comment by xena ugrinsky on April 19, 2014 at 7:03am
Alteryx is a nice SAS competitor with some elegant yet simple for business user ETL built in.
Comment by Aman Khurana on January 9, 2013 at 9:42pm

Please also count GingerBrain , an upcoming predictive and statistical analytics platform in the cloud.

Comment by Cesar Rojas on November 29, 2012 at 4:32pm

Thanks Vincent for including Teradata Aster. The SQL-MaprReduce database and step-by-step tutorials can be downloaded here: http://www.asterdata.com/downloads/ 

Comment by Karl Rexer on November 29, 2012 at 3:21pm
Additional tools:
- SAS Enterprise Miner
- IBM SPSS Statistics
- IMB SPSS Modeler
- Salford Systems
- Zementis
- Oracle R Enterrpise
- Oracle Data Miner
- Revolution Analytics
- Weka

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service