Top 50 data science / big data tools, described in less than 40 words, for decision makers. Please help us: any definition that you fill will have your name attached to it: send your definition or new term and definition to [email protected]
- Asterdata: MPP (Massive Parallel Processing) DB, able to work work with semi-structured data by SQL commands. A part of Teradata.
- EMC (submitted by Mark Burnard): Acquired Greenplum in 2010 to enable analytics on "Big Data". EMC has since integrated Greenplum with Hadoop and with "scale-out NAS" platform Isilon. Also integrates with EMC DataDomain for backup, and VPLEX for multi-site replication. See also Pivotal Initiative.
- Greenplum: (submitted by Mark Burnard): A highly scalable (100s of nodes, PBs of data) MPP database based on Postgres and designed for analytical workloads. Supports queries in SQL, SAS, R, java, perl, python, map-reduce, ODBC, JDBC. Runs on Linux on x86 or virtualised.
- Hadoop (submitted by Mohammad Tariq): An open source platform, written in Java and distributed under Apache's licence, that allows us to store, manage and process gigantic amounts of data in a highly parallel manner on clusters of commodity machines. Most suitable for batch processing.
Infinite Insight: KXEN’s predictive modeling suite that focuses on ease of use and rapid deployment of predictive models. Infinite Insight has been mostly deployed in the Communications, Financial Services, e-Business and Retail industries.
- KNIME: A user-friendly graphical workbench for the entire analysis process: data access, data transformation, initial investigation, powerful predictive analytics, visualisation and reporting. The open (source) platform provides a home for over 1000 modules, including those of the KNIME community.
- Lavastorm: Lavastorm's analytic software enables companies to empower financial and operational teams to tackle high-value, complex, financial and operational analytic problems in a new way, delivering actionable insights and results faster at a significantly lower cost than traditional business intelligence solutions.
- Pervasive: End-to-end data load, transformation, preparation and/or analytics that executes natively on Hadoop clusters at impossible speeds. Pervasive offers a suite of extensible big data and analytics software solutions that save development and deployment time while conserving hardware dollars.
- Pivotal Initiative (submitted by Mark Burnard): A wholly-owned spinoff from EMC and VMware which combines the Greenplum MPP database with distributed in-memory databases Gemfire and SQLfire, plus Application Development platforms to integrate "Big Data Analytics" with Event Processing, supporting real-time "big data-driven" B2B and B2C apps.
- Rapid Miner
- SPSS: Product family of market leading data mining, statistical analysis, text analytics, survey analysis and data collection tools. SPSS was acquired by IBM in 2009.
- Statistica: Data mining, text mining and statistical analysis package developed and marketed by StatSoft. Its deep integration with the Wintel platform translates to rapid speeds in analytic data processing.