Subscribe to DSC Newsletter

Great list of resources: data science, visualization, machine learning, big data

Fantastic resource created by Andrea Motosi. I've only included the 5 categories that are the most relevant to our audience, though it has 31 categories total, including a few on distributed systems and Hadoop. Click here to view the 31 categories. You might also want to check our our our internal resources (the first section below).

Source: Machine Learning and Face Recognition Papers

Data Science Central - Resources

Machine Learning

  • Apache Mahout: machine learning library for Hadoop
  • Ayasdi Core: tool for topological data analysis
  • brain: Neural networks in JavaScript
  • Cloudera Oryx: real-time large-scale machine learning
  • Concurrent Pattern: machine learning library for Cascading
  • convnetjs: Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser
  • Decider: Flexible and Extensible Machine Learning in Ruby
  • etcML: text classification with machine learning
  • Etsy Conjecture: scalable Machine Learning in Scalding
  • Google Sibyl: System for Large Scale Machine Learning at Google
  • H2O: statistical, machine learning and math runtime for Hadoop
  • IBM Watson: cognitive computing system
  • MLbase: distributed machine learning libraries for the BDAS stack
  • MLPNeuralNet: Fast multilayer perceptron neural network library for iOS and Mac OS X
  • nupic: Numenta Platform for Intelligent Computing: a brain-inspired machine intelligence platform, and biologically accurate neural network based on cortical learning algorithms
  • PredictionIO: machine learning server buit on Hadoop, Mahout and Cascading
  • scikit-learn: scikit-learn: machine learning in Python
  • Spark MLlib: a Spark implementation of some common machine learning (ML) functionality
  • Sparkling Water: combine H2OÕs Machine Learning capabilities with the power of the Spark platform
  • Vahara: Machine learning and natural language processing with Apache Pig
  • Viv: global platform that enables developers to plug into and create an intelligent, conversational interface to anything
  • Vowpal Wabbit: learning system sponsored by Microsoft and Yahoo!
  • WEKA: suite of machine learning software
  • Wit: Natural Language for the Internet of Things
  • Wolfram Alpha: computational knowledge engine

Visualization

  • Arbor: graph visualization library using web workers and jQuery
  • CartoDB: open-source or freemium hosting for geospatial databases with powerful front-end editing capabilities and a robust API
  • Chart.js: open source HTML5 Charts visualizations
  • Crossfilter: avaScript library for exploring large multivariate datasets in the browser. Works well with dc.js and d3.js
  • Cubism: JavaScript library for time series visualization
  • Cytoscape: JavaScript library for visualizing complex networks
  • D3: javaScript library for manipulating documents
  • DC.js: Dimensional charting built to work natively with crossfilter rendered using d3.js. Excellent for connecting charts/additional metadata to hover events in D3
  • Envisionjs: dynamic HTML5 visualization
  • Freeboard: pen source real-time dashboard builder for IOT and other web mashups
  • Gephi: An award-winning open-source platform for visualizing and manipulating large graphs and network connections
  • Google Charts: simple charting API
  • Grafana: graphite dashboard frontend, editor and graph composer
  • Graphite: scalable Realtime Graphing
  • Highcharts: simple and flexible charting API
  • IPython: provides a rich architecture for interactive computing
  • Keylines: toolkit for visualizing the networks in your data
  • Matplotlib: plotting with Python
  • NVD3: chart components for d3.js
  • Peity: Progressive SVG bar, line and pie charts
  • Plot.ly: Easy-to-use web service that allows for rapid creation of complex charts, from heatmaps to histograms. Upload data to create and style charts with Plotly’s online spreadsheet. Fork others’ plots.
  • Recline: simple but powerful library for building data applications in pure Javascript and HTML
  • Redash: open-source platform to query and visualize data
  • Sigma.js: JavaScript library dedicated to graph drawing
  • Vega: a visualization grammar

Graph Databases

  • Apache Giraph: implementation of Pregel, based on Hadoop
  • Apache Spark Bagel: implementation of Pregel, part of Spark
  • ArangoDB: multi model distribuited database
  • Facebook TAO: TAO is the distributed data store that is widely used at facebook to store and serve the social graph
  • Faunus: Hadoop-based graph analytics engine for analyzing graphs represented across a multi-machine compute cluster
  • Google Cayley: open-source graph database
  • Google Pregel: graph processing framework
  • GraphLab PowerGraph: a core C++ GraphLab API and a collection of high-performance machine learning and data mining toolkits built on top of the GraphLab API
  • GraphX: resilient Distributed Graph System on Spark
  • Gremlin: graph traversal Language
  • InfiniteGraph: distributed graph database
  • Infovore: RDF-centric Map/Reduce framework
  • Intel GraphBuilder: tools to construct large-scale graphs on top of Hadoop
  • MapGraph: Massively Parallel Graph processing on GPUs
  • Neo4j: graph database writting entirely in Java
  • OrientDB: document and graph database
  • Phoebus: framework for large scale graph processing
  • Sparksee: scalable high-performance graph database
  • Titan: distributed graph database, built over Cassandra
  • Twitter FlockDB: distribuited graph database

NewSQL

  • Actian Ingres: commercially supported, open-source SQL relational database management system
  • BayesDB: statistic oriented SQL database
  • Cockroach: Scalable, Geo-Replicated, Transactional Datastore
  • Datomic: distributed database designed to enable scalable, flexible and intelligent applications
  • FoundationDB: distributed database, inspired by F1
  • Google F1: distributed SQL database built on Spanner
  • Google Spanner: globally distributed semi-relational database
  • H-Store: is an experimental main-memory, parallel database management system that is optimized for on-line transaction processing (OLTP) applications
  • HandlerSocket: NoSQL plugin for MySQL/MariaDB
  • IBM DB2: object-relational database management system
  • InfiniSQL: infinity scalable RDBMS
  • MemSQL: in memory SQL database witho optimized columnar storage on flash
  • NuoDB: SQL/ACID compliant distributed database
  • Oracle Database: object-relational database management system
  • Oracle TimesTen in-Memory Database: in-memory, relational database management system with persistence and recoverability
  • Pivotal GemFire XD: Low-latency, in-memory, distributed SQL data store. Provides SQL interface to in-memory table data, persistable in HDFS
  • SAP HANA: is an in-memory, column-oriented, relational database management system
  • SenseiDB: distributed, realtime, semi-structured database
  • Sky: database used for flexible, high performance analysis of behavioral data
  • SymmetricDS: open source software for both file and database synchronization
  • Teradata Database: complete relational database management system
  • VoltDB: in-memory NewSQL database

Other

Views: 10576

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Kishore Kumar Chaganti on Tuesday

Good info. Thank you.

Comment by Baskaran Viswanathan on December 24, 2016 at 1:16pm

Awesome compilations. Thank you so much. God bless you. 

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service