Subscribe to DSC Newsletter

Data Geek's Blog (68)

Deep Learning Cheat Sheet (using Python Libraries)

This cheat sheet was produced by DataCamp, and it is based on the Keras library..Keras is an easy-to-use and powerful library for Theano and TensorFlow that provides a high-level neural networks API to develop and evaluate deep learning models. Originally posted here in PDF format. Click on the image below to zoom in. …


Added by Data Geek on April 27, 2017 at 4:30pm — No Comments

18 Data Science Certificates Rated

This infographic was produced by Springboard, and it lists a few short online, inexpensive courses along with some university programs, leading to a certificate. The infographics provides some highlights for each program, for comparison purposes. For more data science programs and certificates, click here or…


Added by Data Geek on April 26, 2017 at 10:00am — No Comments

The seven deadly sins of statistical misinterpretation, and how to avoid them

By Winnifred Louis, Associate Professor, Social Psychology, The University of Queensland, and Cassandra Chapman,PhD Candidate in Social Psychology, The University of Queensland.…


Added by Data Geek on March 29, 2017 at 8:30am — No Comments

Matplotlib Cheat Sheet

This Python cheat sheet was produced by DataCamp. Click on the image to zoom in. The original, published here, is available as a PDF document. The translation from PDF to image format was done using PDF2PNG

To check dozens of data science related cheat sheets,…


Added by Data Geek on March 6, 2017 at 5:30pm — No Comments

Most cited deep learning papers

This is a curated list of the most cited deep learning papers (since 2012) posted by Terry Taewoong Um.

Source for picture: …


Added by Data Geek on March 6, 2017 at 8:00am — No Comments

Big Data's Most Influential Rock Stars: 10 Must-Follow Leaders

This list of hand-picked leaders was compiled by Wojtek Aleksander, from

Other bigger lists (sometimes created by robots) can be found here and are usually based on your Klout score, which in my opinion is not accurate. The list below is truly original and I would even add, somewhat unexpected, as you won't find…


Added by Data Geek on May 14, 2016 at 9:00am — No Comments

32 New External Machine Learning Resources and Updated Articles

Starred articles are candidates for the picture of the week. A comprehensive list of all past resources is found here. We are in the process of automatically categorizing them using indexation and automated tagging…


Added by Data Geek on May 6, 2016 at 8:30am — No Comments

70 MongoDB Interview Questions and Answers

According to Wikipedia, MongoDB is a cross-platform document-oriented database. Classified as a NoSQL database, MongoDB avoids the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster. MongoDB is developed by MongoDB Inc. and is published as free and open-source software. MongoDB is the fourth most popular type of database…


Added by Data Geek on May 3, 2016 at 9:00am — No Comments

Least enjoyable and most time consuming data science tasks

These are the findings from a CrowdFlower survey. Data preparation accounts for about 80% of the work of data scientists. Cleaning data is the least enjoyable and most time consuming data science task, according to the survey. Interestingly, when we asked the question to our data scientist, his answer was:…


Added by Data Geek on April 5, 2016 at 9:30am — 1 Comment

Machine Learning Algorithm Identifies Tweets Sent Under the Influence of Alcohol

Interesting article posted recently in MIT Technology Reviews. What kind of metrics would help detect such tweets? We think the following might be useful:

  1. Local time (like late at night)
  2. Whether a picture or not is associated with the tweet
  3. Whether a link or not is associated with the tweet
  4. Number of typos for the tweet in question, compared with average for the user in question 
  5. Frequency of tweets (sudden spike) for user in…

Added by Data Geek on March 18, 2016 at 6:30am — 1 Comment

Implementation of 17 classification algorithms in R

This long article with a lot of source code was posted by Suraj V Vidyadaran. Suraj is pursuing a Master in Computer Science at Temple university primarily focused in Data Science specialization. His areas of interests are  in sentiment analysis, data visualization, big data and machine learning.

This data is obtained from UCI Machine learning repository. The purpose of the…


Added by Data Geek on March 13, 2016 at 9:30am — No Comments

19 Worst Mistakes at Data Science Job Interviews

This applies to many tech job interviews. But here we provide specific advice for data scientists and other professionals with a similar background. More advice is being added regularly. 

Here's the list:

  1. Not doing any research on the company prior to the…

Added by Data Geek on February 29, 2016 at 8:30pm — 2 Comments

New Architecture for the Analytic Ecosystem

Great article by Mike Ferguson. Articles about the big data, AI, data science or IoT ecosystems are always popular. Many have been posted here (see screenshot below):

Sometimes, the keyword…


Added by Data Geek on February 22, 2016 at 11:30am — No Comments

12 Emerging Trends in Data Analytics

Data Science Central shared its predictions for 2016. More predictions can be found here. In this article, we share Scott Mongeau's predictions. The full version of this (long) article can be found…


Added by Data Geek on February 22, 2016 at 11:00am — 2 Comments

11 Important Model Evaluation Techniques Everyone Should Know

Model evaluation metrics are used to assess goodness of fit between model and data, to compare different models, in the context of model selection, and to predict how predictions (associated with a specific model and data set) are expected to be accurate. 

Confidence Interval.…


Added by Data Geek on February 20, 2016 at 10:00am — 2 Comments

An Introduction to Variable and Feature Selection

Feature selection is one of the core topics in machine learning. In statistical science, it is called variable reduction or selection. Our scientist published a methodology to automate this process and efficiently handle la large number of features (called variables by statisticians). Click here for details.

Here, we mention an article published by Isabelle Guyon…


Added by Data Geek on February 14, 2016 at 4:00pm — No Comments

Detecting and Visualising Clusterings Interaction Networks (And a few other cool things like Facebook)

This article focuses on cases such as Facebook and protein interaction networks. The article was written by By Paul Scherer (paulmorio) and submitted as a research paper to HackCambridge. What makes this article interesting is the fact that it compares five clustering techniques for this type of problems:

  • K Clique Percolation - A clique merging algorithm. Given a set kk, the algorithm goes on to produce kk clique clusters and merge…

Added by Data Geek on February 13, 2016 at 8:00am — No Comments

50 Years of Data Science

Very interesting document, relatively recent (September 2015), authored by David Donoho (Statistics professor at Stanford) and posted on one of the MIT websites, here (41 pages, PDF). 

Below you will find the abstract and the table of content. Interestingly, Andrew Gelman and Vincent Granville (our data scientist)…


Added by Data Geek on February 10, 2016 at 8:30am — No Comments

k-nearest neighbor algorithm using Python

This article was written by Natasha Latysheva. Here we publish a short version, with references to full source code in the original article

Our internal data scientist had a few questions and comments about the article:

  • The example used to illustrate the method in the source code is the famous iris…

Added by Data Geek on February 6, 2016 at 6:00pm — 1 Comment

5 Unusual Ways to Find your First or Next Data Science Job

You did all the right things:

  • getting a quantitative degree from a good university,
  • or doing some internship,
  • attending a few online classes (Coursera),
  • spent a few weeks on a valuable data science boot camp or our data science apprenticeship, working on real big data - especially automating data processes -  even gained a certification (…

Added by Data Geek on February 6, 2016 at 10:30am — 2 Comments

Follow Us


  • Add Videos
  • View All


© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service