*This list of hand-picked leaders was compiled by Wojtek Aleksander, from GetResponse.com.*

Other bigger lists (sometimes created by robots) can be found here and are usually based on your Klout score, which in my opinion is not accurate. The list below is truly original and I would even add, somewhat unexpected, as you won't find…

ContinueAdded by L.V. on May 14, 2016 at 9:00am — No Comments

Starred articles are candidates for the picture of the week. A comprehensive list of all past resources is found here. We are in the process of automatically categorizing them using indexation and automated tagging…

ContinueAdded by L.V. on May 6, 2016 at 8:30am — No Comments

According to Wikipedia, MongoDB is a cross-platform document-oriented database. Classified as a NoSQL database, MongoDB avoids the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster. MongoDB is developed by MongoDB Inc. and is published as free and open-source software. MongoDB is the fourth most popular type of database…

ContinueAdded by L.V. on May 3, 2016 at 9:00am — No Comments

These are the findings from a CrowdFlower survey. Data preparation accounts for about 80% of the work of data scientists. Cleaning data is the least enjoyable and most time consuming data science task, according to the survey. Interestingly, when we asked the question to our data scientist, his answer was:…

ContinueInteresting article posted recently in MIT Technology Reviews. What kind of metrics would help detect such tweets? We think the following might be useful:

- Local time (like late at night)
- Whether a picture or not is associated with the tweet
- Whether a link or not is associated with the tweet
- Number of typos for the tweet in question, compared with average for the user in question
- Frequency of tweets (sudden spike) for user in…

This long article with a lot of source code was posted by Suraj V Vidyadaran. Suraj is pursuing a Master in Computer Science at Temple university primarily focused in Data Science specialization. His areas of interests are in sentiment analysis, data visualization, big data and machine learning.

This data is obtained from UCI Machine learning repository. The purpose of the…

ContinueAdded by L.V. on March 13, 2016 at 9:30am — No Comments

This applies to many tech job interviews. But here we provide specific advice for data scientists and other professionals with a similar background. More advice is being added regularly.

**Here's the list**:

- Not doing any research on the company prior to the interview.
- Not understanding whether they want to hire a…

Added by L.V. on February 29, 2016 at 8:30pm — 2 Comments

Great article by Mike Ferguson. Articles about the big data, AI, data science or IoT ecosystems are always popular. Many have been posted here (see screenshot below):

Sometimes, the keyword…

ContinueAdded by L.V. on February 22, 2016 at 11:30am — No Comments

Data Science Central shared its predictions for 2016. More predictions can be found here. In this article, we share Scott Mongeau's predictions. The full version of this (long) article can be found…

ContinueAdded by L.V. on February 22, 2016 at 11:00am — 2 Comments

Model evaluation metrics are used to assess goodness of fit between model and data, to compare different models, in the context of model selection, and to predict how predictions (associated with a specific model and data set) are expected to be accurate.

**Confidence Interval**. Confidence intervals are used to assess how reliable a statistical…

Added by L.V. on February 20, 2016 at 10:00am — 2 Comments

Feature selection is one of the core topics in machine learning. In statistical science, it is called variable reduction or selection. Our scientist published a methodology to automate this process and efficiently handle la large number of features (called variables by statisticians). Click here for details.

Here, we mention an article published by Isabelle Guyon…

ContinueAdded by L.V. on February 14, 2016 at 4:00pm — No Comments

This article focuses on cases such as Facebook and protein interaction networks. The article was written by By Paul Scherer (paulmorio) and submitted as a research paper to HackCambridge. What makes this article interesting is the fact that it compares **five clustering techniques** for this type of problems:

**K Clique Percolation**- A clique merging algorithm. Given a set kk, the algorithm goes on to produce kk clique clusters and merge…

Added by L.V. on February 13, 2016 at 8:00am — No Comments

Very interesting document, relatively recent (September 2015), authored by David Donoho (Statistics professor at Stanford) and posted on one of the MIT websites, here (41 pages, PDF).

Below you will find the abstract and the table of content. Interestingly, Andrew Gelman and Vincent Granville (our data scientist)…

ContinueAdded by L.V. on February 10, 2016 at 8:30am — No Comments

*This article was written by Natasha Latysheva. Here we publish a short version, with references to full source code in the original article. *

Our internal data scientist had a few questions and comments about the article:

- The example used to illustrate the method in the source code is the famous iris…

Added by L.V. on February 6, 2016 at 6:00pm — 2 Comments

You did all the right things:

- getting a quantitative degree from a good university,
- or doing some internship,
- attending a few online classes (Coursera),
- spent a few weeks on a valuable data science boot camp or our data science apprenticeship, working on real big data - especially automating data processes - even gained a certification (…

Added by L.V. on February 6, 2016 at 10:30am — 2 Comments

If you read our digests, you already know that each week, we publish our *picture of the week*. Below is a selection from the last few months. By clicking on the link associated with each image, you will find the article in which it is described, many times with details about how the image was produced. Some of these images are interactive, when viewed on the original web page.

Related…

ContinueWhat do you think about Inora, a company that advertises itself as the *New Linear Regression Approach - Scalable to Big Data*. Our data scientist also developed automated regression for big data, offering source code, even an Excel implementation, and comparing his results with traditional regression techniques.…

Here are a few fields that are experiencing tremendous growth. The links below provide hundreds of popular articles recently published about these topics.

ContinueBusiness Insider analyzed tech skills posted in job ads found on Dice.com. Below are the 6 data science skills found in the top 10, in terms of salary. This is in accordance with a previous study, performed using Glassdoor data (instead of Dice) to conclude that data scientist is the most promising job in 2016.

**MapReduce**is a programming…

Added by L.V. on January 30, 2016 at 11:30am — No Comments

Sometimes a correlation means absolutely nothing, and is purely accidental (especially when you compute millions of correlations among thousands of variables) or it can be explained by confounding factors. For instance, the fact that the cost of electricity is correlated to how much people spend on education, is explained by a confounding factor:…

Continue- Deep Learning Networks: Advantages of ReLU over Sigmoid Function
- Deep Learning: AlphaGo Zero Explained In One Picture
- Choosing the Correct Type of Regression Analysis
- Book: Machine Learning: a Probabilistic Perspective
- Handbook of Statistical Analysis and Data Mining Applications - 2nd Edition
- The Gaussian Correlation Inequality in One Picture
- Machine Learning Glossary by Google

- 32 New External Machine Learning Resources and Updated Articles
- Data scientist paid $500k can barely code!
- Deep Learning Cheat Sheet (using Python Libraries)
- 50+ Free Data Science Books
- Free Book: Probability and Statistics Cookbook
- 27 Best "Picture of the Week" Over the Last 12 Months
- 18 Reasons Data Scientists are Difficult to Manage

© 2020 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions