Subscribe to DSC Newsletter

All Blog Posts (7,413)

The Data Scientist at Work

BRick van der Lans. Originally posted in B-Eye-Network

The Data Scientist’s Four-Step Discovery Process

The discovery process used by data scientists commonly consists of four steps (see also Figure 1):

  • Data…

Added by Vincent Granville on March 26, 2014 at 8:00pm — No Comments

Cute but flawed API: What your name says about your politics

Published in the Wall Street Journal, designed by Clarity Campaigns, but not by someone statistically savvy.…


Added by Mirko Krivanek on March 26, 2014 at 7:30pm — No Comments

Interesting chart

Published in The Economist. It shows the difference in cost-of-living between 2003 and 2013. However, I see two issues:

  • Making index = 100 for New York both in 2003 and 2013 is wrong. The reader will think New York prices stayed flat over 10 years, and it makes all comparisons 2003-2010 for other cites meaningless, as index might not have evolved the same way outside New York.
  • The choice of cities listed below is questionable. Why is Mexico City not…

Added by Mirko Krivanek on March 26, 2014 at 7:00pm — No Comments

21 Thought-Leader Professors in Data Science

The field of data science continues to grow, and with it come thought leaders who contribute to the industry through outreach and education. Many of the data science professors teaching today are leaders in the big-data field, speaking at conferences, writing books, and even creating groundbreaking big-data developments themselves. Find out which schools boast the most influential leaders in the data science industry.



Added by Vincent Granville on March 26, 2014 at 6:11pm — 3 Comments

Top 10 Capabilities for Exploring Complex Relationships in Data for Scientific Discovery

With all of the discussion about Big Data these days, there is frequest reference to the 3 V’s that represent the top big data challenges: Volume, Velocity, and Variety. These 3 V’s generally refer to the size of the dataset (Volume), the rate at which data is flowing into (or out of) your systems (Velocity), and the complexity (dimensionality) of the data (Variety).  Most practitioners agree that…


Added by Kirk Borne on March 26, 2014 at 4:30am — 1 Comment

The Data Science Toolkit - My Boot Camp Ciriculum

This is a compilation has everything you need to jumpstart your skills in the core tasks of data transformation, modeling, and visualization.

tl;dr: Coursera and John Hopkins have a new course called The Data Scientist's Toolbox.


Below is a list of popular analysis from Rexer's 2013 survey. The table is biased towards customer transaction, text,…


Added by Peter Higdon on March 25, 2014 at 9:01am — No Comments

KDNuggets 2014 Salary Survey (and 10 other salary surveys)

Data Scientists Salary Survey shows that industry data scientists are in a sweet spot, especially in US, Canada, and Australia, with average salary $135K. European and Asian data scientists salaries are significantly lower.


Added by Vincent Granville on March 25, 2014 at 8:30am — No Comments

The Haboob Clouds Hadoops Future

Hadoop is an open source framework for storing massive amounts of data on clusters of commodity hardware.

Haboob is a dense dust storm that moves…


Added by Michael Walker on March 23, 2014 at 9:03am — 3 Comments

Write a data science research paper and win fame and award

In connection with our proposed methodology to create a black-box, automated, easy-to-interpret, sample-based, robust technique called jackknife regression, to be used in small and big data environments by non-statisticians, We offer an award and massive promotion to the successful candidate who

  1. Provide the exact formulas for the solution of the 2x2, 3x3 and…

Added by Vincent Granville on March 20, 2014 at 9:00pm — 1 Comment

Jackknife logistic and linear regression for clustering and predictions

This article discusses a far more general version of the technique described in our article The best kept secret about regression. Here we adapt our methodology so that it applies to data sets with a more complex structure, in particular with highly correlated independent variables.…


Added by Vincent Granville on March 19, 2014 at 4:00pm — 11 Comments

Machine Learning in Parallel with Support Vector Machines, Generalized Linear Models, and Adaptive Boosting


This article describes methods for machine learning using bootstrap samples and parallel processing to model very large volumes of data in short periods of time. The R programming language includes many packages for machine learning different types of data. Three of these packages include Support Vector Machines (SVM) [1], Generalized Linear Models (GLM) [2], and Adaptive Boosting (AdaBoost) [3]. While all three packages can be highly accurate for…


Added by Jake Drew Ph.D. on March 19, 2014 at 9:10am — 4 Comments

Learn experimental design with our live, real-time ongoing analysis

This permanent experimental design setting allow you to learn, participate or check out the results at any time, as data is gathered and reported in real time. This article illustrates a few concepts:

  • The necessity to work with redundant data
  • The necessity to identify and use the right metrics
  • How to detect anomalies in experimental design settings
  • How to test multiple factors at once
  • What could make this analysis invalid

You can…


Added by Vincent Granville on March 18, 2014 at 12:00pm — No Comments

Top 10 Business Intelligence Trends for 2014

Tableau webinar, March 25.

Register now.

The innovation in data and analytics continues to…


Added by Vincent Granville on March 17, 2014 at 9:00am — No Comments

Big Data A to ZZ – A Glossary of my Favorite Data Science Things

Here are some of my favorite things about big data and data science, from A to Z (actually, ZZ):

A – Association rule mining

B – Bayes belief networks

C – Characterization

D –…


Added by Kirk Borne on March 16, 2014 at 4:00pm — 2 Comments

7 Key Skills of Effective Data Scientists


Are you looking for an exciting career opportunity that is just as paying as it is desirable? Harvard Business Review calls Data Scientists are the sexiest jobs of the 21st century. Data Scientist term coined when two people, DJ Patil and Jeff Hammerbacher, were trying to name their data team working on big data and did not want to limit their…


Added by Mousumi Ghosh on March 14, 2014 at 7:00pm — 11 Comments

Weekly digest - March 17

Featured articles


Added by Vincent Granville on March 13, 2014 at 5:30pm — No Comments

The best kept secret about linear and logistic regression

Update: The most recent article on this topic can be found here

All the regression theory developed by statisticians over the last 200 years (related to the general linear model) is useless. Regression can be performed as accurately without statistical models, including the computation of confidence intervals (for estimates, predicted values or…


Added by Vincent Granville on March 13, 2014 at 11:30am — 18 Comments

The Texas Sharpshooter Deception

I received a call from an old client who stated his analytics team had a recent string of failures alarming the firm and costing money. He asked me to review and audit the teams work and analytical processes in attempt to understand and remedy the failures. The data crunching technology was…


Added by Michael Walker on March 12, 2014 at 9:00pm — No Comments

Blog Topics by Tags

Monthly Archives













  • Add Videos
  • View All

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service