Bill Vorhies is Editorial Director for DataScienceCentral, and President and Chief Data Scientist at Data-Magnum, providing predictive analytics and big data infrastructure projects as a service. Bill has been an active commercial predictive modeler since 2001.
Below, you will find a selection of his articles posted in the last two years. To check out his most recent…Continue
Added by Vincent Granville on June 2, 2016 at 9:00am — No Comments
Summary: Continuing from out last article, we searched the web to find all of the most common myths and misconceptions about Big Data. There were a lot more than we thought. Here’s what we found. Part 2.
Summary: What do you need to do to get an entry level job in data science?
This article is written for anyone who is considering becoming a data scientist. That includes young people just starting their bachelor’s degrees and folks in the first two or three years of their careers who want to make the switch.
It’s not for folks who…Continue
I invite you to solve these challenges yourself before reading the solutions (for some of these problems) or hints to help you tackle these problems.Continue
Added by Vincent Granville on April 19, 2016 at 9:44am — No Comments
These articles were controversial in the sense that they highlighted the differences between data science and other disciplines, at a time when many believed that data science was just old stuff being re-branded, or being practiced by people knowing nothing about statistics. Ironically, some of the old stuff actually re-branded itself as data science, not the other way around.…Continue
Added by Vincent Granville on March 24, 2016 at 6:30pm — No Comments
Summary: Which is the most critical element in data exploration, statistics or data visualization? The answer is a little like the lyric ‘love and marriage, you can’t have one without the other’. It can be tempting to skip the data visualization but it’s frequently the key to making sure we aren’t heading down the completely wrong path.
Added by William Vorhies on March 23, 2016 at 8:35am — No Comments
Summary: This is my favorite IoT story. We are so used to IoT platforms being physical objects that we forget about the potential for biologics. In terms of direct economic reward little will compare to this story about the IoT and cows.
This is my favorite IoT story which I first heard from Joseph Sirosh, CVP of Machine Learning for Microsoft at the spring Strata convention in San Jose. We are so used to IoT platforms being physical objects like cars…Continue
Summary: It’s become almost part of our culture to believe that more data, particularly Big Data quantities of data will result in better models and therefore better business value. The problem is it’s just not always true. Here are 7 cases that make the point.
Using AI and data science, an MIT team was able to accurately predict rogue waves coming out of the blue in the middle of the ocean, in near real time, to help sailors change their navigation path and avoid destruction and death. Rogue waves, while rare, are unpredictable, tall (up to 100 feet) and devastating. The physical mechanism producing these waves is well understood, and is typically modeled using rotating elements.…Continue
Guest blog by Justin B. Dickerson, PhD, MBA, PStat, Chief Data Scientist at Snap Advances.
Okay, that headline was meant to get your attention. But lately, I've been thinking about this crazy circus we call data science and how everyone seems to think data scientists are invaluable, treasured, and potentially "un-fireable" in this age of data scientist negative…Continue
This post is a summary of 3 different posts about outlier detection methods.
One of the challenges in data analysis in general and predictive modeling in particular is dealing with outliers. There are many modeling techniques which are resistant to outliers or reduce the impact of them, but still detecting outliers and understanding them can lead to interesting findings. We generally define outliers as samples that are exceptionally far from the mainstream of data.There is no rigid…Continue
Starred articles are new additions or updated content, posted between Thursday and Sunday. The weekly digest has 6 sections: (1) Featured Articles and Case Studies, (2) Featured Resources and Technical Contributions, (3) From our Sponsors, (4) News, Events, Books, Training, Forum Questions, (5) Picture of the Week, and (6) Syndicated Content.
The full version is always published Monday.…Continue
Added by Vincent Granville on January 6, 2016 at 8:00pm — No Comments
Here we discuss general applications of statistical models, whether they arise from data science, operations research, engineering, machine learning or statistics. We do not discuss specific algorithms such as decision trees, logistic regression, Bayesian modeling, Markov models, data reduction or feature selection. Instead, I discuss frameworks - each one using its own types of techniques and algorithms - to solve real life problems.
Most of the entries below are found in…Continue
Added by Vincent Granville on December 14, 2015 at 10:00am — No Comments
\We asked our staff data scientist what motivates him, and here's what he said:
Starred articles are candidates for the picture of the week. A comprehensive list of all past resources is found here. We are in the process of automatically categorizing them using indexation and automated tagging…Continue
Added by Vincent Granville on December 6, 2015 at 8:46pm — No Comments
I was trying to find some good domain name for our upcoming business science website, when something suddenly became clear to me. Many of us have been confused for a long time about what data science means, how it is different from statistics, machine learning, data mining, or operations research, and the rise of the data scientist light - a new species of coders who call themselves data scientist after a few hours of Python/R…Continue
By Chuck Currin and Arvid Tchivzhel, Mather Economics
Audience segmentation of their readers is a relatively new undertaking for publishers. The publishing business model, historically, has relied heavily on advertising revenue, and the principal audience information that a publisher possessed was focused on characteristics valuable to their advertisers. As subscription revenue has become half or more of total revenue, the return on audience analytics and segmentation has…Continue
Added by Arvid Tchivzhel on November 17, 2015 at 5:28am — No Comments
Anyone interested in categorizing them? It could be an interesting data science project, scraping these websites, extracting keywords, and categorizing them with a simple indexation or tagging algorithm. For instance, some of these blogs cater about stats, or Bayesian stats, or R libraries, or R training, or visualization, or anything else. This indexation technique…Continue