The Zipf's law states that in many settings (that we are going to explore), the volume or size of entities is inversely proportional to a power s (s > 0) of their ranking. This has important implications in predictive modeling, discussed below. The processes that create this type of dynamic are not well understood. It is the purpose of this article to explain the underlying mechanics. The traditional example for the Zipf distribution is the distribution of Internet…Continue
Over the years, our…Continue
Added by Vincent Granville on August 20, 2014 at 5:30pm — No Comments
What are the differences between data science, data mining, machine learning, statistics, operations research, and so on?
Here I compare several analytic disciplines that overlap, to explain the differences and common denominators. Sometimes differences exist for nothing else other than historical reasons. Sometimes the differences are real and subtle. I also provide typical job titles, types of analyses, and industries traditionally attached to each discipline. Underlined domains are…Continue
The full version is always published Monday. Starred articles are new additions or updated content, posted between Thursday and Sunday.
Added by Vincent Granville on July 16, 2014 at 2:30pm — No Comments
The full version is always published Monday. Starred articles are new additions or updated content, posted between Thursday and Sunday
Added by Vincent Granville on June 12, 2014 at 7:30am — No Comments
The full version is always published Monday. Starred articles are new additions, posted between Thursday and Sunday
Announcements from Sponsors
Added by Vincent Granville on May 28, 2014 at 12:00pm — No Comments
Specifically designed in the context of big data in our research lab, the new and simple strong correlation synthetic metric proposed in this article should be used, whenever you want to check if there is a real association between two variables, especially in large-scale automated data science or machine learning projects. Use this new metric now, to avoid being accused of reckless…Continue
Should you hire someone who knows all the most recent flavors of logistic regression? Or an Hadoop developer?
In my opinion, this is the wrong strategy. These employees are very expensive (at least $120k per year), and they might not bring the ROI that you expect. At least, if going in that direction, hire someone favoring simple, scalable, robust, automated solutions over anything else. To automate, you need someone great at developing…Continue
The discovery process used by data scientists commonly consists of four steps (see also Figure 1):
Added by Vincent Granville on March 26, 2014 at 8:00pm — No Comments
We are now at 9 categories after a few updates. Just like there are a few categories of statisticians (biostatisticians, statisticians, econometricians, operations research specialists, actuaries) or business analysts (marketing-oriented, product-oriented, finance-oriented, etc.) we have different categories of data scientists. First, many data scientists have a job title different from data scientist, mine…Continue
Both programs are alternatives to university curricula and traditional education. Both are run by leading industry professionals rather than academic leaders. Zipfian offers a 12-week program, the Data Science Apprenticeship (DSA, organized by us at DSC - Data Science Central) is a 6-month program.…Continue
A few universities and other organizations have started to offer data science curriculum, training and certificates.
The following excerpts are taken from the…Continue
Added by Vincent Granville on August 27, 2013 at 6:00pm — No Comments
Added by Vincent Granville on July 17, 2013 at 3:30pm — No Comments
Added by Vincent Granville on June 27, 2013 at 10:00am — No Comments
This 2nd edition has more than 200 pages of pure data science, far more than the first edition. This new version of our very popular book will soon be available for download: we will make an announcement when it is officially published.
Monthly selection of 50 big data, analytics, data science, visualization, data integration articles from various respected news outlets. Covering education, training, salaries, business applications, major press releases, success stories etc.
At the bottom of this article, you will find links to our previous news selections,…Continue