Vincent Granville
• Male
• Issaquah, WA
• United States

## Vincent Granville's Discussions

### How to deal with missing data

Started Nov 22

Originally posted by Vincent Ajayi. The most common challenge faced by data scientists (DS) and…Continue

### Simulating Distributions with One-Line Formulas, even in Excel

Started this discussion. Last reply by Vincent Granville Nov 12.

If you don't like using black-box R functions, or you don't have access to these functions, here are simple options to simulate deviates from various distributions. They can even be implemented in…Continue

### Moments of Order Statistics

Started this discussion. Last reply by Vincent Granville May 24.

### Question about the big O notation

Started this discussion. Last reply by Daren Scot Wilson May 27.

We all know that exponential functions grow faster than polynomials. Let us consider the following function: f(n) = n^a ⋅ (log n)^b ⋅ (log log n)^c ⋅ (log log log n)^d⋯ where the leading coefficient…Continue

Red Ribbon From Jessica Qiu

# Vincent Granville's Page

## Profile Information

Short Bio
Data science pioneer, founder, author, CEO, investor, with broad spectrum of domain expertise, technical knowledge, and proven success in bringing measurable added value to companies ranging from startups to fortune 100, across multiple industries (finance, Internet, media, IT, security), domains (data science, operations research, machine learning, computer science, business intelligence, statistics, applied mathematics, growth hacking, IoT) and roles (data scientist, founder, CFO, CEO, HR, product development, marketing, media buyer, operations, management consulting).

Vincent developed and deployed new techniques such as hidden decision trees (for scoring and fraud detection), automated tagging, indexing and clustering of large document repositories, black-box, scalable, simple, noise-resistant regression known as the Jackknife Regression (fit for black-box, real-time or automated data processing), model-free confidence intervals, bucketisation, combinatorial feature selection algorithms, detecting causation not correlations, automated exploratory data analysis with data dictionaries, data videos as a visualization tool, automated data science, and generally speaking, the invention of a set of consistent robust statistical / machine learning techniques that can be understood, implemented, interpreted, leveraged and fine-tuned by the non-expert. Vincent also invented many synthetic metrics (for instance, predictive power and L1 goodness-of-fit) that work better than old-fashioned stats, especially on badly-behaved sparse big data. Some of these techniques have been implemented in a Map-Reduce Hadoop-like environment. Some are concerned with identifying true signal in an ocean of noisy data.

Vincent is a former post-doctorate of Cambridge University and the National Institute of Statistical Sciences. He was among the finalists at the Wharton School Business Plan Competition and at the Belgian Mathematical Olympiads. Vincent has published 40 papers in statistical journals (including Journal of Number Theory, IEEE Pattern analysis and Machine Intelligence, Journal of the Royal Statistical Society, Series B), a Wiley book on data science, and is an invited speaker at international conferences. He also holds a few patents on scoring technology, and raised \$6 MM in VC funding for his first startup. Vincent also created the first IoT platform to automate growth and content generation for digital publishers, using a system of API's for machine-to-machine communications, involving Hootsuite, Twitter, and Google Analytics.

Vincent's profile is accessible here and includes top publications, presentations, and work experience with Visa, Microsoft, eBay, NBC, Wells Fargo, and other organisations.

My Web Site Or LinkedIn Profile
Professional Status
C-Level
Years of Experience:
15
Data Science Central, AnalyticBridge
Industry:
Internet
Executive Data Scientist, Co-Founder
How did you find out about DataScienceCentral?
Tim Matteson
Interests:
Networking, New venture, Recruiting, Other
What is your Favorite Data Mining or Analytical Website?
http://www.datasciencecentral.com
What Other Analytical Website do you Recommend?
http://www.analyticbridge.com

## Bio

Data science pioneer, founder, author, CEO, investor, with broad spectrum of domain expertise, technical knowledge, and proven success in bringing measurable added value to companies ranging from startups to fortune 100, across multiple industries (finance, Internet, media, IT, security) and domains (data science, operations research, machine learning, computer science, business intelligence, statistics, applied mathematics, growth hacking, IoT).

Vincent developed and deployed new techniques such as hidden decision trees (for scoring and fraud detection), automated tagging, indexing and clustering of large document repositories, black-box, scalable, simple, noise-resistant regression known as the Jackknife Regression (fit for black-box, real-time or automated data processing), model-free confidence intervals, bucketisation, combinatorial feature selection algorithms, detecting causation not correlations, and generally speaking, the invention of a set of consistent robust statistical / machine learning techniques that can be understood, implemented, interpreted, leveraged and fine-tuned by the non-expert. Vincent also invented many synthetic metrics (for instance, predictive power and L1 goodness-of-fit) that work better than old-fashioned stats, especially on badly-behaved sparse big data. Some of these techniques have been implemented in a Map-Reduce Hadoop-like environment. Some are concerned with identifying true signal in an ocean of noisy data.

Vincent is a former post-doctorate of Cambridge University and the National Institute of Statistical Sciences. He was among the finalists at the Wharton School Business Plan Competition and at the Belgian Mathematical Olympiads. Vincent has published 40 papers in statistical journals and is an invited speaker at international conferences. Vincent also created the first IoT platform to automate growth and content generation for digital publishers, using a system of API's for machine-to-machine communications, involving Hootsuite, Twitter, and Google Analytics.

Vincent's profile is accessible at http://bit.ly/1jWEfMP and includes top publications, presentations, and work experience with Visa, Microsoft, eBay, NBC, Wells Fargo, and other organisations.

## Latest Activity

Vincent Granville posted a blog post

### Python: Implementing a k-means algorithm with sklearn

Originally posted by Michael Grogan. The below is an example of how sklearn in Python can be used to develop a k-means clustering algorithm.The purpose of k-means clustering is to be able to partition observations in a dataset into a specific number of clusters in order to aid in analysis of the data. From this perspective, it has particular value from a data visualisation perspective.This post explains how to:Import kmeans and PCA through the sklearn libraryDevise an elbow curve to select the…See More
yesterday
"Thank you Vincent for providing this. I stumbled upon this whilst looking for an ML program that works within Excel and does not require any programming expertise - mine is now way to rusty, such as it was. However, what you have provided may not be…"
Tuesday
Vincent Granville is now friends with Nitin Agarwal, Christopher N and Dr. Zane
Tuesday
Vincent Granville's blog post was featured

### Weekly Digest, December 2

Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, follow this link.  …See More
Dec 1
Nov 30
Nov 30
Vincent Granville posted a blog post

### Thursday News, November 29 - Special Thanksgiving Edition

Here is our selection of featured articles and technical resources posted since Monday. There is a lot of very interesting material in this edition.Technical ResourcesVariance, Attractors and Behavior of Chaotic Statistical SystemsNew Family of Generalized Gaussian or Cauchy…See More
Nov 29
Vincent Granville's blog post was featured

### Variance, Attractors and Behavior of Chaotic Statistical Systems

We study the properties of a typical chaotic system to derive general insights that apply to a large class of unusual statistical distributions. The purpose is to create a unified theory of these systems. These systems can be deterministic or random, yet due to their gentle chaotic nature, they exhibit the same behavior in both cases. They lead to new models with numerous applications in Fintech, cryptography, simulation and benchmarking tests of statistical hypotheses. They are also related to…See More
Nov 28
"Impressive Explanation!  Data Science Course"
Nov 28
Nov 27
Nov 27
Vincent Granville's blog post was featured

### New Family of Generalized Gaussian or Cauchy Distributions

The standard definition of a generalized Gaussian distribution can be found here. In this article, we explore a different type of generalized univariate normal distributions that satisfies useful statistical properties, with interesting applications. This new class of distributions is defined by its characteristic function, and applications are discussed in the last section. These…See More
Nov 27
Nov 26
Nov 26
Vincent Granville's 2 blog posts were featured
Nov 24

Join Data Science Central

At 6:53pm on May 04, 2019, Florent Rudel Ndeffo gave Vincent Granville a gift
Thank you for the documentations. Priceless! :)
At 9:13am on December 13, 2018, victor zurkowski said…

Dear Vincent,

Do you know how long does membership approval in "Analytic Bridge" take? I want to submit an answer to the self-correcting random walk problem. The answer is long, and I left a copy of my document (not the final draft) in Github.

At 6:24am on October 01, 2017, Nitesh Choudhary gave Vincent Granville a gift
Your posts are very informative and I have learned a lot from them. Thanks for sharing!
At 1:37pm on June 23, 2016, Bill Bahl said…

Dr. Granville,

I enjoyed your white paper on Building Dashboards that Flow and could not agree more with minimalism. One thing that seems to be missing from the dashboard packages I've seen is control charts.  At least for the process owner, my personal opinion is a control chart should be the first chart.  If the process is not stable and predicable, statistical analysis seems futile.  Before I retired (two months ago) we started including these in the process owners' LEAN PIT boards.  We generated them in Minitab.  It only takes a few clicks once the data is paste into Minitab.  Bill Bahl

At 12:05pm on February 11, 2016, Dean Pangelinan said…

Dr. Granville,

Regarding the passerelle options for the Data Science certification program, does the notation of "IEEE Computer Science Society - Member" refer to Associate Membership in the IEEE Computer Society, or to full IEEE Membership with additional membership in the IEEE Computer Science Society?

--  Dean Pangelinan

At 5:05pm on June 21, 2015, Sankara Kumaravel gave Vincent Granville a gift
Dear Dr.Vincent, Thanks for preserving such a nice professional web page for Data Analytics, this is really help for the novice like me.
At 5:28am on June 15, 2015, Lissy Able said…

Hi Vincent,

Can you suggest some points or links about serious data quality issue with the information pulled.

Thanks

Lissy

At 3:38pm on March 11, 2015, Donald Tynes said…

Vincent,

I recently was hired as a data scientist. As a new hire, leading the department of Business Intelligence, I am faced with self-posed questions such as, "What do I need to accomplish in the first 5 days?" And, "What should I accomplish in the first month?" And, of course, "How do I develop a long-term plan for transforming the business into a data-driven organization?" To make the problem of determining how I should focus my attention even more complicated, I have a single employee whom I want to groom to understand the algorithms that I am implementing. Also, I have a CEO who only agreed to hire for this position because the CIO, CFO, and COO encouraged him to do so, but he is highly skeptical of what data science can do for the organization; this complicates matters too because it puts on me a pressure to be dazzling right out-of-the-box.

I have given these questions considerable thought. I am on day 3 of my new job. I have decided to orient myself on the business' data, query tools, and self-service tools, such as QlikView. I have so many ideas, I have difficulty in choosing a single direction in which I should run. I must note that I want to be significantly impactful while minimizing disruptions in the business' daily functions. To that end, I keep thinking, "run a clustering analysis! Discover the patterns and trends in the company's data to begin the model-building process."

What advice would you give a young data scientist on his 4th day on the job (as it is for me, tomorrow)?

At 5:22am on December 3, 2014, Harvey Summers said…

I thought you might like this site: http://rpsychologist.com/d3/CI/

# Interpreting Confidence Intervals

### an interactive visualization

At 11:29pm on October 31, 2014, Philippe Van Impe said…

• View All

## Vincent Granville's Blog

### Python: Implementing a k-means algorithm with sklearn

Posted on December 6, 2019 at 12:22pm

Originally posted by Michael Grogan.

The below is an example of how sklearn in Python can be used to develop a k-means clustering algorithm.

The purpose of k-means clustering is to be able to partition observations in a dataset into a specific number of clusters in order to aid in analysis of the data. From this perspective, it has particular value from a data visualisation perspective.

This post explains how…

Continue

### Visualizing New York City WiFi Access with K-Means Clustering

Posted on December 6, 2019 at 12:12pm

Visualization has become a key application of data science in the telecommunications industry.

Specifically, telecommunication analysis is highly dependent on the use of geospatial data. This is because telecommunication networks in themselves are geographically dispersed, and analysis of such dispersions can yield valuable insights regarding network…

Continue

### Predicting Hotel Cancellations with Support Vector Machines and SARIMA

Posted on December 6, 2019 at 11:30am

This is Part 1 of a three part study on predicting hotel cancellations with machine learning. Originally posted by Michael Grogan.

# Logistic Regression and SVM

Hotel cancellations can cause issues for many businesses in the industry. Not only is there the…

Continue

### Thursday News, December 5

Posted on December 5, 2019 at 12:30pm

Here is our selection of featured articles and technical resources posted since Monday:

Upcoming Webinar

Technical Resources

Continue