Subscribe to Dr. Granville's Weekly Digest

All Blog Posts (1,668)

Top 100 most influential big data / data science practitioners to follow on Twitter

We've published such articles in the past.  Here are recent reports from external sources. The first one below comes from You can choose a topic, to find top influential people related to the topic in question. Below is…


Added by Data Science Girl on May 28, 2015 at 9:00am — No Comments

Weekly Digest - June 1

The full version is always published Monday. Starred articles are new additions or updated content, posted between Thursday and Sunday.



Added by Vincent Granville on May 27, 2015 at 9:00am — No Comments

What clustering method is required for text documents

Let's say a set of documents 'S' has a large set of 'pure' texts.

On all documents in S, I am spelling normalisation method, which yields a normalised set S'.

Then I use the chosen method M (which method? ) to make clusters in S, obtaining a clustering result C.

Then I use the same method M to make clusters in S', obtaining a clustering results C'.

Finally I need to compare if there are statistically significant differences between C and C'.

Any help in identifying…


Added by MUSHTAQ AHMAD on May 25, 2015 at 11:48am — 3 Comments

Simple Regression use in Big Data

We have witnessed the rise of Key & Value pair, since the emergence of Big Data. We certainly can explore the relationship of such two variables in terms of X & Y, to be worked with in terms of using Data Science. The use of Regression also on basic terms gives an a depiction of two variables X & Y to work with. These variables are:

Independent Variables & Dependent Variables

Let us take behavior of users of a…


Added by Atif Farid Mohammad on May 25, 2015 at 6:00am — No Comments

How to determine the quality and correctness of classification models? Part 2 - Quantitative quality indicators

Basic quantitative quality indicators

In the last part of the tutorial we introduced the basic qualitative model quality indicators. Let us recall them now:

  • TP – True Positive – the number of observations correctly assigned to the positive class

    Example: the model’s predictions are correct and resigning customers have been…

Added by Algolytics on May 23, 2015 at 6:00pm — No Comments

Virtual Org and Behaviour by Transaction

In Java programming, there is the idea of a "virtual machine." A virtual machine is a computer system that doesn't exist in real life. Yet programs can be written for it. The code is interpreted by a runtime environment. Through this arrangement, Java programs can operate on different operating systems rather than one exclusively. Depending on one's background, the concept of a "…


Added by Don Philip Faithful on May 23, 2015 at 6:31am — No Comments

Four successful big data / analytics startups in Seattle

These companies gather and process gigantic amounts of data to serve their clients and/or users. They make money out of selling summarized, processed, real-time data. They are poised to succeed in the IoT (Internet of Things) revolution, leveraging all sort of devices and API's to gather data, and

  • send alerts to users via text messages or other technology
  • sell intelligence extracted from data, to other businesses

It is worth spending some time figuring out…


Added by Mirko Krivanek on May 22, 2015 at 8:00pm — No Comments

How Apple Uses Big Data To Drive Success

Apple’s old slogan was “Think Different” – and while it is now retired, and the ethos may not be as apparent in the company’s products as it once was, it is true for their approach to Big Data.

In some…


Added by Bernard Marr on May 22, 2015 at 1:30pm — 1 Comment

Big Data: Uncovering The Secrets of Our Universe At CERN

CERN is best known these days as the research organization which operates the Large Hadron Collider – the largest and most complicated science experiment ever undertaken, which aims to explain mysteries behind the creation of the universe.…


Added by Bernard Marr on May 22, 2015 at 1:30pm — 3 Comments

10 Python Machine Learning Projects on GitHub

Here is a list of top Python Machine learning projects on GitHub. A continuously updated list of open source learning projects is available on Pansop.



Added by Pansop on May 21, 2015 at 8:00pm — 1 Comment

Data Integrity: The Rest of the Story Part II

Buzz words are one of my least favorite things, but as buzz words go, I can appreciate the term “Data Lake.” It is one of the few buzz words that communicates a meaning very close to its intended definition. As you might imagine, with the advent of large scale data processing, there would be a need to name the location where lots of data resides, ergo, data lake. I personally prefer to call it a series of redundant commodity servers with Direct-Attached Storage, or hyperscale computing with…


Added by Randall V Shane on May 21, 2015 at 3:13pm — 1 Comment

Measuring Information Retrieval Performance Using Extrapolated Precision

This is a brief overview of my paper “Information Retrieval Performance Measurement Using Extrapolated Precision,” which I’ll be presenting on June 8th at the DESI VI workshop at ICAIL 2015.  The paper provides a novel method for extrapolating a precision-recall point to a different level of recall, and…


Added by Bill Dimm on May 21, 2015 at 2:44pm — No Comments

9 Python Analytics Libraries

Python & data analytics go hand in hand. Here is a list of 9 Python data analytics libraries. This list is going to be…


Added by Pansop on May 21, 2015 at 4:30am — No Comments

Weekly Digest - May 25

The full version is always published Monday. Starred articles are new additions or updated content, posted between Thursday and Sunday.


  • Webinar: Flipping the 80/20 Rule for Analytics - Hear how Teradata helps businesses flip the 80/20 model so they can spend only 20% preparing and organizing data and 80% on the analytics, accelerating time to value.…

Added by Vincent Granville on May 20, 2015 at 5:30pm — No Comments

100 Best Data Science Companies to Work for in 2015

This is an interesting article recently published in Forbes. The author gathered data from, to rank companies. is a website where employees make comments about, and rate their company, and can even post their job title and salary range. Keep in mind that the author is not a statistician, and his analysis is…


Added by Mirko Krivanek on May 20, 2015 at 10:00am — 2 Comments

What Defines a Big Data Scenario?

Big data is a new marketing term that highlights the everincreasing and exponential growth of data in every aspect of our lives. The term big data originated from within the open-source community, where there was an effort to develop analytics processes that were faster and more scalable than traditional data warehousing, and could extract value from the vast amounts of unstructured and semistructured data produced daily by web users. Consequently, big data origins are tied to web data,…


Added by Khosrow Hassibi on May 20, 2015 at 7:51am — No Comments

How Do I Become a Data Scientist? / Data Science Aspects

I asked myself this question a few months ago. Next I thought: What is the definition of Data Science? So the first thing I started to do is read as many posts on the topic as I could get my hands on and also lookup definitions of related topics such as Data Mining and Machine Learning. Looking at the discussions and posts around Data Science it …


Added by Michael Laux on May 20, 2015 at 5:30am — 1 Comment

Machine Learning Resources for Spam Detection

Spam is a kind of messaging where the cost of sending is usually negligible and the receiver and the ISP pays the cost in terms of bandwidth usage. 

An example of a manual approach to detecting spam is using knowledge engineering. When you are aware of what is spam and what is not, you can usually filter it by creating a set of rules like,

  • If the subject line of an email contains words ‘Buy viagra’ its…


Added by Pansop on May 19, 2015 at 1:00am — 1 Comment

Predictive Analytics Demystified

This 30 minute video aims to demystify predictive analytics and present the IBM SPSS predictive analytics portfolio. The contents of the video are as follows:

  • Evolution of Analytics 5:45
  • Why is Predictive Analytics Important? 11:35
  • Demystifying Predictive Analytics 21:30
  • IBM…

Added by Venky Rao on May 18, 2015 at 11:30am — No Comments

Welcome to Sparkling Land

Note: Opinions expressed are solely my own and do not express the views or opinions of my employer.

As a data scientist who has been munging data and building machine learning models in tools like R, Python and other software(s) (open source and proprietary), I had always longed for a world without technical limitations. A world which would allow me to create data structures (data scientists usually call them vectors, matrices or dataframes) of virtually any…


Added by Fawad Alam on May 18, 2015 at 8:30am — No Comments

Blog Topics by Tags

Monthly Archives







Follow Us



  • Add Videos
  • View All

© 2015   Data Science Central

Badges  |  Report an Issue  |  Terms of Service