Professor Bart Baesens is a professor at KU Leuven (Belgium), and a lecturer at the University of Southampton (United Kingdom). He has done extensive research on analytics, customer relationship management, web analytics, fraud detection, and credit risk management. His findings have been published in well-known international journals (e.g. Machine Learning, Management Science, IEEE Transactions on Neural Networks, IEEE Transactions on Knowledge and Data Engineering, IEEE…Continue
Added by Vincent Granville on June 11, 2014 at 10:30am — No Comments
Here are several different ways to leverage Data Science Central for your benefit, at no cost.
Added by Vincent Granville on June 10, 2014 at 8:00pm — No Comments
Here we blended together the best of the best resources posted recently on DSC. It would be great to organize them by category, but for now they are organized by date. This is very useful too, since you are likely to have seen old entries already, and can focus on more recent stuff. We plan to update this reference of references on a regular basis.…Continue
Added by Mirko Krivanek on June 10, 2014 at 3:00pm — No Comments
One of the marvels that the age of data and technology presents is the ability to analyze and determine the minutest of details in the world today. Several of these innovative breakthroughs pass unnoticed under the gaze of daily life. Yet it is this dissemination of data and integration of innovation that is intrinsic the modern world. One field which has risen from the fore of the data deluge is ‘…Continue
Added by Sumit Prasad on June 9, 2014 at 9:51pm — No Comments
When designing a model for a data warehouse we should follow standard pattern, such as gathering requirements, building credentials and collecting a considerable quantity of information about the data or metadata. This helps to figure out the formation and scope of the data warehouse. This model of data warehouse is known as conceptual model. General elements for the model are fact and dimension tables. These tables will be related to each other which will help to identity relationships…Continue
Having looked at the fundamentals in the first blog, the natural next step is to understand the various types of strategies to "attack" the data and make it reveal useful information. However, there is one step we must take just before that: Understand the "enemy" i.e. the problem at hand and the data available.
The Tree of the Data Shinobi:
The tree below is an attempt at categorizing the most commonly…Continue
Added by Amogh Borkar on June 8, 2014 at 2:30am — No Comments
My favourite explanation of the "butterfly effect" so far is as follows: Under particular conditions, even the tiniest movements of a butterfly can trigger storms and hurricanes. This principle is not limited to butterflies, of course. I think that many of us face pivotal moments in life that leave lasting effects. Perhaps no different than other students, I remember running out of cash during my undergraduate years. I consider this my personal butterfly moment. I had no money for food. I…Continue
Added by Don Philip Faithful on June 7, 2014 at 7:33am — No Comments
In this article, we will discuss the so called 'Curse of Dimensionality', and explain why it is important when designing a classifier. In the following sections I will provide an intuitive explanation of this concept, illustrated by a clear example of overfitting due to the curse of dimensionality.
Consider an example in which we have a set of images, each of which depicts either a cat or a dog. We would like to create a classifier that is able to…Continue
Added by Mirko Krivanek on June 6, 2014 at 5:00pm — No Comments
The full version is always published Monday. Starred articles are new additions or updated content, posted between Thursday and Sunday
Added by Vincent Granville on June 4, 2014 at 7:00pm — No Comments
While MongoDB has been the most popular NoSQL database over the past few years, it appears Cassandra is most popular over the past six months. Many assert that Cassandra has superior scalability, better data management features, is faster and MongoDB has more moving parts and complexity to cause…Continue
How is this related to big data and data science, and why is it such a big deal?
It is important big data science in multiple ways. First, data security and encryption relies on algorithms that typically use an encryption key: the key - at the very core of these algorithms - is essentially the product of two very large prime numbers. While there has been new developments to produce different algorithms…Continue
Added by Vincent Granville on June 3, 2014 at 4:30pm — No Comments
Interesting article about the history of the Internet, with some really cool maps. Our upcoming "picture of the week" will come from this article. Check out our most recent weekly digests to discover our previous picture of the week.
Added by Vincent Granville on June 3, 2014 at 4:00pm — No Comments
Many of those who call themselves statisticians just won't admit that data science heavily relies on and uses (heretical, rule-breaking) statistical science, or they don't recognize the true statistical nature of these data science techniques (some are 15-year old), or are opposed to the modernization of their statistical arsenal. They already missed the train when machine learning became a popular discipline (also heavily based on statistics) more than 15 years ago. Now machine learning…Continue