Subscribe to DSC Newsletter

All Blog Posts (6,145)

Simple Trick to Normalize Correlations, R-squared, and so on

Many statistics, such as correlations or R-squared, depend on the sample size, making it difficult to compare values computed on two data sets of different sizes. Here, we address this issue.

Below is an example with 20 observations. The 10 last observations (the second half of the data set) is a mirror of the first 10, and the two correlations, computed on each subset, are identical and equal to  0.30. The full correlation computed on the 20 observations is 0.85.…


Added by Vincent Granville on June 2, 2019 at 7:30am — No Comments

The Call for a New Device for Data Scientists

My first computer was a Commodore Vic-20 in 1981. I bought the device because of this incredible urge to program in BASIC as a result of Mr. Ted Becker’s course on computer programming. I vaguely remember the leap from the pain-staking process of programming using punch cards to writing code and watching your program run immediately, once you resolved all of the syntax errors of course. Nonetheless, it was thrilling and addictive! In hindsight, a…


Added by Richard Charles, PhD on June 2, 2019 at 12:00am — No Comments

The Homogeneity and Location Index: An open-source Statistical Framework for the classification of ordinal categorical data

The analysis and classification of ordinal categorical data are central in most scientific domains and ubiquitous in governments and businesses.

Examples of ordinal data are either found in questionnaires for measuring opinions or self-reported health status. A well-known example of ordinal data is the Likert Scale [1]



Added by Ludovico Pinzari on June 1, 2019 at 3:35pm — No Comments

Data Science Central Monday Digest, June 3

Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, follow this link.  


  • The NEW M.S.…

Added by Vincent Granville on June 1, 2019 at 3:00pm — No Comments

How DevOps Drives Analytics Operationalization and Monetization

I recently wrote a blog "Interweaving Design Thinking and Data Science to Unleash Economic Value of Data"  that discussed the power of interweaving Design Thinking and Data Science to make our analytic efforts more effective.  Our approach was validated by a recentMcKinsey article titled “Fusing data and…


Added by Bill Schmarzo on June 1, 2019 at 9:35am — No Comments

R-Squared in One Picture

R-squared measures how well your data fits a regression line. More specifically, it's how much variation in the…


Added by Stephanie Glen on May 31, 2019 at 8:00am — No Comments

Apache Spark Streaming Tutorial for Beginners


In a world where we generate data at an extremely fast rate, the correct analysis of the data and providing useful and meaningful results at the right time can provide helpful solutions for many domains dealing with data products. We can apply this in Health Care and Finance to Media, Retail, Travel Services and etc. some solid examples include Netflix providing personalized recommendations at real-time, Amazon tracking your interaction with different products on its…


Added by Divya Singh on May 30, 2019 at 8:00pm — No Comments

Data Science Central Thursday Digest, May 30

Here is our selection of featured articles and technical resources posted since Monday.



Added by Vincent Granville on May 30, 2019 at 10:30am — No Comments

Getting to Know Keras for New Data Scientists

This article was originally published on, written by Daniel Gutierrez.

For many new data scientists transitioning into AI and deep learning, the Keras…


Added by ODSC on May 30, 2019 at 10:00am — No Comments

Simulated Significance

I pulled out a dusty copy of Thinking Stats by Allen Downey the other day. I highly recommend this terrific little read that teaches statistics with easily understood examples using Python. When I purchased the book eight years ago, the Python code proved invaluable as…


Added by steve miller on May 30, 2019 at 7:56am — No Comments

Harnessing Potential of Artificial Intelligence In Energy and Oil & Gas

The energy industry is undergoing a rapid transformation in recent past owing to the enhanced role of renewables and enhanced data-driven models making the value chain smarter. In the context of the primary constituents of this sector comprising of coal, power, renewables, solar energy, oil, and gas, there is a huge role AI can play.

We illustrate some key use cases below:

1. Smart Grid

The biggest disruption in power in recent times is in the smart grid…


Added by Mahesh Kumar CV on May 30, 2019 at 5:02am — No Comments

Basic Statistics Concepts Every Data Scientist Should know


Data science is a multidisciplinary blend of data inference, algorithm development, and technology in order to solve analytically complex problems. At the core is data. Troves of raw information, streaming in and stored in enterprise data warehouses. Much to learn by mining it. Advanced capabilities we can build with it. Data science is ultimately about using this data in creative ways to generate business value

The broader fields of understanding what data…


Added by Divya Singh on May 29, 2019 at 8:00pm — No Comments

Build Your Intelligent Enterprise through a Data Fabric

The future offers interesting and exciting times ahead for most businesses. With data being a big influencer in the enterprise of the future, it is a matter of time before we jump into the era of intelligent enterprises.

Intelligent enterprises are going to be…


Added by Ronald van Loon on May 29, 2019 at 7:26pm — No Comments

6 Important Steps to Building a Successful Factory of the Future

What is the factory of the future? Is it a synonym to Industry 4.0, or is it a different concept in its own right? Industry 4.0 and the factory of the future might sound similar, but they are different in some ways. To begin with, the factory of the future is an elusive concept that isn’t as common as Industry 4.0.

The factory of the future is…


Added by Ronald van Loon on May 29, 2019 at 6:32pm — No Comments

Data science Coding in a weekend series of books …

After testing this idea for the last few months, we have formally launched this concept


The idea of ‘Data Science Coding in a weekend’ originated from meetups we conducted in London


The idea is simple but effective


We choose a complex section of code and try to learn it in detail over…


Added by ajit jaokar on May 29, 2019 at 7:52am — No Comments

10 Areas of Expertise in Data Science

The analytics market is booming, and so is the use of the keyword – Data Science. Professionals from different disciplines are using data in their day to day activities, and feel the need to master the start-of-the-art technology in order to get maximum insights from the data, and subsequently help the business to grow.

Moreover, there are professionals who want to keep them updated with this latest skills such as Machine Learning, Deep Learning, Data Science, and so either to elevate…


Added by Divya Singh on May 28, 2019 at 10:19pm — No Comments

Simple Trick to Remove Serial Correlation in Regression Models

Here is a simple trick that can solve a lot of problems.

You can not trust a linear or logistic regression performed on data if the error term (residuals) are auto-correlated. There are different approaches to de-correlate the observations, but they usually involve introducing a new matrix to take care of the resulting bias. See for instance here.  …


Added by Vincent Granville on May 28, 2019 at 9:30am — No Comments

Top AI algorithms for Healthcare

The …


Added by Max Ved on May 27, 2019 at 11:02pm — No Comments

Gentle Approach to Linear Algebra, with Machine Learning Applications

This simple introduction to matrix theory offers a refreshing perspective on the subject. Using a basic concept that leads to a simple formula for the power of a matrix, we see how it can solve time series, Markov chains, linear regression, data reduction, principal components analysis (PCA) and other machine learning problems. These problems are usually solved with more advanced matrix calculus, including eigenvalues, diagonalization, generalized inverse matrices, and other types of matrix…


Added by Vincent Granville on May 27, 2019 at 2:00pm — No Comments

Real Time Computer Vision is Likely to be the Next Killer App but We’re Going to Need New Chips

Summary:  Real Time Computer Vision (RTCV) that requires processing video DNNs at the edge is likely to be the next killer app that powers a renewed love affair with our mobile devices.  The problem is that current GPUs won’t cut it and we have to wait once again for the hardware to catch up.


 The entire…


Added by William Vorhies on May 27, 2019 at 8:47am — 1 Comment

Blog Topics by Tags

Monthly Archives












  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service