Many statistics, such as correlations or R-squared, depend on the sample size, making it difficult to compare values computed on two data sets of different sizes. Here, we address this issue.
Below is an example with 20 observations. The 10 last observations (the second half of the data set) is a mirror of the first 10, and the two correlations, computed on each subset, are identical and equal to 0.30. The full correlation computed on the 20 observations is 0.85.…Continue
Added by Vincent Granville on June 2, 2019 at 7:30am — No Comments
My first computer was a Commodore Vic-20 in 1981. I bought the device because of this incredible urge to program in BASIC as a result of Mr. Ted Becker’s course on computer programming. I vaguely remember the leap from the pain-staking process of programming using punch cards to writing code and watching your program run immediately, once you resolved all of the syntax errors of course. Nonetheless, it was thrilling and addictive! In hindsight, a…Continue
Added by Richard Charles, PhD on June 2, 2019 at 12:00am — No Comments
The analysis and classification of ordinal categorical data are central in most scientific domains and ubiquitous in governments and businesses.
Examples of ordinal data are either found in questionnaires for measuring opinions or self-reported health status. A well-known example of ordinal data is the Likert Scale 
(DISLIKE = 1, DISLIKE SOMEWHAT…Continue
Added by Ludovico Pinzari on June 1, 2019 at 3:35pm — No Comments
Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, follow this link.
Added by Vincent Granville on June 1, 2019 at 3:00pm — No Comments
I recently wrote a blog "Interweaving Design Thinking and Data Science to Unleash Economic Value of Data" that discussed the power of interweaving Design Thinking and Data Science to make our analytic efforts more effective. Our approach was validated by a recentMcKinsey article titled “Fusing data and…Continue
Added by Bill Schmarzo on June 1, 2019 at 9:35am — No Comments
Added by Stephanie Glen on May 31, 2019 at 8:00am — No Comments
In a world where we generate data at an extremely fast rate, the correct analysis of the data and providing useful and meaningful results at the right time can provide helpful solutions for many domains dealing with data products. We can apply this in Health Care and Finance to Media, Retail, Travel Services and etc. some solid examples include Netflix providing personalized recommendations at real-time, Amazon tracking your interaction with different products on its…Continue
Added by Divya Singh on May 30, 2019 at 8:00pm — No Comments
Here is our selection of featured articles and technical resources posted since Monday.
Added by Vincent Granville on May 30, 2019 at 10:30am — No Comments
For many new data scientists transitioning into AI and deep learning, the Keras…Continue
Added by ODSC on May 30, 2019 at 10:00am — No Comments
I pulled out a dusty copy of Thinking Stats by Allen Downey the other day. I highly recommend this terrific little read that teaches statistics with easily understood examples using Python. When I purchased the book eight years ago, the Python code proved invaluable as…Continue
Added by steve miller on May 30, 2019 at 7:56am — No Comments
The energy industry is undergoing a rapid transformation in recent past owing to the enhanced role of renewables and enhanced data-driven models making the value chain smarter. In the context of the primary constituents of this sector comprising of coal, power, renewables, solar energy, oil, and gas, there is a huge role AI can play.
We illustrate some key use cases below:
1. Smart Grid
The biggest disruption in power in recent times is in the smart grid…Continue
Added by Mahesh Kumar CV on May 30, 2019 at 5:02am — No Comments
Data science is a multidisciplinary blend of data inference, algorithm development, and technology in order to solve analytically complex problems. At the core is data. Troves of raw information, streaming in and stored in enterprise data warehouses. Much to learn by mining it. Advanced capabilities we can build with it. Data science is ultimately about using this data in creative ways to generate business value
The broader fields of understanding what data…Continue
Added by Divya Singh on May 29, 2019 at 8:00pm — No Comments
The future offers interesting and exciting times ahead for most businesses. With data being a big influencer in the enterprise of the future, it is a matter of time before we jump into the era of intelligent enterprises.
Intelligent enterprises are going to be…Continue
Added by Ronald van Loon on May 29, 2019 at 7:26pm — No Comments
What is the factory of the future? Is it a synonym to Industry 4.0, or is it a different concept in its own right? Industry 4.0 and the factory of the future might sound similar, but they are different in some ways. To begin with, the factory of the future is an elusive concept that isn’t as common as Industry 4.0.
The factory of the future is…Continue
Added by Ronald van Loon on May 29, 2019 at 6:32pm — No Comments
After testing this idea for the last few months, we have formally launched this concept
The idea of ‘Data Science Coding in a weekend’ originated from meetups we conducted in London
The idea is simple but effective
We choose a complex section of code and try to learn it in detail over…Continue
Added by ajit jaokar on May 29, 2019 at 7:52am — No Comments
The analytics market is booming, and so is the use of the keyword – Data Science. Professionals from different disciplines are using data in their day to day activities, and feel the need to master the start-of-the-art technology in order to get maximum insights from the data, and subsequently help the business to grow.
Moreover, there are professionals who want to keep them updated with this latest skills such as Machine Learning, Deep Learning, Data Science, and so either to elevate…Continue
Added by Divya Singh on May 28, 2019 at 10:19pm — No Comments
Here is a simple trick that can solve a lot of problems.
You can not trust a linear or logistic regression performed on data if the error term (residuals) are auto-correlated. There are different approaches to de-correlate the observations, but they usually involve introducing a new matrix to take care of the resulting bias. See for instance here. …Continue
Added by Vincent Granville on May 28, 2019 at 9:30am — No Comments
Added by Max Ved on May 27, 2019 at 11:02pm — No Comments
This simple introduction to matrix theory offers a refreshing perspective on the subject. Using a basic concept that leads to a simple formula for the power of a matrix, we see how it can solve time series, Markov chains, linear regression, data reduction, principal components analysis (PCA) and other machine learning problems. These problems are usually solved with more advanced matrix calculus, including eigenvalues, diagonalization, generalized inverse matrices, and other types of matrix…Continue
Added by Vincent Granville on May 27, 2019 at 2:00pm — No Comments
Summary: Real Time Computer Vision (RTCV) that requires processing video DNNs at the edge is likely to be the next killer app that powers a renewed love affair with our mobile devices. The problem is that current GPUs won’t cut it and we have to wait once again for the hardware to catch up.