Over the years I’ve often been asked by beginners where they should start in statistics, what they should do first, and which parts of statistics they should prioritise to get them to where they want to be (which is usually a higher…Continue
Added by Lee Baker on May 26, 2020 at 6:30am — No Comments
The explosion of data in the world – right from the data collected from the cameras to the data gathered from visitors’ actions on websites – is staggering. With new types of data pouring in and the applications of data analysis becoming vast, companies need to regulate the unprecedented data.The explosion of data in the world – right from the data collected from the cameras to the data gathered from visitors’ actions on websites – is staggering. With new types of data…Continue
Added by Divyesh Aegis on October 18, 2019 at 1:00am — No Comments
Excel is often poorly regarded as a platform for regression analysis. The regression add-in in its Analysis Toolpak has not changed since it was introduced in 1995, and it was a flawed design even back then. (See this link for a discussion.) That’s unfortunate, because an Excel file can be a very good place in which to build regression models, compare and refine them, create…Continue
Added by Robert Nau on July 21, 2019 at 7:00am — No Comments
Summary: Finally there are tools that let us transcend ‘correlation is not causation’ and identify true causal factors and their relative strengths in our models. This is what prescriptive analytics was meant to be.
Just when I thought we’d figured it all out, something comes along to make…Continue
BigQuery is Google’s serverless, highly scalable, enterprise data warehouse designed to make all your data analysts productive at an unmatched price-performance. Because there is no infrastructure to manage, you can focus on analyzing data to find meaningful insights using familiar SQL without the need for a database administrator.
Analyze all your data by…Continue
Added by satyajit maitra on March 22, 2019 at 3:49am — No Comments
In a previous blog-post we have seen how we can use Signal Processing techniques for the classification of time-series and signals.
A very short summary of that post is: We can use the Fourier Transform to transform a signal from its time-domain to its frequency domain. The peaks in the frequency spectrum indicate the most…Continue
Added by Ahmet Taspinar on December 20, 2018 at 9:30pm — No Comments
By Gunnar Carlsson
December 3, 2018
Added by Jonathan Symonds on December 4, 2018 at 3:00pm — No Comments
There is a library called threading in Python and it uses threads (rather than just processes) to implement parallelism. This may be surprising news if you know about the Python's Global Interpreter Lock, or GIL, but it actually works well for certain instances without violating the GIL. And this is all done without any overhead -- simply define…Continue
This is an analysis of the Kaggle 2018 survey dataset. In my analysis I am trying to understand the similarities and differences between men and women users from US and India, since these are the two biggest segments of the respondent population. The number of respondents who chose something other than Male/Female is quite low, so I excluded that subset as well.
The complete code is available as a …Continue
What do you do before purchasing something that costs more than a pack of gum? Whether you want to treat yourself to new sneakers, a laptop, or an overseas tour, processing an order without checking out similar products or offers and reading reviews doesn’t make much sense anymore. Thanks to comment sections on eCommerce sites, social nets, review platforms, or dedicated forums, you can learn a ton about a product or service and evaluate whether it’s a good value for money. Other customers,…Continue
Added by Kateryna Lytvynova on October 30, 2018 at 12:45am — No Comments
Added by Matthew Gierc on October 4, 2018 at 7:00am — No Comments
In this article we discuss popularity of various software programs used for data analysis which are mentioned in various reviews published online in the period between 2017 and 2018. We used 14 reviews listed in the article Popularity of software programs for data…Continue
Added by jwork.ORG on September 6, 2018 at 6:00pm — No Comments
Deep neural nets typically operate on “raw data” of some kind, such as images, text, time series, etc., without the benefit of “derived” features. The idea is that because of their flexibility, neural networks can learn the features relevant to the problem at hand, be it a classification problem or an estimation problem. Whether derived or learned, features are important. The challenge is in determining how one might use what one learned from the features in future work (staying…Continue
Added by Jonathan Symonds on August 30, 2018 at 7:00am — No Comments
In my earlier post I discussed how performing topological data analysis on the weights learned by convolutional neural nets (CNN’s) can give insight into what is being learned and how it is being learned.
The significance of this work can be summarized as follows:
Added by Jonathan Symonds on August 9, 2018 at 11:30am — No Comments
A smoothly running sensor data analytics tool may be just as difficult to manage as a symphony orchestra. Because every musician in an orchestra – and every part of an IoT system – needs to work properly and ‘harmonize’ with the others. But how do conductors make their orchestras work so nicely and sound so heavenly instead of creating a mismanaged cacophony? Obviously, there’s a lot of practice involved. But besides that, they definitely know what pitfalls they need to avoid. Which is why,…Continue
Added by imranali on July 7, 2018 at 4:30am — No Comments
The main components of systems theory that readers might remember are “inputs,” “processes,” and “outputs.” The part that tends to get neglected is “feedback mechanisms.” These mechanisms tell the system the extent to which operations fit the environment. If there is lack of fitness, there is stress. One adaptive impulse is to make processes more complex and intelligent - i.e. sometimes described as the fight response. Another impulse is to give up and run away - i.e. the flight…Continue
TLDR: Neural Networks are powerful but complex and opaque tools. Using Topological Data Analysis, we can describe the functioning and learning of a convolutional neural network in a compact and understandable way. The implications of the finding are profound and can accelerate the development of a wide range of applications from self-driving everything to GDPR.
Neural networks have demonstrated a great…
Added by Jonathan Symonds on June 21, 2018 at 9:30am — No Comments
R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises.
Learn the fundamentals of data analysis in the second edition of Data Analysis with R, authored by data scientist…Continue
Added by Packt Publishing on May 8, 2018 at 10:30pm — No Comments
Sometimes when dealing with performance metrics, there are contradictory signals. For instance, although both are desirable, it is common for efficiency and efficacy to be in opposition. An agent in a call centre can handle lots of calls while at the same time getting few sales; this is especially true if the agent’s main objective is to do lots of calls. This is a highly efficient person albeit unsuccessful in terms of expanding the business. Conversely, another agent by spending a…Continue
Added by Don Philip Faithful on May 6, 2018 at 3:30am — No Comments
Cambridge Analytica’s wholesale scraping of Facebook user data is big news now, and people are “shocked” that personal data is being shared and traded on a massive scale on the internet. But the real issue with social media is not harm to individual users whose information was shared, but sophisticated and sometimes subtle mass manipulation of social and political behavior by bad actors, facilitated by deceit, fraud, and amplification of lies that spread easily through societal…Continue