Past literature show that the comparisons of classifier's performance are specific to the types of datasets (e.g., Pharmaceutical industry data) used; i.e., some classifiers may perform better in some context than others. A paper titled CDS Rate Construction Methods by Machine Learning Techniques conducts the performance comparison exclusively in the context of financial market by applying a wide range of classifiers to provide solution to so-called Shortage of…Continue
Added by Zhongmin Luo on May 23, 2017 at 1:30am — No Comments
Variable reduction is a crucial step for accelerating model building without losing the potential predictive power of the data. With the advent of Big Data and sophisticated data mining techniques, the number of variables encountered is often tremendous making variable selection or dimension reduction techniques imperative to produce models with acceptable accuracy and generalization. The temptation to build an ecological model using all available information (i.e., all variables) is hard to…Continue
Added by Valiance Solutions on April 21, 2017 at 9:20pm — No Comments
Multiple numeric columns in data and even more techniques at hand to analyse the data, like histograms, ANOVA, mean/median, contingency tables, scatter plots, variance…what to choose for exploratory or descriptive analytics!
Sounds a bit geeky! Let me simplify
This is an everyday scenario faced by an analyst. There are too many numbers and challenge is to communicate the scenario to business folks. Whether its competitive analysis, internal sales analysis,…Continue
Added by saurabh ajmera on March 31, 2017 at 6:00am — No Comments
It is not only about understanding about statistics, it is also about implementing the correct statistical approach or method. In this brief article I will showcase some common statistical blunders that we generally make and how to avoid them.
To make this information simple and consumable I have divided these errors into two parts:
In the wake of ZMOT (Zero Moment of Truth) it becomes pivotal for any product company to choose the most appropriate advertisement channel for the promotion of their products. This not only helps the organizations to maximize their chances of creating the best first impression but will also help them to be discovered by today’s tech savvy consumers.
Today we will talk about a very…Continue
Added by Sunil Kappal on December 30, 2016 at 8:30am — No Comments
Anyone who has followed that last few presidential elections has seen how easily data can go from a hero to a zero. Nate Silver was a genius when his model and analysis successfully predicted Obama's electoral wins with great precision in 2008 and 2012. However, after the 2016 election, Mr. Silver and the value of data fell precipitously from grace in the eyes of political pundits and many voters.
Those of us who know better, well... know better. The problem wasn't with the data or…Continue
With marketing and advertising gaining more space on the internet, big data analytics are playing a prominent role in following the trends on the market and providing users with key statistics.
Data analysis and statistics traditionally play an important role in analyzing the success of companies and brands in the market.
The growth of internet…Continue
Added by Diana Beyer on July 26, 2016 at 12:00am — No Comments
Added by Besim Ismaili on June 1, 2016 at 3:00am — No Comments
The City and County of San Francisco had launched an official open data portal called SF OpenData in 2009 as a product of its official open data program, DataSF. The portal contains hundreds of city datasets for use by developers, analysts, residents and more. Under the category of Public Safety, the portal contains the list of SFPD Incidents since Jan 1, 2003.
In this post I have done an exploratory time-series analysis on the crime incidents dataset to see…
Added by Vimal Natarajan on May 30, 2016 at 7:42am — No Comments
Big data is the humongous amount of information, both structured and unstructured, that is generated from everyday functioning of various sectors across all walks of life. The high volume of big data generated across the world has necessitated the development of technological tools that can make this information utilizable. These tools have taken many form, from the previously used SQL (structured query language) to advanced DBMS (data base management systems). Hadoop is the newest and most…Continue
Added by Ankit Jain on March 8, 2016 at 9:00pm — No Comments
Reading the academic literature Text Analytics seems difficult. However, applying it in practice has shown us that Text Classification is much easier than it looks. Most of the Classifiers consist of only a few lines of code.In this three-part blog series we will examine the three well-known Classifiers; the Naive Bayes, Maximum Entropy and Support Vector Machines. From the…Continue
Added by Ahmet Taspinar on February 15, 2016 at 10:00pm — No Comments
For the past few years, the drumbeat of think pieces about automation taking your job–yes,your job–has gotten both louder and more incessant. Smart people like the folks at …
Recently, I came across with an interesting book on the statistics which has a narration of Ugly Duckling story and correlation of this story with today's DATA or rather BIG DATA ANALYTICS world. This story originally from famous storyteller Hans Christian Andersen
Story goes like this...
The duckling was a big ugly grey bird, so ugly that even a dog would not bite him. The poor duckling…
Added by Manish Bhoge on January 31, 2016 at 12:00pm — No Comments
Data Science is the system used to extract insights from data that’s mined from various sources. Using various techniques including predictive modeling, Data Science helps to analyze and interpret vast amounts of data. The people who apply Data Science to manage large amounts of data are called Data Scientists. Let’s see how Data Science correlates with the…Continue
Added by Vaishnavi Agrawal on January 8, 2016 at 11:30pm — No Comments
This article describes the approach undertaken by data scientists at Axibase to calculate the Cloud Cover using satellite imagery from the Japanese Himawari 8 satellite.
Today, cloud cover is measured using automated weather stations, specifically the ceilometer and the sky imager instruments. The ceilometer is an upward pointed laser that calculates the time required for the laser to reflect from the clouds, determining the height of the cloud base.…Continue
Added by Axibase Corp on October 20, 2015 at 6:21am — No Comments
Forbes magazine has been publishing the list of The World's Most Powerful People since 2009. The number of people in the list is proportional to the global population with the ratio being one slot for every 100 million people on Earth. When the list started in 2009, there were 67 people on the list and the latest list from year 2014 had 72 people. According for…Continue
This blog post was originally published as part of an ongoing series, "Popular Algorithms Explained in Simple English" on the AYLIEN Text Analysis Blog.
Picture added by the…Continue
Ulla B. Mogensen, Hemant Ishwaran, Thomas A. Gerds (2012). Evaluating Random Forests for Survival Analysis Using Prediction Error Curves. Journal of Statistical Software, 50(11), 1-23.
Abstract Prediction error curves are increasingly used to assess and compare predictions in survival analysis. This article surveys the R package pec which provides a set of functions for efficient computation of prediction error…
Added by Diego Marinho de Oliveira on April 10, 2015 at 12:21am — No Comments
From episode 10 of my Naked Analyst Channel on YouTube.
I think I do - and it is the ‘appification’ of analytics. What I mean by this is the reduction of a complex analytic activity such as market segmentation, down to a single button on your computer interface. Very much like the…Continue