A decade back, when ‘analytics’ was more an esoteric buzzword than an organizational necessity, leaders of fledging analytics divisions focused one key goal - ‘growth’
Today, with increasing acceptance of analytics across sectors and the explosive expected growth in Analytics software and…Continue
Added by Debleena Roy on August 31, 2014 at 5:30pm — No Comments
This blog is about the peculiar nature in which software sometimes gets developed. I hope that many readers will recognize the relevance of data science in the examples taken from my own projects. I propose that development is the product of creativity more than accreditation. Creativity is something complicated that interacts with a person over his or her life circumstances. Many people know how to write . . . sentences and paragraphs. However, the ability to write well does not necessarily…Continue
Added by Don Philip Faithful on August 30, 2014 at 8:59am — No Comments
D3.js is an awesome library for visualization. However, it requires a developer to perform the magic. So far, it has not been so popular with traditional data analysts.
At vida.io, we set out to bring D3.js to less technically savvy users. We want to enable them to create amazing data visualizations with D3.js. Our approach is template.
What is a D3.js Template?
It’s a D3.js visualization that you can reuse by just plugging in new data. For example, one of our…Continue
In this article, I further emphasize the difference between data scientists and other analytic practitioners. I wrote last week that statisticians are to data scientists what astronomers are to physicists: there's some overlap, but less than most people think. Here, I elaborate on this theme.
Other disciplines such as data mining and machine learning…Continue
Added by Vincent Granville on August 29, 2014 at 8:30am — No Comments
Here is a non-exhausting list of curious problems that could greatly benefit from data analysis. If you think you can't get a job as a data scientist (because you only apply to jobs at Facebook, LinkedIn, Twitter or Apple), here's a way to find or create new jobs, broaden your horizons, and make Earth a better world not just for human beings, but for all living creatures. Even beyond Earth indeed. Help us grow this list of 33 problems, to 100+.
The actual number is higher than 33, as…Continue
The success of any big data or data science initiative is determined by the kind of data that you collect, and how you analyze it. In this article, we describe a simple criterion to select great metrics out of dozens, hundreds or even millions of potential predictors - sometimes called features or rules by machine learning professionals, or independent variables, by statisticians.…Continue
Added by Vincent Granville on August 26, 2014 at 3:30pm — No Comments
*Note this was originally posted at Leada's Blog
When we first began working on Leada, we sought to better understand the data science industry by interviewing professionals in the field. As students simply wanting to learn more about data science, we ultimately created a free resource to inform both undergraduates and professionals about the data science industry. We accomplished this by…Continue
Added by Brian Liou on August 26, 2014 at 9:00am — No Comments
Businesses have always faced security threats, whether is someone breaking in to steal some cash or equipment, a disgruntled employee selling company secrets, or something else altogether, there has always been cause to be careful as a business owner.
However, with the evolution of technology, and the digital age coming into full swing, security threats have evolved as well, and…Continue
Added by Beau Winchester on August 25, 2014 at 11:51am — No Comments
In order to write a tutorial about classification, it was necessary to find an example that was broad enough that it would need to be sub-divided. Since I actually care about whether you remember this stuff, it needed to be something that a lot of people like and would relate to. And since I have a lot of international subscribers, it needed to be cross-cultural as well. So what is universal, cross-cultural, and dearly loved?
There’s American beer,…Continue
This rudimentary statistics textbook, entitled Statistics: The Art and Science of Learning from Data (3rd Edition), sells on Amazon for $157.79. Not sure if everyone sees the same price as me (maybe prices are user-customized), if price changes over time, but it seems stable. Below is a screenshot.
Surprisingly, this book is meant for first-year…Continue
The Zipf's law states that in many settings (that we are going to explore), the volume or size of entities is inversely proportional to a power s (s > 0) of their ranking. This has important implications in predictive modeling, discussed below. The processes that create this type of dynamic are not well understood. It is the purpose of this article to explain the underlying mechanics. The traditional example for the Zipf distribution is the distribution of Internet…Continue
Added by Ryan Montano on August 21, 2014 at 9:30am — No Comments
Over the years, our…Continue
Added by Vincent Granville on August 20, 2014 at 5:30pm — No Comments
Spencer Greenberg holds a B.S. Magna Cum Laude in Applied Mathematics & Computer Science, from Columbia University, and a Ph D. in Machine Learning, from NYU. Prior to Rebellion Research, he was Software Developer, Neuberger Berman, LLC and Engineer in The Investigative Project for Terrorism. Spencer has been interviewed on CNBC, Bloomberg News, Canada’s BNN, and in the Wall Street Journal. He has also lectured at Columbia…Continue
Added by Vincent Granville on August 20, 2014 at 5:00pm — No Comments
Summary: Gartner says that predictive analytics is a mature technology yet only one company in eight is currently utilizing this ability to predict the future of sales, finance, production, and virtually every other area of the enterprise. What’s holding them back?
In an earlier posting we argued that much of what is holding companies back from…Continue
This article presents various ways of measuring the popularity or market share of software for analytics including: Alpine, Alteryx, Angoss, C / C++ / C#, BMDP, FICO, IBM SPSS Statistics, IBM SPSS Modeler, InfoCentricity Xeno, Java, JMP, KNIME, Lavastorm, Mathworks’ MATLAB, Megaputer’s PolyAnalyst, Minitab, NCSS, Python, R, RapidMiner, SAS, SAS Enterprise Miner, Salford Predictive Modeler (SPM) etc., SAP KXEN, TIBCO Spotfire, Stata, Statistica, Systat, WEKA / Pentaho. …Continue
Many years ago, I attended a vocational college to learn skilled trade. I was taught about the behaviour of systems. I learned that after renovations to a house, the furnace might cycle on and off more frequently; this can leave some parts of the house too cold. A wood-burning stove or fireplace should be treated as a part of a system. Open doors and windows in the dwelling can cause exhaust from such appliances to enter living spaces. I realize that these particular examples of systems…Continue
Added by Don Philip Faithful on August 16, 2014 at 8:31am — No Comments
Interesting comparison table and comments, regarding the following statistical packages: R, MATLAB, SAS, STATA and SPSS. I wish Statistica would be included. The table tells you which statistical methods are available in each package. The list of statistical methods is itself impressive. Note that Jackknife (a resampling method in the table below) has nothing to do with …Continue
Added by Debleena Roy on August 13, 2014 at 8:11pm — No Comments