Featured Blog Posts – December 2014 Archive (40)

What’s Hot & What’s Not in Data Science 2015

Interesting infographics from CrowdFlower. In the hot category, I would add data plumbing, sensor data to better predict Earthquakes, weather or solar flares, predictive analytics for flu and other health or environmental issues, automating data science and man-made statistical analyses, pricing optimization for medical procedures, customized drugs, car traffic optimization via sensor data,…


Added by Mirko Krivanek on December 31, 2014 at 7:30pm — 2 Comments

Rigorous Generalized L^p Variance

An article by Vincent Granville posted to Hadoop360 introduces a formal method to generalize the notion of variance based on L^p norms. Whereas the formal generalization suggested in the article did meet several desired criteria, it left other desirable criteria unmet. In particular, there was no formal connection between the generalized variance and an associated generalized mean, and there was…


Added by Bryan M. Gorman on December 31, 2014 at 12:40pm — 1 Comment

Understanding Linear Regression

Abstract: Although Linear Regression is arguably one of the most popular analytical techniques, I believe it isn’t understood well. Several fundamental assumptions are violated during application. The objective of this note is to provide an overview of the assumptions and possible fixes.

Linear regression is arguably one of the most widely used techniques in the data science world. But, a comprehensive understanding of this technique is not universal and it is at a level that is…


Added by Jeevan Kumar R on December 30, 2014 at 9:00am — 3 Comments

Interactive Visualization enabled Feature Selection and Model Creation

Interactive Data Visualization or Visual Analytics

"A picture is worth a thousand words" or in the case of Data Science, we could say "A picture is worth a thousand statistics". Interactive Data Visualization or Visual Analytics has become one of the top trends in transforming business intelligence (BI) as technologies based on Visual Analytics have moved into widespread use.

Conventional Charts and Dashboards show conclusions but not the thinking behind it.…


Added by Mark Sharma on December 30, 2014 at 8:49am — No Comments

New Model for Scientific Research

This applies to data science research as well as any other analytic discipline. For centuries, scientific research was performed in Academia, by university professors managing their own labs. Much of the research was carried out by young scientists who just completed their PhD. The selection process has always favored the same type of personality. The basic rule is "publish or perish" which produces the following drawbacks:

  • Re-use of old material (rather than brand new material)…

Added by Vincent Granville on December 29, 2014 at 7:00pm — 13 Comments

Some statisticians have a biased view on data science

Most statisticians are great professionals, working on various data-intensive projects, and they don't care about their job title. You can say the same about data scientists, and me in particular. However, there is a small cluster of statisticians - Andrew Gelman seems to be their leader and their only influencer - who have been challenging us, even publicly insulting us recently.…


Added by Vincent Granville on December 28, 2014 at 9:00pm — 10 Comments

Engineering a far worse attack than Sony, without hacking

To be more precise, this kind of attack would rely on business hacking, rather than computer hacking. Other attacks, some potentially as massive as to turn Google into the worst search engine, are described below.

The Sony attack

I believe that such an attack could be accomplished by an insider…


Added by Vincent Granville on December 28, 2014 at 4:00pm — 1 Comment

Common Problems with Data

When learning data science a lot of people will use sanitized datasets they downloaded from somewhere on the internet, or the data provided as part of a class or book. This is all well and good, but working with “perfect” datasets that are ideally suited to the task prevents them from getting into the habit of checking data for completeness and accuracy.

Out in the real world, while working with data for an employer or client, you will undoubtedly run into issues with data that you…


Added by Randal Scott King on December 28, 2014 at 11:30am — 1 Comment

Data Science Meets Bubbly: What Data Says About Champagne Buying Patterns

Everyone loves champagne, right? But what strongly influences people’s behavior to purchase that bottle of bubbly? A growing body of research literature has found that a number of factors, including…


Added by Renette Youssef on December 24, 2014 at 3:00pm — 4 Comments

Are Earthquakes becoming more severe?

Every data scientist worth her salt will immediately notice that the biggest Earthquakes (magnitude above 9) took place in the last 60 years or so.

Northridge Earthquake

Most journalists, and even some…


Added by Vincent Granville on December 23, 2014 at 11:30am — 1 Comment

Actionable Insights from Competitive Research

Keeping your eye on your competitors is a vital strategy for helping your business grow. By watching what they're doing and looking at their successes and failures, you'll be able to keep a leg up and a competitive edge. That being said, we're going to look a little more in-depth into why you need to be incorporating competitive research into your SEO and digital marketing strategy, some metrics you should be looking at, and actionable results that you can look at to know that…


Added by Robert Cordray on December 22, 2014 at 11:30am — No Comments

What can be predicted, and what can't?

Given the right data being correctly collected, and analyzed using sound predictive models, what can be predicted, and what can't be predicted no matter what?

I believe that I have an answer to this question. All systems and processes that rely on some energy source can be predicted, and the other way around. Note that energy…


Added by Vincent Granville on December 21, 2014 at 8:00pm — 1 Comment

Fallacy of Rational Prerequisite & My Fruitless Existence

Before elaborating on my fruitless existence - about my decision to avoid fruit - I want to emphasize how this blog is actually about something that I call the "Fallacy of Rational Prerequisite." There will be some misunderstanding about this term even after my prolonged explanation. I just want to state plainly at the outset that I am not proposing that people become irrational. If they are already so, I am not suggesting that they further the situation.…


Added by Don Philip Faithful on December 20, 2014 at 8:21am — No Comments

Why Media Bias Has Nowhere to Run and Hide from Data Science

When you want to see the face of biased reporting in online news, you may not have to go further than, the satirical news site, The Onion. Titles such as “Media Reports of Bear Attacks May Be Biased”, “Weather Channel Accused of Pro-Weather Bias”, and “Media Criticized for Hometown Sports Reporting” can make us laugh, but they can…


Added by Renette Youssef on December 19, 2014 at 10:46am — 1 Comment

The data science project lifecycle

How does the typical data science project life-cycle look like?

This post looks at practical aspects of implementing data science projects. It also assumes a certain level of maturity in big data (more on big data maturity models in the next post) and data science management within the organization. Therefore the life cycle presented here differs, sometimes significantly from purist definitions of 'science' which emphasize the…


Added by Maloy Manna on December 18, 2014 at 2:39pm — 6 Comments

Key Takeaways: Pivotal’s Top 10 2015 Predictions

On Tuesday 12/16, I attended Pivotal’s Top 10 Data Science Predictions in 2015 webinar.

The webcast was ran by leaders from the Pivotal Data Science  team – Annika Jimenez, Kaushik Das and Hulya Farinas – who shared their insights on the key Data Science industry trends for the coming year. The webcast came off as a bit scripted, but one could tell that these three individuals have a passion for Data Science discipline and it’s future.

In this post, I’d like to take a…


Added by Anthony Dutra on December 18, 2014 at 6:56am — No Comments

The Future of Big Data is Wearables

Guest blog past by Rohit Yadav, from BRIDGEi2i Analytics Solution

The Net (Part 1)

The plot goes something like this – Sandra Bullock plays a computer expert Angela Benett, her life changes when she is sent a program with a crazy glitch to ‘de-bug’. Soon she finds out some vital government information on the disk, things gets nutty as fruitcake, her life becomes a nightmare with her records getting erased and she is given a new identity of some chick with a…


Added by Vincent Granville on December 16, 2014 at 7:30pm — 2 Comments

Rules for building a Data Product in IT organizations

In my consulting work in the Enterprise IT space, I am seeing a definite trend of growing interest in Data Product/Advanced Analytics Design and Development which is becoming increasingly mainstream. Even as I view this a positive, it comes with its own set of perils and pitfalls that will need to be avoided.  

Enterprise IT Application Development is often bureaucratic and involves multiple and redundant levels of management through the design, development and testing phases.…


Added by Mark Sharma on December 16, 2014 at 8:30am — No Comments

Data Visualization of Employee metrics at the top Tech companies

The top tech companies by market capitalization are IBM, HP , Oracle , Microsoft , Cisco , SAP , EMC , Apple , Amazon and Google

All of the top tech companies are selected based on their current market capitalization with the exception of Yahoo. The year 2014 is not included as part of this analysis.


Data: The source of this data is from the public financial records from SEC.gov


All the sales figures are normalized and reported in USD…


Added by Nilesh Jethwa on December 15, 2014 at 11:01am — 8 Comments

Best solution to a problem: data science versus statistical paradigm

The definition of 'best' depends on which school you follow. Data science and classic statistical science are at the opposite ends of the spectrum. So let's clarify what 'best solution' means in these two opposite contexts:

'Best', according to statistical science:

  • It usually means the global maximum of a mathematical optimization problem
  • The objective function involved is usually a maximum likelihood function, KS, c-statistics, or some function…

Added by Vincent Granville on December 14, 2014 at 8:30pm — 8 Comments

Featured Monthly Archives












© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service