Interesting infographics from CrowdFlower. In the hot category, I would add data plumbing, sensor data to better predict Earthquakes, weather or solar flares, predictive analytics for flu and other health or environmental issues, automating data science and man-made statistical analyses, pricing optimization for medical procedures, customized drugs, car traffic optimization via sensor data,…Continue
An article by Vincent Granville posted to Hadoop360 introduces a formal method to generalize the notion of variance based on L^p norms. Whereas the formal generalization suggested in the article did meet several desired criteria, it left other desirable criteria unmet. In particular, there was no formal connection between the generalized variance and an associated generalized mean, and there was…Continue
Linear regression is arguably one of the most widely used techniques in the data science world. But, a comprehensive understanding of this technique is not universal and it is at a level that is…Continue
Interactive Data Visualization or Visual Analytics
"A picture is worth a thousand words" or in the case of Data Science, we could say "A picture is worth a thousand statistics". Interactive Data Visualization or Visual Analytics has become one of the top trends in transforming business intelligence (BI) as technologies based on Visual Analytics have moved into widespread use.
Conventional Charts and Dashboards show conclusions but not the thinking behind it.…Continue
Added by Mark Sharma on December 30, 2014 at 8:49am — No Comments
This applies to data science research as well as any other analytic discipline. For centuries, scientific research was performed in Academia, by university professors managing their own labs. Much of the research was carried out by young scientists who just completed their PhD. The selection process has always favored the same type of personality. The basic rule is "publish or perish" which produces the following drawbacks:
Most statisticians are great professionals, working on various data-intensive projects, and they don't care about their job title. You can say the same about data scientists, and me in particular. However, there is a small cluster of statisticians - Andrew Gelman seems to be their leader and their only influencer - who have been challenging us, even publicly insulting us recently.…Continue
To be more precise, this kind of attack would rely on business hacking, rather than computer hacking. Other attacks, some potentially as massive as to turn Google into the worst search engine, are described below.
The Sony attack
I believe that such an attack could be accomplished by an insider…Continue
When learning data science a lot of people will use sanitized datasets they downloaded from somewhere on the internet, or the data provided as part of a class or book. This is all well and good, but working with “perfect” datasets that are ideally suited to the task prevents them from getting into the habit of checking data for completeness and accuracy.
Out in the real world, while working with data for an employer or client, you will undoubtedly run into issues with data that you…Continue
Every data scientist worth her salt will immediately notice that the biggest Earthquakes (magnitude above 9) took place in the last 60 years or so.
Most journalists, and even some…Continue
Keeping your eye on your competitors is a vital strategy for helping your business grow. By watching what they're doing and looking at their successes and failures, you'll be able to keep a leg up and a competitive edge. That being said, we're going to look a little more in-depth into why you need to be incorporating competitive research into your SEO and digital marketing strategy, some metrics you should be looking at, and actionable results that you can look at to know that…Continue
Added by Robert Cordray on December 22, 2014 at 11:30am — No Comments
Given the right data being correctly collected, and analyzed using sound predictive models, what can be predicted, and what can't be predicted no matter what?
I believe that I have an answer to this question. All systems and processes that rely on some energy source can be predicted, and the other way around. Note that energy…Continue
Before elaborating on my fruitless existence - about my decision to avoid fruit - I want to emphasize how this blog is actually about something that I call the "Fallacy of Rational Prerequisite." There will be some misunderstanding about this term even after my prolonged explanation. I just want to state plainly at the outset that I am not proposing that people become irrational. If they are already so, I am not suggesting that they further the situation.…Continue
Added by Don Philip Faithful on December 20, 2014 at 8:21am — No Comments
When you want to see the face of biased reporting in online news, you may not have to go further than, the satirical news site, The Onion. Titles such as “Media Reports of Bear Attacks May Be Biased”, “Weather Channel Accused of Pro-Weather Bias”, and “Media Criticized for Hometown Sports Reporting” can make us laugh, but they can…Continue
How does the typical data science project life-cycle look like?
This post looks at practical aspects of implementing data science projects. It also assumes a certain level of maturity in big data (more on big data maturity models in the next post) and data science management within the organization. Therefore the life cycle presented here differs, sometimes significantly from purist definitions of 'science' which emphasize the…Continue
On Tuesday 12/16, I attended Pivotal’s Top 10 Data Science Predictions in 2015 webinar.
The webcast was ran by leaders from the Pivotal Data Science team – Annika Jimenez, Kaushik Das and Hulya Farinas – who shared their insights on the key Data Science industry trends for the coming year. The webcast came off as a bit scripted, but one could tell that these three individuals have a passion for Data Science discipline and it’s future.
In this post, I’d like to take a…Continue
Added by Anthony Dutra on December 18, 2014 at 6:56am — No Comments
Guest blog past by Rohit Yadav, from BRIDGEi2i Analytics Solution
The Net (Part 1)
The plot goes something like this – Sandra Bullock plays a computer expert Angela Benett, her life changes when she is sent a program with a crazy glitch to ‘de-bug’. Soon she finds out some vital government information on the disk, things gets nutty as fruitcake, her life becomes a nightmare with her records getting erased and she is given a new identity of some chick with a…Continue
In my consulting work in the Enterprise IT space, I am seeing a definite trend of growing interest in Data Product/Advanced Analytics Design and Development which is becoming increasingly mainstream. Even as I view this a positive, it comes with its own set of perils and pitfalls that will need to be avoided.
Enterprise IT Application Development is often bureaucratic and involves multiple and redundant levels of management through the design, development and testing phases.…Continue
Added by Mark Sharma on December 16, 2014 at 8:30am — No Comments
The top tech companies by market capitalization are IBM, HP , Oracle , Microsoft , Cisco , SAP , EMC , Apple , Amazon and Google
All of the top tech companies are selected based on their current market capitalization with the exception of Yahoo. The year 2014 is not included as part of this analysis.
Data: The source of this data is from the public financial records from SEC.gov
All the sales figures are normalized and reported in USD…Continue
The definition of 'best' depends on which school you follow. Data science and classic statistical science are at the opposite ends of the spectrum. So let's clarify what 'best solution' means in these two opposite contexts:
'Best', according to statistical science: