Inspired by the development of semantic technologies in recent years, in statistical analysis field the traditional methodology of designing, publishing and consuming statistical datasets is evolving to so-called “Linked Statistical Data” by associating semantics with dimensions, attributes and observation values based on Linked Data design principles.
The representation of datasets is no longer a combination of magic words and numbers. Everything is becoming…Continue
Added by Andreas Blumauer on September 9, 2015 at 11:23pm — No Comments
R has become a massively popular language for data mining and predictive model building with over two million users worldwide. The wide adoption of R has to do with the fact that it is available as open source, runs on most technology platforms and is commonly taught in academic institutions in courses with significant components of data science, machine learning and statistics. A recent study found that R is now cited in academic papers more often then SAS and SPSS, a change from previous…Continue
How AsterR is used in the Data Discovery Process?
AsterR is a Teradata produced package installed within the R client application. This package is distinct from, but complements, the installation of R within Aster. Together the AsterR package and the R installation into Aster create a rich environment that provides the R user with the normal look and feel of R while maintaining the power and speed of Aster. There is a great deal of…Continue
Added by John Thuma on June 22, 2015 at 6:00am — No Comments
In this post, I will cover in-depth a Big Data use case: monitoring and forecasting air pollution.
A typical Big Data use case in the modern Enterprise includes the collection and storage of sensor data, executing data analytics at scale, generating forecasts, creating visualization portals, and automatically raising alerts in the case of abnormal deviations or threshold breaches.
This article will focus on an implemented use case: monitoring and analyzing air quality…Continue
Added by Axibase Corp on June 2, 2015 at 6:00am — No Comments
Given below is a list of R functions for quickly exploring the key attributes of the data set. The data set is based on car prices & insurance…Continue
Buzz words are one of my least favorite things, but as buzz words go, I can appreciate the term “Data Lake.” It is one of the few buzz words that communicates a meaning very close to its intended definition. As you might imagine, with the advent of large scale data processing, there would be a need to name the location where lots of data resides, ergo, data lake. I personally prefer to call it a series of redundant commodity servers with Direct-Attached Storage, or hyperscale computing with…Continue
We are all very fortunate to be alive during this exciting time in history. Some truly disruptive technologies are on the verge of exploding into reality and it is difficult to imagine what the future holds. With these new technologies, however, we must not ignore the technically sound practices that allowed us to reach this point – managing data integrity is one of those practices.
As promised from my last post, I will discuss the importance of data integrity in the…
Added by Randall Shane on May 2, 2015 at 4:30pm — No Comments
Machine learning algorithms are parameterized so that they can be best adapted for a given problem. A difficulty is that configuring an algorithm for a given problem can be a project in and of itself.
Like selecting ‘the best’ algorithm for a problem you cannot know before hand which algorithm parameters will be best for a problem. The best thing to do is to investigate empirically with controlled experiments.
The caret R package was designed to make finding…Continue
Added by Diego Marinho de Oliveira on April 7, 2015 at 6:41am — No Comments
I have a query around whether to learn R from scratch or should I leverage my basic python knowledge to extend into Data Science with scikit,numpy ,pandas? So I am bit confused ... I am not shy to learn New programming language like R etc bur really need to know who edges out whom in market. Maybe i should learn R too along with Python so your valuable opinion matters.
Also i am playing around with IBM's MessageSight product for Internet of things so…Continue
I've been writing a Tableau and Alteryx-focused blog for 1.5 years on Wordpress and haven't thought of writing anything here on DSC. I just completed a two-part series that discusses solving problems using innovative approaches with Alteryx and Tableau, which were my 99th and 100th blog posts. They are longer than usual but offer a good insight into my background and why I write a technical blog.
My blog is focused…Continue
From episode 10 of my Naked Analyst Channel on YouTube.
I think I do - and it is the ‘appification’ of analytics. What I mean by this is the reduction of a complex analytic activity such as market segmentation, down to a single button on your computer interface. Very much like the…Continue
Graphs are everywhere, used by everyone, for everything. Neo4j is one of the most popular graph database that can be used to make recommendations, get social, find paths, uncover fraud, manage networks, and so on. A graph database can store any kind of data using a Nodes (graph data records), Relationships (connect nodes), and Properties (named data values).
A graph database can be used for connected data which is otherwise not possible with either relational or other NOSQL databases…Continue
Added by Raghavan Madabusi on September 19, 2014 at 6:31pm — No Comments
This article provides a full demo application using both the C# and R programming languages interchangeably to rapidly identify and cluster similar images. The demo application includes a directory with 687 screenshots of webpages. Many of these images are very similar with different domain names but near identical content. Some images are only slightly similar with the sites using the same general layouts but different colors and different images on certain…
Added by Jake Drew Ph.D. on June 25, 2014 at 4:00pm — No Comments
I was reading through my Twitter feed the other day and saw a comment about the R language being too ad hoc for users. It got me thinking, "Is that bad? Aren't most languages initially seen as ad hoc?".
The beauty of R as a data science tool is its "ad hocedness" in that its use can satisfy multiple interests. Initially I can see this as troublesome in that learning the specificity of a tool's use can be daunting. But in the long-run I think this benefits a…Continue
Added by Justin on May 15, 2014 at 5:04pm — No Comments
I recently added two new data analytics books from Pearson to my growing Data Science and Big Data stack:Continue
Added by Kirk Borne on March 29, 2014 at 11:15am — No Comments
Hey Data Scientists,
I wanted to reach out about Plot.ly, a new startup for analyzing and beautifully visualizing data. We just launched a beta.
It is built for math, science, and data applications. We'd love your thoughts.
Added by Matthew Sundquist on November 9, 2013 at 10:40pm — No Comments
One of the most popular methods or frameworks used by data scientists at the Rose Data Science Professional Practice Group is Random Forests. The…Continue
Statistics.com, a provider of online education in statistics and analytics, announces a partnership with CrowdANALYTIX, a predictive modeling “managed crowdsourcing” company, offering a new online course, “Applied Predictive Analytics in partnership with CrowdANALYTIX“, which will run from Oct. 11 to Nov 8, 2013.
The goal of this course is to teach users (who have basic knowledge of R programming, predictive analytics and statistics)…Continue
Added by Janet Dobbins on September 11, 2013 at 6:58am — No Comments
Bob Muenchen's very useful work on this topic, SAS Dominates Analytics Job Market; R up 42% sent me back to some 2012 work we did at Statistics.com on the subject of what employers are looking for in the way of analytics skills. First, our main results:
1. Our numbers showed a much less SAS-dominant world: 1.92 SAS jobs for every R job. Bob had found the ratio to…Continue
I found it odd there was no way to automatically deskew data in R, so I wrote a short little function to do it. It noticeably improves the peformance of linear models and linear support vector machines.