This is a compilation has everything you need to jumpstart your skills in the core tasks of data transformation, modeling, and visualization.
tl;dr: Coursera and John Hopkins have a new course called The Data Scientist's Toolbox. https://www.coursera.org/course/datascitoolbox
Below is a list of popular analysis from Rexer's 2013 survey. The table is biased towards customer transaction, text,…
Added by Peter Higdon on March 25, 2014 at 9:01am — No Comments
Data Scientists Salary Survey shows that industry data scientists are in a sweet spot, especially in US, Canada, and Australia, with average salary $135K. European and Asian data scientists salaries are significantly lower.
Added by Vincent Granville on March 25, 2014 at 8:30am — No Comments
Added by Vincent Granville on March 21, 2014 at 10:00am — No Comments
In connection with our proposed methodology to create a black-box, automated, easy-to-interpret, sample-based, robust technique called jackknife regression, to be used in small and big data environments by non-statisticians, We offer an award and massive promotion to the successful candidate who
Added by Vincent Granville on March 20, 2014 at 7:30am — No Comments
This article discusses a far more general version of the technique described in our article The best kept secret about regression. Here we adapt our methodology so that it applies to data sets with a more complex structure, in particular with highly correlated independent variables.…Continue
This article describes methods for machine learning using bootstrap samples and parallel processing to model very large volumes of data in short periods of time. The R programming language includes many packages for machine learning different types of data. Three of these packages include Support Vector Machines (SVM) , Generalized Linear Models (GLM) , and Adaptive Boosting (AdaBoost) . While all three packages can be highly accurate for…Continue
This permanent experimental design setting allow you to learn, participate or check out the results at any time, as data is gathered and reported in real time. This article illustrates a few concepts:
Added by Vincent Granville on March 18, 2014 at 12:00pm — No Comments
Added by Vincent Granville on March 17, 2014 at 9:00am — No Comments
Here are some of my favorite things about big data and data science, from A to Z (actually, ZZ):
C – Characterization
Are you looking for an exciting career opportunity that is just as paying as it is desirable? Harvard Business Review calls Data Scientists are the sexiest jobs of the 21st century. Data Scientist term coined when two people, DJ Patil and Jeff Hammerbacher, were trying to name their data team working on big data and did not want to limit their…Continue
Added by Vincent Granville on March 13, 2014 at 5:30pm — No Comments
Update: The most recent article on this topic can be found here.
All the regression theory developed by statisticians over the last 200 years (related to the general linear model) is useless. Regression can be performed as accurately without statistical models, including the computation of confidence intervals (for estimates, predicted values or…Continue
I received a call from an old client who stated his analytics team had a recent string of failures alarming the firm and costing money. He asked me to review and audit the teams work and analytical processes in attempt to understand and remedy the failures. The data crunching technology was…Continue
Added by Michael Walker on March 12, 2014 at 9:00pm — No Comments
Stack exchange data dump
This is an anonymized dump of all user-contributed content on the Stack Exchange network. Each site is formatted as a separate archive consisting of XML files zipped via 7-zip using bzip2 compression. Each site archive includes Posts, Users, Votes, Comments, PostHistory and PostLinks. For complete schema information, see the included readme.txt.…Continue
Added by Vincent Granville on March 12, 2014 at 2:30pm — No Comments
Here's my list:
Added by Vincent Granville on March 12, 2014 at 9:00am — No Comments
According to a report from Ministry of Higher Education and Scientific Research of UAE, polls published for the year 2013 by the American website “Mashable” stated that three out of four people in the United Arab Emirates own a smartphone making the country rank first globally in the use of smartphones. Saudi Arabia has been ranked third while Britain ranked ninth in the world. Surprisingly, with only 56.4% of penetration of the smart…Continue
Added by IPSITA on March 12, 2014 at 4:21am — No Comments
The Big Data Yawn
Over the past couple of months we have met with a number of oil and gas executives to demonstrate our Oil and Gas Solution built on Data-Tactics’ Big Data Engine (BDE). During these conversations it has become obvious that the very mention of "Big Data" produces an involuntary physiological response among business leaders - eye rolls and yawns. It appears that big data has reached the Gartner "trough of disillusionment". These executives have heard from a bewildering…Continue
Added by Sullexis LLC on March 10, 2014 at 6:00am — No Comments
In-memory database technology is fashionable in recent years as the price of RAM drops substantially and gigabyte chips become affordable. By taking advantage of the cost-performance value of RAM, leading edge database developers are boosting the performance of next-generation databases with in-memory technology. However, many developers who intend to adopt in-memory technology only think of speed in terms of RAM, and do not exploit the true power of in-memory technology.