There are many good and sophisticated feature selection algorithms available in R. Feature selection refers to the machine learning case where we have a set of predictor variables for a given dependent variable, but we don’t know a-priori which predictors are most important and if a model can be improved by eliminating some predictors from a model. In linear regression, many students are taught to fit a data set to find the best model using so-called “least squares”. In most…Continue
Added by Blaine Bateman on April 30, 2018 at 7:30am — No Comments
Across the country, animal shelters work around the clock to help pets get rescued. For the most part, people assume animal adoption is mostly driven by intuition and emotion — a family comes in, falls in love, and then welcomes a new pet into their home.
As readers here are likely to suspect, if you have the data, you can see there’s more to the story. Fortunately, the Austin Animal Shelter has made their animal…Continue
Added by Adam Levenson on April 30, 2018 at 5:00am — No Comments
Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week.
Added by Vincent Granville on April 29, 2018 at 12:00pm — No Comments
Added by Serge Audenaert on April 29, 2018 at 11:00am — No Comments
Although a support vector machine model (binary classifier) is more commonly built by solving a quadratic programming problem in the dual space, it can be built fast by solving the primal optimization problem also. In this article a Support Vector Machine implementation is going to be described by solving the primal optimization…Continue
Added by Sandipan Dey on April 28, 2018 at 3:30pm — No Comments
R language is a free statistical computing environment; hence there are multiple ways/packages to achieve a particular statistical/quantitative output. I am going to discuss here a concise list of R packages that one can use for the modeling of financial risks and/or portfolio optimization with utmost efficiency and effectiveness. The intended audience for this article is financial market analysts interested in using R, and also for quantitatively inclined folks…Continue
Added by Ranjit Mishra on April 28, 2018 at 2:30am — No Comments
It's been a while since my last blog post, but I wanted to update everyone on a project that's been keeping me busy recently.
Last year Pearson Publishing commissioned me to write a book on big data / data science. They asked me to write it for executive audiences (non-technical), and to make it very practical for business use. Writing the book was quite a task, but it was a great exercise in summarizing the industry experience I'd gathered in my years leading data science teams…Continue
Added by David Stephenson on April 26, 2018 at 9:49am — No Comments
We illustrate pattern recognition techniques applied to an interesting mathematical problem: The representation of a number in non-conventional systems, generalizing the familiar base-2 or base-10 systems. The emphasis is on data science rather than mathematical theory, and the style is that of a tutorial, requiring minimum knowledge in mathematics or statistics. However, some off-the-beaten-path, state-of-the-art number theory research is discussed here, in a way that is accessible to…Continue
After reviewing 8 great ETL tools for fast-growing startups, we got a request to tell you more about open source solutions.There are many open source ETL tools and frameworks, but most of them require writing code.…Continue
Added by Luba Belokon on April 26, 2018 at 2:30am — No Comments
Added by Serge Audenaert on April 25, 2018 at 10:30pm — No Comments
With high profile security breaches like Equifax, the publicity over data security, as well as the cost, has only continued to grow. According to The 2017 Cost of Data Breach Study from the Ponemon Institute, the global average cost of a data breach is $3.6 million, or $141…Continue
Added by Jayesh Bapu Ahire on April 25, 2018 at 8:00pm — No Comments
My new book may be of interest to some members.
Added by Gohar F. Khan on April 25, 2018 at 3:30pm — No Comments
Reinforcement Learning (RL) – 3rd / last post in this sub series “Machine Learning Type” under master series “Machine Learning Explained“. Next sub series “Machine Learning Algorithms Demystified” coming up. This post talks about reinforcement machine learning only.
Added by Vinod Sharma on April 25, 2018 at 9:30am — No Comments
Data Dictionary to Meta Data III is the third and final blog devoted to demonstrating the automation of meta data creation for the American Community Survey 2012-2016 household data set, using a published data dictionary. DDMDI was a teaser to show how Python could be used to generate R statements that could in turn be cut/pasted/applied in an R Jupyter notebook to…Continue
Added by steve miller on April 25, 2018 at 9:00am — No Comments
The European General Data Protection Regulations (GDPR) will come into force on May 25, 2018. These regulations will have a significant impact on existing data collection and analysis methods.
Many businesses have become reliant on customer…Continue
Customers leave behind an incomprehensible amount of data while they go about shopping. Making sense of that data and reacting in real time are the two things that will keep companies one-step ahead of their customers (and competition) in the present-day customer-centric world.
Today, the average customer is spoiled for choice. Every time he goes shopping, he expects highly personalized, relevant offers. One poor interaction with a brand, and poof, the customer’s gone, almost-certain…Continue
Added by Laura Ellis on April 23, 2018 at 6:30pm — No Comments
Developed at MIT’s Sloan School of Management in 1950s system dynamics is a methodological approach to model the behavior of complex systems, where change in one component leads to change in others (like the dominos effect with feedback loops added). This approach is widely applied in industries such as healthcare, disease research, public transportation, business management and revenue forecasting. The most famous application of system dynamics probably is in…Continue
Added by Mab Alam on April 23, 2018 at 5:30pm — No Comments
Summary: Not everyone wants to invest the time and money to become a data scientist, and if you’re mid-career the barriers are even higher. If you still want to be deeply involved in the new data-driven economy and well paid, the growth rate and opportunities as a data engineer or business analyst need to be on your radar screen.
In this article, couple of implementations of the support vector machine binary classifier with quadratic programming libraries (in R and python respectively) and application on a few datasets are going to be discussed.
The next figure describes the basics of Soft-Margin SVM (without kernels).
SVM in a nutshell
Added by Sandipan Dey on April 23, 2018 at 9:30am — No Comments