Subscribe to DSC Newsletter

Featured Blog Posts – April 2018 Archive (75)

Simple automated feature selection using lm() in R

There are many good and sophisticated feature selection algorithms available in R.  Feature selection refers to the machine learning case where we have a set of predictor variables for a given dependent variable, but we don’t know a-priori which predictors are most important and if a model can be improved by eliminating some predictors from a model.  In linear regression, many students are taught to fit a data set to find the best model using so-called “least squares”.  In most…

Continue

Added by Blaine Bateman on April 30, 2018 at 7:30am — No Comments

Predicting animal adoption with Random Forest, SVM

Across the country, animal shelters work around the clock to help pets get rescued. For the most part, people assume animal adoption is mostly driven by intuition and emotion — a family comes in, falls in love, and then welcomes a new pet into their home.

As readers here are likely to suspect, if you have the data, you can see there’s more to the story. Fortunately, the Austin Animal Shelter has made their animal…

Continue

Added by Adam Levenson on April 30, 2018 at 5:00am — No Comments

Weekly Digest, April 30

Monday newsletter published by Data Science Central. Previous editions can be found here.  The contribution flagged with a + is our selection for the picture of the week.

Announcements
  • The APEXX W3 Brings DL to Your Deskside. Featuring an 18-core Intel® Xeon® W CPU and up to four professional NVIDIA Quadro GV100 GPUs, APEXX W3…
Continue

Added by Vincent Granville on April 29, 2018 at 12:00pm — No Comments

AI needs further demystification and democratisation

Two particular events directed my attention recently to the importance to democratise information about AI into governence and popular culture. The first was a congressional hearing of Facebook CEO Mark Zuckerberg (the Cambridge Analytica story) where he at some point had to explain rather basic principles on Facebook’s revenue model. The second was a recent EU parliament panel (October 2017) organized by STOA (Science and Technology Options Assessment) on AI aimed to prepare audiences for…
Continue

Added by Serge Audenaert on April 29, 2018 at 11:00am — No Comments

Implementing PEGASOS: Primal Estimated sub-GrAdient SOlver for SVM, Logistic Regression and Application in Sentiment Classification (in Python)

Although a support vector machine model (binary classifier) is more commonly built by solving a quadratic programming problem in the dual space,  it can be built fast by solving the primal optimization problem also. In this article a Support Vector Machine implementation is going to be described by solving the primal optimization…

Continue

Added by Sandipan Dey on April 28, 2018 at 3:30pm — No Comments

A Comprehensive List Of R Packages For Portfolio Analysis

R language is a free statistical computing environment; hence there are multiple ways/packages to achieve a particular statistical/quantitative output. I am going to discuss here a concise list of R packages that one can use for the modeling of financial risks and/or portfolio optimization with utmost efficiency and effectiveness. The intended audience for this article is financial market analysts interested in using R, and also for quantitatively inclined folks…

Continue

Added by Ranjit Mishra on April 28, 2018 at 2:30am — No Comments

New book on big data, AI and data science for executive audiences

It's been a while since my last blog post, but I wanted to update everyone on a project that's been keeping me busy recently.

Last year Pearson Publishing commissioned me to write a book on big data / data science.  They asked me to write it for executive audiences (non-technical), and to make it very practical for business use. Writing the book was quite a task, but it was a great exercise in summarizing the industry experience I'd gathered in my years leading data science teams…

Continue

Added by David Stephenson on April 26, 2018 at 9:49am — No Comments

New Decimal Systems - Great Sandbox for Data Scientists and Mathematicians

We illustrate pattern recognition techniques applied to an interesting mathematical problem: The representation of a number in non-conventional systems, generalizing the familiar base-2 or base-10 systems. The emphasis is on data science rather than mathematical theory, and the style is that of a tutorial, requiring minimum knowledge in mathematics or statistics. However, some off-the-beaten-path, state-of-the-art number theory research is discussed here, in a way that is accessible to…

Continue

Added by Vincent Granville on April 26, 2018 at 4:30am — 3 Comments

Open Source ETL: Apache NiFi vs Streamsets

After reviewing 8 great ETL tools for fast-growing startups, we got a request to tell you more about open source solutions.There are many open source ETL tools and frameworks, but most of them require writing code.…

Continue

Added by Luba Belokon on April 26, 2018 at 2:30am — No Comments

AI - From Silo to Ecosystem

I sometimes reflect on how we will reach this stage of AI called AGI (Artificial General Intelligence) which is defined by a state of AI that by some measure equivalents the human condition. Scientists in the field sometimes call this the singularity, or the point where AI will develop much faster and at larger scale than ourselves. This is of course still a hypothetical state and many out there are in the very process of proving it either wrong or right - so I am not going there in this…
Continue

Added by Serge Audenaert on April 25, 2018 at 10:30pm — No Comments

Resolving the Cyber Skills Gap & Talent Shortage

With high profile security breaches like Equifax, the publicity over data security, as well as the cost, has only continued to grow. According to The 2017 Cost of Data Breach Study from the Ponemon Institute, the global average cost of a data breach is $3.6 million, or $141…

Continue

Added by Jayesh Bapu Ahire on April 25, 2018 at 8:00pm — No Comments

New Book: Creating Value with Social Media Analytics

My new book may be of interest to some members.

 
Paperback: …
Continue

Added by Gohar F. Khan on April 25, 2018 at 3:30pm — No Comments

Reinforcement Learning – Reward for Learning

Reinforcement Learning (RL) –  3rd / last post in this sub series “Machine Learning Type” under master series “Machine Learning Explained“. Next sub series “Machine Learning Algorithms Demystified” coming up. This post talks about reinforcement machine learning only. 

 …

Continue

Added by Vinod Sharma on April 25, 2018 at 9:30am — No Comments

Reticulating Python and R -- the American Community Survey Data Dictionary to Meta Data III.

Data Dictionary to Meta Data III is the third and final blog devoted to demonstrating the automation of meta data creation for the American Community Survey 2012-2016 household data set, using a published data dictionary. DDMDI was a teaser to show how Python could be used to generate R statements that could in turn be cut/pasted/applied in an R Jupyter notebook to…

Continue

Added by steve miller on April 25, 2018 at 9:00am — No Comments

What Does GDPR Mean For Your Business?

The European General Data Protection Regulations (GDPR) will come into force on May 25, 2018. These regulations will have a significant impact on existing data collection and analysis methods. 

Many businesses have become reliant on customer…

Continue

Added by Ronald van Loon on April 23, 2018 at 10:30pm — 1 Comment

How Machines Are Learning From Customers And Predicting Human Behavior

Customers leave behind an incomprehensible amount of data while they go about shopping. Making sense of that data and reacting in real time are the two things that will keep companies one-step ahead of their customers (and competition) in the present-day customer-centric world.

Today, the average customer is spoiled for choice. Every time he goes shopping, he expects highly personalized, relevant offers. One poor interaction with a brand, and poof, the customer’s gone, almost-certain…

Continue

Added by Hemant Warudkar on April 23, 2018 at 8:00pm — 1 Comment

Map plots created with R and ggmap

Continue

Added by Laura Ellis on April 23, 2018 at 6:30pm — No Comments

Data Science meets System Dynamics

Developed at MIT’s Sloan School of Management in 1950s system dynamics is a methodological approach to model the behavior of complex systems, where change in one component leads to change in others (like the dominos effect with feedback loops added). This approach is widely applied in industries such as healthcare, disease research, public transportation, business management and revenue forecasting. The most famous application of system dynamics probably is in…

Continue

Added by Mab Alam on April 23, 2018 at 5:30pm — No Comments

Data Engineer and Business Analyst Might be the Best Data Science Opportunities

Summary: Not everyone wants to invest the time and money to become a data scientist, and if you’re mid-career the barriers are even higher.  If you still want to be deeply involved in the new data-driven economy and well paid, the growth rate and opportunities as a data engineer or business analyst need to be on your radar screen.

  …

Continue

Added by William Vorhies on April 23, 2018 at 3:41pm — 2 Comments

Implementing a Soft-Margin Kernelized Support Vector Machine Binary Classifier with Quadratic Programming in R and Python

In this article, couple of implementations of the support vector machine binary classifier with quadratic programming libraries (in R and python respectively) and application on a few datasets are going to be discussed.  

The next figure describes the basics of Soft-Margin SVM (without kernels).

svm_slack.png SVM in a nutshell

  • Given a (training) dataset consisting of positive and negative class instances.
  • Objective is to find…
Continue

Added by Sandipan Dey on April 23, 2018 at 9:30am — No Comments

Featured Monthly Archives

2019

2018

2017

2016

2015

2014

2013

2012

2011

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service