Featured Blog Posts – January 2017 Archive (85)

Using the Bizarro Pipe to Debug magrittr Pipelines in R

I have just finished and released a free new R video lecture demonstrating how to use the “Bizarro pipe” to debug magrittr pipelines. I think Rdplyr users will really enjoy it.

In this video lecture I use the “Bizarro pipe” to debug the example pipeline from RStudio’s purrr announcement.

TLDnW (too long, did…


Added by John Mount on January 31, 2017 at 10:30pm — No Comments

25 Big Data Terms You Must Know To Impress Your Date (Or whoever you want to)

Big Data can be intimidating! If you are new to Big Data, please read ‘What is Big Data’, ‘…


Added by Ramesh Dontha on January 31, 2017 at 9:51pm — 8 Comments

Will Trump Kill Statistician's Jobs

Today Trump met with leaders of pharmaceutical companies, to discuss “astronomical” drug prices and reduce regulations, so that drug companies can still make hefty profits while charging less for drugs. The motivation could be to keep the costs of healthcare down to facilitate the…


Added by Vincent Granville on January 31, 2017 at 8:00pm — 3 Comments

20 Great Blogs Posted in the last 12 Months

This is part of a new series of articles: once or twice a month, we post previous articles that were very popular when first published. These articles are at least 6 month old but no more than 12 month old. The previous digest in this series was posted here a while back. 

Upcoming DSC Webinar

How to Keep Your R Code Simple While…


Added by Vincent Granville on January 31, 2017 at 10:30am — No Comments

Deep Learning and Recommenders

Summary:  In this last article in our series on recommenders we look to the future to see how the rapidly emerging capabilities of Deep Learning can be used to enhance recommender performance. 


In our first article, “Understanding…


Added by William Vorhies on January 31, 2017 at 9:30am — No Comments

Stepping back from "Big data" and into "Mesoscale data science"

Hot topics like “big data”, “machine learning”, “data science” are now dominating in the scientific community. In the past 10 years alone, data availability has increased exponentially (and not even in a squared, or cubed sort of way… we are talking on the order of 1010 if not more). Exabytes (1018 or one QUINTILLION bytes!!?) of information are being passed, stored, saved and analyzed on a monthly…


Added by Grant Humphries on January 31, 2017 at 6:00am — No Comments

Deep Learning (DL) versus Analysis Learning (AL)

At first I liked tinkering with computers and learn computer programming languages, after graduating high school I started to develop the concept of work on data processing and I've completed it. More recently the IT world the term Deep Learning (DL) number of campuses or institutions have been developing this concept, and many experts of computer data or data processing experts began to talk about it.

I do not know that it is actually a concept I have done resemblance to Deep…


Added by Jeefri A. Moka on January 31, 2017 at 1:00am — 1 Comment

Big Data Science: Expectations vs. Reality

The past few years has been like a dream come true for those who work in…


Added by Maria Sayapina on January 30, 2017 at 2:00am — 1 Comment

Tutorial: Neutralizing Outliers in Any Dimension

In this article, we discuss a general framework to drastically reduce the influence of outliers in most contexts. It applies to problems such as clustering (finding centroids,) regression, measuring correlation or R-Squared, and many more. We will focus on the centroid problem here, as it is very similar and generalizes easily to solving a linear regression. The correlation / R-Squared issue was discussed…


Added by Vincent Granville on January 29, 2017 at 10:30pm — No Comments

Differential Spectrum - the Articulated Event Horizon

I periodically use charts containing a crosswave “differential spectrum” or “event horizon.”  In this blog, I will explain the nature of the spectrum and the relevance of any apparent bias.

I once mentioned purchasing a machine designed to monitor and reduce sleep apnea.  Sleep apnea is when a person stops breathing while sleeping.  During a sleep study, I was found to have moderate sleep apnea.  Apart from its medical implications, sleep apnea is also a metric.  The machine…


Added by Don Philip Faithful on January 28, 2017 at 10:27am — 2 Comments

Organizational Distress - Cumulative Differential from Spliced Data

I routinely study differences in production between years by charting the data on the same graph. I consider this a popular approach. It makes sense since there is often interest on how the year is shaping up compared to previous years. Moreover, seasonality would be less relevant given that the same seasons are compared between years (assuming the seasons reoccur at around the same time). Below I present some real data from an organization in 1983 comparing production to 1982. I think many…


Added by Don Philip Faithful on January 28, 2017 at 10:00am — No Comments

Weekly Digest, January 30

Monday newsletter published by Data Science Central. Previous editions can be found here.  The contribution flagged with a + is our selection for the picture of the week.

Upcoming DSC Webinar


Added by Vincent Granville on January 28, 2017 at 8:00am — No Comments

Interesting Data Science Application: Steganography

The Art and Science of Encrypting, Embedding and Hiding Messages in Pictures and Videos.

This is related to data encryption and security. Imagine that you need to transmit the details of a patent or a confidential financial transaction over the Internet. There are three critical issues:…


Added by Vincent Granville on January 27, 2017 at 8:30pm — 9 Comments

Importance of Hypothesis Testing in Quality Management

Essentially good hypotheses lead decision-makers like you to new and better ways to achieve your business goals. When you need to make decisions such as how much you should spend on advertising or what effect a price increase will have your customer base,…


Added by Vinay Babu on January 27, 2017 at 6:00pm — 2 Comments

Using ML-driven marketing optimization to solve the attribution conundrum

Accurate multichannel campaign attribution has stumped the online marketing industry for years. But what if the solution is to stop worrying about attribution, and move to an optimization-driven approach?

You know those photo mosaic images, which suddenly became terribly popular a few years back? They cleverly use lots of individual tiny images to make up one large image. If you look closely you can make out the…


Added by Ian Thomas on January 27, 2017 at 9:30am — No Comments

Data Science Reveals Trump Tweets are Written by Two People

By David Robinson. David Robinson is a data scientist at Stack Overflow. His article (parts of it) was re-posted in the Washington Post, here. This is also a short version that summarizes his analysis. The details and source code can be found on David's website,…


Added by Vincent Granville on January 26, 2017 at 7:30pm — No Comments

140 Machine Learning Formulas

By Rubens Zimbres. Rubens is a Data Scientist, PhD in Business Administration, developing Machine Learning, Deep Learning, NLP and AI models using R, Python and Wolfram Mathematica. Click here to check his Github page.…


Added by Vincent Granville on January 25, 2017 at 6:30pm — 3 Comments

A Visual Introduction to Machine Learning

This article was written by Stephanie and Tony on R2D3. 

In machine learning, computers apply statistical learning techniques to automatically identify patterns in data. These techniques can be used to make highly accurate predictions. Using a data set about homes, we will create a machine learning model to distinguish homes in New York from homes in San Francisco.…


Added by Emmanuelle Rieuf on January 25, 2017 at 4:00pm — 1 Comment

Fraud analysis using speech analytics and Monte Carlo

As per the largest market research firm MarketsandMarkets the speech analytics industry will grow to USD 1.60 billion by 2020 at a Compound Annual Growth Rate (CAGR) of 22% from 2015 to 2020. Today the omnichannel world consists of voice, email, chat, social channels, and surveys, and each channel has its own importance.

Therefore, it becomes inevitable for any customer centric organization to ignore the information that can be glean…


Added by Sunil Kappal on January 25, 2017 at 8:00am — 3 Comments

Fraud detections in the health care industry

One more opportunity to implement data mining techniques in the health care industry will be helping the healthcare insurers to detect fraud transactions so that the other patients can receive better and more affordable healthcare services. This occurs when individuals deceive an insurance company to try to obtain money to which they are not entitled. It happens when someone puts false information on an insurance application and when false or misleading information is given or…


Added by Aravind Reddy on January 24, 2017 at 5:52pm — 2 Comments

Featured Monthly Archives












© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service