Featured Blog Posts – May 2016 Archive (71)

Storytelling in Social Media Analytics: Beyond the Bar Chart II

In marketing, everybody talks about telling stories. Stories about audiences. Stories about ad concepts. Stories about brands … and about consumers who use them.


Analysts also see stories in data. Looking through lines of data or snippets of code, you can see the threads weaving themselves together to give you a clear understanding of what’s happening; you see offshoots that lead to questions about different segments or correlation/cause and effect. But if you were to show…


Added by Chris Atwood on May 31, 2016 at 12:30pm — No Comments

Machine Learning is dead – Long live machine learning!

You may be thinking that this title makes no sense at all. ML, AI, ANN and Deep learning have made it into the everyday lexicon and here I am, proclaiming that ML is dead. Well, here is what I mean…

The open sourcing of entire ML frameworks marks the end of a phase of rapid development of tools, and thus marks the death of ML as we have known it so far. The next phase will be marked with ubiquitous application of these tools into software applications. And that is how ML…


Added by Srividya Kannan Ramachandran on May 31, 2016 at 8:00am — No Comments

Picking an Analytic Platform

Picking an analytic platform when first starting out in data science almost always means working with what we’re most comfortable.  But as organizations grow larger there is a need for standardization and for selecting one, or a few analytic tools.


Picking an analytic platform when first starting out in data science almost…


Added by William Vorhies on May 31, 2016 at 7:00am — 1 Comment

Towards a Data-driven Organization: A Roadmap for Analytics

Building a Data-driven Organization requires identifying and prioritizing the opportunities where advanced analytics can make a material difference to the quality of decisions!…


Added by RADHA KRISHNA PERA on May 30, 2016 at 9:30am — No Comments

San Francisco Police Department Crime Incidents: Part 1-Time Series Analysis


The City and County of San Francisco had launched an official open data portal called SF OpenData in 2009 as a product of its official open data program, DataSF. The portal contains hundreds of city datasets for use by developers, analysts, residents and more. Under the category of Public Safety, the portal contains the list of SFPD Incidents since Jan 1, 2003.

In this post I have done an exploratory time-series analysis on the crime incidents dataset to see…


Added by Vimal Natarajan on May 30, 2016 at 7:42am — No Comments

Weekly Digest, May 30

Monday newsletter published by Data Science Central. Previous editions can be found here.  

Featured Resources and Technical Contributions


Added by Vincent Granville on May 29, 2016 at 8:04am — No Comments

Table 1 and the Characteristics of Study Population (rstats)

In research, especially in medical research, we describe characteristics of our study populations through Table 1. The Table 1 contain information about the mean for continue/scale variable, and proportion for categorical variable. For example: we say that the mean of systolic blood pressure in our study population is 145 mmHg, or 30% of participants are smokers. Since is called Table 1, means that is the first table in the manuscript.

To create the Table 1…


Added by Klodian on May 29, 2016 at 6:46am — No Comments

13 Great Data Science Infographics

Most of these infographics are tutorials covering various topics in big data, machine learning, visualization, data science, Hadoop, R or Python, typically intended for beginners. Some are cheat sheets and can be nice summaries for professionals with years of experience. Some, popular a while back (you will find one example here) were designed as periodic tables.…


Added by Vincent Granville on May 28, 2016 at 8:09am — No Comments

TensorFlow: Why Google's AI Engine is a Gamechanger

In May 2006, Larry page, one of Google’s co-founders had said “The ultimate search engine would understand everything in the world. It would understand everything that you asked it and give you back the exact right thing instantly. You could ask ‘what should I ask Larry?’ and it would tell you.” Come 2016, it seems at least part of his vision has been achieved through the release of Tensorflow, Google’s Artificial engine platform.

Tensorflow is a deep learning software…


Added by Tanmay Bhandari on May 27, 2016 at 5:00am — 1 Comment

Data Science Central 'Challenge of the Week'

I’d like to personally invite our global community of Data Scientists to participate in this week’s DSC challengeYou are invited to create your own data video: we provide simple instructions on how to do it. All submissions of acceptable quality will be featured on DSC, reaching our entire community of just over 1M members.  Each participant will receive a free copy of my (Vincent…


Added by Vincent Granville on May 26, 2016 at 2:30pm — No Comments

Expand Machine Learning tools: Configure Jupyter/IPython notebook for PySpark 1.6.1

Data Analytics favorites include Apache Spark, which is becoming a reference standard for Big Data, as a “fast and general engine for large-scale data processing”. Its built-in PySpark interface can run as a Jupyter notebook, but recent posts didn’t quite allow me to do…


Added by Marc Borowczak on May 26, 2016 at 6:43am — No Comments

Tableau 10 beta features

Yesterday evening,I attended a Tableau user group meeting to preview the new features expected in the upcoming Tableau 10 release. This meeting was hosted at the Toronto Public Library by none other than Tableau Maestro, Michael Martin!

The turn out was great with more than 50 people attending from various industries.

Here are some of the new…


Added by Salman Khan on May 25, 2016 at 7:03pm — No Comments

The Fallacies of Data Science

The Fallacies of Data Science

Adnan Masood, PhD. & David Lazar

  1. Correlation = Causation, and Big Data = Information and Insights because Data Context Doesn't Matter.
  2. The random nature of the event drives the distribution, therefore the likely distribution also drive the…

Added by Adnan Masood, PhD. on May 25, 2016 at 10:30am — No Comments

Want to Win Competitions? Pay Attention to Your Ensembles.

Summary: Want to win a Kaggle competition or at least get a respectable place on the leaderboard?  These days it’s all about ensembles and for a lot of practitioners that means reaching for random forests.  Random forests have indeed been very successful but it’s worth remembering that there are three different categories of ensembles and some important hyper parameters tuning issues within each  Here’s a brief review.



Added by William Vorhies on May 25, 2016 at 7:30am — 1 Comment

Twitter Analytics using Tweepsmap

This morning I saw #tweepsmap on my twitter feed and decided to…


Added by Salman Khan on May 25, 2016 at 5:00am — No Comments

Identify, describe, plot, and remove the outliers from the dataset with R (rstats)

In statistics, a outlier is defined as a observation which stands far away from the most of other observations. Often a outlier is present due to the measurements error. Therefore, one of the most important task in data analysis is to identify and (if is necessary) to remove the outliers.

There are different methods to detect the outliers, including standard deviation approach and Tukey’s method which use interquartile (IQR) range approach. In this post I will use…


Added by Klodian on May 24, 2016 at 11:07pm — No Comments

Tips for Effectively Communicating Complex Ideas to Non-Technical Clients

As a data scientist, your job doesn’t always make sense to others. Ever tried explaining what you do to your parents? They may nod their heads, but their eyes scream confusion.

Well, aside from possibly stifling job-related conversations, this isn’t a big deal. However, when it comes to explaining what you do to potential clients, who happen to be just as technology averse, it’s a major issue.

Here are some helpful tips for explaining exactly what you do to…


Added by Larry Alton on May 24, 2016 at 7:30am — No Comments

Why Hadoop? Streamlined Nature, Scalability and Cost-Effectiveness

Since its inception in the year 2008, the global Hadoop market has observed growth at a tremendous pace. This market, valued US$1.5 billion in 2012, is estimated to grow at a CAGR of 54.7% from 2012 to 2018. By the end of 2018, this market could amass a net worth of US$20.9 billion. With the massive amount of data generated every day across major industries, the global Hadoop market is anticipated to observe significant growth in the future as well.



Added by Ankit Jain on May 23, 2016 at 11:00pm — No Comments

Statistical Attribution & Optimization in the B2B World.

There has been a lot of activity recently around revenue attribution - marketers want to develop a better understanding of their customer acquisition funnel and be able to measure progress against it.  Most of this attention has been focused on the B2C space. However, less work has been done measuring the performance of B2B marketing activities. 

Certainly the marketing automation segment is very vibrant with a large number of vendors (both big and small) providing solutions that…


Added by Gregory Thompson on May 23, 2016 at 4:33pm — No Comments

Data Science & Machine Learning Encyclopedia - 4,000 Entries

This is one of the first comprehensive machine learning, data science, statistical science, and computer science repository -- featuring many brand new scalable, big-data algorithms published in the last two years, such as automated cataloging, causation detection, or model-free tests of hypotheses, in addition to the classics. The original title for this project was Handbook of Data Science, but over time, it grew much bigger than an handbook. This is still an ongoing…


Added by Vincent Granville on May 23, 2016 at 2:10pm — No Comments

Featured Monthly Archives












© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service