October 2018 Blog Posts (89)

Why do people with no experience want to become data scientists?

Below is my contrarian answer to one question recently posted on Quora.

It depends on what you mean by “no experience”. An NASA scientist who has processed petabytes of data and found great insights, for example discovered exoplanets, is de facto a data scientist and may have no interest in having his job title changed.

Then there is a bunch of people who call themselves “data science enthusiasts” and know nothing other than what they learned in a two-hour…


Added by Vincent Granville on October 25, 2018 at 6:00pm — 1 Comment

Thursday News: Stats, Math, ML, Neural Nets, K-means, AI, Deep Learning, Python, Anomaly Detection

Here is our selection of featured articles and technical resources posted since Monday:



Added by Vincent Granville on October 25, 2018 at 8:30am — No Comments

Advertising & Marketing Fundamentals For Data Scientists

I am an advertising and marketing veteran who is currently transitioning towards data science. The purpose of this write-up is to give you some baseline understanding of marketing, grounded in my professional experience. I am hoping that my write-up will help you gain a bigger share of voice when working with advertising & marketing teams. Eventually, you might ask bigger questions and thus move beyond just optimizing their work.

I will expand this post into a…


Added by Rafael Knuth on October 25, 2018 at 2:00am — 4 Comments

Facial Recognition and its Applications

Facial Recognition

Facial recognition technology was always a mythical concept that we thought could be a tool that could solve many of our problems but would never see the light of day. Today, facial recognition is everywhere and is a part of the everyday technology that we use. The…


Added by Abhimanyu on October 25, 2018 at 12:50am — No Comments

Anomaly/Outlier Detection using Local Outlier Factors

Outliers are patterns in data that do not confirm to the expected behavior. While detecting such patterns are of prime importance in Credit Card Fraud, Stock Trading etc. Detecting anomaly or outlier observations are also of importance when training any of the supervised machine learning models. This brings us to two very important questions: concept of a local outlier, and why a local outlier?

In a multivariate dataset where the rows are generated independently from a probability…


Added by Deepankar Arora on October 25, 2018 at 12:30am — No Comments

The Math Behind Machine Learning

Let’s look at several techniques in machine learning and the math topics that are used in the process.

In linear regression, we try to find the best fit line or hyperplane for a given set of data points. We model the output of our linear function by a linear combination of the input variables using a set of parameters as weights.

The parameters are found by minimizing the residual sum of squares. We find a critical point by setting the vector of derivatives of…


Added by Richard Han on October 24, 2018 at 6:00pm — No Comments

Prediction at Scale with scikit-learn and PySpark Pandas UDFs

By Michael Heilman, Civis Analytics

scikit-learn is a wonderful tool for machine learning in Python, with great flexibility for implementing pipelines and running experiments (see,…


Added by Civis Analytics on October 24, 2018 at 10:44am — No Comments

Business Problems and Data Science Solutions Part 1

An important principle of data science is that data mining is a process. It includes the application of information technology, such as the automated discovery and evaluation of patterns from data. It also includes an analyst’s creativity, business knowledge, and common sense. Understanding the whole process helps to structure data mining projects.

Since the data mining process breaks up the overall task…


Added by Mehmet Gökce on October 24, 2018 at 8:58am — No Comments

Top Data Analysis Books and Videos to become an Expert in Data

Learn how to transform data into business insight with these Data Tutorials and eBooks.

Deep Reinforcement Learning Hands-On By Maxim Lapan

This practical guide will teach…


Added by Packt Publishing on October 24, 2018 at 3:19am — No Comments

29 Statistical Concepts Explained in Simple English - Part 1

This resource is part of a series on specific topics related to data science: regression, clustering, neural networks, deep learning, decision trees, ensembles, correlation, Python, R, Tensorflow, SVM, data reduction, feature selection, experimental design, cross-validation, model fitting, and many more. To keep receiving these articles, sign up on…


Added by Vincent Granville on October 23, 2018 at 4:30pm — 6 Comments

K-means: A step towards Marketing Mix Modeling

                                                                                                  Source: www.mstecker.com/…


Added by Ridhima Kumar on October 23, 2018 at 11:00am — No Comments

Using Semantic Segmentation to identify rooftops in low-resolution Satellite images: Use case of Machine Learning in Clean Energy sector

The work is done by Jatinder Singh (also co-authored this article) and Iresh Mishra. Also thanks to Saurabh…

Added by Rudradeb Mitra on October 23, 2018 at 11:00am — No Comments

The Case for Just Getting Your Feet Wet with AI

Summary:  Even if you’re not big enough to have a full blown data science group that shouldn’t hold you back from benefiting from AI.  The market has evolved so that there are now industry and process specific vertical applications available from 3rd party AI vendors that you can implement.  There are just a few things to look out for.



Added by William Vorhies on October 23, 2018 at 7:30am — No Comments

Data Science “Paint by the Numbers” with the Hypothesis Development Canvas

 When I was a kid, I use to love “Paint by the Numbers” sets.  Makes anyone who can paint or color between the lines a Rembrandt or Leonardo da Vinci (we can talk later about the long-term impact of forcing kids to “stay between the lines”).


Well, the design world is applying the “Paint by the Numbers” concept using design canvases. …


Added by Bill Schmarzo on October 23, 2018 at 7:00am — No Comments

DL4J: How to create a neural network that draws images – Step by step guide

This article was written by Lukasz Ciesla

Neural networks, machine learning, artificial intelligence – I get the impression that these slogans attack us from everywhere. They are mainly associated with the giants of the IT industry, who from time to time report spectacular progress in this field. I decided to dispel myths about machine learning using a series of articles…


Added by Andrea Manero-Bastin on October 22, 2018 at 9:00pm — No Comments

Creating a Culture that Works for Data Science and Engineering

Written by Allison Sullivan, PHD - Civis Analytics

Product management resources often focus on collaborating with engineering and design. However, products are increasingly powered by data science, and getting data science into production means teams need…


Added by Civis Analytics on October 22, 2018 at 1:00pm — No Comments

Why is Becoming a Data Scientist so Difficult?

This question was recently posted on Quora. Below is my answer.

It depends what kind of data scientist you want to become. I think many university curricula include material that is advanced but that you don’t really need. Also, they offer not enough practical, professional coding and big data manipulation that would help you right away when starting a career. It also depends on your background: mine was math, stats, data analysis, and applied computer science, so the…


Added by Vincent Granville on October 22, 2018 at 12:00pm — No Comments

Should Python Become Your Official Corporate Language, Along With English?

English is becoming the official language in the global business world, being currently spoken by approximately 1.75 billion people worldwide according to Harvard Business Review. While English is the fastest spreading language in human history, a significant proportion of businesses are still resistant to giving up…


Added by Rafael Knuth on October 22, 2018 at 5:49am — 2 Comments

A Data Scientist’s Guide to an Efficient Project Lifecycle

It is often seen that projects often overshoot their normal completion data by at least three times most probably owing to shifting goals, inefficient approaches towards data collection and exploring various solution paths among others. A closer scrutiny often reveals that the delay was avoidable had there been a more disciplined decision making in place. To put it in a nutshell, there are three major principles which closely followed have reduced the entire project…


Added by Richa Ojha on October 21, 2018 at 9:30pm — No Comments

Weekly Digest, October 22

Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week.



Added by Vincent Granville on October 21, 2018 at 2:30am — No Comments

Blog Topics by Tags

Monthly Archives













© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service