Subscribe to DSC Newsletter

All Blog Posts (6,084)

Data Science Job in 90 days - Book Summary

As a senior datascience professional and analytics manager, I get countless requests for job search advice, resume feedback and heart-breaking stories from brilliant students who are unable to snag a job in this exciting field. There are tons of books on how to learn the skills to become a data scientist/ data analyst, but none to prepare folks for the frustrating job search.

I've repeated this advice to dozens of people, most of whom found their dream datascience job with…

Continue

Added by Ann Rajaram on May 25, 2019 at 12:04pm — 1 Comment

Cross Validation in One Picture

Cross Validation explained in one simple picture. The method shown here is k-fold cross validation, where data is split into k folds (in this example, 5 folds). Blue balls represent training data; 1/k (i.e. 1/5) balls are held back for model testing.

Monte Carlo cross validation works the same way, except that the balls would be chosen with replacement. In other words, it would be possible for a ball to appear in more than one sample.…

Continue

Added by Stephanie Glen on May 25, 2019 at 8:30am — No Comments

Xaas Business Model: Economics Meets Analytics

Digital capabilities leverage customer, product and operational insights to digitally transform business models.  And nowhere is this more evident than the rush by industrial companies to digitally transform consumption models by transitioning from selling products to selling [capabilities]-as-a-service (thusly, Xaas).  For example:

  • The key issue for the airlines is to maximize their core revenue generating mechanisms:flight scheduling and the hours…
Continue

Added by Bill Schmarzo on May 24, 2019 at 3:48am — No Comments

Profiling Store Visitors

Our Telecom Client was developing a Big Data Product that will profile demography (Age, Gender, Income, Ethnicity, Marital Status) of the visitors of the stores receiving feed from the wi-fi routers placed in the stores. Client used to receive daily feed of router data in its server which were then uploaded in HDFS / Hive Tables in the data lake for analysis.

Maintaining data quality was a serious issue without which the reports would have been erroneous. A daily e-mail used to get…

Continue

Added by Dr. Moloy De on May 23, 2019 at 9:29pm — No Comments

How to Install and Run Hadoop on Windows for Beginners

Introduction

Hadoop is a software framework from Apache Software Foundation that is used to store and process Big Data. It has two main components; Hadoop Distributed File System (HDFS), its storage system and MapReduce, is its data processing framework. Hadoop has the capability to manage large datasets by distributing the dataset into smaller chunks across multiple machines and performing parallel computation on it…

Continue

Added by Divya Singh on May 23, 2019 at 8:30pm — No Comments

Data Science Central Thursday Digest, May 23

Here is our selection of featured articles, resources and forum questions posted since Monday:

Technical Resources

Continue

Added by Vincent Granville on May 23, 2019 at 10:30am — No Comments

Free Book: Foundations of Data Science (from Microsoft Research Lab)

By Avrim Blum, John Hopcroft, and Ravindran Kannan (2018). 

Computer science as an academic discipline began in the 1960s. Emphasis was on programming languages, compilers, operating systems, and the mathematical theory that supported these areas. Courses in theoretical computer science covered finite automata, regular expressions, context-free languages, and computability. In the 1970s, the study of algorithms was added as an important component of…

Continue

Added by Capri Granville on May 23, 2019 at 9:00am — No Comments

Free Textbook: Probability Course, Harvard University (Based on R)

A free online version of the second edition of the book based on Stat 110, Introduction to Probability by Joe Blitzstein and Jessica Hwang, is now available here. Print copies are available via CRC Press, Amazon, and…

Continue

Added by Capri Granville on May 23, 2019 at 8:30am — 1 Comment

What is Data Lake and How to Improve Data Lake Quality

Introduction

Building data pipelines is a core component of data science at a startup. In order to build data products, you need to be able to collect data points from millions of users and process the results in near real-time. Today, many organizations nowadays are struggling with the quality of their data. Data quality (DQ) problems can arise in various ways. Here are common causes of bad data quality:

  • Multiple data sources:…
Continue

Added by Divya Singh on May 22, 2019 at 9:00pm — No Comments

Price Forecasting: Applying Machine Learning Approaches to Electricity, Flights, Hotels, Real Estate, and Stock Pricing

When you give customers advice that can help them save some money, they will pay you back with loyalty, which is priceless. Interesting fact: Fareboom users started spending twice as much time per session within a month of the release of an airfare price forecasting feature. This tool continues to grow conversion for our partner.

Besides travel, price predictions find their application in various scenarios. Commodity traders, investors, construction developers, or energy generators…

Continue

Added by Kateryna Lytvynova on May 22, 2019 at 7:30am — No Comments

An Introduction to Python Virtual Environment

Data Science, Machine Learning, Deep Learning, and Artificial Intelligence are some of the most heard about buzzwords in the modern analytical eco-space. The exponential growth of technology in this regard has simplified our lives and made us more machine dependent. The astonishing hype surrounding such technologies has prompted professionals from various disciples to hop on to the ship and consider analytics as their career option.

To master Data Science or Artificial Intelligence in…

Continue

Added by Divya Singh on May 21, 2019 at 9:30pm — No Comments

29 Statistical Concepts Explained in Simple English - Part 13

This resource is part of a series on specific topics related to data science: regression, clustering, neural networks, deep learning, decision trees, ensembles, correlation, Python, R, Tensorflow, SVM, data reduction, feature selection, experimental design, cross-validation, model fitting, and many more. To keep receiving these articles, sign up on…

Continue

Added by Vincent Granville on May 21, 2019 at 5:30pm — No Comments

Implementing Knowledge Graphs in Enterprises - Some Tips and Trends

Tips

  1. Don't try to put the cart before the horse: realize that efficient data preparation (and thus interoperable standards) and data quality, especially in the enterprise environment, are a basic requirement for…
Continue

Added by Andreas Blumauer on May 21, 2019 at 5:33am — No Comments

Prediction of Customer Churn with Machine Learning

Machine Learning is the word of the mouth for everyone involved in the analytics world. Gone are those days of the traditional manual approach of taking key business decisions. Machine Learning is the future and is here to stay.

However, the term Machine Learning is not a new one. It was there since the advent of computers but has grown tremendously in the last decade due to the massive amounts of data that’s getting generated, and the enormous computational power that modern-day…

Continue

Added by Divya Singh on May 20, 2019 at 10:30pm — No Comments

Deep Learning Explainability: Hints from Physics


Nowadays, artificial intelligence is present in almost every part of our lives. Smartphones, social media feeds, recommendation engines, online ad networks, and navigation tools are some…

Continue

Added by Marco Tavora on May 20, 2019 at 11:46am — No Comments

Should You Be Recommending Deep Learning Solutions in Your Company?

Summary:  If you are guiding your company’s digital journey, to what extent should you be advising them to adopt deep learning AI methods versus traditional and mature machine learning techniques.

 

By now everyone is at least familiar with using AI/ML as a required cornerstone of company strategy.  Frequently…

Continue

Added by William Vorhies on May 20, 2019 at 8:33am — 1 Comment

A Complete Machine Learning Project Walk-Through in Python: Part One

This article was written by Will Koehrsen.

 Reading through a data science book or taking a course, it can feel like you have the individual pieces, but don’t quite know how to put them together. Taking the next step and solving a complete machine learning problem can be daunting, but preserving and completing a first…

Continue

Added by Andrea Manero-Bastin on May 20, 2019 at 6:30am — No Comments

Data Science Central Monday Digest, May 20

Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, follow this link.  

Announcements

  • Machine…
Continue

Added by Vincent Granville on May 19, 2019 at 3:00pm — No Comments

Quantum Simulator Qubiter now has a native TensorFlow backend



I am pleased to announce that my quantum simulator Qubiter (available at GitHub, BSD license) now has a native TensorFlow Backend-Simulator (see its class `SEO_simulator_tf`, the `tf` stands for TensorFlow). This complements Qubiter's original numpy simulator (contained in its class `SEO_simulator`). A small step for Mankind, a giant leap for me! Hip Hip Hurray!

This means that Qubiter can now…

Continue

Added by Robert R. Tucci on May 19, 2019 at 11:30am — No Comments

Frame a problem as a machine learning problem or otherwise

This is not very simple to choose a machine learning method and letting it go wild on the data. Particularly, understanding the core business problem and objective of the outcome and frame accordingly is one of the vital factors in machine learning. A general approach is difficult to recommend without intimate knowledge of the data. However, it sounds like we need to formalize the aspects of your model. Following questions may help to decide on machine learning problem or…

Continue

Added by Ariful Islam on May 19, 2019 at 9:00am — No Comments

Blog Topics by Tags

Monthly Archives

2019

2018

2017

2016

2015

2014

2013

2012

2011

1999

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service