This article was written by Tristan Handy. Tristan is the founder and president of Fishtown Analytics: helping startups implement advanced analytics. I’m very confident...
Apache Spark is generally known as a fast, general and open-source engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph p...
In this blog post, I will discuss feature engineering using the Tidyverse collection of libraries. Feature engineering is crucial for a variety of reasons, and it requi...
In this article, I present a few modern techniques that have been used in various business contexts, comparing performance with traditional methods. The advanced techniqu...
In this article, I present a few modern techniques that have been used in various business contexts, comparing performance with traditional methods. The advanced techniqu...
Data science today is a lot like the Wild West: there’s endless opportunity and excitement, but also a lot of chaos and confusion. If you’re new to data science and a...
Guest blog by Kevin Gray.. Kevin is president of Cannon Gray, a marketing science and analytics consultancy. Regression is arguably the workhorse of statistics. Despit...
Summary: In this multi-part series we walk through the full landscape of Recommenders. In this article we cover business considerations as well as issues for Recommen...
AI systems need to continually learn from new data to perform well in real-world scenarios. However, it is non-trivial to decide what new data needs to be labeled for tra...
This resource is part of a series on specific topics related to data science: regression, clustering, neural networks, deep learning, Hadoop, decision trees, ensembles, ...