This article was written by Tristan Handy. Tristan is the founder and president of Fishtown Analytics: helping startups implement advanced analytics. I’m very confident...
Apache Spark is generally known as a fast, general and open-source engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph p...
In this blog post, I will discuss feature engineering using the Tidyverse collection of libraries. Feature engineering is crucial for a variety of reasons, and it requi...
Genevera I. Allen (left) is professor in the Departments of Statistics, and the Electrical and Computer Engineering, at Rice University. Corinne Cath (right) is a docto...
In this article, I present a few modern techniques that have been used in various business contexts, comparing performance with traditional methods. The advanced techniqu...
Data science today is a lot like the Wild West: there’s endless opportunity and excitement, but also a lot of chaos and confusion. If you’re new to data science and a...
Guest blog by Kevin Gray.. Kevin is president of Cannon Gray, a marketing science and analytics consultancy. Regression is arguably the workhorse of statistics. Despit...
t-Distributed Stochastic Neighbor Embedding (t-SNE) is a (prize-winning) technique for dimensionality reduction that is particularly well suited for the visualization of ...
The world is increasingly digital, and this means big data is here to stay. In fact, the importance of big data and data analytics is only going to continue growing in th...
Hi, my name is Brontobyte and this is my story of how I grew up from a Byte, to Megabyte, to Gigabyte, to Brontobyte. I was born possibly in 1956 to unknown parents at ...