Data Science & Technology Monthly: Feb 2016

In the last post, we talked about how the open sourcing of machine learning algorithms and hardware architecture gives rise to the latest phenomenon of “Data is king”. As companies compete to get developers to use their libraries, we continue to see this ongoing arms race like trend in open sourcing. In addition, we are also seeing how complicated the big data landscape is becoming. This is the link to the most recent 2016 Big Data Landscape compiled by Matt Turck. There are several new entrants into niche areas in the landscape.

1. Open-Source arms race for wide adoption

In Jan 2016, Microsoft released its Computational Network Toolkit on Github. Microsoft claims that CNTK is a much better deep learning toolkit compared with all the other toolkits out there because CNTK can run on a single core machine and also on a large cluster of GPU machines. It would be a few weeks before the community verifies the claim though.

Researchers just want a large variety of datasets to try out their algorithms on. While there are tons of open and free datasets available out there, the biggest news item from the last month has been Yahoo’s release of the largest ever machine learning dataset. This new data set is 13.5TB in size and it consists of anonymized user interaction from Yahoo News Feeds between Feb 2015 and May 2015 for about 20M users. This data would be a dream for researchers to lay hands on because they can now study online behavior of 20M users.

2. Image Processing

In the image processing category, here is an interesting piece of news. MIT Researchers have trained an algorithm to predict how boring you…. The applications of the algorithm are widespread because it would improve the impact of the shares. The team that made this algorithm are planning on releasing an app. And yes, deep learning was involved.

There are about 500 North Atlantic right whales left in the world and this project funded by marine biologists the National Oceanic and Atmospheric Administration aims to develop facial recognition software to recognize all the whales in the species. Th project has been dubbed Facebook for Whales.

3.Marvin Minsky loss

The AI world suffered a big loss with the death of Marvin Minsky, one of the co-founders of MIT’s Artificial Intelligence Laboratory. The NY Times has anexcellent profile on him, but the 1981 profile in the New Yorker is equally fascinating because you get a glimpse into the mind of a younger Minsky

4. Cool Data Visualizations

Who Marries Whom: This chart breaks down occupations by the predominance of male and female groups and how people in these various occupations tend to marry. Tip: move mouse over the grey areas slowly.

Toothbrushes and inequality: These collection of images of toothbrushes from around the world tell a poignant story about inequality. Perhaps this will inspire us to spread some love over the Valentine ’s Day weekend.