# Featured Blog Posts – April 2015 Archive (71)

### Calculate Cosine Similarity Using Scipy – Data Sets & Sample Code

What is Cosine Similarity?

Cosine Similarity is a measure of similarity between two vectors that calculates the cosine of the angle between them. Similarity ranges from −1 meaning exactly opposite, to 1 meaning exactly the same, with 0 usually indicating independence, and in-between values indicating intermediate similarity or dissimilarity.…

Added by Gridlex on April 12, 2015 at 8:30pm — 1 Comment

### It's Time to Bring Your Own Data!

I just got back from my vacation in Barcelona, Spain where I spent about 3 days, then rented a car and drove up north through the South of France. My last stop was Nice, France. The trip was a lot of fun and now I intend to find some data to help bring back great memories (hm...it sounds more geeky than I thought but anyway).

Barcelona is located in Catalonia region of Spain famous for its earthy dry reds as well as Cava - world's most delicious bubbly drink. I am a big wine fan which…

Added by Tatiana Sorokina on April 12, 2015 at 8:00am — No Comments

### Driving Behaviour as a Telematic Fingerprint

The objective of my final project at Metis from weeks 9 to 12, is to categorize drivers based on their behaviour on the roads - their driving style and the type of roads that they follow.

The challenge associated with this objective is to identify uniquely a driver (and hence his proper “driving…

Added by DAGHER Philippe N. on April 11, 2015 at 11:49am — 2 Comments

### 40 Excel Tricks

First, let's start with an article featuring many great Excel functions, entitled 11 Advanced Excel Tricks That Will Help You Get An Instant Raise At Work. It describes the following Excel functions:

• VlookupYou can use the VLOOKUP function to search the first column of a range…
Added by Mirko Krivanek on April 9, 2015 at 10:00pm — 6 Comments

### The 5 V's of Big Data by Bernard Marr

Nice infographics produced by famous business management consultant and author, Bernard Marr. Click on the picture, then click one more time on the picture, to see easy-to-read version.

DSC Resources

Added by Bernard Marr on April 9, 2015 at 7:30pm — No Comments

### That’s Data Science: Airbus Puts 10,000 Sensors in Every Single Wing!

In a meeting with Airbus last week I found out that their forthcoming A380-1000 – the supersized airliner capable of carrying up to 1,000 passengers – will be equipped with 10,000 sensors in each wing.

The current A350 model has a total of close to 6,000 sensors across the entire plane and generates 2.5 Tb of data per day, while the newer model – expected to take…

Added by Bernard Marr on April 9, 2015 at 7:00pm — 3 Comments

I assist enterprises by driving data-driven approaches into their operations, developing market-aware products that learn from data, and encouraging data-smart cultures among the c-suite of executives. I have had the privilege to work with many talented professionals looking to disrupt their…

Added by Sean McClure on April 8, 2015 at 8:00am — 2 Comments

### Top 5 Disruptive Technologies that Will Change the World

In order for a business today to remain competitive, it must be willing to embrace new technologies. Using old or outdated technology can leave a business trailing in the dust of those newer businesses that have emerged to the forefront of the industry, especially when reaping the benefits that new technology affords them. Of course, this means that one must also be aware of new technology and how they might benefit your business, which is not always so easy to do. In fact, there is a term…

Added by Shezagary on April 7, 2015 at 9:44pm — 1 Comment

### The Easy Way Big Data Can Be Accessed with Data-as-a-Service

Primed to make a huge entrance in 2015, Data-as-a-Service (DaaS) empowers companies with real-time data to overcome tough challenges with data. DaaS is allowing companies to generate real-time insights and revenue from Big Data. Companies commonly report feeling overwhelmed solely by the mere size of big data, not to mention the processes necessary to use the data. This no longer has to be a reality. With DaaS using big data is no longer a couple month long process.

### What is…

Added by Larisa Bedgood on April 7, 2015 at 12:30pm — No Comments

### The Hype Around Graph Databases And Why It Matters

Organizations are struggling with a fundamental challenge – there’s far more data than they can handle.  Sure, there’s a shared vision to analyze structured and unstructured data in support of better decision making but is this a reality for most companies?  The big data tidal wave is transforming the database management industry, employee skill sets, and business strategy as organizations race to unlock meaningful connections between disparate sources of…

Added by Tony Agresta on April 7, 2015 at 6:45am — 4 Comments

### Four Techniques to Apply in the Design of Data-heavy Applications

Guest blog post.

Big data makes a noteworthy contribution to the usefulness of an application, but its presence can make the design of a clean and usable interface rather difficult. Today, many web applications are built on the platform of big cloud-based data, which leads to the question: how can a designer deliver all the necessary data in an application without making a train-wreck of everything?

Creating a balance between complex data requirements and a simplified…

Added by Vincent Granville on April 7, 2015 at 3:50am — No Comments

### The Hot Hand Rises Again

Next month marks the 100th anniversary of Babe Ruth’s first home run.

This year, opening day in baseball signals the “closing day” for one of the classic truisms among sports statisticians: the belief that…

Added by Peter Bruce on April 6, 2015 at 10:40am — No Comments

### Five Characteristics of the Big Data Bang

At this point, I suspect a lot of us have heard of the three, four, or even seven V’s of big data. The original three V’s – Volume, Velocity, and Variety – appeared in 2001 when Gartner analyst Doug Laney used it to help identify key dimensions of big data.   …

Added by Anne Russell on April 6, 2015 at 8:30am — No Comments

### Lambda Complexity: Why Fast Data Needs New Thinking

The Unix Philosophy, summarized by Doug McIlroy in 1994:

Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

This is considered by some to be the greatest revelation in the history of computer science, and there’s no debate that this philosophy has been instrumental in the success of Unix and its derivatives. Beyond Unix, it’s easy to see how this philosophy has…

Added by John Hugg on April 6, 2015 at 7:30am — No Comments

### Weekly Digest - April 13

The full version is always published Monday. Starred articles or sections are new additions or updated content, posted between Thursday and Sunday.

Announcement

Added by Vincent Granville on April 5, 2015 at 5:30pm — No Comments

Gest blog post.

Vozag downloaded CRAN data from the R project to understand the top projects & which ones had the most discussions. Given below is a list of the top 20 packages downloaded in a single day. The full list of the top 100 most downloaded R packages is here.

 Rank…
Added by Vincent Granville on April 5, 2015 at 4:24pm — 1 Comment

### Simplest Way to Monetize Data: Think of Data as a Product

Guest blog post by Mike Davie.

With the exponential growth of IoT and M2M, data is seeping out of every nook and cranny of our corporate and personal lives. However, harnessing data and turning it into a valuable asset is still in its infancy stage of development.  In a recent study, IDC estimates that only 5% of data created is actually analyzed.…

Added by Vincent Granville on April 5, 2015 at 2:30pm — No Comments

# 1.Introduction

In this post, we’ll use an unsupervised machine learning technique called kmeans clustering to find naturual structures in our data. In the other blog posts, we used supervised machine learning techniques like logistic regression and linear regression to predict car prices or …

Added by Peter Chen on April 4, 2015 at 6:00pm — No Comments

### Cloud Based Analytics - Coming Clash of the Titans

From all indications, 2015 is well on its way to becoming the year of cloud computing. The feverish pitch of activities at key players on one hand and the data as well as observations of industry pundits affirm this. There are apparently a handful of reason to keep the IT industry leaders awake at night.

For starters, per , 2014 revenues for cloud services grew by 60 percent.  The global cloud computing market, per Forrester, is expected to grow to over \$191 billion by 2020. IDC…

### Mega collection of data science books and terminology

More than a thousand keywords with detailed explanations, and hundreds of machine learning / data science books categorized by programming language used to illustrate the concepts.

Here's a selection of keywords, from the mega-list

10 keywords starting with A, this is indeed a small subset of all the keywords starting with…

Added by Mirko Krivanek on April 3, 2015 at 11:30am — No Comments

