K-means is a centroid based algorithm that means points are grouped in a cluster according to the distance(mostly Euclidean) from centroid.

### Centroid-based…

Continue
Added by satyajit maitra on July 1, 2019 at 6:30am —
No Comments

Data scientists and predictive modelers often use 1-D and 2-D aggregate statistics for exploratory analysis, data cleaning, and feature creation. Higher dimensional aggregations, i.e., 3 dimensional and above, are more difficult to visualize and understand. High density regions are one example of these N-dimensional statistics. High density regions can be useful for summarizing common characteristics across multiple variables. Another use case is to validate a forecast prediction’s…

Continue
Added by Rohan Kotwani on January 3, 2019 at 4:00pm —
No Comments

K-means algorithm is a popular and efficient approach for clustering and classification of data. My first introduction to K-means algorithm was when I was conducting research on image compression. In this applications, the purpose of clustering was to provide the ability to represent a group of objects or vectors by only one object/vector with an acceptable loss of information. More specifically, a clustering process in which the centroid of the cluster was optimum for the cluster and the…

Continue
Added by Faramarz Azadegan on October 31, 2018 at 7:06am —
No Comments

When I was beginning my way in data science, I often faced the problem of choosing the most appropriate algorithm for my specific problem. If you’re like me, when you open some article about machine learning algorithms, you see dozens of detailed descriptions. The paradox is that they don’t ease the choice.

In this article, I will try to explain basic concepts and give some intuition of using different…

Continue
Added by Luba Belokon on October 26, 2017 at 6:00am —
No Comments

### Today, many companies use big data to make super relevant recommendations and growth revenue. Among a variety of recommendation algorithms, data scientists need to choose the best one according a business’s limitations and requirements.

To simplify this task, my team has prepared **an overview of the main existing recommendation system…**

Continue
Added by Luba Belokon on July 28, 2017 at 4:00am —
No Comments

**1.Objective**

First of all we will see what is R Clustering, then we will see the Applications of Clustering, Clustering by Similarity Aggregation, use of R amap Package, Implementation of Hierarchical Clustering in R and examples of R clustering in various fields.

**2. Introduction to Clustering in…**

Continue
Added by Sheetal Sharma on July 19, 2017 at 9:00pm —
No Comments

**Text Analytics with Python -- A Practical Real-World Approach to Gaining Actionable Insights from your Data**

Text analytics can be a bit overwhelming and frustrating at times with the unstructured and noisy nature of textual data and the vast amount of information available. "Text Analytics with Python" published by Apress\Springer, is a book packed with 385 pages of useful information based on techniques, algorithms,…

Continue
Added by Dipanjan Sarkar on July 14, 2017 at 4:00am —
No Comments

Graphs belong to the field of mathematics, graph theory. For data analysis that requires searches of particular patterns, graph-based data mining becomes an important technique. Indeed, in real life, most of the data we have to deal with can be represented as graphs. A typical graph consists of vertices (nodes, cells), and of edges that…

Continue
Added by jwork.ORG on June 19, 2017 at 5:30pm —
No Comments

The below is an example of how **sklearn** in Python can be used to develop a k-means clustering algorithm.

The purpose of k-means clustering is to be able to partition observations in a dataset into a specific number of clusters in order to aid in analysis of the data. From this perspective, it has particular value from a data visualisation perspective.

This post explains how to:

*Import kmeans and PCA through the sklearn…*

Continue
Added by Michael Grogan on June 17, 2017 at 8:00am —
9 Comments

We frequently get questions about whether we have chosen all the right parameters to build a machine learning model. There are two scenarios: either we have sufficient attributes (or variables) and we need to select the best ones OR we have only a handful of attributes and we need to know if these are impactful. Both are classic examples of feature engineering challenges.

Most of the…

Continue
Added by BR Deshpande on April 16, 2016 at 9:00am —
No Comments

Cluster Analysis is a common technique to group a set of objects in the way that the objects in the same group share certain attributes. It’s commonly used in marketing and sales planning to define market segmentations.

Here at BigObject we adopt a simple approach to exploring the similarities between…

Continue
Added by Yuanjen Chen on October 2, 2015 at 1:21pm —
No Comments

Let's say a set of documents 'S' has a large set of 'pure' texts.

On all documents in S, I am spelling normalisation method, which yields a normalised set S'.

Then I use the chosen method M (which method? ) to make clusters in S, obtaining a clustering result C.

Then I use the same method M to make clusters in S', obtaining a clustering results C'.

Finally I need to compare if there are statistically significant differences between C and C'.

Any help in identifying…

Continue
Added by MUSHTAQ AHMAD on May 25, 2015 at 11:48am —
3 Comments