Subscribe to DSC Newsletter

Here, I've used the famous Iris Flower dataset to show the clustering in Power BI using R. I've used the K-means clustering method to show the different species of Iris flower.

About the datasetThe Iris dataset has 5 attributes (Sepal length, Sepal width, Petal width, Petal length, Species). The 3 different species are named as Setosa, Versicolor and Virginica. It is observed that, the Petal Length and Petal Width are similar in each Species, hence I have considered Petal Length for x axis and Petal Width for y axis to plot a graph.

K-means Clustering K means is a non-hierarchical iterative clustering technique.In this technique we start by randomly assigning the data points to clusters. We know that there are 3 different species in our data set, so I have taken 3 clusters. The algorithm will start assigning each data points to these 3 clusters. Then it calculates the distance between each data point to the assigned cluster centroids using 'Eluclidian Space'.  According to the distance rearrange the centroid. This process is done iteratively until the clusters become stable and there are no data points to be moved.

R visual: In the visual we can see the how the species are separated after clustering. Here 1 is Setosa, cluster 2 is Versicolor and cluster 3 is Virginica. We can also see that algorithm wrongly assigned few data points in Versicolor and Virginica.

Drawback: We see that after clustering few data points belonging to Setosa are seen in Versicolor and vice-versa. However this clustering is more suitable for unsupervised learning and when we have a large dataset.

Code:

  • require('ggplot2')
    library(ggplot2)
    set.seed(20)
    iris<- kmeans(dataset[ ,3:4], 3, nstart=20)
    Clusters<- as.factor(iris$cluster)
    ggplot(dataset, aes(PetalLength, PetalWidth, color = Clusters)) + geom_point(shape = 17, size = 4)

Views: 2276

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Dan Butorovich on March 14, 2018 at 12:19pm

Why use Power BI at all? Or why not just use Power BI? In the first case I can send this out to an R markdown file and have it go straight to html/CSS in a dashboard that is open source and easily edited with a CSS editor. Seems like a chore to load it through Power BI. I guess if your company is already committed to Power BI and using that platform you could do this. In the second case Power BI already has a function to do this... See: https://powerbi.microsoft.com/en-us/blog/tag/clustering/

Thanks,  Dan

Comment by Norberto J. Sanchez on March 10, 2018 at 2:29pm

Sorry, but I don't understand how Power BI is used in this example.

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service