Subscribe to DSC Newsletter

Add a criteria to the dataset in clustering?


I'm entirely new to ML and data science, so my inquiry might be somewhat senseless. I have a dataset, each column is a vector [a1,a2,a3,a3,...,an]. Those vectors are diverse in their estimations as well as in number of n and the aggregate A = a1 + a2 + a3 +...+ an.

A large portion of the vectors have 5-6 measurements, with some special case at 15-20 measurements. By and large, their parts regularly have estimation of 40-50.

I have attempted Kmeans, DBSCAN and GMM to bunch them:

Kmeans general gives the best outcome, in any case, for vectors with 2-3 measurements and vectors with low An, it regularly misclassifies.

DBSCAN can just separate vector with low measurement and low A from the dataset, the rest it regards as clamor.

GMM isolates the vectors with 5-10 measurement, low A, decent, yet performs ineffectively on the rest.

Presently I need to incorporate the data of n and An into the procedure. Case: - Vector 1 [0,1,2,1,0] and Vector 2 [0,2,4,5,3,2,1,0], they are differents in both n and A, they can't be in a similar group. Each bunch just contains vectors with similar(close esteem) An and n, before considering their segments.

I'm utilizing sklearn on Python, I'm happy to hear recommendation and guidance on this issue.

Thank you


Views: 98

Reply to This

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service