Every data has an inherent natural geometry associated with it. We are generally influenced by how the world visually appears to us and apply the same flat Euclidean geometry to data. The data geometry could be curved, may have holes, distances cannot be defined in all cases. But if we still impose Euclidean geometry on it, then we may be distorting the data space and also destroying the information content inside it.
In the space of BigData world we have to regularly handle TBs of data and extract meaningful information from it. We have to apply many Unsupervised Machine Learning techniques to extract such information from the data. Two important steps in this process is building a topological space that captures the natural geometry of the data and then clustering in that topological space to obtain meaningful clusters.
This talk will walk through "Data Geometry" discovery techniques, first analytically and then via applied Machine learning methods. So that the listeners can take back, hands on techniques of discovering the real geometry of the data. The attendees will be presented with various BigData techniques along with showcasing Apache Spark code on how to build data geometry over massive data lakes.