Home » Uncategorized

Topology Data Analysis (TDA)

Topology is the branch of pure mathematics that studies the notion of shape.  In the context of large, complex, and high dimensional data sets, topology takes on two main tasks, the measurement of shape and the representation of shape.  One can measure shape related properties within the data, and create compressed representations of data sets retaining features which reflect the relationships among the points in the data set. The representation is in the form of a topological network or combinatorial graph.  In the study of high dimensional and complex data sets, combinatorial representations provides a compressed representation of the data that retains information about the geometric relationships between data points. Also, the representations are a useful and simple way to examine the data, as well as understand the primary variables characterizing various subgroups.  The three properties of topological analysis include: coordinate invariance, deformation invariance and compressed representations. 

 

Topological Data Analysis (TDA) allows you to interact with and represent structured and unstructured data through a topological network. A topological network provides a map of all the points in the data set, so that nearby points are more similar than distant points and clarifies the structure of the data set without having to query it or to perform any algebraic analysis on only a subset of variables. In essence, one can discover the true meaning of the data by analyzing a compressed representation of the data set retaining all of the subtle features and data points that have a degree of similarity to each other.

Topological networks are a framework for Machine Learning.   A topological network represents data by grouping similar data points into nodes, and connecting those nodes by an edge if the corresponding collections have a data point in common. Because each node represents multiple data points, the network gives a compressed version of extremely high dimensional data. Topological networks allow individuals to easily examine machine-learning outputs and understand the “shape” of complex data sets.
 
Topological methods provide a quick way to understand the structure of the data and obtain knowledge from data. Topology can be used to develop methods for recognizing shapes, which it does through a set of tools called homology or for “point clouds” called persistent homology.  Point cloud processing is extremely efficient when dealing with huge data volumes, because one uses patterns occurring in a shape to distinguish shapes from each other.  With point clouds, all points have the same components (which are the analogs of the feature attributes), and the data types of the components are the same. This allows processing points not one by one, but in huge chunks, without inspecting whether there are any differences in their data structure.