In this blog post, we take a look at some work I did with my colleague Jin Yu to explain how data science techniques such as sequential pattern mining can detect coordinated network threats such as watering hole attacks. Watering hole attacks target a group of users in an organization by attacking the most popular websites among these users.
By examining outbound network traffic data, researchers can analyze potential threats after they undertake data preprocessing steps. These include host name normalization, filtering invalid host names, identifying unpopular domains, and user-specific sessionization. By using Sequential Pattern Mining (SPM), researchers can identify sequential patterns in time-ordered data, the quality of which is measured by its support and confidence.
Detecting watering hole attacks requires that the researcher identify low support and high confidence sequential patterns of domains. Uncommon domain sequences showing low support that are more likely to result in a redirect pattern that can lead users to a comprised domain.
The modeling process follows as such:
Create time-ordered domain sequences such as sessionized data
Given a list of targeted domains, select subset of sequences containing those domains
Find high confidence, low support sequential patterns of targeted domains in parallel
To perform these steps, we distribute the sequences across multiple servers using Pivotal Greenplum Database (GPDB). The massive parallel processing (MPP) architecture of GPDB enables researchers to run a modified Sequential Pattern Mining algorithm (m-SPM) on sequences residing in different segment servers in parallel to detect correlated domains. This helps researchers to quickly and efficiently identify sequences that are potentially part of watering hole attacks.
Not only can this approach identify watering hole attacks through sequential pattern mining, the approach can proactively identify patterns that may represent such attacks in the future. New patterns can be gleaned from incoming proxy logs, in this case taken from a large enterprise network. In this way, sequential pattern mining can identify not only current or historical watering hole attacks, but also help predict behavior patterns that could lead to future watering hole attacks.
For more on this case study, see the full story on the Pivotal Data Science blog.