In todays world majority of information is generated by self sustaining systems like various kinds of bots, crawlers, servers, various online services, etc. This information is flowing on the axis of time and is generated by these actors under some complex logic. For example, a stream of buy/sell order requests by an Order Gateway in financial world, or a stream of web requests by a monitoring / crawling service in the web world, or may be a hacker's bot sitting on internet and attacking various computers. Although we may not be able to know the motive or intention behind these data sources. But via some unsupervised techniques we can try to infer the pattern or correlate the events based on their multiple occurrences on the axis of time. Thus we could automatically identify signatures of various actors and take appropriate actions.
Sessionisation is one such unsupervised technique that tries to find the signal in a stream of events associated with a timestamp. In the ideal world it would resolve to finding periods with a mixture of sinusoidal waves. But for the real world this is a much complex activity, as even the systematic events generated by machines over the internet behave in a much erratic manner. So the notion of a period for a signal also changes in the real world. We can no longer associate it with a number, it has to be treated as a random variable, with expected values and associated variance. Hence we need to model """"Stochastic periods"""" and learn their probability distributions in an unsupervised manner.
In this talk we will do a walk through of a real security use cases solved via Sessionisation for the SOC (Security Operations Centre) centre of an international firm with offices in 56 countries being monitored via a central SOC team.
In this talk we will go through a Sessionisation technique based on stochastic periods. The journey would begin by extracting relevant data from a sequence of timestamped events. Then we would apply various techniques like FFT (Fast Fourier Transform), kernel density estimation, optimal signal selection, Gaussian Mixture Models, etc. and eventually discover patterns in time stamped events.
Key concepts explained in talk: Sessionisation, Bayesian techniques of Machine Learning, Gaussian Mixture Models, Kernel density estimation, FFT, stochastic periods, probabilistic modelling