Why both Cybersecurity and IAM require Identity-centric Behavioral Analytics using modern Machine Learning approaches.
“Eyes without a face…got no human grace…your eyes without a face…” The British punk-rocker Billy Idol said it all in his 1983 song about the most prominent problem today enterprise’s face when using cybersecurity technologies. They detect footprints, sometimes stretching to the behaviour of such footprints, but have no idea of the identity (the ‘face’) behind it and the overall behavioural insights.
The output of such an approach: enterprises spend, on average, $1.3 million a year only for dealing with false-positive cybersecurity alerts (www.securityweek.com/false-positive-alerts-cost-organizations-13-million-year-report). Moreover, the majority of data breach/compliance violations are carried out by known users on sensitive data sitting on business applications such as Collaboration tools, ERP & CRM, etc.
Besides the Covid-WFH-lack of physical perimeter effect, the one and only possible solution to deal efficiently with this situation is to use identity-enriched data streams to define multiple Indicators of Behavior (IoB) for each identity, built using unsupervised Machine Learning (ML) models.
The key ingredient: Machine Learning
Machine learning software learns the relevant features of the system under study directly from data to solve a series of complex tasks that cannot be described in terms of simple mathematical equations; in other words, without being explicitly programmed to do it. Given a series of samples, a machine learning algorithm is able to detect hidden variables and correlations, extract them, and use them to make predictions, cluster, and anomaly detection. While this procedure has been tested in a growing number of scientific fields, including medicine, biology, physics, and economics, the analysis of users’ behaviour is a new approach and is the first step to detect possible malicious anomalies.
ML at work
A fundamental requirement for all ML tasks is the availability of a meaningful database. It is necessary to build a behavioural baseline for each user and the data stream should be enriched with all possible relevant details to increase the precision of the linking between person and activity.
The second phase is the so-called learning phase: the algorithms should analyze customer data to determine the behavioural baselines of all users that have access to the applications: both internal employees and insiders, whether they are accidental or compromised.
Once the algorithms have learned the standard behaviour of users and determined the key elements to test such behaviour, the algorithms should incorporate new and real-time data coming from a live user’s stream of activities. By combining these different indicators trained during the learning phase, one anomalous and malicious behaviour can be spotted as suspiciously deviating from the behavioural baselines observed before.
What is “a modern ML approach”?
With respect to the initial UEBA approaches, the modern approach has a number of advantages:
- Nowadays, threat detection is performed by looking at known attack patterns ex-post. Since machine learning is designed to infer directly from data which kind of behaviour is normal and what is anomalous, there will be no more need to set thresholds based on the subjective rule of thumb and (often biased) past experiences. On the contrary, a modern approach removes most, if not all, of the human intervention in the analysis of users’ behaviour. The modern algorithms are able to detect what is abnormal on the basis of what they have seen before, making data speak for themselves.
- Since the algorithms are trained on proprietary data, the analysis is tailored to the specific structure and needs, decreasing the time of intervention as well.
- Users can be observed over a period of time: the users can be tracked not only during its use of an application but also by checking the chronological patterns of its operations, providing the highest resolution of his behaviour.
- Each anomaly score (deviation from the individual baselines) can be redefined in probabilistic terms for every single indicator. This allows not only to assess better the likelihood of each single user operation to be “normal” or not, but also, and more importantly, to combine the different indicators and, as a result, to assess the probability that the whole behavioural pattern is anomalous or not with respect to the individual learned baseline.
A modern approach to identity behavioural insight represents a new paradigm for the analysis of users’ behaviour. It offers unprecedented opportunities to improve both knowledge and security of companies, and tears down the barrier between security and identity, offering a new ‘lingua-franca” for implementing durable Zero trust Identity-centric architectures.
About the Authors:
Andrea Zaccaria, Sharelock Shareholder & AI Advisor. Andrea is a researcher at the Institute for Complex Systems (ISC)-CNR. Author of more than 30 papers published in peer-reviewed journals, he participated in more than 20 international conferences, including eight invited talks. He received his PhD in Physics at the “Sapienza” University of Rome, where he applied concepts and methods borrowed from Statistical Physics and the Physics of Complex Systems to the study of financial markets.Andrea Rossi, Sharelock Shareholder & Growth Advisor. Andrea is a Senior Cybersecurity and Identity Management executive. His operating experience includes start-ups as well as large, multi-national corporations as a result of successful company exits. As co-founder of CrossIdeas, he led a team that quickly rose to be a recognized industry leader, resulting in a successful acquisition by the IBM Corporation in 2014.