13 Sep 2023

AI Powered Cybersecurity: Leveraging Machine Learning for Proactive Threat Detection 

Every day the attack surface of an organisation is changing and most likely growing.

An environment where petabytes of both traditional and AI enhanced data are transferred across private and public networks creates a daunting landscape for cybersecurity professionals.  This data rich world is now even more accessible to cyber criminals as new AI-enabled strategies facilitated by open-source tooling become available to them.

How is the modern CISO, IT manager or cybersecurity professional meant to keep up? The answer, probably unsurprisingly, is that to detect, and deal with these new threats, AI is also the solution.

Words by

Chris Jefferson

A new Environment 

Every day the attack surface of an organisation is changing and most likely growing. In the current 5g enabled world, edge computing is becoming cheaper and more powerful allowing users to work from anywhere. AI, as well as gaining traction within the network perimeter is also moving out from the servers behind an organisations firewall to the chips embedded in the devices on the edge. Combine that with the cultural changes in a post covid world and the impact of remote working, and we have a very dynamic environment that poses a challenge for traditional cyber controls, but also for those responsible. An environment where petabytes of both traditional and AI enhanced data are transferred across private and public networks, creates a daunting landscape for cybersecurity professionals. 

This data rich world is now even more accessible to cyber criminals as new AI-enabled strategies facilitated by open-source tooling become available to them. Using AI, a new generation AI enhanced attacks ranging from brute-force password attacks which can defeat captcha to social engineering attacks can now be automated at scale, powered by deep learning and big data. Given that challenge how is the modern CISO, IT manager or cybersecurity professional meant to keep up? 

The answer they must adapt and adopt the new tools that are becoming available. AI enabled tools that can identify patterns and behaviours that traditional rules may not and recognise the attacks that even a seasoned professional may not spot in time. 

This article was originally published in the PECB Magasine: PECB Insights – ISSUE 44 / JU

https://insights.pecb.com/pecb-insights-issue-44-july-september-2023/#page14LY – SEPTEMBER 2023 - PECB Insights

The rise of the Large Language Model (LLM) 

So, what has sparked this step change in the AI debate. LLMs weren’t the first AI models to be released, or even used in Cyber Security, but they have taken off like a rocket, since the release of Open Ai’s ChatGPT. Transformers, the technology that LLMs have grown from has been around for several years, but Open AI managed a rare feat. They created a highly scalable model, with an intuitive interface, that generalised well, and most importantly caught the publics imagination. Trained on a wide corpus, collected from the internet, and refined based on human feedback, ChatGPT became the first AI model that really sparked the AI debate and changed the lexicon being considered by regulators. 

They also single handily created a new job role for budding tech enthusiasts, ML engineers and red teamers. Prompt Engineering. It quickly became apparent that you could both improve performance of Chat GPT and exploit using well crafted prompts. Prompts that could enforce structure and reason or prompts that could change the model’s “personality” and get it to leak sensitive information, hallucinate, or become abusive. Within the space of months, a new technology rose, ushered in a new career path, and simultaneously created a new value chain and vectors for attack. 

The open-source community soon burst onto the scene too releasing a vast array of comparable models. The aim to advance research and democratise AI. For better or worse, they succeeded. It wasn’t long before researchers trawling the dark web discovered ChatGPT’s dark doppelganger in the form of FraudGPT. A model that to quote a web posting advertised “exclusive tools, features and capabilities” and “no boundaries.” While Open AI is aligning ChatGPT to prevent harms, the dark web is doing the opposite. 

But this isn’t the only example of a generative AI that has become a threat. Deepfakes for voices and imagery, generative models for create images based on a prompt, and now multi-modal models, (which use various modes of data such as images, text, videos) provide an incredibly versatile toolkit, for an attacker. 

This highlights something important. That cyber criminals will be able to now leverage the vast quantities of data available to them to make ever more ingenious attacks, tools and evolve techniques. So, the question becomes as the custodian of cyber security how do you rise to the challenge? 

 Why it is needed.

The answer, probably unsurprisingly, is that to detect, and deal with these new threats, AI is also the solution. With the release of such powerful, and more generalisable models that can be fine tuned to multiple use case, a new arms race has begun. An AI enabled arms race where data is both commodity and weapon, and AI algorithms and models are the strike-teams. 

The challenge faced currently is that the problem is often to broad, persistent, and complex to easily be solved by a single team that is meant to monitor activity within a companies’ network. They are responsible for all the assets within an organisation and must track the interactions that users are having with other systems, correspondences outside the organisation and the states of data and servers. Add false positives and the reality that its not if but when a cyber attack is successful and it becomes clear why many cybersecurity workers are stressed, burnt out and considering a career change. 

This is where AI enabled threat detection can lighten the load. 

"These reinforcement learning (RL) models are commonly used as part of cyber security tooling, and when combined with rules and command sets for infrastructure can become a key ally. "

DALL·E 2023 09 13 13.03.27 Data Rich World, Cyber Criminals (1)

The power of AI  

AI models come in many flavours. In the context of cyber security, they can be used to classify, predict, generate instructions, or detect and action behaviours. This means that when trying to use an AI enabled system there is likely a combination of many models that are used to classify that a file is malicious, predict the next node it will infect, generate an instruction set to the system, or deploy an agent to shutdown the behaviour. The beauty of these models is that they know what they have learnt very well. Over thousands of generations that have been optimized to be able to perform there task optimally. As such they can easily identify nuances or patterns that as humans we don’t see or discount.  

The other flavour of AI instead learns a policy or strategy to outcompete its adversary or achieve a goal in an environment. In its quest to achieve its objective and get a reward, it learns strategies that can contain an attacker or protect a network from attack. These reinforcement learning (RL) models are commonly used as part of cyber security tooling, and when combined with rules and command sets for infrastructure can become a key ally. 

The thing both these systems share, is speed, and generally accuracy. When working in an environment that is controlled and dealing with expected ranges of data these AI enabled system can bring the autonomy and insight that traditional rules-based systems may lack, and can be applied to the following areas: 

Autonomous detection of threats (Darktrace, LogRhythm)

Malware detection (Cisco) 
Spam Filters (Proofpoint) 
Complex Patterns of behaviour (CrowdStrike)

Automation of Tasks (JASK)

Autonomous Endpoint protection (Cylance)

Many of these solutions are available across the providers highlighted above and are also bundled into enterprise level cyber platforms, which are also provide by the systems providers like IBM, Microsoft, and Google, to name a few. 

If we were to add generative AI into the mix, we also have the potential for an incredibly powerful toolchain that can detect threats, manoeuvre them into desired locations and using generative AI keep the attacker busy to learn from the behaviour by generating data to keep them interested. A fascinating avenue of research. 

The Risks  

Unfortunately, that doesn’t mean we are home free, as with any new piece of software we are introducing a new vector for an adversary.  

Data is the most common vector in AI and more so with cyber tools. As well as requiring the infrastructure to maintain data that could be used to train and improve AI models, the data itself can be poisoned, which allow backdoors to be injected into the models that we are relying on. Add to this the domain of Adversarial AI which creates perturbed data or behaviours to invoke failure in AI models, and a new vector is born. 

AI can also be incredibly dumb, as they can only recognise patterns or behaviours they have been trained on. Introduce an adversarial behaviour into an attack and an RL agent trained over thousands of generations to defend the kingdom, could unlock the gates, and let the invaders in. If an agent is given too much agency this can become a significant risk, and another attack vector. 

Finally, there is the human aspect. If the AI is good, and we hope it is, we in time become over reliant on it and stop thinking critically about the actions or recommendations it is making. This is a potential for Generative AI that could be producing code or instructions sets. It is incredibly powerful but will likely be prone to hallucination, or potentially creating insecure code. If we have not implemented procedures to provide oversight for this, then the AI is truly in control. 

New Challenge ahead 

So, considering these opportunities and risks how do we proceed? Well, now that we are aware of some of the bigger risks this leads us into the responsible adoption and deployment of AI systems. Not using AI will leave you exposed to those that do, and using it means at the best you are freed up to do higher value work instead of monitoring all your systems in depth daily. 

To adopt AI systems the key things that will prime you for success are: 

- - - Data; 
  - - Infrastructure; 
  - - Education; 
  - - Governance.

Implementation leads to the next generation of challenges for cybersecurity professionals. Complex attacks and highly autonomous systems will prove a challenge to interpret for even skilled teams. 

-Phishing attacks are going to change, data used to enable social engineering will be come more common. Voice cloning, knowledge graphs and natural language. 
-Previously complex attacks will become faster, automated, and more common place, making human intervention difficult. 
-Your AI will fail at some point. Alpha Go was defeated by adversarial strategies it had never seen before that confused it and resulted in it making consecutive losing choices.

This means that the ability to detect these sophisticated threats will become ever more import, and the need to have the right level of visibility into you AI enabled tools will too. 

Observability and Governance 

This need to understand what the AI is doing, is not unusual. As humans we seek to understand how a system works, and through that understanding we develop the skills to maintain, fix and improve. To achieve this any AI enabled threat detection needs to be observable and explainable. 

-Why did it choose to classify one file as a threat and not the other?  
-Why was a certain behaviour flagged as a cyber attack?  
-What were the sets of features in the data it made its decision on?

These are simple question for you and me to answer, but not as easy for a Blackbox AI. This then leads onto other question of systems that a business may be more interested in: 

-Which AI system is good at what?  
-What was the latest data it was trained on?  
-Does the latest model perform as well as the previous one?

 That’s a lot of things to track.  AI needs its own governance to ensure this information is captured enabling these questions to be answered, but also to align it with the values and objectives that a business requires. 

This requires observability of an AI system and when required explainability to provide an audit of actions taken. Add to this the advent of global regulations for AI and there also arrives a driver to be able to prove that AI enabled systems are functioning responsibly. This further drives the requirement for observability but also governance of the AI Systems and Data within an organisation. 

Conclusion

So, what is the main take away? The threat landscape is evolving, but so too is the tooling to help defend. Adoption of AI enabled systems is an obvious choice to help mitigate, these challenges and reduce the stress on cybersecurity teams. However, AI needs to be evaluated and governed so AI adoption must be considered as an organisation and to gain the most benefit, organisations must ensure that AI is aligned to the needs, objectives, and controls to avoid it becoming a new vector of attack and cause of failure. 

References

Rise of the LLM 

https://insurancenewsnet.com/innarticle/ai-enabled-cybercrime-becoming-more-effective-insurance-experts-say 

https://www.infosecurityeurope.com/en-gb/blog/threat-vectors/generative-ai-dark-web-bots.html 

https://www.tomshardware.com/news/dark-web-chatgpt-unleashed-meet-darkbert 

Why it is needed. 

https://www.deepinstinct.com/voice-of-secops-reports 

https://www.csoonline.com/article/649944/generative-ai-fueling-significant-rise-in-cyberattacks.html 

New Challenge ahead 

https://www.computerworld.com/article/3043562/googles-alphago-loses-in-fourth-go-game-by-s-korean-player.html 

Observability and Governance 

AI has a place in cyber, but needs effective evaluation | Computer Weekly 

Global moves to regulate artificial intelligence: AI news | World Economic Forum (weforum.org) 

Who are Advai?

Advai is a deep tech AI start-up based in the UK that has spent several years working with UK government and defence to understand and develop tooling for testing and validating AI in a manner that allows for KPIs to be derived throughout its lifecycle that allows data scientists, engineers, and decision makers to be able to quantify risks and deploy AI in a safe, responsible, and trustworthy manner.

If you would like to discuss this in more detail, please reach out to [email protected]

Learn Article

11 Sep 2024