Damian leads a team that researches AI robustness, safety and security. What does this mean? They spend their time developing breakthrough methods to stress test and break machine learning algorithms.
This, in turn, shows us how to protect these same algorithms; from intentional misuse, and from natural deterioration. It also lets us understand how to strengthen their performance under diverse conditions.
This is to say, to make them ‘robust’.
The Superalignment initiative aligns well with our research at Advai. Manual testing of every algorithm and for every facet of weakness isn’t feasible, so – just as OpenAI have planned, we’ve developed internal tooling that performs a host of automated tests to indicate the internal strength of AI systems.
“It’s not totally straightforward to make these tools.” Damian’s fond of an understatement.
The thing is, trying to test for when something will fail is traying to say what something can't do.
You might say 'this knife can cut vegetables’. But what if you come across more than vegetables? What can’t the knife cut? Testing when a knife will fail means trying to cut an entire world of materials, categorising ‘things that can be cut’ from ‘everything else in the universe’. The list of things the knife can’t cut is almost endless. Yet, to avoid breaking your knife (or butchering your item) you need to know what to avoid cutting!
To be feasible, one needs shortcuts in conducting these failure mode tests. This is where automated assurance mechanisms and Superalignment comes in. There are algorithmic approaches to testing what we might call the ‘negative space’ of AI capabilities.
This might sound difficult - and it is, controlling what an algorithm does is hard, but controlling what it doesn’t do is harder. We’ve been sharing our concerns about AI for a few years now: they have so many failure modes. These are things businesses should be worrying about because there is a pressure to keep up with innovations.
There are so many ways that a seemingly accurate algorithm can be vulnerable and can subsequently expose its users to risk. Generative AI and large language models like Chat GPT-4 make it harder still because these models are so much more complex and guardrail development is reciprocally much more challenging.