10 November 2023

Exploring red teaming to identify new and emerging risks from AI foundation models

Marie-Laure Hicks, Ella Guest, Jess Whittlestone, Jacob Ohrvik-Stott, Sana Zakaria, Cecilia Ang, Chryssa Politi, Imogen Wade, Salil Gunashekar

On 12 September 2023, RAND Europe and the Centre for Long-Term Resilience organised a virtual workshop to inform UK government thinking on policy levers to identify risks from artificial intelligence foundation models in the lead up to the AI Safety Summit in November 2023. The workshop focused on the use of red teaming for risk identification, and any opportunities, challenges and trade-offs that may arise in using this method.

The workshop brought together a range of participants from across academia and public sector research organisations, non-governmental organisations and charities, the private sector, the legal profession and government. The workshop consisted of interactive discussions among the participants in plenary and in smaller breakout groups. The views and ideas discussed at the workshop have been summarised in this short report to stimulate further debate and thinking as policy around this topical issue develops in the coming months.

Key Findings

The discussion focused on the following themes associated with the use of red teaming with AI foundation models to identify risks:
  • The term 'red teaming' is loosely used across the global AI community. A crucial first step is to develop a clear and shared taxonomy, along with shared norms and good practice around red teaming, for example, regarding who to involve, how to implement it and how to share findings.
  • Red teaming is one specific tool that is part of the wider risk identification, assessment and management toolbox. It is not a governance mechanism in itself.
  • Red teaming is useful in certain cases, in particular medium-term risks and assessment of known risks. Key limitations of red teaming included identifying unknown or chronic risks.
  • The socio-technical aspect of red teaming – who does it and in what context – must be actively considered. Embedding a diversity of perspectives, with deep understanding of the risks, the domain, and the actors or adversaries, is essential to improve a red team's effectiveness.
  • Specific methods such as red teaming should not be the focal point of mandated risk-management activities. If mandates are put in place, they should instead focus on holistic approaches and risk-management frameworks.

No comments: