Skip to main navigation Skip to search Skip to main content

Prevailing technical limitations against GPAI misuse

Activity: Talk or presentation typesKeynote or plenary presentation

Description

As a guest speaker at the EU-funded NOTIONES Project (a pan-European ecosystem of security and intelligence practitioners), I explained why prevailing technical safeguards against the malicious use of AI are inherently limited. NOTIONES Project fosters discussions on emerging technologies to anticipate threats and develop countermeasures. (https://www.notiones.eu/)

The presentation introduced different elements of the two approaches that attempt to prevent misuse:

-𝗖𝗹𝗼𝘂𝗱-𝗕𝗮𝘀𝗲𝗱 𝗔𝗰𝗰𝗲𝘀𝘀 𝗖𝗼𝗻𝘁𝗿𝗼𝗹 – End-Point Providers can restrict interactions by monitoring in- and output traffic or request behaviorHowever, this 𝗰𝗼𝗻𝘁𝗿𝗼𝗹 𝘃𝗮𝗻𝗶𝘀𝗵𝗲𝘀 with the 𝗰𝗹𝗼𝘀𝗶𝗻𝗴 𝗴𝗮𝗽 𝘁𝗼 𝗼𝗽𝗲𝗻-𝘄𝗲𝗶𝗴𝗵𝘁 𝗺𝗼𝗱𝗲𝗹𝘀 and continues 𝗵𝗮𝗿𝗱𝘄𝗮𝗿𝗲 𝗮𝗻𝗱 𝗮𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺𝗶𝗰 𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻 that enables accessibility without vendors.
-𝗠𝗼𝗱𝗲𝗹 𝗔𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 – Training AI to be "helpful, harmless, and honest" reduces undesired outputs, but adversarial attack surfaces, from model steering to in-context learning, 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲 𝗰𝗼𝗺𝗽𝗲𝘁𝗶𝗻𝗴 training 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲𝘀 and the 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 of reward-based safety training.

Even with improved alignment, adversarial training techniques designed to bypass safety mechanisms reveal the inherent weakness of model-level defenses.
Period26 Feb 2025
Held atBA4603 Quantitative science and technology studies

Keywords

  • AI safety
  • AI Alignment
  • AI Strategy