Jia Fu
Doktorand
Contact Jia
Multimodal large language models can draft reports, analyze images, and aid decisions, yet they remain surprisingly vulnerable. Tiny, imperceptible adversarial perturbations can trigger drastic errors, yielding nonsensical or harmful outputs and undermining reliable, safe AI deployment.
For years, researchers have been searching for defense schemes. Initial attempts focused on adversarial training, essentially showing the model thousands of poisoned inputs to teach it to resist. While effective within their training limits, these methods are computationally expensive and struggle to generalize to unseen types of attacks.
A more elegant solution has emerged: adversarial purification. Instead of constantly retraining the large models, this "plug-in" approach cleans the malicious input before it reaches the model, acting like a digital decontamination. Generative models, especially diffusion models, have proven highly effective in this scenario. However, they suffer from substantial slowdowns in real-time deployments, since they require a fixed and long "purification time".
New research at RISE, DiffCAP (Diffusion-based Cumulative Adversarial Purification), introduces a novel solution by solving the scalability-reliability trade-off. The core contribution of DiffCAP is its ability to dynamically determine the minimal necessary purification time for each individual image.
Here’s how it works:
By not relying on a one-size-fits-all duration, DiffCAP uses significantly fewer diffusion steps than prior methods, resulting in an average purification time of only around 1 second per image.
The empirical evidence is compelling. Evaluated on various large VLMs, datasets, perturbation strengths, and tasks, DiffCAP consistently outperforms existing defenses by a substantial margin. Furthermore, it remains highly functional even against adaptive attacks, where the adversary has full knowledge of the defense mechanism. DiffCAP's success in mitigating the tension between robustness, efficiency, and image quality is a major benefit for the AI community. The original manuscript of DiffCAP can be found at https://arxiv.org/pdf/2506.03933.
Looking ahead, we'll push toward joint defenses across modalities (image, text, audio, etc.) for large AI models. Our researchers from Computer Science at RISE and the unit for Data Analysis together with the efforts by The Center for Applied AI at RISE are also focused on broader trustworthiness challenges beyond adversarial attacks. This includes guaranteeing performance with open sets (test-time classes unseen in training), domain shifts (deployment data out of distribution from training), and noisy labels (imperfect or inconsistent data annotations).
Our ambition is clear. We want to build AI that deploys efficiently, runs safely, and operates sustainably.
Please do reach out if you would like to discuss this further.
Image 1: Model output is misled by adversarial perturbations.
Image 2: DiffCAP removes adversarial noise, and the model responds correctly.