Factored and MLCommons scale adversarial testing for T2I models, ensuring safe, ethical generative AI outputs.

Factored Advances AI Safety with MLCommons and Adversarial Testing.

Generative AI models like DALL-E and Stable Diffusion have transformed creative industries, but they also pose significant challenges in ensuring safe and ethical outputs. Building on the foundational work of the Adversarial Nibbler Challenge, Factored has taken the next step in addressing these issues with its research on Lexically-Constrained Automated Prompt Augmentation. This innovative approach blends human insights with scalable automation to enhance safety benchmarks for text-to-image (T2I) models.

We evaluated submissions based on safety impact and creativity, scoring unsafe image and safe prompt pairs while analyzing prompt uniqueness. Top creative contributions were recognized through percentile-based badges

Scaling Adversarial Testing

This research focuses on automating adversarial testing strategies identified in the Adversarial Nibbler Dataset, particularly typographical errors and semantic ambiguity. Factored’s team developed a novel framework to scale these attack strategies while maintaining the realism of human-created prompts.

A key highlight is the data-centric methodology used to augment prompts. Instead of relying on synthetic or random variations, the process constrains prompt generation based on patterns observed in human-generated attacks. This approach ensures contextually appropriate and realistic outputs, retaining 72% of the failure rates of human-designed prompts while significantly scaling up dataset size.

Real-World Impact and Future Implications

The framework demonstrated its effectiveness through rigorous testing on multiple T2I models, including DALL-E 2 and Stable Diffusion. By scaling adversarial datasets, Factored is equipping developers with tools to proactively identify and address vulnerabilities, bridging the gap between theoretical safety and real-world application.

On the Global Stage at NeurIPS 2024

The Lexically-constrained automated prompt augmentation was showcased at the Safe Generative AI Workshop during NeurIPS 2024. Organized by leading experts, including Yoshua Bengio, the December 15, 2024 event addressed concerns like harmful content, adversarial risks, and ethical challenges while fostering academia-industry collaboration.

To explore the full paper, click here.

We cover 100% of U.S. time zones, becoming a natural extension of your team
Hire the highest-caliber engineers in under a week
Build IP that belongs to you
Accelerate your roadmap
Start Building Your Team