Generative AI models like DALL-E and Stable Diffusion have transformed creative industries, but they also pose significant challenges in ensuring safe and ethical outputs. Building on the foundational work of the Adversarial Nibbler Challenge, Factored has taken the next step in addressing these issues with its research on Lexically-Constrained Automated Prompt Augmentation. This innovative approach blends human insights with scalable automation to enhance safety benchmarks for text-to-image (T2I) models.

Scaling Adversarial Testing
This research focuses on automating adversarial testing strategies identified in the Adversarial Nibbler Dataset, particularly typographical errors and semantic ambiguity. Factored’s team developed a novel framework to scale these attack strategies while maintaining the realism of human-created prompts.
A key highlight is the data-centric methodology used to augment prompts. Instead of relying on synthetic or random variations, the process constrains prompt generation based on patterns observed in human-generated attacks. This approach ensures contextually appropriate and realistic outputs, retaining 72% of the failure rates of human-designed prompts while significantly scaling up dataset size.
Real-World Impact and Future Implications
The framework demonstrated its effectiveness through rigorous testing on multiple T2I models, including DALL-E 2 and Stable Diffusion. By scaling adversarial datasets, Factored is equipping developers with tools to proactively identify and address vulnerabilities, bridging the gap between theoretical safety and real-world application.
On the Global Stage at NeurIPS 2024
The Lexically-constrained automated prompt augmentation was showcased at the Safe Generative AI Workshop during NeurIPS 2024. Organized by leading experts, including Yoshua Bengio, the December 15, 2024 event addressed concerns like harmful content, adversarial risks, and ethical challenges while fostering academia-industry collaboration.
To explore the full paper, click here.