Executive Summary
Unit 42 researchers have developed a genetic algorithm-inspired prompt fuzzing method to automatically generate variants of disallowed requests that preserved their original meaning. This method also measures guardrail fragility under systematic rephrasing.
Our research uncovered guardrail weaknesses, with evasion rates ranging from low single digits to high levels in specific ke...
