Evolving the Jailbreak: How Genetic Algorithms Are Defeating LLM Safety
In July 2023, Zou et al. published a paper that broke open the field of automated LLM jailbreaking [1]. Their method, Greedy Coordinate Gradient (GCG), appends an optimized suffix to a harmful query that forces the model to respond affirmatively. The suffix is gibberish, a string of tokens that makes