Josh @ AL - Adversarial Logic

A helix and a shield battling each other for cybersecurity

Evolving the Jailbreak: How Genetic Algorithms Are Defeating LLM Safety

In July 2023, Zou et al. published a paper that broke open the field of automated LLM jailbreaking [1]. Their method, Greedy Coordinate Gradient (GCG), appends an optimized suffix to a harmful query that forces the model to respond affirmatively. The suffix is gibberish, a string of tokens that makes

A swarm of digital planes attacking a system “core”

Swarm Intelligence as a Weapon: How Nature-Inspired Algorithms Attack Neural Networks

In a previous post, I covered the one-pixel attack, where differential evolution finds a single pixel change that fools an image classifier. DE is effective, but it’s one algorithm from a much larger family. Researchers have adapted at least five different nature-inspired optimization algorithms as black-box adversarial attacks against

When AI Finds the Shortcut: Reward Hacking from 1994 to 2025

In February 2025, Palisade Research set up hundreds of chess matches between seven large language models and Stockfish, a top-tier open-source chess engine [1]. The models had general computer access, the same kind of shell environment increasingly standard for AI agents in production. The task was simple: play chess as

The AI Agent Supply Chain Is Vulnerable. You Probably Are Too.

On September 8, 2025, a phishing email impersonating npm support hit the inbox of Josh Junon, maintainer of chalk, debug, and other foundational JavaScript packages. Within hours, attackers had published trojanized versions of 18 packages with a combined 2.6 billion weekly downloads [1]. The malware, dubbed Shai-Hulud, harvested credentials,

One-Pixel Attacks: Why Computer Vision Security Is Broken

State-of-the-art image classifiers can identify thousands of objects with near-human accuracy. They power self-driving cars, medical diagnostics, and security systems. But a 2019 paper by Su et al. proved something unsettling: you can make these systems completely misclassify an image by changing a single pixel. Not photoshopping the whole thing.

LLMs

7 Prompt Injection Defenses That Actually Work (and 3 That Don't)

Most companies are defending against prompt injection completely wrong. They're either doing nothing—hoping OpenAI or Anthropic will magically fix the problem—or they're implementing security theater that wouldn't stop a determined 12-year-old with a ChatGPT account. Here's the uncomfortable reality: if

GPT-OSS Safeguard: What It Actually Does (And Common Mistakes to Avoid)

GPT-OSS Safeguard isn't just "Llama Guard but from OpenAI." It's a policy-following reasoning model - you write the safety rules, it interprets them at inference time. That flexibility is powerful for custom policies, but deploy it wrong and you'll be out of compute fast.

Llama gaurd in a retro-theme stopping hackers from abusing an AI system

Llama Guard: What It Actually Does (And Doesn't Do)

Llama Guard isn't a firewall. It's not antivirus for your prompts. And if you're treating it like either, you're probably leaving gaps in your AI security.

Cybersecurity

The One LLM Security Setting Everyone Gets Wrong

Bing Chat. ChatGPT plugins. Hundreds of production apps. Same vulnerability: no separation between system instructions and user input. If you're concatenating prompts, you're vulnerable.

LLMs

Is Your RAG System Leaking Data? 5 Minute Security Check

Most RAG systems have at least one critical security flaw — they can be exploited to leak confidential data. Run these 5 checks before your next deployment.

Image of a Prompt Injection Adverstisement with a happy hacker in the background

LLMs

3 Prompt Injection Attacks You Can Test Right Now

Wanna learn how to hack an AI? Now is your chance! I'm going to show you three prompt injection attacks that work on ChatGPT, Claude, and most other LLMs. You can test these yourself in the next five minutes. No coding required. Also...you didn't 'hear' this from me...

Hacker Ahab taking down the Docker Whale

Cybersecurity

Kata Containers: When Docker's Isolation Isn't Enough

Kata Containers runs each container inside its own lightweight VM, giving you Docker's speed with VM-level security isolation—perfect for untrusted code, multi-tenant systems, and when namespace isolation just isn't enough.

See all