LLMs

Prompt Injection: The Unfixable Vulnerability Breaking AI Systems

Prompt injection is the #1 security threat facing AI systems today and there's no clear path to fixing it. This vulnerability exploits a fundamental limitation: LLMs can't distinguish between trusted instructions and malicious user input. Understanding prompt injection isn't optional—it's critical.

Josh @ AL

20 Jan 2026 — 26 min read

Here's an uncomfortable truth about AI security: we've built the digital equivalent of a medieval castle, complete with moats, walls, and guards—and then we've trained it to open the gates whenever someone asks nicely enough.

That's prompt injection in a nutshell.

You've probably heard of SQL injection—the classic web vulnerability where attackers slip malicious code into database queries. It's been around for decades, we know how to prevent it, and it's mostly a solved problem (if you're using modern frameworks and following best practices).

Prompt injection is similar in concept but fundamentally worse in one critical way: there's no clear path to fixing it completely.

Why? Because SQL injection exploits a flaw in how systems handle data. Prompt injection exploits a fundamental architectural limitation of how language models work. SQL databases can distinguish between "code" and "data." Large Language Models? They can't. To an LLM, everything is just text. Instructions from the developer, data from the user, content from external sources—it's all the same.

This creates an attack surface that's both enormous and incredibly difficult to defend. Since OpenAI released ChatGPT in November 2022, security researchers have been having a field day finding new ways to manipulate AI systems. And despite millions of dollars in research and countless patches, the problem isn't getting significantly better.

In this post, we'll dive deep into prompt injection: what it is, how it works, why it's so dangerous, and most importantly, why it's so damn hard to fix. We'll cover real-world attacks like the infamous Bing "Sydney" incident, sophisticated techniques like RAG poisoning, and the cutting-edge research trying to solve this mess.

Fair warning: by the end of this post, you might be a little more paranoid about trusting AI systems. And honestly? You probably should be.

Let's get into it.

One-Pixel Attacks: Why Computer Vision Security Is Broken

State-of-the-art image classifiers can identify thousands of objects with near-human accuracy. They power self-driving cars, medical diagnostics, and security systems. But a 2019 paper by Su et al. proved something unsettling: you can make these systems completely misclassify an image by changing a single pixel. Not photoshopping the whole thing.

7 Prompt Injection Defenses That Actually Work (and 3 That Don't)

Most companies are defending against prompt injection completely wrong. They're either doing nothing—hoping OpenAI or Anthropic will magically fix the problem—or they're implementing security theater that wouldn't stop a determined 12-year-old with a ChatGPT account. Here's the uncomfortable reality: if

GPT-OSS Safeguard: What It Actually Does (And Common Mistakes to Avoid)

GPT-OSS Safeguard isn't just "Llama Guard but from OpenAI." It's a policy-following reasoning model - you write the safety rules, it interprets them at inference time. That flexibility is powerful for custom policies, but deploy it wrong and you'll be out of compute fast.

Llama gaurd in a retro-theme stopping hackers from abusing an AI system

Llama Guard: What It Actually Does (And Doesn't Do)

Llama Guard isn't a firewall. It's not antivirus for your prompts. And if you're treating it like either, you're probably leaving gaps in your AI security.

Read more

One-Pixel Attacks: Why Computer Vision Security Is Broken

7 Prompt Injection Defenses That Actually Work (and 3 That Don't)

GPT-OSS Safeguard: What It Actually Does (And Common Mistakes to Avoid)

Llama Guard: What It Actually Does (And Doesn't Do)