The Swarm Always Wins: How Swarm Intelligence Breaks AI

In a previous post, I covered the one-pixel attack, where differential evolution finds a single pixel change that fools an image classifier. DE is effective, but it’s one algorithm from a much larger family. Researchers have adapted at least five different nature-inspired optimization algorithms as black-box adversarial attacks against neural networks, and each exploits a fundamentally different search strategy.

Particle swarm optimization mimics bird flocking. Artificial bee colony algorithms simulate honeybee foraging. Fish swarm algorithms model schooling behavior. Genetic algorithms follow Darwinian selection. Each produces different attack characteristics, different query costs, and different perturbation patterns. And none of them need gradients.

That last point is what makes this family of attacks practically dangerous. Most adversarial ML research focuses on gradient-based attacks like FGSM and PGD, which require white-box access to model internals. In real deployments, attackers get an API endpoint that returns a prediction. Swarm algorithms are purpose-built for exactly this constraint: optimize a function you can only evaluate, not differentiate. Nature figured out gradient-free optimization long before we built neural networks. Those same solutions now attack them.

This post surveys three swarm and evolutionary algorithms that have been published as adversarial attack tools (PSO, DE, and ABC), explains how their search dynamics produce different attack characteristics, and includes working demos: a PSO attack on image classification and a simplified PSO attack on audio, showing the approach is modality-agnostic.

Why Swarm Intelligence Works for Adversarial Attacks

The real-world attack scenario against a deployed model looks like this: you can query an API, you get back a prediction (maybe with confidence scores, maybe just a label), and you want to find an input perturbation that causes misclassification. You don’t have the model’s architecture, weights, training data, or gradients. All you have is a black box you can poke at.

Gradient-based attacks are useless here. FGSM, PGD, and C&W all compute the gradient of the loss with respect to the input, then follow it. No model access means no gradients means no attack. Transfer attacks (craft adversarial examples on a local surrogate model and hope they generalize) work sometimes, but they’re unreliable across architectures and require building a local approximation of the target.

Swarm and evolutionary algorithms solve a different class of problem: optimize a function you can evaluate but can’t differentiate. They need three things. A way to generate candidate perturbations (random initialization). A way to evaluate fitness (query the target model). And a population-based search strategy to iteratively improve candidates.

The adversarial example problem maps directly onto this framework. The search space is the set of possible pixel (or waveform, or feature) perturbations. The fitness function is “how much does this perturbation reduce confidence in the correct class?” The constraint is imperceptibility, typically measured as an L2 or L-infinity norm budget.

What makes population-based search powerful in this setting is that it maintains multiple candidates simultaneously. One candidate might be stuck in a local optimum while another discovers an entirely different vulnerable region. The population explores in parallel, and information about good solutions can propagate through the group (in PSO, via social learning; in ABC, via onlooker bee selection; in DE, via mutation combining successful candidates). Neural network loss landscapes are highly non-convex with many local optima, which is exactly the terrain these algorithms evolved to navigate.

The Toolkit: Three Algorithms, Three Search Strategies

Differential Evolution: The Baseline

Readers of the one-pixel attack post already know DE [1], so I’ll keep this brief. DE maintains a population of candidate solutions and creates new ones through mutation (adding scaled differences between existing candidates) and crossover (mixing parameters between parent and child). The selection rule is simple: keep the child only if it’s better than the parent.

Su et al. (2019) used DE to find single pixels that cause misclassification, achieving 70.97% success on CIFAR-10 and 52.40% on ImageNet [1]. DE works well for sparse perturbations because it handles discrete variables naturally (pixel coordinates are integers) and its mutation mechanism explores broadly.

The key limitation: DE candidates evolve independently. Each candidate is improved through random mutation and comparison with its parent. There’s no mechanism for candidates to share information about promising regions of the search space. This means DE explores broadly but converges slowly. For adversarial attacks where queries cost money and trigger rate limits, slow convergence is a meaningful downside.

PSO was introduced by Kennedy and Eberhart in 1995 [2], inspired by the movement patterns of bird flocks and fish schools. The core idea is elegant: each particle (candidate solution) has a position and a velocity. The velocity is updated based on three forces.

Inertia: keep moving in the same direction. This provides momentum and prevents the particle from changing course too rapidly.

Cognitive pull: attract the particle toward its own best-known position (personal best). This is individual memory, the particle remembers where it found good solutions.

Social pull: attract the particle toward the swarm’s best-known position (global best). This is collective intelligence, the particle is influenced by the best solution anyone in the swarm has found.

The velocity update equation combines all three:

v_new = w * v_current 
      + c1 * rand() * (personal_best - position) 
      + c2 * rand() * (global_best - position)

Where w is inertia weight, c1 is cognitive coefficient, c2 is social coefficient, and rand() introduces stochasticity.

This creates a search dynamic that’s fundamentally different from DE. When one particle finds a good adversarial perturbation, the entire swarm is pulled toward that region. Information propagates socially rather than genetically. The result: PSO typically converges faster than DE because good solutions are broadcast immediately rather than spreading gradually through mutation and selection.

Mosli et al. adapted PSO for adversarial attacks in their AdversarialPSO system (ESORICS 2020) [3]. They divided images into blocks and assigned particles to search over different block combinations, creating a coarse-to-fine search structure. The results: 94.9% success on CIFAR-10, 98.5% on MNIST, and 96.9% on ImageNet, with query counts comparable to prior work. The code is open-source on GitHub.

PSO has also been applied to audio adversarial attacks. Mun et al. (2022) used PSO to craft adversarial examples against speech recognition systems, achieving 96% attack success with 71% fewer queries than genetic algorithm-based approaches [4]. The same algorithmic framework, different modality, same effectiveness.

The main weakness: premature convergence. If the global best gets stuck in a local optimum, the entire swarm collapses toward it. Multi-group PSO variants address this by maintaining separate sub-swarms with periodic redistribution [5], but basic PSO can fail on images where the adversarial region is narrow and hard to find.

Artificial Bee Colony: Division of Labor

ABC, introduced by Karaboga in 2005 [6], simulates honeybee foraging with a structure that’s more sophisticated than PSO’s. Three groups of bees perform different roles.

Employed bees exploit known food sources (existing candidate solutions). Each employed bee searches the neighborhood of its assigned solution, looking for improvements. This is intensification, refining what’s already promising.

Onlooker bees observe the employed bees’ results and probabilistically choose which solutions to reinforce. Better solutions attract more onlookers. This creates selection pressure without discarding weak solutions immediately; they just get less attention.

Scout bees are the critical innovation. When a solution hasn’t improved after a set number of iterations (the “limit” parameter), its employed bee abandons it and becomes a scout, searching randomly for new solutions. This is a built-in escape mechanism for local optima, which PSO lacks in its basic form.

ABCAttack (2022) applied this to adversarial example generation and achieved 100% success on MNIST, 98.6% on CIFAR-10, and 90% on ImageNet in untargeted attacks [7]. The attack is gradient-free and proved effective against several defense mechanisms including adversarial training (achieving 62-88% success rates depending on configuration) and input transformation defenses (78% success on ImageNet with JPEG compression).

The scout bee mechanism is what differentiates ABC from PSO as an attack tool. PSO’s swarm can collapse into a local optimum and stay there. ABC’s scouts automatically restart exploration when a solution stagnates. For adversarial attacks, this means ABC is less likely to report “attack failed” when the real problem was premature convergence rather than absence of adversarial examples.

Comparing Search Dynamics

The three algorithms represent three different philosophies of optimization:

DE (evolution): Random mutation and survival of the fittest. No communication between candidates. Broad exploration, slow convergence. Best for sparse, needle-in-haystack searches (one-pixel attacks).

PSO (social learning): Particles share information about good regions via global best broadcasting. Fast convergence, risk of premature collapse. Best when queries are expensive and you need results quickly.

ABC (division of labor): Structured roles with built-in stagnation detection. Moderate convergence speed, strong local optima escape. Best when the adversarial landscape has many traps and you can afford a larger query budget.

	DE	PSO	ABC
Information sharing	None (independent evolution)	Global best broadcast	Onlooker bee selection
Local optima escape	Mutation (moderate)	Weak without multi-group	Scout bees (strong)
Convergence speed	Slow	Fast	Moderate
Query efficiency	Moderate	High	Moderate
Best adversarial use case	Sparse perturbations	Query-limited APIs	Defense-resistant attacks

Demo 1: PSO Attack on Image Classification

Here’s a self-contained PSO attack against a CIFAR-10 classifier. If you ran the DE attack from the one-pixel post, this uses the same model and dataset, so you can directly compare search dynamics.

"""
pso_adversarial_attack.py

Black-box adversarial attack using Particle Swarm Optimization.
Companion to the DE-based one-pixel attack from the previous post.

Install: pip install torch torchvision numpy matplotlib
Run:     python pso_adversarial_attack.py
"""
import torch
import torchvision
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt

CLASSES = ['airplane', 'automobile', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck']

# Load pretrained CIFAR-10 model (same as one-pixel attack post)
model = torch.hub.load(
    "chenyaofo/pytorch-cifar-models",
    "cifar10_resnet20", pretrained=True
)
model.eval()


def predict(image_np):
    """Get prediction for a uint8 numpy image (32x32x3)."""
    img = torch.from_numpy(image_np).float() / 255.0
    img = img.permute(2, 0, 1)
    img[0] = (img[0] - 0.4914) / 0.2023
    img[1] = (img[1] - 0.4822) / 0.1994
    img[2] = (img[2] - 0.4465) / 0.2010
    with torch.no_grad():
        output = model(img.unsqueeze(0))
        probs = torch.nn.functional.softmax(output[0], dim=0)
    return torch.argmax(probs).item(), probs


def pso_attack(image_np, true_class, n_pixels=3, n_particles=20,
               max_iter=100, w=0.7, c1=1.5, c2=1.5):
    """
    PSO-based adversarial attack. Optimizes the positions and
    colors of n_pixels to minimize confidence in the true class.

    Each particle encodes n_pixels modifications:
      [x1, y1, r1, g1, b1, x2, y2, r2, g2, b2, ...]
    """
    h, w_img = image_np.shape[:2]
    dim = n_pixels * 5  # 5 params per pixel: x, y, r, g, b

    # Bounds for each dimension
    bounds_low = np.tile([0, 0, 0, 0, 0], n_pixels).astype(float)
    bounds_high = np.tile(
        [w_img - 1, h - 1, 255, 255, 255], n_pixels
    ).astype(float)

    # Initialize particles
    positions = np.random.uniform(bounds_low, bounds_high,
                                  (n_particles, dim))
    velocities = np.random.uniform(-1, 1, (n_particles, dim))

    # Track personal and global bests
    personal_best_pos = positions.copy()
    personal_best_score = np.full(n_particles, float('inf'))
    global_best_pos = None
    global_best_score = float('inf')

    queries = 0

    def evaluate(particle):
        """Apply pixel modifications, return true-class confidence."""
        adv = image_np.copy()
        for i in range(n_pixels):
            idx = i * 5
            x = int(np.clip(particle[idx], 0, w_img - 1))
            y = int(np.clip(particle[idx + 1], 0, h - 1))
            r = int(np.clip(particle[idx + 2], 0, 255))
            g = int(np.clip(particle[idx + 3], 0, 255))
            b = int(np.clip(particle[idx + 4], 0, 255))
            adv[y, x] = [r, g, b]
        pred_class, probs = predict(adv)
        return probs[true_class].item(), pred_class, adv

    # Evaluate initial positions
    for i in range(n_particles):
        score, pred, _ = evaluate(positions[i])
        queries += 1
        personal_best_score[i] = score
        if score < global_best_score:
            global_best_score = score
            global_best_pos = positions[i].copy()

    # PSO main loop
    for iteration in range(max_iter):
        for i in range(n_particles):
            # Velocity update: inertia + cognitive + social
            r1, r2 = np.random.random(dim), np.random.random(dim)
            velocities[i] = (
                w * velocities[i]
                + c1 * r1 * (personal_best_pos[i] - positions[i])
                + c2 * r2 * (global_best_pos - positions[i])
            )

            # Position update
            positions[i] += velocities[i]

            # Clip to bounds
            positions[i] = np.clip(positions[i],
                                   bounds_low, bounds_high)

            # Evaluate
            score, pred, adv_img = evaluate(positions[i])
            queries += 1

            # Update personal best
            if score < personal_best_score[i]:
                personal_best_score[i] = score
                personal_best_pos[i] = positions[i].copy()

            # Update global best
            if score < global_best_score:
                global_best_score = score
                global_best_pos = positions[i].copy()

            # Check for success
            if pred != true_class:
                return adv_img, pred, queries, True

    # Return best attempt even if unsuccessful
    _, pred, adv_img = evaluate(global_best_pos)
    return adv_img, pred, queries + 1, pred != true_class


def find_candidate(dataset, conf_min=0.55, conf_max=0.85):
    """Find a correctly classified image with moderate confidence."""
    for i in range(len(dataset)):
        img_pil, label = dataset[i]
        img_np = np.array(img_pil)
        pred, probs = predict(img_np)
        conf = probs[pred].item()
        if pred == label and conf_min < conf < conf_max:
            return img_np, label, i
    return None


# Run the attack
raw_dataset = torchvision.datasets.CIFAR10(
    root='./data', train=False, download=True, transform=None
)

result = find_candidate(raw_dataset)
if result is None:
    print("No suitable candidate found")
else:
    image_np, label, idx = result
    pred, probs = predict(image_np)
    print(f"Original: {CLASSES[pred]} ({probs[pred]:.1%})")

    adv_img, adv_pred, queries, success = pso_attack(
        image_np, label, n_pixels=3, n_particles=20, max_iter=100
    )

    adv_pred_final, adv_probs = predict(adv_img)
    print(f"Adversarial: {CLASSES[adv_pred_final]} "
          f"({adv_probs[adv_pred_final]:.1%})")
    print(f"Queries: {queries}")
    print(f"Attack {'succeeded' if success else 'failed'}")

    # Visualize
    fig, axes = plt.subplots(1, 2, figsize=(8, 4))
    display_orig = image_np.repeat(8, axis=0).repeat(8, axis=1)
    display_adv = adv_img.repeat(8, axis=0).repeat(8, axis=1)

    axes[0].imshow(display_orig)
    axes[0].set_title(f'Original: {CLASSES[pred]}\n'
                      f'{probs[pred]:.1%}')
    axes[0].axis('off')

    axes[1].imshow(display_adv)
    axes[1].set_title(f'PSO Attack: {CLASSES[adv_pred_final]}\n'
                      f'{adv_probs[adv_pred_final]:.1%}')
    axes[1].axis('off')

    plt.tight_layout()
    plt.savefig('pso_attack_result.png', dpi=300,
                bbox_inches='tight')
    plt.show()

This attack modifies three pixels (compared to one in the DE post) because PSO’s search dynamics are better suited to multi-pixel perturbations. The velocity mechanism carries momentum across iterations, so particles that find a promising pixel location will continue exploring nearby color values rather than jumping randomly. Run it alongside the DE one-pixel attack and compare: PSO typically uses fewer queries to find a successful perturbation, but the perturbation involves more pixels.

The image demo shows PSO attacking pixel values. But the algorithm doesn’t know it’s attacking images. It parameterizes a perturbation, queries a model, and optimizes. The same framework applies to any modality where you can define a perturbation space and evaluate fitness.

For audio, the adaptation is straightforward. Instead of optimizing pixel coordinates and RGB values, you optimize a perturbation waveform added to the audio signal. To keep the search space tractable, you parameterize the perturbation as a sum of sinusoidal components, each defined by a frequency, amplitude, and phase. PSO then searches over these parameters to find a combination that causes misclassification while staying within a perturbation budget (typically measured as signal-to-noise ratio).

# Audio attack parameterization (conceptual)
# Instead of [x, y, r, g, b] per pixel, each particle encodes:
# [freq1, amp1, phase1, freq2, amp2, phase2, ...]

def make_audio_perturbation(params, n_samples, sample_rate, budget):
    """Convert PSO particle to an audio perturbation waveform."""
    t = np.arange(n_samples) / sample_rate
    perturbation = np.zeros(n_samples)
    
    n_components = len(params) // 3
    for i in range(n_components):
        freq = params[i * 3]        # 50-8000 Hz
        amp = params[i * 3 + 1]     # Relative amplitude
        phase = params[i * 3 + 2]   # 0 to 2*pi
        perturbation += amp * np.sin(2 * np.pi * freq * t + phase)
    
    # Normalize to perturbation budget
    perturbation = perturbation / np.max(np.abs(perturbation)) * budget
    return perturbation

# The PSO velocity update is identical to the image attack.
# Only the perturbation parameterization changes.

The velocity update, personal/global best tracking, and convergence dynamics are identical to the image attack. The algorithm genuinely does not care about the modality.

Mun et al. (2022) validated this on real speech recognition systems, not toy classifiers [4]. Their PSO-based audio attack achieved a 96% success rate while using 71% fewer queries than genetic algorithm-based approaches. The perturbation budget was small enough that adversarial audio samples sounded identical to the originals to human listeners. PSO’s social learning mechanism was particularly effective here: once one particle found a frequency combination that disrupted the speech model’s features, the entire swarm converged on that region and refined it quickly.

The Bigger Picture

The Gradient-Free Threat Model

Most adversarial robustness research focuses on gradient-based attacks. Defenses like adversarial training and gradient masking are designed to resist gradient-following adversaries. Swarm algorithms bypass these defenses entirely. ABCAttack achieved 62-88% success against adversarial training and 78% against input transformation defenses on CIFAR-10 [7]. The defenses weren’t designed for an attacker that never computes a gradient.

The practical constraint for swarm attacks isn’t capability; it’s query budget. Every query costs money and time. Rate limiting API access is a meaningful defense because it directly constrains the optimization budget for all population-based attacks. Returning only top-1 labels (without confidence scores) reduces the signal in the fitness function. Ensemble models force the swarm to simultaneously fool multiple architectures. None of these are complete defenses, but they raise the cost of attack.

Beyond These Three

This post focused on DE, PSO, and ABC because they have the strongest adversarial ML publications. But the broader landscape includes genetic algorithms (GenAttack, Alzantot et al., 2019 [8]), artificial fish swarm algorithms (EFSAttack, Gao et al., 2024 [9], which constrains perturbations to image edges for improved imperceptibility), and hybrid approaches that combine multiple swarm strategies. The field is active and expanding. Any gradient-free optimizer can, in principle, be adapted for adversarial attacks. The question is always which search dynamics best match the specific attack scenario.

Conclusion

Three algorithms, three search philosophies, one shared conclusion: if an adversarial example exists in the perturbation space, gradient-free optimization will find it. DE searches broadly through random mutation, making it effective for sparse perturbations like the one-pixel attack. PSO converges quickly through social learning, making it query-efficient against production APIs. ABC balances exploitation and exploration through its division-of-labor structure, giving it built-in resistance to local optima.

They all bypass gradient-based defenses because they never compute a gradient. They all work across modalities because they treat the target model as a black box. And they’re all based on search strategies that nature refined over evolutionary timescales.

If you’re deploying ML models behind an API, your threat model should include gradient-free optimization. Rate limiting, output masking, and ensemble approaches raise the cost of swarm attacks. But the fundamental vulnerability remains: any model with adversarial examples in its input space is vulnerable to an attacker with a query budget and an optimizer. The optimizer doesn’t need to understand your model. It just needs to search.

References

[1] J. Su, D. V. Vargas, and K. Sakurai, “One pixel attack for fooling deep neural networks,” IEEE Trans. Evol. Comput., vol. 23, no. 5, pp. 828-841, 2019.

[2] J. Kennedy and R. Eberhart, “Particle Swarm Optimization,” in Proc. IEEE Int. Conf. Neural Networks, 1995, pp. 1942-1948.

[3] R. Mosli, M. Wright, B. Yuan, and Y. Pan, “They Might NOT Be Giants: Crafting Black-Box Adversarial Examples Using Particle Swarm Optimization,” in Proc. ESORICS, 2020. Code: https://github.com/rhm6501/AdversarialPSOImages

[4] H. Mun, S. Seo, B. Son et al., “Black-Box Audio Adversarial Attack Using Particle Swarm Optimization,” IEEE Access, vol. 10, pp. 23532-23544, 2022.

[5] N. Suryanto, C. Ikuta, and D. Pramadihanto, “A Distributed Black-Box Adversarial Attack Based on Multi-Group Particle Swarm Optimization,” Sensors, vol. 20, no. 24, 2020.

[6] D. Karaboga, “An Idea Based on Honey Bee Swarm for Numerical Optimization,” Tech. Rep. TR06, Erciyes University, 2005.

[7] ABCAttack, “ABCAttack: A Gradient-Free Optimization Black-Box Attack for Fooling Deep Image Classifiers,” Entropy, vol. 24, no. 3, 2022.

[8] M. Alzantot, Y. Sharma, S. Chakraborty, H. Zhang, C-J. Hsieh, and M. Srivastava, “GenAttack: Practical Black-box Attacks with Gradient-Free Optimization,” in Proc. GECCO, 2019.

[9] J. Gao, K. Zheng, X. Wang, C. Wu, and B. Wu, “EFSAttack: Edge Noise-Constrained Black-Box Attack Using Artificial Fish Swarm Algorithm,” Electronics, vol. 13, no. 13, 2024.

[10] R. Storn and K. Price, “Differential Evolution: A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces,” J. Global Optimization, vol. 11, no. 4, pp. 341-359, 1997.