Your morning AI security briefing.

Working adversarial ML — exploits, defenses, and the gap between.

Adversarial ML coverage for engineers shipping ML systems. Membership inference, model extraction, evasion attacks, training-data extraction, backdoors — focused on what's exploitable against deployed models and what defenders can actually do about it. PoCs against open models, behavioral analysis for closed ones.

Lead

Data Poisoning and Backdoor Attacks on Foundation Models

Training data manipulation, backdoor triggers, and Trojan attacks against large-scale models. What the threat model actually requires and where the defenses are in 2026.

Read briefing

Today's briefing

attacks

Evasion Attacks on Image Classifiers: FGSM, PGD, and C&W

The three foundational gradient-based evasion attacks, what each one actually optimizes, and what the benchmark numbers mean when you're evaluating a defense.

May 9, 2026

Adversarial text attack changing words while preserving meaning

attacks

Adversarial Robustness in NLP: Why Text Attacks Are Different

Discrete input spaces, semantic constraints, and human-perceptibility rules change what counts as an adversarial example in text. The attacks are harder to define and harder to defend.

May 9, 2026

Adversarial example transfer between neural network models

attacks

Adversarial Transferability: Why Black-Box Attacks Work at All

Adversarial examples transfer across models with different architectures and training sets. Understanding why changes what you think defenses need to accomplish.

May 9, 2026

Model inversion attack reconstructing facial images from gradients

attacks

Model Inversion Attacks: Reconstructing Training Data from Model Outputs

From Fredrikson's pharmacogenetics exploit to Geiping's gradient inversion, model inversion attacks recover private training data in ways most ML engineers don't expect.

May 9, 2026

Certified robustness radius visualization with randomized smoothing

defenses

Certified Robustness via Randomized Smoothing: What 'Certified' Actually Guarantees

Randomized smoothing gives you a provable robustness radius. Understanding what that certificate means in practice — and where it breaks — is more useful than the headline number.

May 8, 2026

Training data extraction from large language models

attacks

Training Data Extraction from LLMs: The Carlini et al. Results and What They Mean

Carlini et al. demonstrated verbatim extraction of training data from GPT-2. The results have been widely misread. Here's what the paper actually shows, what makes data extractable, and what production mitigations work.

May 7, 2026

Adversarial ML — in your inbox

Working adversarial ML — exploits, defenses, and the gap between. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Working adversarial ML — exploits, defenses, and the gap between.

Data Poisoning and Backdoor Attacks on Foundation Models

Today's briefing

Evasion Attacks on Image Classifiers: FGSM, PGD, and C&W

Adversarial Robustness in NLP: Why Text Attacks Are Different

Adversarial Transferability: Why Black-Box Attacks Work at All

Model Inversion Attacks: Reconstructing Training Data from Model Outputs

Certified Robustness via Randomized Smoothing: What 'Certified' Actually Guarantees

Training Data Extraction from LLMs: The Carlini et al. Results and What They Mean

Past briefings

Adversarial ML — in your inbox