All posts

Data Poisoning and Backdoor Attacks on Foundation Models

Training data manipulation, backdoor triggers, and Trojan attacks against large-scale models. What the threat model actually requires and where the defenses are in 2026.
May 9, 2026
Evasion Attacks on Image Classifiers: FGSM, PGD, and C&W

The three foundational gradient-based evasion attacks, what each one actually optimizes, and what the benchmark numbers mean when you're evaluating a defense.
May 9, 2026
Adversarial Robustness in NLP: Why Text Attacks Are Different

Discrete input spaces, semantic constraints, and human-perceptibility rules change what counts as an adversarial example in text. The attacks are harder to define and harder to defend.
May 9, 2026
Adversarial Transferability: Why Black-Box Attacks Work at All

Adversarial examples transfer across models with different architectures and training sets. Understanding why changes what you think defenses need to accomplish.
May 9, 2026
Model Inversion Attacks: Reconstructing Training Data from Model Outputs

From Fredrikson's pharmacogenetics exploit to Geiping's gradient inversion, model inversion attacks recover private training data in ways most ML engineers don't expect.
May 9, 2026
Certified Robustness via Randomized Smoothing: What 'Certified' Actually Guarantees

Randomized smoothing gives you a provable robustness radius. Understanding what that certificate means in practice — and where it breaks — is more useful than the headline number.
May 8, 2026
Training Data Extraction from LLMs: The Carlini et al. Results and What They Mean

Carlini et al. demonstrated verbatim extraction of training data from GPT-2. The results have been widely misread. Here's what the paper actually shows, what makes data extractable, and what production mitigations work.
May 7, 2026
GCG-Class Adversarial Suffix Attacks: A 2026 Practitioner Primer

The math, the cost curve, and why optimization-based attacks are now within reach of solo practitioners. With reproducible setup and what defenders actually need to do.
May 6, 2026
Membership Inference Attacks: What Actually Works Against Production ML APIs

Shokri et al.'s shadow-model attack is the canonical reference, but the gap between the paper's threat model and a real rate-limited API is wide. Here's what survives that gap.
May 6, 2026
Model Extraction via Query-Based Functional Stealing

Query-based model stealing attacks can recover a functionally equivalent model from API access alone. The economics matter more than the technique: here's when extraction is worth doing.
May 6, 2026
What this site is for

Adversarial ML covers attacks against deployed ML systems and the defenses that hold up. Here's what we publish.
May 2, 2026