All posts
-
Data Poisoning and Backdoor Attacks on Foundation Models
Training data manipulation, backdoor triggers, and Trojan attacks against large-scale models. What the threat model actually requires and where the defenses are in 2026.
-
Evasion Attacks on Image Classifiers: FGSM, PGD, and C&W
The three foundational gradient-based evasion attacks, what each one actually optimizes, and what the benchmark numbers mean when you're evaluating a defense.
-
Adversarial Robustness in NLP: Why Text Attacks Are Different
Discrete input spaces, semantic constraints, and human-perceptibility rules change what counts as an adversarial example in text. The attacks are harder to define and harder to defend.
-
Adversarial Transferability: Why Black-Box Attacks Work at All
Adversarial examples transfer across models with different architectures and training sets. Understanding why changes what you think defenses need to accomplish.
-
Model Inversion Attacks: Reconstructing Training Data from Model Outputs
From Fredrikson's pharmacogenetics exploit to Geiping's gradient inversion, model inversion attacks recover private training data in ways most ML engineers don't expect.
-
Certified Robustness via Randomized Smoothing: What 'Certified' Actually Guarantees
Randomized smoothing gives you a provable robustness radius. Understanding what that certificate means in practice — and where it breaks — is more useful than the headline number.
-
Training Data Extraction from LLMs: The Carlini et al. Results and What They Mean
Carlini et al. demonstrated verbatim extraction of training data from GPT-2. The results have been widely misread. Here's what the paper actually shows, what makes data extractable, and what production mitigations work.
-
GCG-Class Adversarial Suffix Attacks: A 2026 Practitioner Primer
The math, the cost curve, and why optimization-based attacks are now within reach of solo practitioners. With reproducible setup and what defenders actually need to do.
-
Membership Inference Attacks: What Actually Works Against Production ML APIs
Shokri et al.'s shadow-model attack is the canonical reference, but the gap between the paper's threat model and a real rate-limited API is wide. Here's what survives that gap.
-
Model Extraction via Query-Based Functional Stealing
Query-based model stealing attacks can recover a functionally equivalent model from API access alone. The economics matter more than the technique: here's when extraction is worth doing.
-
What this site is for
Adversarial ML covers attacks against deployed ML systems and the defenses that hold up. Here's what we publish.