Adversarial ML

Adversarial MLAdversarial ML coverage for engineers shipping ML systems. Membership inference, model extraction, evasion attacks, training-data extraction, backdoors — focused on what's exploitable against deployed models and what defenders can actually do about it. PoCs against open models, behavioral analysis for closed ones.https://adversarialml.dev/enData Poisoning and Backdoor Attacks on Foundation Modelshttps://adversarialml.dev/posts/data-poisoning-backdoor-attacks/https://adversarialml.dev/posts/data-poisoning-backdoor-attacks/Training data manipulation, backdoor triggers, and Trojan attacks against large-scale models. What the threat model actually requires and where the defenses are in 2026.Sun, 10 May 2026 00:00:00 GMTdata-poisoningbackdoor-attackstrojan-mladversarial-mlml-securityfoundation-modelsAdversarial ML EditorialEvasion Attacks on Image Classifiers: FGSM, PGD, and C&Whttps://adversarialml.dev/posts/evasion-attacks-fgsm-pgd-cw/https://adversarialml.dev/posts/evasion-attacks-fgsm-pgd-cw/The three foundational gradient-based evasion attacks, what each one actually optimizes, and what the benchmark numbers mean when you're evaluating a defense.Sun, 10 May 2026 00:00:00 GMTevasion-attacksfgsmpgdcarlini-wagneradversarial-examplesadversarial-mlimage-classifiersAdversarial ML EditorialAdversarial Robustness in NLP: Why Text Attacks Are Differenthttps://adversarialml.dev/posts/adversarial-robustness-nlp-text/https://adversarialml.dev/posts/adversarial-robustness-nlp-text/Discrete input spaces, semantic constraints, and human-perceptibility rules change what counts as an adversarial example in text. The attacks are harder to define and harder to defend.Sun, 10 May 2026 00:00:00 GMTadversarial-nlptext-attacksrobustnessnlpadversarial-mlml-securitytransformersAdversarial ML EditorialAdversarial Transferability: Why Black-Box Attacks Work at Allhttps://adversarialml.dev/posts/transferability-black-box-attacks/https://adversarialml.dev/posts/transferability-black-box-attacks/Adversarial examples transfer across models with different architectures and training sets. Understanding why changes what you think defenses need to accomplish.Sun, 10 May 2026 00:00:00 GMTtransferabilityblack-box-attacksadversarial-examplesevasionadversarial-mlml-securityAdversarial ML EditorialModel Inversion Attacks: Reconstructing Training Data from Model Outputshttps://adversarialml.dev/posts/model-inversion-attacks/https://adversarialml.dev/posts/model-inversion-attacks/From Fredrikson's pharmacogenetics exploit to Geiping's gradient inversion, model inversion attacks recover private training data in ways most ML engineers don't expect.Sun, 10 May 2026 00:00:00 GMTmodel-inversionprivacygradient-inversiontraining-dataadversarial-mlfederated-learningAdversarial ML EditorialCertified Robustness via Randomized Smoothing: What 'Certified' Actually Guaranteeshttps://adversarialml.dev/posts/certified-robustness-randomized-smoothing/https://adversarialml.dev/posts/certified-robustness-randomized-smoothing/Randomized smoothing gives you a provable robustness radius. Understanding what that certificate means in practice — and where it breaks — is more useful than the headline number.Sat, 09 May 2026 00:00:00 GMTcertified-robustnessrandomized-smoothingadversarial-defenseml-securityformal-verificationAdversarial ML EditorialTraining Data Extraction from LLMs: The Carlini et al. Results and What They Meanhttps://adversarialml.dev/posts/training-data-extraction-llms/https://adversarialml.dev/posts/training-data-extraction-llms/Carlini et al. demonstrated verbatim extraction of training data from GPT-2. The results have been widely misread. Here's what the paper actually shows, what makes data extractable, and what production mitigations work.Fri, 08 May 2026 00:00:00 GMTtraining-data-extractionmemorizationprivacyllm-securitygdprAdversarial ML EditorialGCG-Class Adversarial Suffix Attacks: A 2026 Practitioner Primerhttps://adversarialml.dev/posts/gcg-class-adversarial-suffix-2026/https://adversarialml.dev/posts/gcg-class-adversarial-suffix-2026/The math, the cost curve, and why optimization-based attacks are now within reach of solo practitioners. With reproducible setup and what defenders actually need to do.Thu, 07 May 2026 00:00:00 GMTadversarial-mlgcgoptimization-attacksred-teamalignmentAdversarial ML EditorialMembership Inference Attacks: What Actually Works Against Production ML APIshttps://adversarialml.dev/posts/membership-inference-attacks/https://adversarialml.dev/posts/membership-inference-attacks/Shokri et al.'s shadow-model attack is the canonical reference, but the gap between the paper's threat model and a real rate-limited API is wide. Here's what survives that gap.Thu, 07 May 2026 00:00:00 GMTmembership-inferenceprivacyml-securityproduction-mlred-teamAdversarial ML EditorialModel Extraction via Query-Based Functional Stealinghttps://adversarialml.dev/posts/model-extraction-attacks/https://adversarialml.dev/posts/model-extraction-attacks/Query-based model stealing attacks can recover a functionally equivalent model from API access alone. The economics matter more than the technique: here's when extraction is worth doing.Thu, 07 May 2026 00:00:00 GMTmodel-extractionmodel-stealingml-securityadversarial-mlapi-securityAdversarial ML EditorialWhat this site is forhttps://adversarialml.dev/posts/welcome/https://adversarialml.dev/posts/welcome/Adversarial ML covers attacks against deployed ML systems and the defenses that hold up. Here's what we publish.Sun, 03 May 2026 00:00:00 GMTmetaAdversarial ML Editorial