Adversarial ML coverage for engineers shipping ML systems. Membership inference, model extraction, evasion attacks, training-data extraction, backdoors — focused on what's exploitable against deployed models and what defenders can actually do about it. PoCs against open models, behavioral analysis for closed ones.
Randomized smoothing gives you a provable robustness radius. Understanding what that certificate means in practice — and where it breaks — is more useful than the headline number.
Read briefing
Carlini et al. demonstrated verbatim extraction of training data from GPT-2. The results have been widely misread. Here's what the paper actually shows, what makes data extractable, and what production mitigations work.
The math, the cost curve, and why optimization-based attacks are now within reach of solo practitioners. With reproducible setup and what defenders actually need to do.
Shokri et al.'s shadow-model attack is the canonical reference, but the gap between the paper's threat model and a real rate-limited API is wide. Here's what survives that gap.
Query-based model stealing attacks can recover a functionally equivalent model from API access alone. The economics matter more than the technique: here's when extraction is worth doing.
Adversarial ML covers attacks against deployed ML systems and the defenses that hold up. Here's what we publish.
Working adversarial ML — exploits, defenses, and the gap between. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.