Topics
Browse posts by category and tag — every topic we cover, with the latest pieces under each.
Tags
- #adversarial-ml 7
- #ml-security 6
- #privacy 3
- #adversarial-examples 2
- #red-team 2
- #adversarial-defense 1
- #adversarial-nlp 1
- #alignment 1
- #api-security 1
- #backdoor-attacks 1
- #black-box-attacks 1
- #carlini-wagner 1
- #certified-robustness 1
- #data-poisoning 1
- #evasion 1
- #evasion-attacks 1
- #federated-learning 1
- #fgsm 1
- #formal-verification 1
- #foundation-models 1
- #gcg 1
- #gdpr 1
- #gradient-inversion 1
- #image-classifiers 1
- #llm-security 1
- #membership-inference 1
- #memorization 1
- #meta 1
- #model-extraction 1
- #model-inversion 1
- #model-stealing 1
- #nlp 1
- #optimization-attacks 1
- #pgd 1
- #production-ml 1
- #randomized-smoothing 1
- #robustness 1
- #text-attacks 1
- #training-data 1
- #training-data-extraction 1
- #transferability 1
- #transformers 1
- #trojan-ml 1
Categories
attacks 8 posts
- Data Poisoning and Backdoor Attacks on Foundation ModelsTraining data manipulation, backdoor triggers, and Trojan attacks against large-scale models. What the threat model actually requires and where the defenses are in 2026.
- Evasion Attacks on Image Classifiers: FGSM, PGD, and C&WThe three foundational gradient-based evasion attacks, what each one actually optimizes, and what the benchmark numbers mean when you're evaluating a defense.
- Adversarial Robustness in NLP: Why Text Attacks Are DifferentDiscrete input spaces, semantic constraints, and human-perceptibility rules change what counts as an adversarial example in text. The attacks are harder to define and harder to defend.
- Adversarial Transferability: Why Black-Box Attacks Work at AllAdversarial examples transfer across models with different architectures and training sets. Understanding why changes what you think defenses need to accomplish.
- Model Inversion Attacks: Reconstructing Training Data from Model OutputsFrom Fredrikson's pharmacogenetics exploit to Geiping's gradient inversion, model inversion attacks recover private training data in ways most ML engineers don't expect.
- Training Data Extraction from LLMs: The Carlini et al. Results and What They MeanCarlini et al. demonstrated verbatim extraction of training data from GPT-2. The results have been widely misread. Here's what the paper actually shows, what makes data extractable, and what production mitigations work.