AI Related 🐂

OWASP ML Top 10 (Machine Learning)

Input Manipulation ML01 Small input changes that cause harmful outputs Example Graffiti on road signs fools self-driving image classifier Adversarial Attacks on Traffic Sign Recognition: A Survey (2307.08278) Robust Physical-World Attacks on Deep Learning Models (1707.08945)
Data Poisoning ML02 Malicious training data that degrades accuracy or adds backdoors Example Poisoned data plants a backdoor instead of AV protection Protecting against simultaneous data poisoning attacks (2408.13221)
Model Inversion ML03 Reconstructs input/output from trained inverse model Trains second model on outputs to reconstruct inputs Example Cancer classifiers: inverse models can reconstruct patient data from outputs/leak PII

Language Model Inversion (2311.13647)
Membership Inference ML04 Detects whether sensitive info is in training data Privacy risks Medical/financial data: Public/cloud MLaaS models Do Membership Inference Attacks Work on Large Language Models? (2402.07841)
Model Theft ML05 Training separate model from interactions with original to steal IP Theft duplicates model functionality through queries to replicate preferences

A Model Stealing Attack Against Multi-Exit Networks (2305.13584)
AI Supply Chain ML06 Exploiting vulns in any part of the ML supply chain Data sources/libs/pre-trained models
Transfer Learning ML07 Manipulating baseline model fine-tuned by 3rd party Pretrained models common baseline; fine-tuning may preserve backdoors/bias
Model Skewing ML08 Skewing the model's behavior Example Poisoned labels make malware look benign
Output Integrity ML09 Manipulating output before processing, making it look different Intercept/alter output in transit; model unchanged.
- Example A system deletes binaries flagged as malware. An attacker manipulates the classifier's output to appear benign before processing, preventing deletion while keeping the model intact.
Model Poisoning ML10 Manipulating model weights Directly alters weights; needs param access. ACE: A Model Poisoning Attack on Contribution Evaluation Methods... (2405.20975)