Researchers from the Polytechnic of Porto found that machine learning-based malware detectors often fail when tested against real-world threats that differ from their original training datasets.
Key Points
- The study evaluated static malware detection models using six public Windows PE datasets, including EMBER, BODMAS, and the obfuscation-focused ERMDS.
- Models performed with high accuracy on internal test data but showed significant performance declines when evaluated against external datasets like SOREL-20M.
- Training models specifically to recognize obfuscated malware improved detection for those samples but simultaneously reduced the model's effectiveness against broader, diverse threat profiles.
- Researchers identified that obfuscation techniques narrow the feature separation between benign and malicious files, creating new blind spots for static detectors.
- The findings highlight that current benchmark metrics may overestimate the reliability of endpoint security tools when deployed in dynamic, real-world enterprise environments.