Random Forest of Tensors (RFoT)

The amount and complexity of malware in the wild continue to increase. There are not enough expert malware analysts to meet the demand, and manual analysis does not scale at large organizations facing a large number of malware attacks daily. It is our contention that machine learning (ML) can be used as a tool to analyze malware at large. Alerts generated by ML-based malware detectors need to be verified by human analysts; therefore, they need to be interpretable. However, popular ML systems used in malware detection do not always explain their conclusions. At the same time, these ML systems are usually supervised methods that do not always generalize to new malware and need an immense amount of labeled data to yield good performance. Obtaining labeled production quality malware data is expensive, and malware curators regularly develop new malware or modify the code and behavior of existing malware to bypass ML-based malware defense systems. Therefore, ML-based defense solutions need to generalize well to novel malware, work well with a small amount of labeled data, and yield good performance under extreme class imbalance conditions to perform well in production. Tensor factorization, by contrast, is a powerful unsupervised method that can extract interpretable multi-faceted patterns of malware that traditional ML methods cannot detect. Using tensors, we hope to achieve precise malware-detection results, using a small quantity of labeled data, and good generalizability to new malware.

Project Members

Sponsor

  • NSA/LPS

Publications

  1. Eren, M., Nicholas, C., McDonald, R., & Hamer, C. (2021). Random Forest of Tensors (RFoT). UMBC Student Collection.
  2. Boutsikas, J., Eren, M. E., Varga, C., Raff, E., Matuszek, C., & Nicholas, C. (2021). Evading Malware Classifiers via Monte Carlo Mutant Feature Discovery. arXiv preprint arXiv:2106.07860.
  3. Eren, M. E., Solovyev, N., Hamer, C., McDonald, R., Alexandrov, B. S., & Nicholas, C. (2021, August). COVID-19 multidimensional kaggle literature organization. In Proceedings of the 21st ACM Symposium on Document Engineering (pp. 1-4).