Internet of Things Malware Dataset

Description:
This dataset includes Arm Cortex-M processor family samples which is one of the market leaders in the microcontroller market, and the Cortex-R processor family is typically used in specialized controllers such hard disk drives. The malware samples were collected by searching for available 32-bit ARM-based malware in the Virus Total Threat Intelligence platform as of September 30th, 2017. The collected dataset consisted of 280 malware and 271 benign files. All files were unpacked using Debian installer bundle and then Object-Dump tool was used to decompile all samples. We wrote a Linux bash script for the dataset samples’ OpCodes. First, the script extracted each Debian package files (deb file), then searched for ELF files from the extracted materials, and finally feeding the object-dump tool to decompile the ELF files. The decompiled codes were then pruned to extract the sequence of OpCodes in each sample.

Cite this dataset:
@article{HADDADPAJOUH201888,
title = {A deep Recurrent Neural Network based approach for Internet of Things malware threat hunting},
journal = {Future Generation Computer Systems},
volume = {85},
pages = {88-96},
year = {2018},
issn = {0167-739X},
doi = {https://doi.org/10.1016/j.future.2018.03.007},
url = {https://www.sciencedirect.com/science/article/pii/S0167739X1732486X},
author = {Hamed HaddadPajouh and Ali Dehghantanha and Raouf Khayami and Kim-Kwang Raymond Choo},
keywords = {ARM-based IoT malware detection, IoT malware detection, Long short term memory, Machine learning, OpCodes analysis, Deep learning threat hunting}
}

Download dataset:
https://github.com/CyberScienceLab/Our-Datasets/tree/master/IoT/OpCode/OpCode