Dataset Description:

The malware samples were collected by searching for available 32-bit ARM-based malware in the Virus Total Threat Intelligence platform as of September 30th, 2017 . All files were unpacked using Debian installer bundle and then Object-Dump tool was used to decompile all samples. We wrote a Linux bash script for the dataset samples’ OpCodes. First, the script extracted each Debian package files (deb file), then searched for ELF files from the extracted materials, and finally feeding the object-dump tool to decompile the ELF files. The decompiled codes were then pruned to extract the sequence of OpCodes in each sample. The output of object dump tool consists of irrelevant data such as operands and line of codes. Thus, our batch script was applied on each output in order to obtain a pruned file, which listed the sample OpCodes in a sequential order. In terms of the instruction set in these type of microprocessors, Cortex-A has the largest instruction set (OpCodes). Since Raspberry Pie II devices is based on Cortex-A, the complete set of Opcodes obtained will increase detection date (in comparison to, say the Cortex M families since memory management instruction set is not provided).
 
To obtain raw samples please contact dataset@cybersciencelab.org via your academic institution email address.
 
 

Related Articles:

  • – A deep Recurrent Neural Network based approach for Internet of Things malware threat hunting( Paper Link ).
  • – Robust Malware Detection for Internet of (Battlefield) Things Devices Using Deep Eigenspace Learning( Paper Link )