Big Data Forensics: Hadoop Distributed File Systems as a Case Study


Big Data has fast become one of the most adopted computer paradigms within computer science and is considered an equally challenging paradigm for forensics investigators. The Hadoop Distributed File System (HDFS) is one of the most favourable big data platforms within the market, providing an unparalleled service with regards to parallel processing and data analytics. However, HDFS is not without its risks, having been reportedly targeted by cyber criminals as a means of stealing and exfiltrating confidential data. Using HDFS as a case study, we aim to detect remnants of malicious users’ activities within the HDFS environment. Our examination involves a thorough analysis of different areas of the HDFS environment, including a range of log files and disk images. Our experimental environment was comprised of a total of four virtual machines, all running Ubuntu. This HDFS research provides a thorough understanding of the types of forensically relevant artefacts that are likely to be found during a forensic investigation.