The dataset was systematically developed by integrating and augmenting multiple high-quality MQTT-based intrusion detection datasets, enabling a comprehensive and protocol-aware representation of IoT communication. Unlike prior datasets that predominantly focus on packet-level or TCP-based analysis with limited consideration of application-layer semantics, this dataset captures rich MQTT behavioral patterns by leveraging protocol-aware feature extraction and diverse attack scenarios across multiple sources.
Captured and labeled 1 Data source: MQTT-based network traffic transformed into protocol-aware flow representations
Source datasets: Combination of MQTTset, MQTT-IoT-IDS2020, and DoS/DDoS-MQTT-IoT datasets
Attacks Profile: Diverse MQTT-specific attacks including brute-force authentication, malformed messages, SlowITe flooding, scanning, and DoS/DDoS variants
Data size: Multi-gigabyte-scale PCAP data aggregated from multiple datasets
Data records: Millions of packets and MQTT flows across all categories
Data capturing: Aggregated from multiple benchmark datasets with controlled preprocessing and balancing
Extracted Features: 404 MQTT-aware features (reduced to 378 after preprocessing) capturing session behavior, message patterns, and bidirectional interactions
This dataset introduces a protocol-aware, flow-based representation via the MQTTFlowLyzer analyzer, enabling the extraction of temporal, statistical, and behavioral characteristics beyond traditional packet-level inspection. By incorporating MQTT semantics into flow construction, the dataset enables deeper modeling of session dynamics and message-level interactions, which are critical for detecting sophisticated attacks that mimic benign traffic. Furthermore, the dataset is designed to support advanced AI and LLM-based intrusion detection, where high-dimensional behavioral features can be leveraged by attention-based and deep tabular models for adaptive feature selection and contextual threat analysis. This framework facilitates the development of next-generation intrusion detection systems that move beyond isolated traffic analysis toward context-aware, protocol-sensitive security intelligence tailored for IoT environments.
The full research paper outlining the details of the dataset and its underlying principles:
"MQTTFlowLyzer: InterpretableTabNet-Based Flow-Level MQTTIntrusion Detection for IoT”, Arefeh Kouhi and Arash Habibi Lashkari, Journal of Supercomputing, Volume 82, article number 334, 2026, March 2026
Download Dataset:
