Mobile, Embedded, & Wireless Security

Large-Scale Realistic Network Data Generation on a Budget


Research in computer networking domains have been a mainstay for many decades. Often a first step for perfoming such research involves acquiring or collecting relevant network data. Unfortunately, network datasets are not as plentiful in the real-world as one would hope or assume. This may leave researchers with only two options: abandon the research, or generate the data required to perform the research.

In this work, we set out to develop a method for realistic network trace data generation which can be applied in a network emulation setting. Network emulation enables the construction of very large-scale, real-world networks within a single physical host, providing an inexpensive testbed within a lab or personal environment, and without the abstractions often present in network simulators. Within such a network, we deploy our method, called eMews, which provides network dataset generation and monitoring, enabling the autonomous generation of realistic network trace data over potentially very large-scale networks. Client-side human behavior is abstracted to a set of behavioral models, which are then used to automate protocols which would normally require human interaction (such as SSH and HTTP/HTTPS). eMews is written with shared resource constraints in mind, allowing it to scale up for very large networks.

eMews Architecture


eMews protocol scheduling paradigm


Initial scalability results on lower-end hardware

Initial scalability results on a Dell Inspiron laptop (Intel Core i7-7500U CPU @ 2.70GHz [2C/4T], 8GB RAM, 8GB swap) shows promise, with RAM being exhausted before CPU. Current work is focused on increasing scalability on lower-end hardware in terms of memory usage, increasing the number of autonomous client-side protocols supported, and creating more expressive human behavioral models.

eMews Open-Source Software


Related Publications