STARDUST: Sustainable Tools for Analysis and Research on
Darknet UnSolicited Traffic
Network Telescopes (aka
Internet black holes, Internet sinks, darknets, darkspace) are
passive monitoring systems capturing unsolicited Internet traffic sent
to a segment of unutilized IP address space (i.e., IP addresses owned
by an organization but not assigned to any hosts). Traffic captured at
network telescopes (“telescope traffic”) provides precious data for
study a large variety of Internet-related
What is STARDUST?
STARDUST is a collection of software
tools and datasets as well as research infrastructure built to make real-time and
historical analysis of telescope
traffic efficient and easily accessible to researchers. STARDUST hosts
Network Telescope, one of the largest known network telescopes on the
Internet (≈12 million IPv4 addresses). It also provides
data from network telescopes operated by other collaborating organizations.
In addition, it makes available a research compute environment that
enables users to access telescope traffic in real-time as well as
historical datasets with various level of granularity (raw packets,
flow-level data, time series) and augmented with meta-data
(IP geolocation, IP-to-ASN, special tags).
Each user is allocated a dedicated virtual machine.
The diagram below provides an overview of the STARDUST infrastructure
Unsolicited network traffic is captured by the UCSD Network Telescope
and sent to the STARDUST Packet Distribution Server, which streams
traffic on a dedicated VLAN through a
specifically conceived for the STARDUST project.
Each user can access this stream of traffic in real time from their own
STARDUST virtual machine. In addition, STARDUST internal components
process this traffic, augment it with meta-data (e.g.,
geolocation information or ASN associated with the source IP address
of each packet), and re-distribute it on the same VLAN on a separate
stream that is also accessible from the users’ VMs.
From the same VM, users have access to the STARDUST cloud-based object
storage, where raw traffic traces and
flow-level traces are stored.
Access to (and capabilities to process) all these data resources is provided through
various tools, software libraries and APIs documented on this web
Finally, the traffic is also continuously processed to extract
statistics (e.g., per-minute count of unique source IP addresses per
country or ASN or protocol port number) saved as time-series data
that can be visualized through browser-accessible Grafana dashboards.