Cruiser captures snapshots of overlay topologies. Each snapshot is an
undirected graph with peers as vertices and overlay links as edges.
Crawler snapshots can be used for two basic types of analysis:
- Static Properties:
- The snapshots provide a way to examine static graph properties of
the overlay, such as the degree distribution, average path lengths,
resilience, and the clustering coefficient.
- Dynamic Properties:
- The snapshots also provide a way to examine
dynamic properties of the overlay. By running the crawler
back-to-back, observing the differences between consecutive
snapshots allows us to examine peer arrival and departure patterns
(churn), and to explore the way the overlay topology evolves and
changes over time.
Because the network is changing as the crawler runs, a certain level
of distortion is introduced into the snapshots, analogous to the
blur in an overexposed picture. If the level of distortion is too
high, erroneous conclusions may be drawn about the nature of the
topology. As with any measurements, it's critical to understand the
amount of error in the measurements before drawing conclusions. Prior
overlay topology studies did not address this fundamental concern,
taking an hour or more to capture a snapshot.
To achieve rapid crawls, Cruiser runs on a small server farm, with
each server contacting hundreds of peers in parallel. In
, we describe some of the details of Cruiser's
implementation.
Our initial version of Cruiser was tailored to the Gnutella network,
where it can capture 1 million nodes in under 7 minutes. In
, we develop techniques to evaluate the accuracy of
P2P crawlers and show that the Gnutella snapshots captured by Cruiser
have a level of distortion of around 4%. Additionally, we demonstrate
how crawling the network too slowly can lead to erroneous conclusions.
More recently, we modified Cruiser to user a plug-in architecture. By
writing a small system-specific plug-in, Cruiser can capture snapshots
of other P2P systems. In addition to Gnutella, Cruiser can now crawl
the Kad DHT network. Due to the large size and rich routing tables in
Kad, Cruiser cannot capture the entire Kad topology in a reasonable
amount of time. Therefore, we added a feature that allows Cruiser to
focus on a particular zone with in a DHT network, capturing the
topology within a certain region of the DHT geometry
.
While originally designed to capture snapshots of the overlay
topology, with some changes Cruiser can be adapted to capture other
peer properties. With File-Cruiser, we modified Cruiser to capture
the list of files shared at each peer in Gnutella. To
our knowledge, this is the only tool which attempts to capture a
complete list of all the files shared in a large P2P network.