The Ion P2P Project: Empirical Characterizations of P2P Systems
The ION P2P Project aims to improve our understanding of Peer-to-Peer
systems by answering two fundamental questions:
- How do we collect accurate measurements from P2P systems?
- What are useful models to characterize their components?
To answer these questions, we pursue the following goals: (i)
develop and verify accurate measurements techniques, (ii) present
empirical results which may be used to evaluate models and improve our
understanding of existing systems, and (iii) suggest useful
models based on the empirical observations.
Peer-to-peer systems are becoming increasingly popular, with millions
of simultaneous users and covering a wide range of applications, from
file-sharing programs like LimeWire and eMule to Internet telephony
services such as Skype. Understanding existing systems and devising
new P2P techniques relies on having access to representative models
derived from empirical observations of existing systems. However, the
large and dynamic nature of P2P systems makes capturing accurate
measurements challenging.
While some prior studies have attempted to characterize different
aspects of P2P systems, they have not taken the first step of
critically examining their measurement tools (i.e., answering the
first question), leading to conflicting results or conclusions based
on measurement artifacts (e.g., power-law degree distributions may
be the result of measurement error).
There are two basic approaches to collecting measurements from P2P
systems, each with advantages and disadvantages. The first approach
is to capture a global view by collecting data about every peer in
the system. The advantage of this approach is that all the
information is available. The typical problem with this approach is
that the state changes while the measurement tool communicates with
the peers, leading to a distorted view of the system. The longer the
tool requires to capture the global view, the more distorted the
data. Additionally, capturing a global view does not scale well. As
P2P systems grow larger, capturing a global view becomes more time
consuming, leading to greater distortion. To be at all practical,
capturing a global view requires an exceptionally fast tool able to
gather data from a large number of peers very quickly.
The second approach is to collect local samples. The advantage of
this approach is that it scales well. The Law of Large Numbers from
statistics tells us that the average from a large number of samples
will closely approximate the true average, regardless of the
population size. One disadvantage of sampling is that we cannot
easily use samples to examine certain properties which are
fundamentally global in nature. For example, we cannot compute the
diameter of a graph based on local observations at several peers.
More importantly, if the collected samples are biased in some way, the
resulting data may lead us to incorrect conclusions. For this reason,
capturing samples requires validation of the sampling tool, which must
be carefully designed to avoid bias.
Systematically tackling the problem of characterizing P2P systems
requires a structured organization of the different components. At
the most basic level, a P2P system consists of a set of connected
peers. We can view this as a graph with the peers as vertices and the
connections as edges. One fundamental way we can divide the problem
space is into properties of the peers versus properties of the way
peers are connected. Another fundamental division is examining the
way the system is versus the way the system evolves.
The table below presents an overview of several interesting
properties categorized by whether they are static or dynamic, and
whether they are peer properties or connectivity properties.
| |
Peer Properties |
Connectivity Properties |
| Static Properties |
|
|
| Dynamic Properties |
|
|
This material is based upon work supported in part by the National
Science Foundation (NSF) under Grant No. Nets-NBD-0627202 and an
unrestricted gift from Cisco Systems. Any opinions, findings, and
conclusions or recommendations expressed in this material are those of
the authors and do not necessarily reflect the views of the NSF or
Cisco.