One of the components of the workload of a P2P file-sharing system is
the availability of the files on peers. Characterizing the available
files among participating peers is valuable for several reasons.
First, it reveals the properties, distribution and heterogeneity of
the resources contributed (i.e., storage space and available files) by
users of the system. Second, it allows us to identify any potential
design anomaly that might be exposed in a practical setting or any
opportunity that can be used to improve performance of these systems.
Third, collected traces and derived characteristics of available files
through measurement can be also used to conduct more realistic
simulations or analytical modeling on available files in P2P systems.
In , we use File-Cruiser to collect snapshots of the
files shared by peers in the Gnutella network. Our main findings
include:
- Free riding has significantly decreased among Gnutella users
during the past few years and is significantly lower than other
P2P file-sharing applications such as eDonkey. While the ratio of
free riders among Ultrapeers is slightly lower than among leaf peers, there
is no correlation between a peers' uptime and their tendency to
free ride.
- The number of shared files and contributed storage space
by individual peers both follow a power-law distribution. Compared
to earlier studies, Gnutella users contribute significantly more disk
space but share approximately the same number of files.
- The popularity distribution of individual files
follows a Zipf distribution which means that a small
number of files are extremely popular.
- The most popular file type is the MP3 file, which
accounting for two-thirds of all files and one-third of all bytes. Both
the popularity and occupied space by video files has
tripled over the past few years. However, the number of
video files are less than one-tenth of audio files but they occupy
25% more bytes. 93% of bytes in the system are
occupied by multimedia files.
- Files are rather randomly distributed throughout
the overlay and there is no strong correlation between the
files shared by neighboring peers in the overlay topology.
However, files shared by geographically co-located peers
have a visible degree of similarity.
- Shared files by individual peers slowly change over the timescale
of days.
However, over the entire system, more popular files experience larger
variations in their popularity. Furthermore, the recent past trend in
variations of a file popularity seems to predict its changes in
popularity in the near future.
We have made some of snapshots available for the user of
other researchers here.