Multimedia & Internetworking Research Group
University of Oreg
on




The Ion P2P Project: Empirical Characterizations of P2P Systems

Home Global Sampling Properties Data Publications People Links

Shared Files

One of the components of the workload of a P2P file-sharing system is the availability of the files on peers. Characterizing the available files among participating peers is valuable for several reasons. First, it reveals the properties, distribution and heterogeneity of the resources contributed (i.e., storage space and available files) by users of the system. Second, it allows us to identify any potential design anomaly that might be exposed in a practical setting or any opportunity that can be used to improve performance of these systems. Third, collected traces and derived characteristics of available files through measurement can be also used to conduct more realistic simulations or analytical modeling on available files in P2P systems.

In [1], we use File-Cruiser to collect snapshots of the files shared by peers in the Gnutella network. Our main findings include:

  • Free riding has significantly decreased among Gnutella users during the past few years and is significantly lower than other P2P file-sharing applications such as eDonkey. While the ratio of free riders among Ultrapeers is slightly lower than among leaf peers, there is no correlation between a peers' uptime and their tendency to free ride.
  • The number of shared files and contributed storage space by individual peers both follow a power-law distribution. Compared to earlier studies, Gnutella users contribute significantly more disk space but share approximately the same number of files.
  • The popularity distribution of individual files follows a Zipf distribution which means that a small number of files are extremely popular.
  • The most popular file type is the MP3 file, which accounting for two-thirds of all files and one-third of all bytes. Both the popularity and occupied space by video files has tripled over the past few years. However, the number of video files are less than one-tenth of audio files but they occupy 25% more bytes. 93% of bytes in the system are occupied by multimedia files.
  • Files are rather randomly distributed throughout the overlay and there is no strong correlation between the files shared by neighboring peers in the overlay topology. However, files shared by geographically co-located peers have a visible degree of similarity.
  • Shared files by individual peers slowly change over the timescale of days. However, over the entire system, more popular files experience larger variations in their popularity. Furthermore, the recent past trend in variations of a file popularity seems to predict its changes in popularity in the near future.

We have made some of snapshots available for the user of other researchers here.

[1]Shanyu Zhao, Daniel Stutzbach, and Reza Rejaie, "Characterizing Files in the Modern Gnutella Network: A Measurement Study", SPIE/ACM Multimedia Computing and Networking, San Jose, January 2006.