PEX crawl

We are conducting extensive measurements to understand the current performance of PEX. The objective of the measurement is to estimate features and possibilities of any future PEX-based distributed tracker system. To perform a crawl, a high-performance low-footprint BitTorrent bot was implemented in libevent+JavaScript (v8).

EZTV experiments

One fresh EZTV swarm having total peer count around 4000 was crawled in a straightforward manner: the bot bootstrapped with some peers, did a BitTorrent connection and waited for PEX messages. As a PEX message arrives, the bot attempt to connect to every mentioned peer. The process is recursive. In every experiment, the total number of known peers was close to the tracker's estimation of the swarm size.

The central point of the measurement was to measure the rate of obsolescence of PEXed data. The rate was high.

Three crawls were made in 2008: on 14 Oct 16:16 and 17:16 pm and on 15 Oct on 10:16 am.

time 14 Oct 16:16 14 Oct 17:16 15 Oct 10:16 16 Oct 14:00
peers known 4562 4183 3782 3024
peers connected 719 16% 470 11% 651 17% 743 25%
PEX sources 121 104 427* 514
  • Because of change in the code, the third run lasted longer, so more peers sent their PEX messages in.

On the rate of obsolescence.

  • Intersection between 1st run and 2rd run connected peers (719, 470 resp) is still 73 peers
  • Intersection between 1st run and 3rd run connected (719, 651 peers resp) is 15 peers!
  • Intersection between 2nd run and 3rd run connected (479, 651 peers resp) is 10 peers!
  • All three lists of connected peers intersect by just 3 items!

So, old PEX data is mostly garbage. And even with the freshest data, we may shortcut a triangle in about 15% of the cases (i.e. connect to a peer who is connected to somebody we are connected to). Lists of known peers of the 1st and 2nd runs intersect by about a half. Between 1 and 3 runs, 334 known peers are common; between 1st and 4th runs, 168 known peers are common. So, peer rotation rate is high. Everything is very fluid and volatile here.

Obvious reasons for peer unconnectability:

  • NAT/firewall, one-way connectivity
  • peers are normally overloaded, as the maximum number of connections is reached, new incoming connections are dropped (busy)
  • peer pipes are saturated => probability of packet loss
  • stale data - peer was online, not anymore

UPDATE 16 Oct On the next day, 16 oct, connection success rate is 25% using the same version of software. it really depends on the wind! (swarm lifecycle stage?)

Private tracker experiments

Is it possible?

Multiswarmness experiments

  • What is the average peer swarmness?
  • Is the interswarm graph more stable than peer graph?
  • [TorrentSmell] feasibility
  • bandwidth data for Maciek
  • piece subset correlations
  • revisit [BitBanking] feasibility

10% connected => know arrivals, departures of virtually any peer on the swarm, but will not know the connection topology. Full tomography => connect all non-NATed and listen for NATed => will know the connection topology to a large degree.