Statistics crawling and remote network debugging

To understand how the network functions, how healthy it is, and to identify failures it is possible to crawl the Bittorrent network and thus also Tribler.

We integrated some additional support for crawling into Tribler. This feature is specifically designed to be able to prevent cascading failures and understand possible security incidents as the Gnutella worm from 2001. This crawler will not record personal data, psychological information, nor other sensitive data. No Person-identifiable data such as names, addresses or contact details are recorded. We will never publish any personally identifiable information in any form or function. Neither do we record the name of the files or exact filesize, therefore, we do not know what content is played using our software.

This feature is can be easily turned on or off by removing the id's that identify the crawlers in the Tribler/Core/Statistics/crawler.txt file. Our Beta software has this feature enabled by default. We are trying to be the good guys and will not snoop around without reason, however, we need to gather performance statistics. For example, for developing 4G P2P we need solid performance data on frame loss while doing on-demand Bittorrent streaming. This helps improve our algorithm and go towards sustained loss-less 4+ Mbps streaming.

The Crawler can be used to connect to Tribler peers through the overlay swarm. A Crawler can than request specific statistics or for instance request a NAT test. All Tribler peers, as of version 4.2, are running the crawler backend, allowing them to respond to Crawler requests from authorised Crawler peers. To keep everyone from acting as a Crawler and requesting (possibly sensitive) information from peers, a Crawler request is only accepted from a select list of Crawler, identified with their public keys. As no one outside of Tribler has the associated private keys, they will not be able to act as a authentic Tribler Crawler.

Features

  • Frequency: we want to run several Crawler processes at the same time. However, we do not want the clients to handle duplicate requests. Therefore, all request-messages that are received within a given frequency will not be passed to the researchers part of the Crawler. Note that in this case the Crawler will send a reply-message with error-value 254, indicating a frequency error.
  • Channels: while not advisible it is possible to request large amounts of data from client processes. When the reply-messages is larger than a certain threshold (defined in Tribler/Core/Statistics/Crawler.py) this message is split into parts. When all parts are received the researchers 'received-reply' method is called.

For researchers

Crawler processes send Crawler-request messages to peers they find through buddycast. A peer that receives this request should send a matching Crawler-reply back as soon as possible. Any actions performed upon receiving a Crawler-request is left to the researchers.

To use the Crawler mechanism, at least the following steps must be taken:

  1. Reserve message-specific-id to identify request/reply pairs. These Message-id's are defined in Tribler/Core/BitTornado/BT1/MessageID.py.
  2. Register three methods to the Crawler. Registering can be done in Tribler/Core/Overlay/OverlayApps.py.
    • Initiator-callback: called when a new connection is made AND periodically thereafter (crawler-side)
    • Request-callback: called when a request-message is received (client-side)
    • Reply-callback: called when a reply-message is received (crawler-side)
  3. Code to handle the actual requests and replies in a file at Tribler/Core/Statistics/MMMMCrawler.py. This includes writing the received statistics to disk or database.
  4. Ensure that your code is in the latest Tribler release.
  5. Start running one or more Crawler processes 24/7 (python Tribler/Main/crawler.py)

Request --> Reply

Sending a request from a Crawler to a Tribler peer

4 byte: Message length

1 byte: CRAWLER_REQUEST (from Tribler/Core/BitTornado/BT1/MessageID.py)

1 byte: --MESSAGE-SPECIFIC-ID--

1 byte: Channel id

2 byte: Frequency

n byte: Request payload

Sending a reply from a Tribler peer to a Crawler

4 byte: Message length

1 byte: CRAWLER_REPLY (from Tribler/Core/BitTornado/BT1/MessageID.py)

1 byte: --MESSAGE-SPECIFIC-ID--

1 byte: Channel id

1 byte: Parts left

1 byte: Indicating success (0) or failure (non 0)

n bytes: Reply payload