"Phone Home" functionality in the SwarmPlayer

The SwarmPlayer as used in the StreamingExperiment periodically sends measurement data to our servers. This page describes the information transferred, how it is stored and how it can be manipulated.

Note that phoning home can be disabled by setting PHONEHOME=False in [source:abc/branches/player-release-1.0/Tribler/Player/Reporter.py Tribler/Player/Reporter.py].

Client-side

The logging functionality of the SwarmPlayer is provided by [source:abc/branches/player-release-1.0/Tribler/Player/Reporter.py Tribler/Player/Reporter.py]. It defines a Reporter class which is instantiated in the main !Swarmplayer file [source:abc/branches/player-release-1.0/Tribler/Player/swarmplayer.py Tribler/Player/swarmplayer.py].

The Reporter class provides a report_stat(ds) function, which parses the download state ds and appends the result to a logging buffer. This logging buffer is encoded and sent to our logging server:

data = zlib.compress( s, 9 ).encode("base64")

sock = urllib.urlopen("http://swarmplayer.mininova.org/reporting/report.cgi",data)

result = sock.read()

sock.close()

The result reported by our logging server is the number of seconds before the next report should be sent. This allows a server-side reporting rate control.

Information transmitted

Every second, a dictionary (report) is created and appended to the logs. The dictionary has the following keys defined:

'timestamp', the client timestamp at which the dictionary was created
'epoch', the client timestamp at which the current playback session started
'listenport', the port at which the SwarmPlayer is listening
'infohash', the infohash of the torrent being streamed
'filename', the name of the file being streamed
'peerid', the peer id, in printable characters (`...`)
'live', whether we're streaming live video (True) or video-on-demand (False)
'progress', download progress percentage (video-on-demand)
'down_total', total number of kbytes downloaded from *current peers*
'down_rate', current download speed (kbyte/s)
'up_total', total number of kbytes uploaded to *current peers*
'up_rate', current upload speed (kbyte/s)
'p_played', number of pieces played since epoch
'p_dropped', number of pieces dropped (not received or too late) since epoch
'p_late', number of pieces received, but too late
't_prebuf', number of seconds required for prebuffering
't_stalled', number of seconds spent in autopause/buffering, not including prebuffering
'validrange', (playbackpos,maxvalidpiece) tuple describing which pieces we're interested in downloading. In case of live streaming, wraparound is possible. If playback hasn't started yet, validrange == "".
'pieces', piece info (see below) since last report
'peers', list of current peers (see below)

Caveat: due to a bug, 'down_total' and 'up_total' only count the neighbours the client is currently connected with. Any bytes exchanged with old neighbours have to be reconstructed from previous logging entries.

Piece information

The piece information is a dictionary constructed in [source:abc/branches/player-release-1.0/Tribler/Core/Video/VideoOnDemand.py Tribler/Core/Video/VideoOnDemand.py] with an entry for each piece number. Each entry is a dictionary with the following contents:

'known', the client timestamp at which the piece first became known (first HAVE received)
'completed', the client timestamp at which the piece was completely obtained
'tobuffer', the client timestamp at which the piece was pushed to the playback buffer
'toplayer', the client timestamp at which the piece was read by the video player

Only information about completed pieces (pushed to player, or dropped) is transmitted. The information about that piece is discarded after logging, so no piece is logged twice.

Current peers

The current-peer list is a list of dictionaries, each dictionary describing a neighbour to which the client is currently connected:

'g2g', is 'bt' in the case we're talking BitTorrent, or 'g2g' if we're talking Give-to-Get to this neighbour
'addr', is 'ip:port:dir', the address of the neighbour. The direction 'dir' can be either L or R, describing which side initiated the connection (local or remote). Note that the port number does not match the neighbour's listening port if the connection was remotely initated.
'id', the neighbour's peer id
'g2g_score', a Give-to-Get score tuple at which this neighbour is rated
'down_str', a string 'ci' which capitalises either character when we're choked (C) and when we're interested (I)
'down_total', total number of kbytes we downloaded from this neighbour on this connection
'down_rate', current download speed from this neighbour
'up_str', a string 'cio' which capitalises either character when we're choking the neighbour (C) and when the neighbour is interested (I), and whether this neighbour is optimistically unchoked (O).
'up_total', total number of kbytes we uploaded to this neighbour on this connection
'up_rate', current upload speed to this neighbour

Server side

(under construction)

We use the following !PostgreSQL database structure:

drop table logv2;

create table logv2 (

     id serial primary key,

     ts timestamp with time zone default ('now'::text)::timestamp with time zone,

     epoch integer,

     ip inet,

     port integer,

     data text,

     infohash text,

     filename text,

     progress integer,

     uploaded integer,

     downloaded integer

);

Which is filled using the information provided by the client. The ip address is obtained from Apache. Note that recording the server time (ts field) is vital as the client's clock cannot be trusted to run correctly. In all time calculations, adjust for the skew (min(ts)-epoch), where min(ts) denotes the time of the first report of that session.

A session is identified by a (ip,port,epoch) tuple, as each client will generate a new epoch if it restarts. We index these sessions:

drop index logv2_sessions;

create index logv2_sessions on logv2 (ip,port,epoch);

We want to be able to quickly filter out only the latest data, to generate live reports:

drop index logv2_ts;

create index logv2_ts on logv2 (ts);

Finally, searching for specific swarms is useful. Note that two peers streaming different files within the same torrent (infohash) do not help each other, and thus can be considered to be different swarms.

drop index logv2_swarms;

create index logv2_swarms on logv2 (infohash,filename);

To store whether we can connect to each peer, we collect our NAT check statistics in natchecks:

drop table natchecks;

create table natchecks (

     id serial primary key,

     ts timestamp with time zone default ('now'::text)::timestamp with time zone,

     epoch integer,

     ip inet,

     port integer,

     connectable boolean,

     lastcheck timestamp with time zone,

     firstseen timestamp with time zone,

     lastseen timestamp with time zone

);

With an index to join it with the logv2 table.

drop index natchecks_sessions;

create index natchecks_sessions on natchecks (ip,port,epoch);

Finally, we periodically collect geolocation information about many peers, and cache this in geolookups:

drop table geolookups;

create table geolookups (

     ip inet primary key,

     ts timestamp with time zone default ('now'::text)::timestamp with time zone,

     latitude float,

     longitude float,

     source text

);

The 'source' field indicates who we asked to obtain the location (hostip,geoip,geoiptool).