Swarm-based Reputation Consensus

In a nutshell: a mechanism for improving bartercast with swarm-based information diffusion. Trusted peers start torrents to disseminate their current view of the system and other peers join such torrents to update their views. [Rahim: How a peer knows that it is trusted by others?]

[Todo: the discussion on the feasibility of keeping track of behavior for one million peers is relevant but somehow orthogonal to how information is propagated. Where should we put it then?]

Relationship with BarterCast

BarterCast is a reputation mechanism based on a contribution network built by each peer with information it gathers from the system. Peers propagate information about other peers' behavior through gossiping and use this information to build a directed graph of contributions. This graph then serves as input for a ranking algorithm such as Maxflow (of contributions to the peer calculating the reputations) or PageRank.

An issue with this mechanism, however, is that information might take a considerable amount of time to propagate and therefore the opinion of the community about a peer might diverge for long periods. [Johan: more importantly it facilitates a secure and fast bootstrap. The reputation system at a newly installed peer is fully operational after just downloading a single file]

If information could be passed around in the form of signed certificates of contribution, it could be more trustable. However, this would cause potentially large convergence times. This can be circumvented if some peers merge this information and are trusted to propagate the result of such merging. [Todo: I was unsure whether transaction certificates are sent around in BarterCast.]

Synchronizing views with trusted torrents

Consider a network for which the reputation network has converged in all peers. The central idea of swarm-based reputation consensus is that the centroids of the reputation graph can periodically (once every few days maybe) initiate torrents to propagate the contribution network they currently have. After a (small) number of these peers perform this operation, other peers join the torrents and update their current reputation network by merging their view of the system with that of the trusted peer that initiated the torrent.
[Rahim: I think that reputation graph based on upload contribution is not appropriate to make a snapshot of others upload treatment. Look at example below: In this reputation graph, node 5 has uploaded to nodes 3 and 7 and according to definition of the graph center it is the center of the graph. But based on this info what it can do for others?]
If instead of reputation graph, nodes use another metric to decide whether to make a snapshot of reputation graph or not, it will be better. We can name it awareness graph and each node that knows more makes snapshot and creates relevant swarm. For example, suppose figure below:

In this figure blue nodes and edges between them show the local reputation graph of the associated red node. If the reputation graph be converged all nodes will have same view point. In this figure nodes 6 and 2 more less than node 9. Node 9 knows about 7 upload actions ( number of edges) but the others know about 5 uploads. So if node 9 make a snapshot and propagate it, it will be better and brings more info to others.
But there is two important questions: in a distributed environment how a node should know that it knows more than others and it's knowledge is fresher? ]

If the convergence about the contribution network is not perfect, some peers will not identify correctly who are the trusted publishers of the contribution network. These peers can use the popularity of current swarms distributing contribution networks to estimate which are the most trusted ones. [Todo: This assumes trusted swarm size information.] The rationale for this approach is that if peers synchronize often with the same highly-trusted peers, information will be more rapidly disseminated than through gossiping alone.

[Todo: if we know who are the most trusted peers, maybe it is also a good idea to prioritize them when gossiping new info.]

Open issues:

  • Do we have hard evidence that gossiping alone is not already enough for disseminating contribution information?
  • Reputation in mechanism could be more general than BarterCast data only, including friendship certificates and other data. How to aggregate such different sources?
  • If we know that only trusted peers will merge the information, do we need to keep gossiping to everyone about everyone else's behavior?
  • If different sets of peers start trusting different peers that have different views of the network, can they disagree progressively more over time instead of moving towards consensus?
  • Can we make the system so that it encourages non-tribler peers to follow the protocol?

Analysis of BarterCast

To reply some of the above questions, whether gossiping is enough for spreading info or are there strong peers that their local history is rich enough, we decided to evaluate current running BarterCast and measure some parameters. Below are some of them and their description:

  • Number of replicas in different cycles: This parameter shows the number of peers that have a specified report/s in their local history in different cycles/time of running. To measure this parameter we should check existence of those specified records and count how many peers have them in their cache. It can be shown in a graph, x-axis shows the relative cycle/time and y-axis shows the number of peers that have that records.

  • Coverage rate: The number of peers that have a specified record/records divide by the total number of peers. This shows the convergence speed toward same view. Again in a graph x-axis can show the time and y-axis the coverage percent.
  • Convergency: During discrete time intervals of running or simulation we can measure the convergence of the peers toward same view. The local history shows the view of peer. If H be the union of the all local histories and f(h) be a function that maps local history to a real value, then the convergence of arbitrary peer P can be f(P)/f(H). A distribution of the convergence make clear the effectiveness of the gossiping.
  • Determining the number of power nodes: As a simple definition, power nodes are the nodes that know more and their local history have converged enough. Also this definition is vague but we can clear it later. For example by using power low distribution, if local history of a node covers over 80 % of the total history , f(P)/f(H) > 0.8, it is counted as a power node. The number can be shown as a graph where x-axis and y-axis shows the time and number of power nodes respectively. We can also determine when the first power node comes out.

How to measure

Requirements:

  • Many nodes should run the code. How many is enough?
  • In each cycle the local history of the nodes should be logged. To save space only changes can be logged.
  • Nodes should be active and act as what happens in real case.
  • Churn should be considered.

The BIG question is how? Simulation or actual experiment?

Using crawler

By using modified peers we can crawl inside the local history of the nodes and impose them to log and send their local history's changes. By following the log of changes we can trace the change of local history from beginning to end. Below are the tasks that should be done:

  1. Finalizing the list of parameters to be measured . DONE
  2. Details of how the above parameters should be measured. DONE
  3. Implementing crawler and data gatherer. DONE
  4. Implementing data analysis tool. DONE
  5. Actual running and data gathering. DONE
  6. Data Analysis and publishing the result. TODO

Bartercast graph visualization

To have a proper imagination of the layout and connectivity of the bartercast graph, the collected bartercast records have been visualized in one graph. By using [Graphviz] and [ZGRViewer] tools a directed graph is drawn that shows data transfer between Tribler nodes. We started colleting of data on 20-June and the drawn graph is for 6'th of August. During this period each online node is contacted every hour and asked for new bartercast records. In the below graph:

  • If (A, B , Downloaded, Uploaded) be a bartercast record, if "Downloaded" be greater than zero then arc B-->A is added to the graph.
    If Uploaded be greater than zero then arc A-->B is added too.
  • For each pair of nodes that have data transfer with each other, the most recent record is selected for drawing arc and calculating the sum of transfered data
    .
  • Nodes are ranked by their interaction with other nodes. Before drawing they are ranked by their indegree+outdegree and that value
    determines their rank and color. To do ranking, first maximum of indegree plus outdegree is founded. Then the interval [0, maximum_degree] is divided into 5 interval. The index of each interval shows the rank of the node.
    For example, for a graph with maximum degree 65, if a node's indegree+outdegree be 24 then It's rank is ceil(24/(65/5))= 2.
  • Ranks: Red=5 , Blue=4, Green=3 ,Golden=2, olivergreen=1 and pink=no connection

Bartercast graph (for the first 20000 records), max_degree=65 ===


Bartercast graph zoomed in (for the first 20000 records) ===



Bartercast graph all records till 2009-08-06 , max_degree=189 ===

Note: A higher resolution picture also is attached but showing it on the wiki messes the text. You can download and look at it by other tools.

The "graphviz" files also are attached, "databasecrawler_090806_1_bartercast_graph.gv" for limited first 20000 records, and "bartercast_graph_all_records_till_0806.gv" for all records. You can use ZGRViewer to see different fomrs of zooming (bird eye, block,...) and see nodes connectivity details.

Initial Results

After collecting data for almost a one and half month ( from 20 June till Aug. 6) and processing them we obtained following results. During this period the crawler contacted 2017 nodes. Some of them didn't reply at all but from almost 1800 nodes replied at least once. Some peers were very active and crawler was able to collect their full local history but most of them just come and leave. The cycle or interval length is et to 4 hours and the period of asking peers is one hour. Below is the analysis result for the mentioned set of data.

Average of replicas ===

Average number of replicas in different cycles. Records come in the system in different time, so the first cycle for a records may be different cycle for other records. To resolve this issue, we shift all of the records to the start cycle and measure the average number of records relatively. The same processing is done for number of peers in different cycles. In the way calculations for last periods are less accurate(because less peers and records enter into system in the initial time) but it is acceptable. Figure below shows this


Below graph just shows average number of replicas

Coverage ===

By definition coverage means, the rate of the peers that have a records in their cache. Figure below shows the average of coverage for different records in different cycles

Convergence ===

By Convergence we mean the how local history of a node can represent local union of local history of all nodes in the system. By formula, in a specified cycle the number of records in local history divided by the number of records in all local histories of all nodes. We did measuring convergence at the period and categorized local histories in five groups:

  • Very good: Nodes in this category cover more than 80% of all records in the system.
  • Good: Nodes cover [60%, 80%) of all records.
  • Middle: Nodes cover [40%, 60%) of all records.
  • Bad: Nodes cover [20%, 40%) of all records.
  • Very bad: Nodes cover less than 20% of the records in the system.

figure and table below shows the obtained result

Peers cache richness, accuracy and onliness

The system is a dynamic system and peers join and leave. To differentiate peers from each other we used to fuzzy parameter: richness and accuracy. As the name indicates, the richness means how many records is available in a peer's cache and accuracy gives and indication of synchronization between peer's cache and collected info about peer's cache by crawler. If crawler can not connect to node for long time the accuracy level of that node will decrease. The richness and accuracy of the peers have been categorized into five different groups
To categorize the richness the interval of [0,size_of_largest_cache] is logarithmically divided into five intervals. The richest nodes are those that their cache size is larger than half of the max cache size. The below fuzzy terms have been used to categorize richness peers richness level:

  • Very good: cache_size >= max_cache_size / 2 .
  • Good: max_cache_size / 4 <= cache_size < max_cache_size / 2 .
  • Middle: max_cache_size / 8 <= cache_size < max_cache_size / 4
  • Bad: max_cache_size / 16 <= cache_size < max_cache_size / 8 .
  • Very bad: cache_size < max_cache_size / 16

Figure and table below shows the obtained result

It is possible that some peers be rich in cache but they don't reply to the crawaler for long time. An other metric that is used to measure this phenomena is accuracy. Accurate peers are those that reply to the crawler just very close, in time, to the end of measurement. For exampe, if the last collected record is in the system be logged at time T then those peers that have replied to the crawler after T -threshold_level are very accurate. List below shows different accuracy levels and their meaning:

  • Very good: peer_last_reply_time > T - threshold
  • Good: T-threshold < peer_last_reply_time < 2 x ( T - threshold ) .
  • Middle: 2 x (T-threshold) < peer_last_reply_time < 3 x ( T - threshold ) .
  • Bad: 3 x (T-threshold) < peer_last_reply_time < 4 x ( T - threshold ) .
  • Very bad: 4 x (T-threshold) < peer_last_reply_time

For calculation, the threshold is set to eight hours and a slack of four hours also deducted from last reply time to leverage the values around the middle. Figure below shows the result:

Another parameter about peers that worth to measure is their onliness, or the rate of their online time to the time passed since their discovery by crawler. Because of the way that crawler contacts with peers, their online time is not accurate but in approximation it is very close to accurate value. Again we used fuzzy value to categorize onliness:

  • Very good: onliness >= 0.8 .
  • Good: 0.6<= onliness < 0.8 .
  • Middle: 0.4 <= onliness < 0.6 .
  • Bad: 0.2 <= onliness < 0.4 .
  • Very bad: onliness < 0.2 .

Graph below shows the result:

Correlation between parameters

The accuracy, richness and onliness are in some way correlated. Peers that stay online for long time collect many records, so there should be positive correlation between onliness and richness. Also peers with less accuracy will have less richness and onliness. The following graphs show correlations between different parameters.
For each node, the value of accuracy minus onliness determines correlation. The value for correlation is divided into five categories:

  • High correlation: | accuracy - onliness | = 0
  • Correlated : | accuracy - onliness | = 1
  • Middle correlation : | accuracy - onliness | = 2
  • Weakly correlation: | accuracy - onliness | = 3
  • No correlation: | accuracy - onliness | = 4

same story is applied for other correlations shown below.

Correlation between accuracy and onliness

Correlation between richness and accuracy

Correlation between richness and onliness

Discovery of peers by crawler

Crawler finds new permids in the system by via three ways. The are founded and given by Buddycast or they are in the from_id or to_id part of the bartercast record. If a peer given by Buddycast to the crawler, the crawler tries to connect to that peer and fetch some records from it. To evaluate how the system bhave in this case, the number of discovered permids in different cycle are counted. The first graph shows the cumulative number of discovered peers in different steps, with non-overlao between methods. For example if a permid was seen for the first time in the from_id part, it will not be counted in other ways. The second graph shows the same graph but with overlap. In this graph a permid may be counted in all three way. The last graph show daily discovery of new permids. As it is expected, at the beginning it finds many records but more or less it has decreasing trend.

Cumulative number of discovered peers without overlap

Cumulative number of discovered peers with overlap

daily discovery of new peers

Cumulative number of connected peers

Records update statistics

Another interesting that we measured was the update ratio of bartercast records. By update we mean, a record we same from_id and to_id but different downloaded or uploaded values. For example, record (A,B, d, u ) is an update of record (M,N, o , p ) if A=M and B=N and d>=o or u >= p. In the current system records are overwritten by new updated ones. We measured update ration of bartercast records in two case. First, by considering all records and second putting away those nodes that their from_id is equal to those to_id ( control records). usually control records seen a lot updates but real records very few. Figures below shows the histogram and also cumulative number of updates for these to set of records. Red color graph is for all records ( real and control records) and blue one for just for real records.

Histogram for records with different updates (all records)

Cumulative sum of records with different updates

Histogram for records( distinct from_id and to_id ) with different number of updates

Cumulative sum of records ( distinct from_id and to_id ) with different number of of updates

As it is seen from the graph most of the records never get second update.

Evaluation of the maxflow and Bartercast

To answer the question of whether the Bartercast and maxflow are able to make a good estimation of peers' reputation or not, we calculated average of Subjective Reputations(SR) and then plotted it against net-contribution to Tribler peers.
Because of some limitations in the way that crawler collects data, 50 records per each request, some of the collected local histories are not complete and this incompleteness may diverge the results accuracy. So before running the process we removed all of the incomplete local histories and their associated records. Comparison of these two record set is interesting:

  • Number of records in whole set: 1,320,735 , Number of records in complete set: 986,235. 25% reduction
  • Number of local histories in whole set: 2,675 , Number of complete local histories: 1,425. 47% of local histories were incomplete
  • Number of unique permids in whole set 9,442, Number of unique permids in just complete local histories: 9,396. 0.0049% reduction.
    This means that complete local histories cover all records of incomplete caches.

We also separated the amount of upload/download between Triblers peers and all peers(Tribler and non-Tribler). Some facts about these values are::

  • Total Upload(MB) : 19,787,118, Total Download(MB): 18,361,551 , Total ratio: 1.078 , Total net-contribution (Upload - Download): 1,425,567, Number of peers in calculation: 7,717
  • Upload/Download(MB) between Tribler peers: 888,207 , Total Tribler ratio and contribution: 0, Number of peers in calculation is 2,635.

For some nodes, 974 nodes, we are able to calculate both total and Tribler contribution, the result is very interesting as well:

  • Total upload(MB): 16,849,450, Total download(MB): 15,253,251 , Ratio=1.1
  • Tribler Upload(MB): 470,851 , Total download(MB): 427,996 , Ratio=1.1

But the interesting point is the ratio of Tribler upload and download to total upload and download. In both cases it is almost 0.03 and it means that only 3% of data transfers happens between two Tribler peers and in 97% one side is non-Tribler. It is not clear how much of this data is control data, like Buddycast or Bartercast messages?

Correlation between Tribler net-contribution and average of subjective ratios

The following graphs show the correlation between average of SRs and net-contribution. SRs are calculated from the view-point of the local history holder. For example, if peer Q be inside the local history of peer P( P is the holder of cache) then inside this cache, 2-hop maxflow is calculated from P to Q and from Q to P, then by using the formula: arctan(flow(Q->P) -flow(P->Q))/(PI/2) we calculate the reputation of Q from the point of view of P. Figures below show the average of these SRs, that were able to calculate SR:

Correlation graph without zooming

Correlation graph with [-1000 , 1000 ] on x-axis and [-0.2 ,0.2] on y-axis

Correlation graph with [-500,500] on x-axis and [-0.1, 0.1] on y-axis

Correlation graph with [-200,200] on x-axis and [-0.05 ,0.05] on y-axis

Correlation graph with [-100 , 100] on x-axis and [-0.2, 0.1] on y-axis

Correlation graph with [-100, 100] on x-axis and [-0.02, 0.02] on y-axis

As the graphs show there are few dots in the left-up and right-down quarters of the page and it means that there is positive correlation between Avg of SRs and net-contribution, but from the scattering of dots it does not seems to be a strong correlation. To give a better answer we calculated the correlation coefficient between these two parameters.Because of the non-linear function that is used in calculating the reputation, "arctan" , we don't expect to have a linear correlation between them, as Pearson correlation shows this fact. Before calculating we also removed outliers that may unexpectedly deviate the result. Below is the result for different models in 2-hop maxflow:

  • Pearson: coefficient: 0.005, significant level 0.813. Because significant level is much higher that 0.05, it means that there is no correlation between these two variables.
  • Kendall's tau_b: coefficient: 0.308, significant level: 0.01, weak correlation.
  • Spearman's rho: coefficient: 0.415, significant level 0.01, average correlation.

We also calculated the maxflow with 6-hop distance and below is the correlation coefficients:

  • Pearson: coefficient: 0.009, significant level 0.633.
  • Kendall's tau_b: coefficient: 0.331, significant level: 0.01
  • Spearman's rho: coefficient: 0.448, significant level 0.01

In comparison to 2-hop maxflow there is a little improvement but it is not spectacular. As it was expected there is no strong correlation two parameters but at least it is positive and we can conclude that Bartercast and maxflow can differentiate between very bad and very good users, but not more.

research problems

  • decide for a peer: altruist, freeriding, nautral, newcomer.
  • informativeness
  • dissemination (perfect gossip assumption)
  • best informed policy
  • would work for other/any policy?
  • what accuracy does policy X need?

Related work / Alternative approaches:

Attachments