Changes between Version 38 and Version 39 of SimilarityFunction

Show
Ignore:
Timestamp:
06/10/09 16:03:30 (5 months ago)
Author:
niels@… (IP: 130.161.159.240)
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SimilarityFunction

    v38 v39  
    125125 
    126126== Dataset == 
    127 Parsing superpeerlogs in order to get insight into availible data. 
     127First started by parsing superpeer logs in order to get insight into availible data (See SuperpeerLogs).  
    128128 
    129 Stats extracted from logs: 
    130 [[Image(tribler_daily_usage.png)]] 
    131 The usage of tribler spikes in 2007, after slashdot and bbc news published articles on it. 
     129Then a subset was created using the top-50000 users with the most downloaded files. 
     130This subset has 252.469 items and 50.000 users. Using tf/idf 31.906 items were assigned to a category. This helps evaluating the performance of more complex similarity functions and was done manually. First using the tf/idf the more frequent terms were discovered. Which were used to create a list of categories. All items matching all or a combination of the terms of a category where written to a file. These files were then checked manually and incorrect items were removed/disabled. 
    132131 
    133 [[Image(tribler_first_last.png)]] 
    134 This graph is generated using the first and lastseen date of each permid. 
    135  
    136 [[Image(tribler_top50_cumul.png)]] 
    137 A strange graph, the tribler top 50 most active downloaders consists of 2 groups. The first group is a very old one and used the program in 2006. The new group are users starting in 2009 and are using the 4.5 client. But no active users in between? 
    138  
    139 [[Image(tribler_top50_hour.png)]] 
    140 Used to detect bots in the top50 downloaders. But possibly due to resolution of the superpeerlogs and the buddycast protocol (send message every 4 hours) this graph says nothing. 
    141132 
    142133