Changes between Version 38 and Version 39 of SimilarityFunction
- Timestamp:
- 06/10/09 16:03:30 (5 months ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
SimilarityFunction
v38 v39 125 125 126 126 == Dataset == 127 Parsing superpeerlogs in order to get insight into availible data. 127 First started by parsing superpeer logs in order to get insight into availible data (See SuperpeerLogs). 128 128 129 Stats extracted from logs: 130 [[Image(tribler_daily_usage.png)]] 131 The usage of tribler spikes in 2007, after slashdot and bbc news published articles on it. 129 Then a subset was created using the top-50000 users with the most downloaded files. 130 This subset has 252.469 items and 50.000 users. Using tf/idf 31.906 items were assigned to a category. This helps evaluating the performance of more complex similarity functions and was done manually. First using the tf/idf the more frequent terms were discovered. Which were used to create a list of categories. All items matching all or a combination of the terms of a category where written to a file. These files were then checked manually and incorrect items were removed/disabled. 132 131 133 [[Image(tribler_first_last.png)]]134 This graph is generated using the first and lastseen date of each permid.135 136 [[Image(tribler_top50_cumul.png)]]137 A strange graph, the tribler top 50 most active downloaders consists of 2 groups. The first group is a very old one and used the program in 2006. The new group are users starting in 2009 and are using the 4.5 client. But no active users in between?138 139 [[Image(tribler_top50_hour.png)]]140 Used to detect bots in the top50 downloaders. But possibly due to resolution of the superpeerlogs and the buddycast protocol (send message every 4 hours) this graph says nothing.141 132 142 133
![(please configure the [header_logo] section in trac.ini)](/images/TriblerLogo.png)