Similarity Function
In Buddycast3, recommendation is used for two main purposes: item (i.e. torrent file) recommendation and peer recommendation. Both of them are formalized as ranking problems relying on the calculation of peer-peer similarity.
Peer-Peer Similarity Metric
Assuming that peer u has created a peer-item matrix with received Buddycast messages and its own preference list, the similarity of peer u' to u is calculated by formula (1):
- Li denotes the list of peers who prefer(i.e. download)item i. c(u)denotes the length of peer u's preference list. Smoothing parameter mu is set to 1 and augment parameter k is set to 100000 in Buddycast3. It's easy to prove that k>sim(u',u)>=0. Furthermore, for a experience peer (i.e. having a number of preferece lists from other peers), 0.5k>sim(u',u)>=0.
- |Li| denotes the length of the list. It is a reverse-frequency factor. The importance of an item for calculating peer-peer similarity decreases with the frequency of its occurrences in users’ preference lists.
- Similarity is asymmetric: sim(u',u) is not equal to sim(u,u').
Item Recommendation
Item-Peer Recommendation
The similarity of an item to a peer is estimated by formula(2)
In most cases sim(u',u) is equal to zero due to the sparsity of the peer-item matrix. Therefore, it's often not feasible to rank all of items by peer-peer similarity.The formula remedies the drawback by incorporating the global taste (i.e. the popularity of an item) and the personal taste. Item-peer similarity will be cached and periodically updated.
Item-Item Recommendation
The tribler client also provides an item-item style recommendation. While clicking one item in the client UI, a ranked list of items related to the one clicked is shown to the user. Item-item similarity is generated by calculating the co-prefered frequency of two items.
Metadata Request Recommendation
The unknown torrent (i.e. the client only has the infohash of the torrent) with high item-peer similarity has priority in the metadata_request_queue. In other words, the client always tries to get the torrent files that are most interesting to the peer.
Peer Recommendation
Peer recommendation has three utilities in Tribler:
- Recommend top-20 taste bodies in the client UI.
- Select the most similar peer as the target of a Buddycast message. After every 15sec, the client sends a BuddyCast message to either the most similar connectable peer or a random one according to a ratio (the ratio is set to 1 in Buddycast3). Addtionally, each peer will not be sent more than one Buddycast message in a time window (the time window is set to 4 hours in Buddycast3).
- Put the top-10 taste bodies in the Buddycast message.
Similarity Updating
- When the client starts, the similarity of each peer-peer pair is updated.
- When a user changes his/her preference list (i.e. downloading a new item or deleting an item), the similarity of each peer-peer pair and each item-peer pair will be updated.
- When a peer receives a new Buddycast message, the peer-peer similarity to the sender is updated by formula (1). For the tast buddies and random peers in the message, the peer picks out the ones of which it has no preference list and then updates the indirect peer-peer similarity with them by formula (3). Note that the indirect peer-peer similarity is stored as negtive value in order to be distinct with the direct one.
- Periodically, the similarity of each item-peer pair is updated when the amount of new items received from Buddycast messages exceeds a threshold.
Peer Database and Cache Updating
Considering the scale problem, each peer only store a number (in Buddycast3, 2000 in default,100 at least) of other peers in its database and cache. The cache is periodically (15 sec.) examined and updated. The database is also examined and updated when the client starts. The algorithm of updating the cache and database update is shown in formaula (4). Only the peers with top-N comp scores are retained. nfref(u') denotes the length of peference list of peer u'. lastseen(u') denotes the last time peer u' is seen by peer u, which is recorded as the seconds since 00:00:00 UTC on January 1, 1970. time_sim_weight is set to 4*60*60 in Buddycast3,that is, four hours are equal to 1 similairty score.
Research
Research question: Find a similarity function from which an efficient P2P topology for keyword search emerges.
Literature Research
Literature Research: Optimizing Peer-to-Peer Keyword Search using Collaborative Filtering
A document created in preparation for the Master Thesis Research consisting of 40-50 Pages and 5-7 Chapters.
Research includes
- Current state of P2P search strategies.
- Current state of Similarity functions, both memory and model-based.
- Methods used to evaluate performance of Similarity functions.
- Problem applying existing Similarity functions to Distributed Environment.
In total 31 references used:
| Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions | IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 17, NO. 6, JUNE 2005 | Adomavicius et al. |
| Incorporating Contextual Information in Recommender Systems Using a Multidimensional Approach | Department of Information & Decision Sciences Carlson School of Management University of Minnesota | Adomavicius et al. |
| Content-Based, Collaborative Recommondation | COMMUNICATIONS OF THE ACM March 1997/Vol. 40, No. 3 | Balabanovic et al. |
| Empirical Analysis of Predictive Algorithms for Collaborative Filtering | Microsoft Research | Breese et al. |
| Eigentaste: A Constant Time Collaborative Filtering Algorithm | IEOR and EECS Departments University of California, Berkeley | Goldberg et al. |
| Combining Collaborative Filtering with Personal Agents for Better Recommendations | GroupLens? Research Group | Good et al. |
| An Empirical Analysis of Design Choices in Neighborhood-Based Collaborative Filtering Algorithms | Department of Computer Science, Oregon State University | Herlocker et al. |
| The EigenTrust Algorithm for Reputation Management in P2P Networks | CIKM '01: Proceedings of the tenth international conference on Information and knowledge management | Kamvar et al. |
| Evaluation of Item-Based Top-N Recommendation Algorithms | University of Minnesota | Karypis et al. |
| Pollution in p2p file sharing systems | IEEE INFOCOM | Liang et al. |
| Search Performance Analysis in Peer-to-Peer Networks | Dept. of Electrical Eng. National Taiwan University, Taipei, Taiwan | Lin et al. |
| Amazon.com Recommendations | Amazon.com | Linden et al. |
| Content-Boosted Collaborative Filtering for Improved Recommendations | Proceedings of the Eighteenth National Conference on Artificial Intelligence(AAAI-2002) | Melville et al. |
| PocketLens: Toward a Personal Recommender System | University of Minnesota | Miller et al. |
| Passive Profiling from Server Logs in an Online Recruitement Enviroment | Proceedings of IJCAI Workshop on Intelligent Techniques for Web Personalization (ITWP2001) | Rafter et al. |
| Peer-to-Peer Architecture Case Study: Gnutella Network | Department of Computer Science The University of Chicago | Ripeanu et al. |
| Survey or research toward robust p2p networks search methods | Comput. Netw. 50(17) | Risson et al. |
| Percolation search in power law networks: Making unstructured peer-to-peer networks scalable | Proceedings of IEEE P2P04 | Sarshar et al. |
| Item-Based Collaborative Filtering Recommendation Algorithms | GroupLens? Research Group | Sarwar et al. |
| E-commerce recommendation applications | Data Min. Knowl. Discov. | Schafer et al. |
| Internet study 2007 | http://www.ipoque.com | Schulze et al. |
| Personalized Recommendation in Social Tagging Systems using Hierarchical Clustering | RecSys? '08: Proceedings of the 2008 ACM conference on Recommender Systems | Shepitsen et al. |
| Load reducing in the kad p2p system | Fifth International Workshop on Databases, Information Systems and Peer-to-Peer Computing (DBISP2P, 2007) | Steiner et al. |
| Improving lookup performance over a widely-deployed dht | IEEE International Conference on Computer Communications INFOCOM 2006 | Stutzbach et al. |
| Adaptive Web Search Based on User Profile Constructed without Any Effort from Users | Proceedings of the 13th international conference on World Wide Web | Sugiyama et al. |
| Clustering methods for collaborative filtering | AAAI Press, 1998 | Ungar et al. |
| Unifying User-based and Item-based Collaborative Filtering Approaches by Similarity Fusion | Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology | Wang et al. |
| A User-Item Relevance Model for Log-Based Collaborative Filtering | Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology | Wang et al. |
Research
First idea of an approach:
- Write several implementations in Python of a SimilarityFunction which addresses the sparseness issues, is scalable and utilizes more sophisticated CF techniques.
Interesting approaches:
- Implement Cosine function()
- Pseudo-user vector as discussed in: Content-Boosted Collaborative Filtering for Improved Recommendations
- Incremental top-k item-item similarity
Needed for evaluation:
- Dataset tribler 50000 profiles
- New method for evaluating performance due to matching users using similar items instead of overlapping items (pseudo user vector).
- K-Top TorrentSimilarity? table in MegaCache
To create a Pseudo-user vector item-item similarity needs to be calculated using the swarm-name, filenames, filesizes etc. Need to look at IR-domain for existing techniques.
Applying the item-item similarity function search collapse could be implemented. Resulting in a tree-like search results.
- Tv-series
- ...
- series 1
- series 2
- episode 1
- ...
Dataset
First started by parsing superpeer logs in order to get insight into availible data (See SuperpeerLogs).
Then a subset was created using the top-50000 users with the most downloaded files. This subset has 252.469 items and 50.000 users. Using tf/idf 31.906 items were assigned to a category. This helps evaluating the performance of more complex similarity functions and was done manually. First using the tf/idf the more frequent terms were discovered. Which were used to create a list of categories. All items matching all or a combination of the terms of a category where written to a file. These files were then checked manually and incorrect items were removed/disabled.
Day-to-Day worklog (in Dutch) http://www.evernote.com/pub/nees/AfstuderenTribler/
Attachments
- indirectsimfunction.PNG (13.9 KB) - added by ustcchensu 13 months ago.
- simfunction.PNG (11.2 KB) - added by ustcchensu 13 months ago.
- item2peersimfunction.PNG (4.7 KB) - added by ustcchensu 13 months ago.
- compfunction.PNG (5.5 KB) - added by ustcchensu 13 months ago.
- Combining Collaborative Filtering with Personal Agents for Better Recommendations.pdf (89.8 KB) - added by niels@… 10 months ago.
- Content-Boosted Collaborative Filtering for Improved Recommendations.pdf (97.5 KB) - added by niels@… 10 months ago.
- gnutella-rc.pdf (166.9 KB) - added by niels@… 10 months ago.
- Search Performance Analysis in Peer-to-Peer Networks.pdf (70.3 KB) - added by niels@… 10 months ago.
- Karypis2001.pdf (169.9 KB) - added by niels@… 10 months ago.
- sarwar_01_itembased.pdf (250.6 KB) - added by niels@… 10 months ago.
- breese_98_empirical.pdf (242.2 KB) - added by niels@… 10 months ago.
- herlocker_02_empirical.pdf (135.1 KB) - added by niels@… 10 months ago.
- clements_08_optimizing.pdf (310.8 KB) - added by niels@… 10 months ago.
- fab-content-based-filtering.pdf (470.5 KB) - added by niels@… 10 months ago.
- Incorporating Contextual Information in Recommender Systems.PDF (449.5 KB) - added by niels@… 10 months ago.
- Toward a Personal Recommender System.pdf (1.0 MB) - added by niels@… 10 months ago.
- Amazon-Recommendations.pdf (353.8 KB) - added by niels@… 10 months ago.
- Wang2006.pdf (397.2 KB) - added by niels@… 10 months ago.
- Eigentaste A Constant Time Collaborative Filtering Algorithm.pdf (0.8 MB) - added by niels@… 10 months ago.
- wang_06_distributed.pdf (0.5 MB) - added by niels@… 10 months ago.
- mirza_03_studying.pdf (0.6 MB) - added by niels@… 10 months ago.
- Toward the Next Generation of Recommender Systems A Survey of the State-of-the-Art and Possible Extensions.pdf (0.7 MB) - added by niels@… 10 months ago.
- han_04_scalable.pdf (483.3 KB) - added by niels@… 10 months ago.
-
ResearchAssignment.pdf
(1.2 MB) - added by niels@…
7 months ago.
Concept 1.4 (pre-final)
- eigentrust.pdf (198.1 KB) - added by niels@… 6 months ago.
- pollution in p2p file sharing system.pdf (2.5 MB) - added by niels@… 6 months ago.
- Passive Profiling from Server Logs in an Online Recruitement Enviroment.pdf (103.8 KB) - added by niels@… 6 months ago.
- survey or research toward robust p2p networks search methods.pdf (1.4 MB) - added by niels@… 6 months ago.
- percolation-based search.pdf (321.6 KB) - added by niels@… 6 months ago.
- e-commerce.pdf (0.6 MB) - added by niels@… 6 months ago.
- ipoque internet survey.pdf (4.0 MB) - added by niels@… 6 months ago.
- Personalized Recommendation in Social Tagging Systems.pdf (0.5 MB) - added by niels@… 6 months ago.
- Load reducing in the kad p2p system.pdf (170.6 KB) - added by niels@… 6 months ago.
- improving lookup performance over a widely-deployed dht.pdf (197.8 KB) - added by niels@… 6 months ago.
- Adaptive Web Search Based on User Profile Constructed without Any Effort from Users.pdf (304.6 KB) - added by niels@… 6 months ago.
- clustering methods for collaborative filtering.pdf (242.1 KB) - added by niels@… 6 months ago.
- wang_06_unifying.pdf (390.2 KB) - added by niels@… 6 months ago.
- Meeting 17 September 2009.pdf (222.9 KB) - added by niels@… 7 weeks ago.
- Cosine, Anderberg voor Arno Boudewijn.pdf (316.2 KB) - added by niels@… 5 weeks ago.
- CurrentProgress.pdf (0.8 MB) - added by niels@… 3 weeks ago.
![(please configure the [header_logo] section in trac.ini)](/images/TriblerLogo.png)