{{{ #!forumlinks }}} = Content Search = ''Status: operational and deployed for years'' The prime functionalities of Tribler are easy finding and fast download. Finding content should be ''as-easy-as-google''. This research is strongly linked with: [wiki:Buddycast3 decentralised content discovery], [wiki:DecentralizedRecommendation], and [wiki:AdversarialContentSearch] == Algorithms == Keyword search is the main improvement of V4.1. We had many complaints that the core P2P features of search & download where poor in Tribler V4.0. ContentSearch improvements : - Return many results ''Done: Arno implemented remote search'' - Works even after Tribler is just installed ''TODO'' - Not consume much memory ''TODO'' - Fast response time '' ? '' - Returns '''no''' irrelevant results ''TODO'' - Partial matches to keywords are not shown ''TODO'' - Always use the '''AND''' operator when two keywords are provided - Provides clear user feedback, like ''TODO'' * Searched for ''silver'' in 40,000 files * Matches 1 - 20 of 89 for ''silver'' * No matches for your query ''Silver'' AND ''man''. Version V4.0 of Tribler keeps all .torrent information in main memory. A Google-like keyword search window will replace the ''browse all'' feature as the default opening screen For V4.1 we no longer keep all .torrent information in main memory. Like many other parts of Tribler this info needs to be loaded on-demand. Partial word matches are no longer shown and results with 1 matching keyword of a 2 keyword search are also no longer shown (aka default ''AND''). A search of ''look 2007'' does not return ''The.Outlook.2007.HDTV''. ||TIME||ACTION|| ||startup||list of all keyword of local stored .torrent files is loaded(+popularity); the filenames in a .torrent are ignored for this version|| ||keyword search local||matching of keywords in main memory|| ||keyword search remote||send out search to peers with at least 1000 collected torrents || ||Youtube search remote||send out search to main video webservers|| ||result ordering||results are simply ordered by their popularity (number of seeds); first all matching torrent names, then the youtube/liveleak matches ordered by number of views || ||results display||matching swarms are shown directly after local keyword search is completed|| ||results highlighting||keywords matches on the torrent name are highlighted in bold, similar to Google|| ||.torrent info loading||.torrent details such as tracker URL are search in MegaCache ''after'' display of results|| ||progressive updates||incoming results from the network search (Tribler peers + video sites) are inserted in real-time into the results, respecting the result ordering|| ||limited window flickering||due to the progressive updating, the results screen is constantly re-written. After the remote keyword search is done, the first page of the results screen is sorted and no longer updated. The slower Youtube results are appended.|| [attachment:KeywordSearchDataSet.txt?format=raw Attached is a file with 44.000 .torrent names]. This serves as a dataset for keyword matching. Note that even the most basic in information retrieval algorithms are not yet included: [http://en.wikipedia.org/wiki/Tf-idf Keyword frequency], [http://www.xapian.org/docs/bm25.html BM25] in Tribler. This is work for the Tribler V4.2 release. On the V4.1 roadmap : ''Enable remote queries of your MegaCache'' * Keyword search returns numerous results directly after install * Bootstrap process does not need to be very agressive * Remote keyword searches are not the main mechnism, only to help non-bootstrapped peers * Thus restrict to 10 keyword searches per random peer over its entire lifetime * Your taste buddies+friends have no restrictions on keyword searches * return maximum 20 alive+unchecked torrent names+details, not bulky hashes * Allow download of 10 .torrents for given keyword per peer over its lifetime * Present a counter to the user of how many .torrent files he can query * Show number of discovered peers and how many .torrent files they have on average (partly?) Implemented by Arno... For Tribler verion 4.2 : ''Keyword searches are more powerfull'' * Buddycast is extended with a field with the last 10 keyword searches * Upon startup the opening screen of Tribler shows hot keywords of other peers * Clicking a hot keyword initiates a keyword search query * Combine the Youtube.com visual gridview with keyword browsing * Show for each hot keyword a few thumbnails == User Interface == - Main opening screen - Search for content on Bittorrent/Youtube/Liveleak/etc. * Video * Audio * XXX * other - Search for people on Tribler/MSN/Myspace/Facebook/etc. The Tribler ''draft'' version of this : [[Image(File:search_prompt_screen_idea.png, 400)]] == long-term Research issues == Mapping of the user desires with the available content. Craft a iterative cycle of re-search using results feedback. Feedback on keyword search convergence. Real-time search keyword suggestion. pairs of keywords & click-log. Implicit tagging. Adversarial content search. == Usage of click-log approach in P2P search with privacy preservation and security == Arjen: click-log approach with maximum privacy. Spam and adversarial behavior is the key challenge of this research. Understanding the trade-off between performance, relevance, scalability, robustness, etc. Exploit the usage of social networks and friends for security & privacy. Use of aggregation to combat privacy leakage. Johan requirements: running experimental code and zero-servers.