{{{ #!forumlinks }}} = Channels = The concept of Channels will add rich metadata to Bittorrent and enable RSS-like subscriptions in P2P. Channels give every user the ability to clean up metadata and extend Bittorrent swarms with additional info. With a simple keyword search, you can locate a channel. Every user can publish content in his Channel and the ChannelCast protocol is used to distribute this list. No server is needed for creating, spreading, and searching contents of the channel. We believe our initiative is the first attempt at '''Serverless user-generated metadata''' == Status and planning == After the summer of 2009 we aim to launch a Beta version of this feature. Currently one part of Channels called [wiki:Moderation ModerationCast] is implemented in [source:abc/branches/vincent/d07-09-18-modcast-from-mainbranch-r5355 Python], there is technical [http://www.tribler.org/attachment/wiki/Moderation/ModerationCASTDesign.pdf?format=raw documentation], and a [source:abc/branches/vincent/moderationcastsim simulator]. More [wiki:Collections info]. == Roadmap (Johan) == * Step 1: we get basic playlists working plus voting on them. Result is a list of popular files without spam and lot of duplicate removal. 10-25 constant active moderators. * Step 2a: rich metadata to playlist with subtitles, thumbnail, tag added by playlist owner * Step 2b: every user can add tags to files in popular playlists. nicolas technology used for spamming prevention. We then build great a dataset for offline tuning. * Step 3: we implement tag-cloud for browsing. This enables remote control TV-like operation of GUI. {{{ #!protected #:[[Include(wiki:ProtectedSectionMessage)]] = Channels = ---- (Still in progress.. Date: 17-07-09) [[BR]] == Basic Definitions == * '''Channel''' is a feature of Tribler peer who publishes the content using an RSS feed, also enabling the user to provide additional rich metadata. * '''Subscriber''' is a peer that subscribes to a Channel, and gets periodic updates of the content of the subscribed channel (also on-demand). == Content Ingestion and Dissemination == * '''Ingestion''': An RSS feed of torrents is used by Channel peer to inject the content and corresponding metadata into Tribler network. Metadata (incl. content digest) of each torrent is signed by the peer and hence, making it more robust. * '''Advertising''': Whenever any peer (taste buddy or random one) comes into contact with the Channel peer, along with BuddyCast, it sends ChannelCast message containing metadata and signature of the 20 recently published torrents and 5 previously published torrents by the peer. The former peer can verify the signature and if it did not have the torrent file, it would request and download the file from the latter (channel) peer. * '''Forwarding''': A subscriber forwards the metadata and content of the Channels he/she is subscribed to. This way, popular content spreads more (just like viral marketing). Another advantage is that the Channel need not be online all the time for his content to spread and will help immensely in content dissemination (and discovery as well), considering a very high churn factor in P2P networks. [[Image(Channel.PNG)]] == Content Discovery == * '''Remote Search''': Any peer can perform a remote search for channels he is interested. The result is a list of channels matching the keyword, along with the torrents published by them. The querying peer can then subscribe to the relevant channel. * '''Update''': Whenever the subscriber bootstraps or requests for update of the subscribed channel ('Refresh'), a remote search for this channel (based on its permid) is performed to identify any new content and if found, is updated. == Technical Details == === ChannelCast === * ChannelCast message: [(mod_id, mod_name, infohash, torrenthash, torrent_name, time_stamp, signature)] * ChannelCast table: (mod_id, mod_name, infohash, torrenthash, torrent_name, time_stamp, signature) === ChannelQueryMsgHandler === == Implementation Details == * The architecture follows (1 Channel + Multiple Subscribers) scenario * The ModerationCast implementation is slightly modified to suit ChannelCast. Along with 'infohash', 'torrenthash' is stored corresponding to a moderation. SHA1 hash of the bdecoded torrent's entire content gives the 'signature', rather than torrent's 'info' field. = Search = == Relevance Ranking == * Votes * Subscriptions * Duplicates * Remote Search * Given a query, rather than mere "swarm name" matching (which is done currently), searching within the swarm's files(audio/video/etc) can also help. == Screenshots == [[Image(Playlist.jpg)]] ---- = Future (to be updated when the design is final) = Tribler's Playlists are modelled on many of the entertainment websites' playlists. Each Tribler user should be able to maintain his/her own playlists in which he/she can insert files from various torrents along with metadata like tags, video quality, audio quality and subtitles. This playlist can be subscribed by other peers, much like RSS. It is modelled on Youtube's Playlist, where a user can create a collection of videos uploaded by many others, all of which are of same topic. For instance, various videos of Britney Spears are uploaded by many users, but there might be no single user who has all of Britney Spears videos. A fan of Britney Spears might make a collection/playlist of all uploaded videos of Britney Spears, which will be easy for various other fans who are interested in her videos. == Technical Details == * Every peer can create multiple playlists. In each playlist, he can add individual files from various torrents, along with metadata like tags, video quality (0-5), audio quality (0-5). For instance, for creating a playlist named 'Slumdog Millionaire', the user can add a 700 MB avi file from aXXo's torrent into the playlist, and then add English subtitle from another torrent, followed by addition of French subtitle file from a different torrent. * The architecture follows (1 Manager + Multiple Playlists; 1 Publisher + Multiple Subscribers) scenario * The messages exchanged among peers is similar to that of ModerationCast's messages (Have, Request and Reply) * Playlist_Have Message fields: * playlistid: * timestamp: represents the last time this playlist was updated on the publisher side * num_files: number of files the playlist has * Playlist_Request Message fields: * playlistid: * timestamp: represents the last time the playlist from the subscriber's side; if timestamp=0, then it was never subscribed * Playlist_Reply Message fields: * playlistid: * playlistname: * filename: * path: the original path of the desired file in the torrent * torrenthash: hash of the torrent * timestamp: the last time this moderation was updated * tags: tags related to this file * quality: Poor/OK/Good/Pretty Cool/Awesome; Webcam/Standard/DVD/HDTV/Blueray/NA * When a peer P1 connects to another peer P2, P1 sends a Playlist_Have message - a list of (playlistid, timestamp, num_files) records to P2. When P2 receives this list, P2 checks in its database for each playlist record, whether it has that playlist from P1 and if so, when it was last updated. P2, then, creates Playlist_Request message - a list of (playlistid, timestamp) records where each record denotes those playlists that P2 either does not have or is not up-to-date with the version P1 has. Obviously, if playlist is up-to-date, P2 would not request that playlist. If P2 does not have the playlist, timestamp would be declared 0; this would force P1 to send all (or at most 20 recent) files in that playlist. However, if P2 does have the playlist but is not up-to-date, it would send the last time this playlist was updated so that it receives only the newer files. Once P1 receives the request list from P2, it then sends the entire (or recent files in) playlist or only the difference of files to P2. * Database Tables * Playlists (playlistid, playlistname, filename, path, torrenthash, timestamp, tags, quality) * torrent's details are stored in existing Torrent's table; this must also include the individual files within the torrent == Typical Example of a Playlist == The idea of a playlist is to have a lot of related content at one place with proper moderation, which helps users of the system to download what they like. A moderator, to create a playlist for a movie 'Slumdog Millionaire', would add the fundamental video content (Slumdog.Millionaire.2008.avi file of 700MB) from a popular/large swarm-sized torrent. He can then add English subtitle file (SM.en.srt of 60 KB) from another torrent; this need not include all the files in that torrent. Similarly, he could add other language subtitles (fr/nl/es/it) from various torrents. Most movies also contain various musical scores. The moderator can add audio files of Slumdog Millionaire from an audio-based torrent. If lyrics are there, he can add them too. This way, this playlist has a rich collection of all files related to Slumdog Millionaire. Similarly, a moderator can create a playlist for popular TV series, which has many episodes and seasons; this list can get updated every week by the moderator as soon as new episode is broadcast. == Playlist Format == Playlist's format is designed on the lines of torrent. Playlist is a dictionary of following fields: * '''name''': Name of the Playlist * '''tags''': all the tags for the playlist * '''creation-date''': Creation Date/time of this playlist * '''files''': a list of dictionaries for each file. Each dictionary in this list contains the following keys: * '''filename''': renamed version of file denoted by 'path' in a torrent * '''path''': Original Path for this file in the torrent * '''torrenthash''': Hash of the torrent * '''added-on''': Denotes the date/time it was added into the playlist * '''torrents''': a list of dictionaries for each torrent * '''torrenthash''': Hash of the torrent * '''info''': torrent's info field * '''tracker''': Torrent's Tracker Address * '''seeders''': # seeders * '''leechers''': # leechers Channels contain: * Name of Live Playlist * Thumbnail * Description text * Tags * Quality of video * Spoken language * Subtitles in various languages == Possible BuddyCast Improvement == As mentioned in Fake Tracker Attack section, BuddyCast is used to spread downloaded torrents and also the collected torrents. The origin of these torrents cannot be identified in the current system, nor can it be prevented. One way to fight spam is to attack the root problem. I propose that every torrent that is inserted into the Tribler network needs to be corresponded with the peer that inserts. That peer should be responsible for the torrent and its metadata. This way, moderations need not be sent separately and they are very much integrated into the system, rather than an additional feature. Thus, ModerationCast feature would no longer be necessary. All of this can be embedded in the next version of BuddyCast. ---- = Current Drawbacks = == Fake Tracker Attack == In the current Tribler system, BuddyCast protocol is used to propagate the collected/downloaded torrents regularly to various taste buddies and random peers. A torrent is collected and propagated without checking the authenticity of the torrent or the peer that introduced this torrent in the network. Taking advantage of this, a malicious peer can download a popular torrent and change its 'announce' (and/or 'announce-list') field(s) to a fake tracker's address; and then propagate this torrent using BuddyCast protocol. Such an attack, with lots of modified torrents propagated across the network, will lead to spamming of the entire network. The current Tribler system is very vulnerable to such an attack. == ModerationCast Shortcomings == In the current implementation, 'infohash' forms the core part of all ModerationCast's messages (Have, Request and Reply) that are exchanged among peers. Although a moderator approves a torrent in its entirety and adds metadata, only the 'infohash' part of the torrent is stored with respect to this moderation, rather than all of the torrent. A malicious moderator can use the Fake Tracker Attack strategy. To further understand the consequences with the current implementation, lets consider the following scenario. [[BR]] [[BR]] Let T1 be a torrent with infohash I and a real tracker R injected by authentic moderator M1 at time t1. Now, a malicious moderator M2 downloads the torrent T1 and changes only its 'announce' field to a fake tracker F; and thereby creating torrent T2 at time t2(> t1). It is important to note that although the tracker has changed, T2's infohash will remain the same, as that of T1. When a peer U connects to M1, he gets authentic moderation (M1,t1,I) from M1 and inserts into ModerationCast table in the database. A bit later, peer U connects to M2 and gets its moderation (M2, t2, I). Since t2>t1, the new moderation (M2, t2, I) overwrites the old one (M1,t1,I). Now when the peer U searches for a query related to this torrent, the results are found in both torrents T1 and T2 because both have same infohash I. In the search results in GUI, both torrents T1 and T2 are displayed. It is important to note that each result's infohash is used to query the ModerationCast table to get the name of the moderator. In this case, since both the torrents T1 and T2 have the same infohash I, moderator M2's name will be displayed for these 2 results. Only T2's record should have had moderator M2 and T1's record should have had no moderator. If the peer U downloads T1 (which is the authentic torrent), he would give a positive vote for moderator M2. Consequently, to make things worse, this vote along with the moderation gets propagated with subsequent BuddyCast messages. All of this will lead to a significant failure of the system. == Cannot be found through remote search == Currently, search in Tribler is limited to searching local megacache and 10 taste buddies' megacaches. Although searching in 10 other megacaches might help increasing the number of hits for the query, it might not necessarily find the result even if it is present on the network. There is no "genuine effort to SEARCH". Current search follows gossiping + flooding architecture. Torrents are spread using the BuddyCast protocol (Gossiping). While querying, the query terms are looked in the local database as well as flood to 10 taste buddies (1-hop). [1,2,3] clearly state that, although gossip+flood mechanism gives good results for popular items, it falls way short for rare items. For popular items, the search latency is quite less; however, search latency for rare items is very high. == References == 1. Gossip-based Search Selection in Hybrid Peer-to-Peer Networks (M Zaharia, S Keshav - Concurrency and Computation: Practice and Experience, 2008) 1. Survey of research towards robust peer-to-peer networks search methods (J Risson, T Moors - Computer Networks, 2006 - Elsevier) 1. The Case for a Hybrid P2P Search Infrastructure (BT Loo, R Huebsch, I Stoica, JM Hellerstein - Proceedings of the 3rd IPTPS, 2004 - Springer) }}}