Collections for swarm grouping

A collection is a group of resources (or swarms). It is given a name, ID, description and possibly other relevant information. The exact information will be defined in an Ontology, further described lower in this text. Users should be able to rate collections, as it will often make more sense than to rate individual resources (say for a TV or Web series). Further down the line, collections should also be recommended based on the user's rating of other collections, or recommendations from taste buddies/close nodes. Collections will be distributed virally using buddy cast (?).

Any user can add a swarm to a collection, by simply adding the collection ID into the signed meta-data of the swarm. Note that the system must handle multiple meta-data blocks for a single swarm (or better: for each torrent - or are they the same from your point of view?). Other meta-data information would describe the swarm/resource in regards to the collection. For example, "series number, episode number". These are complimentary to the "other" meta-data, such as video quality (720p), language, user tags etc. The collection itself does thus not contain any information about the swarms it contains.

When resources are gathered in collections, the UI can group them and keep track of duplicates (Season Y, Episode Z). The "normal" social validation of resources must of course apply in order to handle malicious users. Allowing the user to show or hide "suspect" content is likely a good idea. If such content is always hidden, DOS attacks will be easier to start. This grouping would allow users to subscribe to collections, then have the software automatically filter new resources and even automatically download them.

Signatures play a vital role in the collection framework. Notably, they allow users to subscribe to collections and verify that the content is genuine. For example, a set of keys can be added to "provide resources for this collection", and only resources with a valid signature will then be seen as part of the collection. Some such keys might be from professional content providers (say the BBC, or NRK), while others might be known publishers/administrators of a collection.

In order to differentiate between good and bad resources, Moderation is required. We should investigate if negative feedback is enough ("FAKE!"). Investigate if a node that assigned a FAKE resource to a collection should be banned or flagged as suspect. Is there some kind of trust value we can decrease for each FAKE? trust function seems to focus on ratios, not on trust, and might as such not be useful. A local trust function might be sufficient, at least for now.

We might also allow to hide content that is unwanted. For example, users could add content to the "Spam" collection, and thus collaborate to remove content they do not wish to see, for example clearly infringed material or illegal pornography.

Code

I (Njål) just added some code to demonstrate the idea of a collection. The code is attached http://www.tribler.org/attachment/wiki/Collections/Collection.py and I also added some unit tests http://www.tribler.org/attachment/wiki/Collections/CollectionTest.py that tests and shows how to use the Collection class.

Also implemented is a 'smart_add' function that will try to guess as best it can which collection a resource is part of. See code for documentation.

An RSS-feed export of the collection class can be tested at http://seer2.itek.norut.no:8080/ for now. If it doesn't respond, it has been taken offline. :-)

Stuff Johan wrote

Related work

Amazon Listmania

The problem is that we need Moderation and a trust function to group content.

With the collection concept we can group content and give it a seal-of-approval.

every user can create collections those collections have 1 name and are a group of SHA1 hashes + some added name per added swarms (+ S2E9 code) new swarms can be added to an existing collectiong To prevent spam only the top10 or top25 collection creators are shown, indicated by number of votes on them

Every user can vote on a collection creator, a vote means you subscribe to them + also help spread them

By default only the "most voted collection creators" are shown, with a "mark as spam" or "incorrect" they disappear from view to the local user keywords searches rank collection names very high, also used for auto-completion + "Did you mean: "

Screenshots of possible grouping interactions


You select multiple items in the overview. When more then one item is selected you get 'cluster options' in the detail panel at the right.


When you select an available cluster you can see the details in the detail panel at the right.