Moderation and rich metadata

Status: operational code, waiting for deployment and improvements

Currently every peer can publish .torrent files using Buddycast3. No moderation mechanism exists yet. An unsolved problem is how to create a metadata infrastructure for Video on Demand that facilitates user-generated metadata. Research question : Design, implementation, and evaluation of an efficient epidemic protocol for distribution of rich-metadata with pollution prevention measures

We have worked out a roadmap for addressing the above research question in incremental stages until June 2009. Each stage represents an improvement in both moderation sophistication and growth of the moderator community. Key is that the software and community use of the new features are in balance. Measurements on related projects such as the MusicBrainz numbers or Wikipedia traces show that it takes many months or even years to grow a community of moderators. Also, the robustness to vandalism or fraud will also grow over time as we see normal usage patterns and can detect anomalies better. Furthermore, the software will give the growing moderator community increasingly more power over the metadata, until they are in full control.

5-stage roadmap

Research questions for each stage:

  • How can we enable peers to automatically approve all swarms originating from a trusted RSS feed?
    Requirement: none
  • How can we enable peers to freely approve any swarm? How can we facilitate the identification of identical swarms and present the user only with the most healthy swarm in keyword search results?
    Requirement: 20+ active moderators and 100+ active voters; all of which are online 20% of the time.
  • How can we give moderators the ability to correct core swarm metadata, such as swarm name, spoken language, and video quality?
    Requirement: a group of 50+ active moderators and 250+ active voters; all of which are online 20% of the time.
  • How can we facilitate users adding subtitles to swarms?
    Requirement: 50+ active subtitle moderators. Victor expands PEX with discovery of extra metadata.
  • How can we disable the showing of non-approved swarms in keyword searches? How can we facilitate giving every user the ability to add tags & a rating to every swarm?
    Requirement: Some sort of turbo swarm discovery mechanism for moderators which allows them to see and give approval to new swarms in minutes. A community of moderators from which at any given point in time there are at least 10+ of them searching & approving swarms. A group of 250+ active tagging&rating users; all of which are online 20% of the time.

General research challenges: scalability, robustness to fraud, acceptable propagation speed, low bandwidth usage on moderator. Our proof-of-principle Python code. Related Gossip work

Draft outline of architecture

We have chosen to design and implement the simple moderation protocol using gossiping (based on BuddyCast). Peers can create and receive moderations, which contain extra data for a given torrent:

  • spoken language
  • subtitles
  • description
  • thumbnail
  • tags

We have chosen not to enable peers to change the title of a torrent. This would make finding a torrent with a badly moderated title very difficult. Furthermore we have decided not to include majority voting as a tie-breaker between moderations, but to use the last moderation. This is far more scalable when there is no trust mechanism. (Determining the majority in a non-secure environment requires either trust or every peer gathering all the moderations itself.) As a pollution prevention measure we use blacklisting. Users can block moderators that send bad moderations. This is done for the PermID and also for the IP-address of the peer. To further prevent the propagation of bad moderations we do not automatically forward moderations. Peers have to indicate that they are willing to forward moderations for certain moderators. Moderations are signed using the Elliptic Curve Digital Signature Algorithm which is also used to verify the PermIDs. This enables the peers to verify the authenticity of the message even if it is forwarded by a third party. The protocol allows for rate-control to minimize bandwidth consumption.

The above design is implemented as a proof-of-principle. Several simulations have been conducted to determine scalability and robustness to fraud. Please read the following documents for more details: --- by Vincent, --- ModerationCAST design document of 7 pages with the message format, a nd finally the very extensive --- Msc thesis on P2P moderation]

Initial Implementation Design

(Dave and Rameez currently working on this)

For an initial implementation we decided to simplify, where possible, Vincents design and also make a few additions. Mainly, we have excluded a lot of the meta-data fields only keeping essentials. ModerationCast is a pre-requise for VoteCast, which will allow users to rate (vote for) moderators.

ModerationCast produces three kinds of message:

  • Moderation_Have messages which signal to other nodes a set of available moderators stored in the megacache that can be sent if required.
  • Moderation_Request messages which are a reply to a moderation_have message listing which moderations are required
  • Moderation_Reply messages which contain the actual metadata for the requested moderations as previously listed in the moderation_request message.

Moderation_Have message:

  • Hash (23 bytes)
  • Time_Stamp (4 bytes)

repeated up to 100 records

After BuddyCast returns a peer from the overlay (approx. every 15 seconds) a Moderation_Have message is passed to it (push) containing a list of hashes and time_stamps of moderations, stored locally. The hash represents a .torrent and the time_stamp indicates the creation time of the stored moderation. Up to 100 such Hash, Time_stamp pairs may be sent in one message. The local node selects moderations to include in the list based on a 50:50 policy. 50% of moderations are selected randomly and 50% of moderations are selected based on time_stamp recency.

Moderation_Request message:

  • Hash (23 Bytes)

repeated up to 100 entries

Any node that receives a Moderation_Have message examines it to determine if it wishes to request any of the available moderations. A node will ask for any new (previously unseen) moderation or any more up-to-date moderation (based on time_stamp). More up-to-date moderations overwrite old moderations. The node sends back a Moderation_Request message containing a list of the Hashes of the required moderations.

Moderation_Reply message:

  • Hash bitstring (23 bytes)
  • Time_Stamp bitstring (4 bytes)
  • Moderator_ID bitstring (124 bytes)
  • Moderator_name string (30 bytes) [eg: "Dave2000" Or "DemocracyNow"]
  • Signature (67 bytes)

Repeated up to 100 records

The Moderation_Reply message contains the actual moderation metadata requested by the remote peer. The local peer extracts the request moderations from its localDB and sends it.

Moderation table:

stored in local megacache for each moderator encountered is mod-perm-ID, vote (0,+ or -)