2 Aug status

12 July status

  • FlashLight semi-work (no mouse/keyboard)
  • NoVNC works (performance low, occasional screen pixel mess)
  • QEmu/KVM as backend (works)
  • Virtualbox, vmware, xen (ToDo)

Crowdsourcing for GUI testing

Research angle:

Facilitating weekly automated human software testing for $40 per week


  • Automated
  • Robustness
  • Cheap

Human side:

  • Can software testing exploit crowdsourcing?
  • 10 workers test software for 30minutes (0.5 x $4/hour x 20people = $40)
  • worker job completion
  • Worker attention span
  • do workers read screen instructions?
  • Returning weekly mturkers?
  • How much to pay them
  • Reputation of weekly returning MTurkers (Martha)


  • MTurkers use any browser
  • MTurkers do not have HTML5 probably
  • Mouse lag (always latency)
  • VNC, phpvirtualbox, Flash, JavaScript
  • Fraud detection
    • check if task was completed (downloaded file, etc)
    • multiple workers, consistency
  • crash capture and logging
  • Multiplatform GUI testing (lot of engineering)


  • Try to get a first MTurk test by 15 June
  • Test out various browser solutions

GOAL: Webpage with results of 5-10 different tests which are automagically MTurked weekly. Each test is green when all testers reported success, yellow if one MTurker encoutered a problem and red in other cases. Every test is clickable and shows for all MTurkers the complete log files of their work. Plus complete screen capture video of their screen/application activity during test. Tests:

  • click on network buzz keyword and start download
  • keyword search without suggest and start download
  • keyword search suggest after typing "2011 " and start download
  • Pauze and resume download
  • subscribe channels
  • Conduct tests with both empty megacache and 50k items megacache

Additional tasks:

  • ToDo: family filter disabled, prevent users conduct both A and B test by making it a single HIT on MTurk
  • Find success rate for various formulations
    • "try to find out how to add something to your channel"
    • "Locate the channel button and add something to Your Channel"
    • "3rd formulation"
    • A) try to understand the channel concept in Tribler B) discover where "your channel" is located C) add something to your channel
  • Find success rate for various formulations
    • " try to download a single file from a swarm
    • "search for "blue suitcase", goto files tab, select the file "vodo.nfo", click the "download selected only button".
  • A/B testing. Create two variants of Tribler and test success rate/task completion time.
    • Search with and without the bundeling feature
    • A: bundeling turned off/disabled
    • B: bundeling turned on
    • Training search queries: "blue suitcase" (simple:one result), "TED Bill Gates", "big buck bunny"
    • A/B search tasks: "Ubuntu 11.04", "Pioneer One" episode 2, "Sintel", "the yes men fix the world",
    • Measure task completion time+evolution over queries (from init, till start time of download!), variance within test population, 95percent significance?
    • Conclude: inconclusive if this feature is good or not, but we've demonstrated that MTurk can be used for this sort of tasks
    • NULL hypothesis: reject that it does not work. Benchmarking against classical method.

GUI usability testing

HYPOTHESIS: Both experienced and novice users of P2P technology don't read anything in the GUI

Tools: task completion time, replay the capture of user mouse-clicks + moves + GUI.

Danger1: task completion time noise: they are doing other background tasks; cancel measurements with non-moving mouse.

Test0: do they understand the search results page
Test1: do they click/understand the frontpage tags
Test2: do they spot the second+third column for bundeling results
Test3: Do they notice with bundeling that the first hit represents a sample? (they don't read the more)