• Home
  • What's Tribler
  • Download
  • FAQ
  • Forum
Tribler

Wiki Navigation


Tribler Database Migration

Reason to Migrate

Tribler uses an embedded database to record the peers, files and user profiles discovered in the P2P file sharing network.

Until Version 4.1.2, we use BerkeleyDB(BSDDB) as the database engine. "Berkeley DB is a family of open source, embeddable databases that allows developers to incorporate within their applications a fast, scalable, transactional database engine with industrial grade reliability and availability." [1]

BSDDB has been proven to be fast and easy for coding in Tribler development, but it also has some vital flaws:

  • BSDDB is not a relational database

As more and more features and sophisticated data structures were injected into Tribler, it requires a relational database to alleviate the difficulty of development. Especially when Tribler supports search, a database with reverse index is a need. However, BSDDB's basic data structure is hash-table like, which is easy to program for simple applications, but much difficult for complex data structure.

  • BSDDB is not robust enough

Although BSDDB alleges it supports transaction and is thread-safe, we found it is difficult to ensure its robustness, even we put lots of effort to write codes as robust as possible. The broken database files were reported by developers and users again and again. When the database file is broken somehow, end users cannot fix it but have to delete it. The similar problem was reported by Subversion, a source code control system, when it also used BSDDB. Therefore we can conclude BSDDB is not robust enough for Tribler.

These flaws bring lots of problems and difficulties for the development. So it is highly required to migrate the database to a relational and robust embedded database soon.

Database Alternatives

There are lots of alternatives in the market. [4] gives a wide comprision among a number of databases. Considering the performance, robustness, functionality, technique support and reputation, we only choose SQLite as the candidate for the migration. "SQLite is a small C library that implements a self-contained, embeddable, zero-configuration SQL database engine."[2] It has been widely used by PHP community, and also been accepted by Python2.5 as a standard package. According to the users' remark, SQLite is efficient, easy to use and configure, robust, and rich in functionalities. Although it is not as powerful as commercial database like Oracle, it can fulfill the requirement of Tribler.

There are some remarks for the comparison of SQLite versus BSDDB[5]:

"Berkeley DB (BDB) is just the data storage layer - it does not support SQL or schemas. In spite of this, BDB is twice the size of SQLite. A comparison between BDB and SQLite is similar to a comparison between assembly language and a dynamic language like Python or Tcl. BDB is probably much faster if you code it carefully. But it is much more difficult to use and considerably less flexible.

On the other hand BDB has very fine grained locking (although it's not very well documented), while SQLite currently has only database-level locking. -- fine grain locking is important for enterprise database engines, but much less so for embedded databases. In SQLite, a writer gets a lock, does an update, and releases the lock all in a few milliseconds. Other readers have to wait a few milliseconds to access the database, but is that really ever a serious problem?"

Performance SQLite vs. BerkeleyDB

Performance is also an important issue for us when selecting a database engine. There are some independent performance tests on both databases, but we didn't find any comparison between them in literature. Therefore we did some experiments to compared their performance under the scenario of adopting them in Tribler usage.

We use a log file of superpeer1 on the date of Sep. 1, 2007, and load all the data contained in the file to the databases. Each line in the file indicates an action for Tribler, which includes sending or receiving a BuddyCast message, and adding or deleting a overlay connection. For each action, Tribler needs to insert or update some data in the database. To test the performance of reading, we first load the all the peers, and then randomly select 10% of the peers to look up their preferences.

Note we do not test the speed of writing or reading simple and similar data items in the database a number of times like most testers did, but consider the real database usage in Tribler, which may cause many sub-actions of inserting, updating, looking-up and reading in database for one action in network. By this means we can also assess the cost of operating database for the normal use. The performance of deleting is not tested yet for simplicity. The test results are following:

Performance Comparision

Action statistics

#actions #CONN_ADD #CONN_DEL #RECV_MSG #SEND_MSG #Peers #Torrents #Peers with profile #Pairs (peer,torrent)
1000 272 281 209 209 1267 3967 465 5164
5000 1355 1377 1000 1000 3691 14578 1803 23921
10000 2720 2791 2002 2002 5493 23511 2938 47697
39364 9807 10036 7182 7182 10928 41279 5571 147944

Database File Size (evaluated under Windows XP, NTFS file system and SATA hard disk):

#actions BSDDB SQLite
1000 4.8MB 1.1MB
5000 16.9MB 3.7MB
10000 26.9MB 5.9MB
39364 50.8MB 11.4MB

* Test platform: Windows XP, CPU: P4 3.6G dual core, Memory: 1GB, Date: 13 Sep, 2007.

Elapsed Time (seconds) of database operation for certain number of actions

Commit after all actions were processed

#actions BSDDB SQLite
1000 3.9 1.1
5000 17.5 5.7
10000 34.3 11.5
39364 157.4 43.2

Commit after each action which needs to write the database was processed

#actions BSDDB SQLite
1000 88.4 81.6
5000 443.5 390.9
10000 846.5 755.0
39364 2993.3 2525.5

* Test platform: Debian Linux, Server Superpeer9, Date: 14 Sep, 2007.

Elapsed Time (seconds) of database operation for certain number of actions

Commit after all actions were processed

#actions BSDDB SQLite
1000 2.8 0.8
5000 18.7 4.1
10000 41.9 10.7
39364 125.8 36.5

Commit after each action which needs to write the database was processed

#actions BSDDB SQLite
1000 12.7 4.0
5000 107.6 49.5
10000 229.6 108.4
39364 742.5 353.1

From the above results we can see that the performance of SQLite is even better then BSDDB, and the database file size of SQLite is about 25% of BSDDB. It is controversial since it is believed BSDDB's most important feature is extremely fast due to its simplicity. However, the simplicity and flexibility of BSDDB causes lots overload and also efforts for developers in maintaining the data structure and integrity. SQLite is also designed to be efficient, and it is also simple compared to many DBMS products. So we think the results are fair in the real usage of Tribler.

Robustness of SQLite

BSDDB is possible to be broken when is closed forcedly. To test the robustness of a database, we should repeat the same scenario. The test plan is to fork a child process to execute the database operations, and then it is killed by the parent process after a while. We can check if the database is broken after repeat the above actions over and over again. However, according to the FAQ, SQLite may have unknown problems when running under a forked child processes under UNIX [6]. In addition, it is intricate to create a child process under Windows. Therefore, we ignore the automatically robustness test for SQLite for the time being. We did some manual robust test, and didn't find any problematic case.

From the literature we see SQLite support transaction with ACID, and is thread safe. Since is widely used in the web environment, it should work well in multi-thread applications. Additionally, we did not find some reports about "broken SQLite database" in the internet (but we found a lot for BSDDB).

After all, we can believe SQLite is robust enough with confidence, though more tests are needed to justify it.

Conclusion

SQLite outperforms BSDDB in performance, file size, data structure, functionality and robustness. Thus it is worthwhile to replace Tribler's database engine by SQLite.

It is estimated that we need one human-month work to finish the database migration assignment.

Reference

[1]. Oracle Berkeley DB

[2]. SQLite

[3]. PySqlite The SQLite wrapper to Python

[4]. Comparison of relational database management systems

[5]. Sqlite Competitors

[6]. SQLite FAQ

Appendix

Source Codes

SVN of the source codes

The superpeer log file

SQLite Database Schema (given by SQL)

CREATE TABLE PeerDB (peer_id INTEGER PRIMARY KEY ASC, buddycast_times NUMERIC, connected_times NUMERIC, friend NUMERIC, ip TEXT, last_buddycast_time NUMERIC, last_connected NUMERIC, last_seen NUMERIC, name TEXT, npeers NUMERIC, nprefs NUMERIC, ntorrents NUMERIC, oversion NUMERIC, permid TEXT, port NUMERIC, similarity REAL, superpeer NUMERIC);

CREATE TABLE PreferenceDB (peer_id NUMERIC, torrent_id NUMERIC);

CREATE TABLE TorrentDB (torrent_id INTEGER PRIMARY KEY ASC, infohash TEXT, destdir NUMERIC, progress NUMERIC, inserttime NUMERIC, source TEXT, status TEXT, retry_number NUMERIC, last_check_time NUMERIC, ignore_number NUMERIC, category NUMERIC, seeder NUMERIC, leecher NUMERIC, info TEXT, torrent_dir NUMERIC, torrent_name TEXT, relevance REAL);

CREATE UNIQUE INDEX peer_id_idx ON PeerDB(peer_id ASC);

CREATE UNIQUE INDEX permid_idx ON PeerDB(permid);

CREATE UNIQUE INDEX torrent_id_idx ON TorrentDB(torrent_id ASC);

CREATE UNIQUE INDEX infohash_idx ON TorrentDB(infohash);

CREATE UNIQUE INDEX pair_idx ON PreferenceDB(peer_id, torrent_id);

CREATE INDEX peer_ip_idx ON PeerDB(ip);

CREATE INDEX peer_pref_idx ON PreferenceDB(peer_id);

CREATE INDEX torrent_pref_idx ON PreferenceDB(torrent_id);

The schema of BSDDB can be found in the source codes of Tribler.

Latest news

28 October 2008

The long-awaited Tribler 4.5 is here''

29 April 2008

Tribler Team organized a successful course on Advanced P2P Technology
Read More

17 February 2008

19 Million Euro for P2P research
Read More

Please contact us if you have found a news item that's not listed here.


  • News
  • Developers
  • Business
  • Research
  • Jobs
  • About Us
  • Press
  • Contact
Login