This paper addresses the classical problem of keeping huge bitmaps predominantly consisting of long ranges of zeros and ones. The problem is most often encountered in filesystems (free space tracking) and network protocols (transmission progress tracking).

Three classical solutions to the problem are plain bitmaps (NTFS), extent lists (TCP SACK) and extent binary trees (XFS, Btrfs). Bitmaps are simple but have high fixed space requirements. Lists are able to aggregate solid ranges, but they don't scale well with regard to search. Extent binary trees are able of aggregation, allow scalable search, but have high overhead and extremely bad worst case behavior, potentially exploding to sizes a couple orders of magnitude higher than plain bitmaps. The latter problem is sometimes resolved by ad-hoc means, e.g. by converting parts of an extent tree to bitmaps (Btrfs). Another possible workaround is to impose a divide-and-conquer multilayered unit system (BitTorrent).

We introduce a new data structure named "binmap", a hybrid of bitmap and binary tree, which resolves the shortcomings of the extent binary tree approach. Namely (a) it has lower average-case overhead (b) it is tolerant to patchy bitmaps and (c) its worst-case behavior is dramatically better.

see the attached article