Performance of the bit package

Dr. Jens Oehlschlägel

2025-03-04


A performance example

Before we measure performance of the main functionality of the package, note that something simple as ‘(a:b)[-i]’ can and has been accelerated in this package:

a <- 1L
b <- 1e7L
i <- sample(a:b, 1e3)
x <- c(
  R = median(microbenchmark((a:b)[-i], times=times)$time),
  bit = median(microbenchmark(bit_rangediff(c(a, b), i), times=times)$time),
  merge = median(microbenchmark(merge_rangediff(c(a, b), bit_sort(i)), times=times)$time)
)
knitr::kable(as.data.frame(as.list(x / x["R"] * 100)), caption="% of time relative to R", digits=1)
% of time relative to R
R bit merge
100 26.9 30.4

The vignette is compiled with the following performance settings: 5 replications with domain size small 1000 and big 10^{6}, sample size small 1000 and big 10^{6}.

Boolean data types

“A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.” “Il semble que la perfection soit atteinte non quand il n’y a plus rien à ajouter, mais quand il n’y a plus rien à retrancher” (Antoine de St. Exupery, Terre des Hommes (Gallimard, 1939), p. 60.)

We compare memory consumption (n=1e+06) and runtime (median of 5 replications) of the different booltypes for the following filter scenarios:

knitr::kable(
  data.frame(
    coin="random 50%",
    often="random 99%",
    rare="random 1%",
    chunk="contiguous chunk of 5%"
  ),
  caption="selection characteristic"
)
selection characteristic
coin often rare chunk
random 50% random 99% random 1% contiguous chunk of 5%
# nolint end: strings_as_factors_linter.

There are substantial savings in skewed filter situations:

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘rare’ scenario
% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘rare’ scenario
% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘often’ scenario
% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘often’ scenario

Even in non-skewed situations the new booltypes are competitive:

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘coin’ scenario
% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘coin’ scenario

Detailed tables follow.

% memory consumption of filter

% bytes of logical
coin often rare chunk
logical 100.0 100.0 100.0 100.0
bit 3.2 3.2 3.2 3.2
bitwhich 50.0 1.0 1.0 5.0
which 50.0 99.0 1.0 5.0
ri NA NA NA 0.0

% time extracting

% time of logical
coin often rare chunk
logical 11.9 6.7 10.9 NA
bit 21.1 21.1 20.8 NA
bitwhich 201.8 165.5 7.7 NA
which NA NA NA NA
ri NA NA NA NA

% time assigning

% time of logical
coin often rare chunk
logical 54.0 53.8 54.6 NA
bit 79.9 20.7 14.4 NA
bitwhich 191.5 65.3 61.7 NA
which NA NA NA NA
ri NA NA NA NA

% time subscripting with ‘which’

% time of logical
coin often rare chunk
logical 24.3 44.8 1.7 NA
bit 28.8 60.6 1.1 NA
bitwhich 42.4 84.7 2.3 NA
which NA NA NA NA
ri NA NA NA NA

% time assigning with ‘which’

% time of logical
coin often rare chunk
logical 26.7 52.6 1.6 NA
bit 15.7 33.1 0.8 NA
bitwhich 106.4 27.6 3.2 NA
which NA NA NA NA
ri NA NA NA NA

% time Boolean NOT

% time for Boolean NOT
coin often rare chunk
logical 22.1 21.0 21.4 22.2
bit 1.2 1.1 1.2 1.1
bitwhich 15.4 1.1 1.1 2.9
which NA NA NA NA
ri NA NA NA NA

% time Boolean AND

% time for Boolean &
coin often rare chunk
logical 100.0 37.1 30.3 28.7
bit 6.3 5.9 6.0 6.0
bitwhich 38.1 7.3 6.2 7.9
which NA NA NA NA
ri NA NA NA NA

% time Boolean OR

% time for Boolean |
coin often rare chunk
logical 97.9 30.6 36.3 34.6
bit 6.0 6.0 5.9 6.0
bitwhich 23.3 6.2 7.0 9.9
which NA NA NA NA
ri NA NA NA NA

% time Boolean EQUALITY

% time for Boolean ==
coin often rare chunk
logical 32.8 32.5 32.6 32.6
bit 6.0 6.0 6.0 6.2
bitwhich 16.0 6.0 6.4 6.7
which NA NA NA NA
ri NA NA NA NA

% time Boolean XOR

% time for Boolean !=
coin often rare chunk
logical 32.1 32.0 32.1 32.1
bit 6.1 5.9 5.9 5.9
bitwhich 16.0 5.9 5.9 6.8
which NA NA NA NA
ri NA NA NA NA

% time Boolean SUMMARY

% time for Boolean summary
coin often
logical 94.9 35.5
bit 14.0 14.1

Fast methods for integer set operations

“The space-efficient structure of bitmaps dramatically reduced the run time of sorting” (Jon Bently, Programming Pearls, Cracking the oyster, p. 7)

Execution time for R (R) and bit (b)
Execution time for R (R) and bit (b)
Execution time for R, bit and merge relative to most expensive R in ‘unsorted bigbig’ scenario
Execution time for R, bit and merge relative to most expensive R in ‘unsorted bigbig’ scenario
Execution time for R, bit and merge in ‘sorted bigbig’ scenario
Execution time for R, bit and merge in ‘sorted bigbig’ scenario

% time for sorting

sorted data relative to R’s sort
small big
sort 136.0 266.7
sortunique 100.2 20.5
unsorted data relative to R’s sort
small big
sort 30.6 40.5
sortunique 19.6 10.9

% time for unique

sorted data relative to R
small big
bit 163.9 12.1
merge 30.8 4.2
sort 0.0 0.0
unsorted data relative to R
small big
bit 152.8 13.6
merge 184.6 40.3
sort 156.8 37.0

% time for duplicated

sorted data relative to R
small big
bit 286.7 11.8
merge 24.3 3.6
sort 0.0 0.0
unsorted data relative to R
small big
bit 294.5 9.4
merge 215.0 42.7
sort 189.7 39.9

% time for anyDuplicated

sorted data relative to R
small big
bit 189.6 13.6
merge 19.9 2.4
sort 0.0 0.0
unsorted data relative to R
small big
bit 149.2 10.3
merge 209.4 47.0
sort 190.2 45.2

% time for sumDuplicated

sorted data relative to R
small big
bit 116.7 11.2
merge 14.4 2.9
sort 0.0 0.0
unsorted data relative to R
small big
bit 101.5 9.1
merge 145.2 41.5
sort 131.9 39.3

% time for match

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit NA NA NA NA
merge 31.8 0 11.4 3.8
sort 0.0 0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit NA NA NA NA
merge 325.1 43.9 74.4 43.5
sort 293.4 43.8 66.9 40.9

% time for in

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 187.7 2.6 6.5 10.3
merge 25.1 0.0 9.4 3.5
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 216.8 2.1 4.5 6.6
merge 285.0 43.3 64.8 41.7
sort 259.3 43.3 58.4 39.5

% time for notin

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 285.8 2.6 5.8 9.3
merge 20.0 0.0 8.3 3.1
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 297.6 3.0 4.5 6.4
merge 243.2 43.8 58.7 39.9
sort 222.2 43.8 52.8 37.7

% time for union

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 37.4 13.0 14.2 10.5
merge 25.2 7.9 8.7 7.8
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 42.4 10.6 10.7 7.9
merge 84.0 25.8 26.1 26.5
sort 56.4 19.6 19.8 20.8

% time for intersect

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 35.8 2.8 3.4 5.8
merge 10.5 0.0 0.0 2.7
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 29.8 2.2 2.6 4.7
merge 59.4 19.9 23.2 19.8
sort 50.1 19.9 23.2 18.0

% time for setdiff

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 32.8 1.7 15.7 4.9
merge 8.7 0.0 9.4 3.1
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 23.8 0.9 12.4 3.6
merge 61.8 19.9 29.2 23.5
sort 54.3 19.9 22.0 21.2

% time for symdiff

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 21.1 7.2 6.8 4.9
merge 3.6 3.5 3.3 1.6
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 18.5 5.6 5.4 3.4
merge 25.6 11.2 10.7 11.2
sort 22.0 8.5 8.1 10.1

% time for setequal

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 61.2 61 6.6 7
merge 16.6 14 4.0 4
sort 0.0 0 0.0 0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 65.2 62.0 5.0 5.1
merge 98.9 29274.9 15.9 28.9
sort 82.5 29260.6 12.6 25.7

% time for setearly

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 63.8 2.4 6.7 6.7
merge 16.1 0.0 0.1 2.8
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 62.9 1.7 4.3 4.5
merge 98.3 26.2 65.7 23.9
sort 80.9 26.2 65.6 21.8