incise.org: Hash Table Benchmarks

I've put together a set of benchmarks of what I consider to be the most prominent C and C++ hash table implementations. I've made the code available at Github. If you have any critiques or corrections, please post a comment below, email me, or fork the code and send me a pull request on Github.

While you can tweak some settings like growth factor, and supply different hash functions, I was more interested in how these hash tables perform out of the box. Before I overload you with all of the charts, here is a quick summary of the results:

Most Noteworthy

Google dense_hash_map: #1 in performance with only a small memory premium over its peers. Its default memory allocation pattern might be dangerously lumpy, though, as it mysteriously ran out of memory before expected.

Google sparse_hash_map: 2-3 times slower than its peers for most things (except deleting integer keys), but uses roughly half the memory of the next most memory-conservative (Boost). Considering that "slower" in this case still basically means "very fast", this is my personal favorite.

The Joe Six-Packs of Hash Tables

GCC unordered_map: Winner of the most boring award. This is probably a good thing, coming from the compiler-provided option. Unless you have very specific needs, you'll be just fine with GCC's unordered_map. It won't blow your mind, but it won't screw you over.

Glib GHashTable: With average memory usage and pretty good performance, it's a fine choice, particularly for plain C apps. Glib is also full of other useful code that makes C much more pleasant to deal with.

Qt QHash: Good performance but pretty heavy on the memory side. Fine if you're writing a Qt app, but otherwise not worth seeking out.

The Fail Club

Boost unordered_map: Reasonable performance and pretty low memory usage up to about 10-20 million entries. After that it has unimpressive performance with string keys and slightly alarming performance with integer keys. A poor choice if you want to put tens of millions of things in it, but below that size it's quite fine.

Just for Fun

Python's dict really surprised me with its performance. I initially included it on a lark, but it gives the other implementations a run for their money speed-wise, and doesn't use as much memory as I feared it would. It does still use a lot of RAM, though. So while you can feel pretty good about its performance within the context of writing Python code, it certainly doesn't make sense to bring the Python baggage into your C or C++ program just to use the Python dictionary.

Ruby's Hash fared better than Python in memory usage, but its performance is significantly worse.

Something I notice about Ruby, Boost, and Google sparse_hash_map is that they all rank lowest in performance while also having very smooth memory scaling. It appears that their growth factors are smaller than the others, and perhaps the reason they're slower is that they spend much of their time re-allocating and copying things around.

The Charts and What They Mean

The sequential inserts charts measure how long it takes to insert a contiguous series of integer keys. Random Inserts insert a series of random integer keys, with some duplicates. The pseudo-random seed is the same every time, ensuring fairness. The deletion benchmark measures how long it takes to delete a contiguous series of integer keys (which are all ensured to exist).

Half of the tests used integer keys and the other half used strings which were stringified representations of the same sequence of integers. A lot of the lines in the charts disappear at a certain point, particularly with string keys. That's where they ran out of memory.

Sequential Inserts: Execution Time (integers)

number of entries in hash table

Sequential Inserts: Execution Time (strings)

number of entries in hash table

Random Inserts: Execution Time (integers)

number of entries in hash table

Random Inserts: Execution Time (strings)

number of entries in hash table

Deletion: Execution Time (integers)

number of entries in hash table

Deletion: Execution Time (strings)

number of entries in hash table

Memory Usage (integers)

number of entries in hash table

Memory Usage (strings)

number of entries in hash table

comments


Nick Welch <nick@incise.org> · github · twitter