I've put together a set of benchmarks of what I consider to be the most prominent C and C++ hash table implementations. I've made the code available at Github. If you have any critiques or corrections, please post a comment below, email me, or fork the code and send me a pull request on Github.
While you can tweak some settings like growth factor, and supply different hash functions, I was more interested in how these hash tables perform out of the box. Before I overload you with all of the charts, here is a quick summary of the results:
dense_hash_map: #1 in performance with only a small
memory premium over its peers. Its default memory allocation pattern might be
dangerously lumpy, though, as it mysteriously ran out of memory before
sparse_hash_map: 2-3 times slower than
its peers for most things (except deleting integer keys), but uses roughly half
the memory of the next most memory-conservative (Boost). Considering that
"slower" in this case still basically means "very fast", this is my personal
unordered_map: Winner of the most
boring award. This is probably a good thing, coming from the
compiler-provided option. Unless you have very specific needs, you'll be just
fine with GCC's
unordered_map. It won't blow your mind, but it
won't screw you over.
GHashTable: With average memory usage and
pretty good performance, it's a fine choice, particularly for plain C apps.
Glib is also full of other useful code that makes C much more pleasant to deal
QHash: Good performance but pretty heavy on
the memory side. Fine if you're writing a Qt app, but otherwise not worth
unordered_map: Reasonable performance
and pretty low memory usage up to about 10-20 million entries. After that it
has unimpressive performance with string keys and slightly alarming performance
with integer keys. A poor choice if you want to put tens of millions of things
in it, but below that size it's quite fine.
dict really surprised me with its
performance. I initially included it on a lark, but it gives the other
implementations a run for their money speed-wise, and doesn't use as much
memory as I feared it would. It does still use a lot of RAM, though. So while
you can feel pretty good about its performance within the context of writing
Python code, it certainly doesn't make sense to bring the Python baggage into
your C or C++ program just to use the Python dictionary.
Hash fared better than Python in memory
usage, but its performance is significantly worse.
Something I notice about Ruby, Boost, and Google
sparse_hash_map is that they all rank lowest in performance while
also having very smooth memory scaling. It appears that their growth factors
are smaller than the others, and perhaps the reason they're slower is that they
spend much of their time re-allocating and copying things around.
The sequential inserts charts measure how long it takes to insert a contiguous series of integer keys. Random Inserts insert a series of random integer keys, with some duplicates. The pseudo-random seed is the same every time, ensuring fairness. The deletion benchmark measures how long it takes to delete a contiguous series of integer keys (which are all ensured to exist).
Half of the tests used integer keys and the other half used strings which were stringified representations of the same sequence of integers. A lot of the lines in the charts disappear at a certain point, particularly with string keys. That's where they ran out of memory.