0

I have learned how a hashmap is implemented: an array of linked lists, if separate chaining is used. I know that when a key-value pair (key, val) is inserted, the hashmap inserts the pair {key, val} in the linked list at entry hash(key) % array.size in the underlying array.

enter image description here

So, for the insertion process, all that is done is

  • A computation of the hash function; also,
  • A modulo operation (take the hash modulo the array size); also,
  • An array access; finally,
  • Allocating a linked list node and inserting it into the linked list.

However, isn't this insertion process O(1), assuming the hash is O(1), because everything is then O(1) except possibly the linked list insertion, and for the linked list insertion, couldn't we always choose to insert the pair at the beginning of the linked list, which is O(1)? Isn't this O(1) always? If so, then why don't hash tables use this strategy?

1
  • And while something like an std::unordered_multimap in C++ never has to do this check, there is still the risk of having to re-hash on insertion as the capacity of the hash map is exceeded. If you've never re-hashed, inserts could be strictly O(1) but only while loosing average case O(1) for lookups in return. Commented Oct 24, 2023 at 12:46

2 Answers 2

2

Typically you don't want duplicate keys in any sort of map, ie when you try to insert a pair (key, value) you need to check whether that key already exists in the map or not. For this check you have to

  1. Find the respective bucket via hash(key)
  2. Check if key already exists in that bucket

And for checking if the key already exists, you have to iterate the bucket, which can have n elements. Thus, with buckets impelemented as linked list, the worst-case for insert is still O(n)

There are other implementations, that implement the buckets as balanced trees. With that implementation you can lower the worst-case for insert to O(log n)

Sign up to request clarification or add additional context in comments.

Comments

1

For a map where the key is required to be unique (e.g. std::unordered_map in C++), traversal of the list is still O(1) in the average case, but the key needs to be searched for first in the bucket. This lookup is never constant time.


Even for a map where the key is not required to be unique (e.g. std::unordered_multimap in C++), while the insertion could be done strictly in O(1) by always prepending to each bucket, there is often a requirement that clashes with that, and that is the support for "equal ranges" which still require sorting stuff into the bucket.

So the need for an efficient lookup by key renders the fastest possible insertion method still sub-par for multi-maps too.


Realistically, the only container which can skip all these checks is if the key is guaranteed to be unique by design, but no such check has to be performed at runtime. Only then you can get O(1) inserts without any pitfalls.


The remaining issue is, you expect not only average O(1) inserts, but also reads. And for that many hashmaps support dynamic resizing of the hash table, which involves a re-hashing of all already inserted elements in order to get down to reasonable bucket sizes again.

Which then gets you that true O(size()) complexity on inserts most implementations have.

Only hash-maps with a static table size can avoid this.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.