Everything You need to know about Hash Tables

vishal rana
5 min readFeb 28, 2023

--

Hash tables are a fundamental data structure used in computer science to store and retrieve data efficiently. They are also known as hash maps, dictionaries, or associative arrays. In this blog, we’ll cover what are hash tables and how they work. So let’s get started.

Hash Tables

A hash table is a data structure that maps keys to values using a hash function. Hash tables offer O(1) average case time complexity for insertion, deletion, and retrieval of data, making them an essential tool in many algorithms and applications.

Hash tables make use of a HASH FUNCTION that maps the values with a key.

HASH FUNCTION

The main objective of the hash function is to create a hash of a given value. Let’s say we have a key i.e — Alice, when we pass it to a hash function and it will return us the hash of the alice , let say it is 5. It means we can map value for Alice at index 5 in our hash table.

Two keys can have a same Hash value, Let say Alice and bob has same hash value of 5, so we need to map both of them at the same index.

if hash(x) == hash(y), it means that both are idential and x is equal to y.

if hash(x) != hash(y), it means that both are idential and x is equal to y.


// hash() is our hash function
Mapping keys to their location in hash tables using hash function

Let say you are created a new blog on medium. After some days you want to make some changes due to some errors or something else, When you save your changes, the medium algorithm cam compare the hash of these two blogs (one that was published and the one that you just edited). If the hash of both the versions are same it means no changes are being done, if not then we can published the new changes to the blog. It is easy to do so as time complexity of creating and comparing a hash is Constant.

It is very hard to compare each word line by line to detect changes. and usually not the most reliable way to do it.

Hash functions must be deterministic, means for a given input they should create a same output.

As i said before, one or more keys can have same hash this is called hash collision. What is the solution for this?

There are two most popular approach for Collision resolution:

  • Separate Chaining
  • Open Addressing

Separate Chaining

Separate chaining is the concept of maintaining a data structure (linked list) to hold all the key value pairs for a particular hash value.

creating a linked list to store all the possible key values for a particular index

In above diagram, we have same hash value for alice and bob, so we have created a separate chain for storing all the key values for hash value 0.

Open Addressing

Open addressing is another technique to resolve collision issues. A hash table consists of an array of buckets, each with a unique index. The index of a bucket is determined by a hash function that takes the key as input and returns an integer value.

when a collision occurs, the new element is inserted into the next available empty bucket in the table. There are several techniques for finding the next empty bucket, such as:

  • Linear probing
  • Quadratic probing
  • Double hashing

Linear probing: In linear probing, if the original bucket index is occupied, the algorithm checks the next bucket in the table. If that bucket is also occupied, it checks the next one, and so on, until an empty bucket is found.

Quadratic Probing: In quadratic probing, if the original bucket index is occupied, the algorithm checks the next bucket with an offset of 1², then 2², then 3², and so on, until an empty bucket is found.

Double Hashing: It uses a second hash function in addition to the primary hash function. The second hash function is used to determine the number of steps to take when searching for an empty bucket to place the item in the hash table.

The general idea is that if the primary hash function maps two keys to the same bucket, the second hash function will produce a different offset value for each key, and the resulting buckets will be different. This reduces the likelihood of collisions and helps distribute the items more evenly throughout the hash table.

Open addressing has the advantage of using less memory than chaining, but it can become slower when the table is more than half full. The choice between the two methods depends on the specific application and the expected number of collisions.

Load Factor

Load factor is a measure of how full a hash table is. It is defined as the ratio of the number of elements in the table to the total number of buckets.

For example, if a hash table has 100 buckets and contains 50 elements, the load factor is 0.5.

Load factor is an important factor to consider when designing a hash table, as it affects the performance of the table. When the load factor is low, there are many empty buckets in the table, which means that the table is wasting memory.

When the load factor is high, there are many elements in each bucket, which can slow down the performance of the table.

A good rule of thumb is to keep the load factor below 0.7. When the load factor exceeds this value, the table should be resized to increase the number of buckets.

Resizing a hash table involves creating a new, larger array and rehashing all the elements in the old table into the new table. This can be an expensive operation, so it is important to choose the initial size of the table carefully to minimize the number of resizes that will be necessary.

Load factor = No. of entries / Total size of array

// load factor =0 (Hash table is empty)
// load factor = 0.5 (Hash table is half full)
// load factor = 1(Hash table is full)

When the load factor is low, collisions are less likely to occur, and the performance of the hash table is better

Conclusion

Hash tables are a versatile data structure with many practical applications in computer science. They offer fast insertion, deletion, and retrieval of data, making them a popular choice for implementing caches, symbol tables, and many more data structures. Understanding the basics of hash tables is essential for any computer scientist or software developer.

Happy Learning 🚀

--

--