In this article, we will discuss about different hashing techniques and collision conditions. We will list out different collision resolution techniques and discuss about closed addressing which is used in Java to resolve collisions.
Why Hashing Technique is Required
When it comes to store and fetch data we use some sort of data structure based on our requirement and the complexities involved in it. For example, we might prefer a plain list rather than a linked list if we are mostly interested in the retrieval of data. Similarly, we might be interested in searching for a particular item or data from very large datasets. We have 2 techniques when it comes to searching an item from a data structure - linear and binary search.
Now when it comes to the complexity of linear search it is O(n) whereas for binary search it is O(logn) where n is the number of elements present in the dataset. Now, to improve this time-complexity of searching an element from a large dataset, we require a technique that is independent of n and hashing technique could be the best fit for this but again we need to ensure we implement a robust hashing function to generate hashes in order to minimize a collision.
What is Hashing
Hashing is a technique used for storing and retrieving information as fast as possible. We can achieve the insertion and deletion operations in O(1) with hashing. Even the worst-case complexity of hashing is still O(n), but it gives O(1) on the average.
In Java, a HashMap uses this technique and complete implementation of a custom HashMap can be found in one of my articles here.
Components of Hashing
There are 4 components of Hashing.
- Hash Table
- Hash Functions
- Collision Resolution Techniques
Hash Table Hash table or hash map is a data structure that stores the keys and their associated values. It is a generalization of array.
In Java, when we use the default constructor as new HashMap<>() to create a Map, then by default a Hash Table of size 16 is created in the memory.
Hash Function: The hash function is used to transform the key into the index. Ideally, the hash function should map each possible key to a unique slot index which is difficult to achieve in a real-time scenario.
Hash Function Design Criteria
The generated index value should ideally be distributed uniformly across the table.
The hash function computation should be quick and it should return values within the range of locations in the hash table.
The hash function should minimize collisions.
Collisions: Collision is a scenario in which two unequal keys or hashcode results in the same index value and this is a very frequent scenario in a large collection of objects. This is the reason why it is required to define an efficient hash function to minimize collisions. Too many of collisions have an impact on the performance.
As collisions are not completely avoidable hence there are different collision resolution techniques.
Collision Resolution Techniques
Despite collision issues, hashing techniques are more efficient in many cases comparative to all other data structures like search trees.
There are broadly 2 collision resolution techniques and they are Closed Addressing(Direct Chaining) and Open Addressing.
- Closed Addressing
- Seperate Chaning
- Open Addressing
- Linear probing(linear search)
- Quadratic probing(non-linear search)
- Double hashing(use two hash functions)
Java uses closed addressing techniques for collision resolution. Hence, let us discuss about this in this article.
We have discussed about this with an example in my last article of hashmap internal implementation here. In this article, we discussed about hashing technique and collision conditions. We also listed out different collision resolution techniques and discussed about closed addressing which is used in java to resolve collisions.
We have discussed about this with an example in my last article of hashmap internal implementation here.
In this article, we discussed about hashing technique and collision conditions. We also listed out different collision resolution techniques and discussed about closed addressing which is used in java to resolve collisions.
If You Appreciate This, You Can Consider:
About The Author
A technology savvy professional with an exceptional capacity to analyze, solve problems and multi-task. Technical expertise in highly scalable distributed systems, self-healing systems, and service-oriented architecture. Technical Skills: Java/J2EE, Spring, Hibernate, Reactive Programming, Microservices, Hystrix, Rest APIs, Java 8, Kafka, Kibana, Elasticsearch, etc.
Further Reading on Data Structure
- Linkedlist Implementation Java
- Queue Implementation Java
- Stack Implementation Java
- Pros And Cons Collection Java
- Hashmap Custom Implementation Java
- Different Sorting Algorithms
- Doubly Linked List Implementation
- Dijkstra Algorithm Java
- Binary Tree Level Order Traversal
- Hashmap Hashtable And Concurrenthashmap