HASHING & HASH TABLE: March 2020

Hashing

Hashing is a technique that is used to assign a key to a specific value so that the value can be accessed easily and efficiently.

Some examples of how hashing is used in our lives include:
•In libraries, a book is assigned to a unique number/key that can be used to find information about the book we are looking for, through the library computer system, such as its exact position in the library or the users it has been issued to, etc.

In the example above, the books with all the information it has are the value. The number is the key which makes it easier to identify the books.

Assume that you have a value and you want to assign a key to it to make searching easy. To store the values, you can use a simple array like a data structure where keys (integers) can be used directly as an index to store values. However, in most cases, the keys are actually String.

So How Do You Map a String(key) Into Indexes In an Array?

HASH FUNCTION AND HASH TABLE

This is where hash function comes into play, a hash functions take a string(key) then convert it into some sort of integer(smaller key) which is called a hashcode, the hashcode is then remapped into a smaller number which will be the index of the array, this is done so that the array size won't get too big. Now you can access the element in O(1) time because of the index assigned to the key. The name of the array on which the value is store is called a hash table.

The hash function uses some sort of mathematical algorithm to convert the string(key) into hash code.

These are a few important methods/algorithm for constructing a hash functions

1.MID-SQUARE

•Square the string/identifier and then using an appropriate number of bits from the middle of

the square to obtain the hash-key.

•If the key is a string, it is converted to a number.

•Steps:
1. Square the value of the key. (k2)
2. Extract the middle r bits of the result obtained in Step 1
Function : h(k) = s
k = key
s = the hash key obtained by selecting r bits from k2

Mid square example:
•Here the entire key participates in generating the address so that there is a better chance
that different addresses are generated even for keys close to each other
•For example,

Key squared value middle part

3121 9740641 406

3122 9746884 468

3123 9753129 531.

2.DIVISION

Steps:

•Divide the string/identifier by using the modulus operator

• It’s the most simple method of hashing an integer x.

Function: h(z) = z mod M

z = key

M = the value used to divide the key, usually using a prime number, the table size or the size of memory used.

DIVISION EXAMPLE:

Suppose the table is to store strings. A very simple hash function would be to add up ASCII values of all the characters in the string and take modulo of table size, say 97.

“COBB” would be stored at the location

( 64+3 + 64+15 + 64+2 + 64+2) % 97 = 88

“HIKE” would be stored at the location

( 64+8 + 64+9 + 64+11 + 64+5) % 97 = 2

“PPQQ” would be stored at the location

( 64+16 + 64+16 + 64+17 + 64+17) % 97 = 35

“ABCD” would be stored at the location

( 64+1 + 64+2 + 64+3 + 64+4) % 97 = 76

3.FOLDING

The Folding method works in two steps :

Divide the key value into a number of parts where each part has the same number of digits except the last part which may have lesser digits than the other parts.
Add the individual parts. That obtains the sum of part1 + part2 + ... + part n. The hash value produced by ignoring the last carry, if any.

Folding Example:

Let's start by converting the string's characters into numbers. ASCII is a good candidate for this operation

Now, we arrange the numbers we just obtained into groups of some size. Generally

we choose the group size value based on the size of our array which is 105. Since the
numbers, in which we transformed the characters into, contain from two to three
digits, without loss of generality, we may set the group size to two:

The next step is to concatenate the numbers in each group as if they were strings
and find their sum

Now we must make the final step. Let's check whether the number 348933 may
serve as an index of our array of size 105. Naturally, it exceeds the maximum
allowed value 99999. We may easily overcome this problem by applying the
modulo operator in order to find the final result: 348933 % 10000 = 48933.

4.Digit Extraction

•A predefined digit of the given number is considered as the hash address.

•Example:

–Consider x = 14,568

–If we extract the 1st, 3rd, and 5th digit, we will get a hash code of 158.

5.Rotating Hash

Steps:

• Use any hash method (such as division or mid-square method)
• After getting the hash code/address from the hash method, do rotation
• Rotation is performed by shifting the digits to get a new hash address.
Example:
-Given hash address = 20021
-Rotation result: 12002 (fold the digits)

COLLISIONS

When we put objects into a hashtable, it is possible that different objects (by
the equals() method) might have the same hashcode. This is called a collision.
Here is the example of collision:
Two different strings ""Aa" and "BB" have the same key:
"Aa" = 'A' * 31 + 'a' = 2112
"BB" = 'B' * 31 + 'B' = 2112

There general and best ways to handle collisions:

Chaining

Chaining is a method that we can use to overcome collision by using a linked list so
that the key/string with the same hash code/index will be linked together using a linked
list. For this, to work the hash table must store not only the value of the object but also
the original key, so that you can find the key searched inside the linked list.

Code implementation:

void chaining(item, h[]) {

hash_key = hash(item);

trail = NULL, lead = h[hash_key];

while ( lead ) {

if ( strcmp(lead->item, item) == 0 ) { // DUPLICATE ENTRY }

trail = lead; lead = lead->next;
}

p = malloc new node;

p->item = item;

p->next = NULL;

if ( trail == 0 ) h[hash_key] = p; else trail->next = p;

}

HASHING IMPLEMENTATION ON BLOCKCHAIN

Hashing in blockchain refers to the process of having an input item of whatever length

reflectingan output item of a fixed length. If we take the example of blockchain use in

cryptocurrencies, transactions of varying lengths are run through a given hashing

algorithm, and all give an output that is of a fixed length. This is regardless of the

length of the input transaction. The output is what we call a hash. A good example is

Bitcoin’s Secure Hashing Algorithm 256 (commonly shortened to SHA-256). Hashing

using SHA-256 always gives an output result of a fixed length, which has a 256-bits

length (the output is 32 bytes). This is always the case whether the transaction is just

a single word or a complex transaction with huge amounts of data. What this means is

that keeping track of a transaction becomes easier when you can recall/trace the

hash. The size of the hash will depend on the hash function utilized, but the out using a

particular hashing algorithm will be of a specific size.

HASHING & HASH TABLE

Wednesday, March 11, 2020

LINKED LIST