Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

CSE 373: Open addressing, Study notes of Data Structures and Algorithms

This problem is known as “primary clustering”. Happens when λ is large, or if we get unlucky. In linear probing, we expect to get O (lg(n)) size ...

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

rossi46
rossi46 🇬🇧

4.5

(10)

313 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CSE 373: Open addressing
Michael Lee
Friday,Jan 26, 2018
1
Warmup
Warmup:
With your neighbor, discuss and review:
IHow do we implement get,put, and remove in a hash table
using separate chaining?
IWhat about in a hash table using open addressing with linear
probing?
ICompare and contrast your answers: what do we do the
same? What do we do differently?
2
Warmup
In both implementations, for all three methods, we start by finding
the initial index to consider:
index =key.hashCode() % array.length
3
Warmup
If we’re using separate chaining, wethen search/insert/delete from
the bucket:
IDictionary<K,V>bucket =array[index]
bucket.get(key)// or .put(...) or .remove(...)
...and resize when λ1.
(When exactly to resize is a tuneable parameter)
4
Warmup
If we’re using linear probing,search until we find an array element
where the key is equal to ours or until the arrayindex is null:
while (array[index] != null
&& array[index].hashcode != key.hashCode()
&& !array[index].equals(key)) {
index = (index +1)%this.array.length
}
if (array[index] == null)
// throw exception if implementing get
// add new key-value pair if implementing put
else
// return or set array[index]
How do we delete? (complicated, see section 04 handouts)
When do we resize?
5
Open addressing: linear probing
Strategy: Linear probing
If we collide, checking each next element until wefind an op en slot.
So, h0(k,i) = (h(k)+ i)mod T, where Tis the table size
i= 0
while (index in use)
try (hash(key)+i)%array.length
i+= 1
6
pf3
pf4

Partial preview of the text

Download CSE 373: Open addressing and more Study notes Data Structures and Algorithms in PDF only on Docsity!

CSE 373: Open addressing

Michael Lee Friday, Jan 26, 2018

1

Warmup

Warmup:

With your neighbor, discuss and review:

I How do we implement get , put , and remove in a hash table using separate chaining? I (^) What about in a hash table using open addressing with linear probing? I Compare and contrast your answers: what do we do the same? What do we do differently?

2

Warmup

In both implementations, for all three methods, we start by finding the initial index to consider : index = key.hashCode() % array.length

3

Warmup

If we’re using separate chaining, we then search/insert/delete from the bucket: IDictionary<K, V> bucket = array[index] bucket.get(key) // or .put(...) or .remove(...) ...and resize when λ ≈ 1. (When exactly to resize is a tuneable parameter)

4

Warmup

If we’re using linear probing, search until we find an array element where the key is equal to ours or until the array index is null: while (array[index] != && array[index].hashcode != key.hashCode() null && !array[index].equals(key)) { index = (index + 1) % this .array.length } if (array[index] == null ) // throw exception if implementing get // add new key-value pair if implementing put else // return or set array[index]

How do we delete? (complicated, see section 04 handouts) When do we resize?

Open addressing: linear probing

Strategy: Linear probing If we collide, checking each next element until we find an open slot. So, h ′( k , i ) = ( h ( k ) + i ) mod T , where T is the table size i = 0 while (index in use) try (hash(key) + i) % array.length i += 1

Open addressing: linear probing

Assume internal capacity of 10, insert the following keys:

38, 19, 8, 109, 10

0 1 2 3 4 5 6 7 8 9

What’s the problem? Lots of keys close together: a “cluster”. We ended up having to probe many slots! 7

Open addressing: linear probing

Primary clustering When using linear probing, we sometimes end up with a long chain of occupied slots. This problem is known as “primary clustering”

Happens when λ is large, or if we get unlucky In linear probing, we expect to get O (lg( n )) size clusters.

8

Open addressing: linear probing

Questions:

I When is performance good? When is it bad? Runtime is bad when table is nearly full. Runtime is also bad when we hit a “cluster” I What is the maximum load factor? Load factor is at most λ = 1. 0! I When do we resize?

9

Open addressing: linear probing

Punchline: clustering can be potentially bad, but in practice, it tends to be ok as long as λ is small 10

Open addressing: linear probing

Question: when do we resize? Usually when λ ≈ (^12)

Nifty equations: I (^) Average number of probes for successful probe: 1 2

(1 − λ)

I (^) Average number of probes for unsuccessful probe: 1 2

(1 + λ)^2

*These equations aren’t important to know

Open addressing: quadratic probing

Problem: We can still get unlucky/somebody can feed us a malicious series of inputs that causes several slowdown Can we pick a different collision strategy that minimizes clustering? Idea: Rather then probing linearly, probe quadratically! Exercise: assume internal capacity of 10, insert the following:

89, 18, 49, 58, 79 0 1 2 3 4 5 6 7 8 9

Open addressing: double-hashing

How many different probe sequences are there?

There are T different starting positions, T − 1 different jump intervals (since we can’t jump by 0), so there are O ( T^2 )^ different probe sequences

Result: in practice, double-hashing is very effective and commonly used “in the wild”.

19

Summary

So, what strategy is best? Separate chaining? Open addressing? No obvious answer: both implementations are common. Separate chaining:

I Don’t have to worry about clustering I Potentially more “compact” (λ can be higher)

Open addressing:

I Managing clustering can be tricky I (^) Less compact (we typically keep λ < 12 ) I Array lookups tend to be a constant factor faster then traversing pointers

20

Applications of hash functions

Can we use hash functions for more then just dictionaries?

Yes! Lots of possible applications, ranging from cryptography to biology.

Important: Depending on the application, we might want our hash function to have different properties.

21

Applications of hash functions

How would you implement the following using hash functions? For each application, also discuss what properties you want your hash function to have.

I (^) Suppose we’re sending a message over the internet. This message might become mildly corrupted. How can we detect if corruption probably occurred? I Suppose you have many fragments of DNA and want to see where they appears in a (significantly longer) segment of DNA. How can we do this efficiently?

22

Applications of hash functions

Same question as before:

I Suppose you’re designing an video uploading site and want to detect if somebody is uploading a pirated movie. A naive way to do this is to check if the movie is byte-for-byte identical to some movie. How can we do this more efficiently? I (^) Suppose you’re designing a website with a user login system. Directly storing your user’s passwords is dangerous – what if they get stolen? How can you store password in a safe way so that even if they’re stolen, the passwords aren’t compromised?

Applications of hash functions

Same question as before:

I (^) You are trying to build an image sharing site. Users upload many images, and you need to assign each image some unique ID. How might you do this? I Suppose we have a long series of financial transactions stored on some (potentially untrustworthy) computer. Somebody claims they made a specific transaction several months ago. Can you design a system that lets you audit and determine if they’re lying or not? Assume you have access to just the very latest transaction, obtained from a different trustworthy source.