Grokking Algorithms - Chapter 5
Hash Tables
Hash tables are the more formal term for Python’s Dictionary structure; a key-value pairing of elements. Hash functions are any function that does the mapping of a value to an input.
Two main rules when implementing hash tables:
- Pairings should be consistent
- Key’s should be unique
The way these mappings work are complex, but essentially, when using Python, grabbing values from a dictionary is pretty fast and grants instant access.
Exercises
Which of the following four are consistent?
- $f(x)=1$ Returns 1 for all input ✅
- $f(x)=$
random.random()
❌ - $f(x)=$
next_empty_slot()
❌ - $f(x) = len(x)$ ✅
Use cases of Hash Tables
Honestly, they are used pretty much everywhere. Contact lists on your phone, social media apps, sports and athletics, just to name a few. Creating one in Python is as simple as phone_book = {}
.
Most Hash Tables are created with lookups in mind. Why? Again, instant access. The internet even does it, just on an even grander scale than our examples. IP addresses are mapped to a domain name (URL) through DNS, domain name resolution.
Browsers use a DNS cache that store recently visited URL’s for easy viewing. See this note on DNS for more info on how exactly that works. Caching makes things super fast, and looks like this:
cache = {}
def get_page(url):
if url in cache:
return cache[url]
else:
data = get_data_from_server(url)
cache[url] = data
return data
Collisions and Performance
To understand performance, we have to first understand what collisions are and how they happen. Unfortunately, when using hash functions, you can’t always avoid keys being assigned to a spot in memory. Exactly why this happens is abstract and not necessary to understand right now.
You could use linked lists or arrays in tables, but that also has it’s own limits like memory waste and management. Hash functions & low load factor help avoid collisions, which is the worst nightmare for hash tables.
Hash (avg) | Hash (worst) | Arrays | Linked Lists | |
---|---|---|---|---|
Search | $O(1)$ | $O(n)$ | $O(1)$ | $O(n)$ |
Insert | $O(1)$ | $O(n)$ | $O(n)$ | $O(1)$ |
Delete | $O(1)$ | $O(n)$ | $O(n)$ | $O(1)$ |
Load Factor is another concept to be aware of, measuring how full an array is (# of items/# of slots). If your load factor is too high, memory is full, and you’ll end up having to resize. Good Hash functions spread the data that’s stored in them evenly.
Next: Chapter 6