Hashing and Integrity

01 // The Concept

What is a hash function?

Think of a hash function as a meat grinder for data. You put anything in (a password, a file, an entire book) and out comes a fixed-size “digest” that uniquely represents the original. But here's the catch: you can never turn the ground meat back into a steak.

🔬

Deterministic

The same input will always produce the exact same output. Hash "hello" a million times, you'll always get the same digest.

🚀

Fast to compute

Hashing even large files takes only microseconds. Speed is essential because these functions run billions of times a day across the internet.

🔒

One-way

Given a hash output, it's computationally infeasible to find the original input. You can't reverse the process.

📐

Fixed output size

Whether you hash a single byte or a terabyte, the output is always the same length. For example, 256 bits for SHA-256.

Any input"Hello, world!"size: any

→

Hash FunctionSHA-256

→

Fixed digest315f5bdb…size: always 256 bits

02 // Try It Yourself

Live hash playground

Type anything below and watch the hash change instantly. Notice how even a tiny change in the input produces a completely different output.

Hash Explorer

Type below to see the hash update in real time.

Input Message

SHA-256 Digest

03 // Security Properties

The three security properties

A cryptographic hash function must satisfy three essential security properties to be considered safe for use.

Pre-image Resistance

Given a hash output $h$ , it should be computationally infeasible to find any input $m$ such that $H(m) = h$ . You can't reverse-engineer the original data.

Second Pre-image Resistance

Given an input $m_1$ , it should be infeasible to find a different input $m_2$ that produces the same hash. Each input's fingerprint is effectively unique.

Collision Resistance

It should be infeasible to find any two different inputs that produce the same hash output. This is the hardest property to achieve and the first to break.

Try to Break It

Test the three security properties of SHA-256 yourself. Pick a challenge and see why these properties hold.

A secret word has been hashed with SHA-256. Can you figure out the original input from its hash alone?

Secret Input

??? (hidden)

Target Hash (SHA-256)

Your Guess

Type something to see how your hash compares

Attempts: 0

04 // The Avalanche Effect

Change one bit, change everything

One of the most fascinating properties: flipping a single bit in the input produces a radically different hash. This is called the avalanche effect.

Avalanche Visualizer

See how two nearly identical inputs produce completely different hashes.

Input A

hello

Input B (one character different)

hellp

SHA-256 of Input A

SHA-256 of Input B

Bits that differ

—

05 // The Problem

When hashes aren't enough

Hash functions are public algorithms. Anyone can compute SHA-256 of anything. That is exactly the problem.

The missing piece

Hash functions create unique fingerprints of data. If the data changes, the fingerprint changes. This sounds like it solves integrity, but there is a catch.

If Alice sends Bob a message with its hash appended, and Eve sits in the middle, Eve can replace the message, compute a fresh hash of her modified version, and forward both to Bob. Bob checks the hash, sees it matches, and trusts the message. The hash gave integrity against accidents (bit flips, network errors) but zero protection against a deliberate attacker. What is missing is a secret.

With hash only

Alicemsg + hash(msg)

→

Eve interceptsmodifies msg, recomputes hash

→

Bob fooledhash matches ✗

With MAC (keyed hash)

Alicemsg + MAC(key, msg)

→

Eve interceptsno key, cannot forge tag

→

Bob detectsMAC fails ✓

06 // Inside the Machine

How hash functions process data

SHA-256 does not see your message as one big blob. It breaks it into 64-byte blocks and processes them one at a time through a compression function. Understanding this pipeline is the key to understanding both why hash functions work and where they can be exploited.

The assembly line

A hash function needs to accept inputs of any size, but it can only mix a fixed number of bits at a time. The Merkle-Damgard construction solves this by turning the hash into a pipeline. The message is padded to a multiple of 64 bytes, then split into blocks. Each block is fed through a compression function together with the running state from the previous block.

The first block starts from a fixed initialization vector (IV), eight 32-bit numbers derived from the square roots of the first eight primes. After the last block, the state IS the hash output. SHA-256, SHA-1, and MD5 all follow this pattern.

Merkle-Damgard Construction

IVfixed

→

↑ M1

→

state256 bits

→

↑ M2

→

...

→

↑ Mn

→

hash256 bits

Merkle-Damgard in Action

Watch data flow through the SHA-256 pipeline, block by block.

Message

The critical consequence: the hash output IS the internal state after the last block. If you know the hash, you know the state. And if you know the state, you can keep feeding in more blocks from that point forward. This fact will matter in the next section.

07 // The Naive Approach

The naive fix and why it breaks

Your first instinct might be: take a hash function, prepend the secret key, and hash the whole thing. It sounds reasonable. It is dangerously broken.

The length extension attack

If Alice computes tag = hash(key || message) and sends the message with the tag, Eve can do the following without knowing the key: take the tag (which is the internal hash state), resume the hash computation from that state, feed in additional data, and obtain hash(key || message || padding || extra_data). She now has a valid tag for a message Alice never sent.

This is not a theoretical weakness. Length extension attacks have been used against real APIs that used hash(key || message) for authentication. The Flickr API signing vulnerability in 2009 is one well-known example.

Length Extension Attack

See why hash(key || message) is broken. Step through the attack.

Original Message

Eve's Extra Data

How SHA-256 processes data (Merkle-Damgard)

IVfixed

→

keyblock 1

→

messageblock 2

→

state= tag output

Step 1: Alice creates a tag

Alice computes hash(key || message) and sends the message with its tag.

Original: hash(key || message)

...

Extended: hash(key || message || pad || extra)

...

08 // Under the Hood

HMAC: the right way

HMAC is the industry-standard MAC construction. It uses the same hash function you already know, but wraps it in a two-pass structure that eliminates length extension entirely.

Two hashes, two derived keys

HMAC starts by creating two derived keys from your original key. It XORs the key with a constant called ipad (0x36 repeated to fill a block) and another constant called opad (0x5c repeated). These two values never change. They are defined in the HMAC specification (RFC 2104).

The inner pass hashes (key XOR ipad) concatenated with the message. This produces a 32-byte intermediate result. The outer pass then hashes (key XOR opad) concatenated with that intermediate result. The output of the outer hash is the final HMAC tag.

Why does this stop length extension? Because the attacker only sees the outer hash output. To extend it, they would need the outer hash's internal state, but that state depends on the key XOR opad, which they do not know. The inner hash result is buried inside the outer computation, completely out of reach.

HMAC Construction

Inner pass

Key

XOR

ipad0x36...36

→

concat with message

→

SHA-256

→

inner hash32 bytes

feeds into

Outer pass

Key

XOR

opad0x5c...5c

→

concat with inner hash

→

SHA-256

→

HMAC tag32 bytes

In a single formula:

\text{HMAC}(key, msg) = \text{Hash}\bigl((key \oplus opad) \,\|\, \text{Hash}((key \oplus ipad) \,\|\, msg)\bigr)

The inner hash produces an intermediate digest. The outer hash seals it. An attacker who sees only the final output cannot extend either pass.

HMAC Step by Step

Watch HMAC-SHA-256 compute a tag one step at a time, with real intermediate values.

Key

Message

09 // The Challenge

Can you forge a MAC?

You have a message and its valid HMAC tag, but the key is secret. Can you produce a valid tag for a different message? Try as many times as you want.

Forgery Challenge

Alice sent a message with a valid HMAC-SHA-256 tag. The key is secret. Your mission: produce a valid tag for a different message without knowing the key.

Original Message

Pay Eve $50

Valid HMAC Tag

Target Message (forge a tag for this)

Pay Eve $50000

Secret Key

??? (hidden)

Your forged HMAC tag for “Pay Eve $50000”

Attempts: 0

10 // In Practice

Where they protect you every day

Hashes and MACs are two of the most deployed cryptographic primitives, working silently in passwords, downloads, APIs, and protocols across the internet.

🔑

Password Storage

Websites store hashes of your password, not the password itself. When you log in, they hash your input and compare.

✅

File Integrity

Download a file and compare its hash to the published one. If they match, the file hasn't been tampered with.

🔐

API Authentication

Services like AWS sign every API request with HMAC. The server recomputes the MAC to verify the request came from someone with the secret key.

🔗

TLS Record Protection

Every TLS record includes a MAC or AEAD tag. This prevents an attacker from silently modifying encrypted traffic between your browser and a server.

📋

JWT Token Verification

JSON Web Tokens use HMAC-SHA256 to sign claims. The server verifies the signature before trusting the token contents.

🍪

Cookie and Session Signing

Web frameworks sign session cookies with HMAC. If a user tampers with their session data, the server detects the invalid tag and rejects the request.

A MAC does not provide confidentiality. The message itself travels in plaintext. Eve can still read it. She just cannot modify it without detection. Combining confidentiality and integrity in a single operation is the topic of a later chapter: Authenticated Encryption.

Hashingand Integrity