From digital fingerprints to keyed authentication. Learn how hash functions compress any data into a fixed-size digest, why that alone is not enough to stop attackers, and how HMAC adds a secret key to guarantee both integrity and authenticity.
Think of a hash function as a meat grinder for data. You put anything in (a password, a file, an entire book) and out comes a fixed-size “digest” that uniquely represents the original. But here's the catch: you can never turn the ground meat back into a steak.
The same input will always produce the exact same output. Hash "hello" a million times, you'll always get the same digest.
Hashing even large files takes only microseconds. Speed is essential because these functions run billions of times a day across the internet.
Given a hash output, it's computationally infeasible to find the original input. You can't reverse the process.
Whether you hash a single byte or a terabyte, the output is always the same length. For example, 256 bits for SHA-256.
Type anything below and watch the hash change instantly. Notice how even a tiny change in the input produces a completely different output.
Type below to see the hash update in real time.
A cryptographic hash function must satisfy three essential security properties to be considered safe for use.
Given a hash output , it should be computationally infeasible to find any input such that . You can't reverse-engineer the original data.
Given an input , it should be infeasible to find a different input that produces the same hash. Each input's fingerprint is effectively unique.
It should be infeasible to find any two different inputs that produce the same hash output. This is the hardest property to achieve and the first to break.
Test the three security properties of SHA-256 yourself. Pick a challenge and see why these properties hold.
A secret word has been hashed with SHA-256. Can you figure out the original input from its hash alone?
One of the most fascinating properties: flipping a single bit in the input produces a radically different hash. This is called the avalanche effect.
See how two nearly identical inputs produce completely different hashes.
Hash functions are public algorithms. Anyone can compute SHA-256 of anything. That is exactly the problem.
Hash functions create unique fingerprints of data. If the data changes, the fingerprint changes. This sounds like it solves integrity, but there is a catch.
If Alice sends Bob a message with its hash appended, and Eve sits in the middle, Eve can replace the message, compute a fresh hash of her modified version, and forward both to Bob. Bob checks the hash, sees it matches, and trusts the message. The hash gave integrity against accidents (bit flips, network errors) but zero protection against a deliberate attacker. What is missing is a secret.
SHA-256 does not see your message as one big blob. It breaks it into 64-byte blocks and processes them one at a time through a compression function. Understanding this pipeline is the key to understanding both why hash functions work and where they can be exploited.
A hash function needs to accept inputs of any size, but it can only mix a fixed number of bits at a time. The Merkle-Damgard construction solves this by turning the hash into a pipeline. The message is padded to a multiple of 64 bytes, then split into blocks. Each block is fed through a compression function together with the running state from the previous block.
The first block starts from a fixed initialization vector (IV), eight 32-bit numbers derived from the square roots of the first eight primes. After the last block, the state IS the hash output. SHA-256, SHA-1, and MD5 all follow this pattern.
Watch data flow through the SHA-256 pipeline, block by block.
The critical consequence: the hash output IS the internal state after the last block. If you know the hash, you know the state. And if you know the state, you can keep feeding in more blocks from that point forward. This fact will matter in the next section.
Your first instinct might be: take a hash function, prepend the secret key, and hash the whole thing. It sounds reasonable. It is dangerously broken.
If Alice computes tag = hash(key || message) and sends the message with the tag, Eve can do the following without knowing the key: take the tag (which is the internal hash state), resume the hash computation from that state, feed in additional data, and obtain hash(key || message || padding || extra_data). She now has a valid tag for a message Alice never sent.
This is not a theoretical weakness. Length extension attacks have been used against real APIs that used hash(key || message) for authentication. The Flickr API signing vulnerability in 2009 is one well-known example.
See why hash(key || message) is broken. Step through the attack.
Alice computes hash(key || message) and sends the message with its tag.
HMAC is the industry-standard MAC construction. It uses the same hash function you already know, but wraps it in a two-pass structure that eliminates length extension entirely.
HMAC starts by creating two derived keys from your original key. It XORs the key with a constant called ipad (0x36 repeated to fill a block) and another constant called opad (0x5c repeated). These two values never change. They are defined in the HMAC specification (RFC 2104).
The inner pass hashes (key XOR ipad) concatenated with the message. This produces a 32-byte intermediate result. The outer pass then hashes (key XOR opad) concatenated with that intermediate result. The output of the outer hash is the final HMAC tag.
Why does this stop length extension? Because the attacker only sees the outer hash output. To extend it, they would need the outer hash's internal state, but that state depends on the key XOR opad, which they do not know. The inner hash result is buried inside the outer computation, completely out of reach.
In a single formula:
The inner hash produces an intermediate digest. The outer hash seals it. An attacker who sees only the final output cannot extend either pass.
Watch HMAC-SHA-256 compute a tag one step at a time, with real intermediate values.
You have a message and its valid HMAC tag, but the key is secret. Can you produce a valid tag for a different message? Try as many times as you want.
Alice sent a message with a valid HMAC-SHA-256 tag. The key is secret. Your mission: produce a valid tag for a different message without knowing the key.
Hashes and MACs are two of the most deployed cryptographic primitives, working silently in passwords, downloads, APIs, and protocols across the internet.
Websites store hashes of your password, not the password itself. When you log in, they hash your input and compare.
Download a file and compare its hash to the published one. If they match, the file hasn't been tampered with.
Services like AWS sign every API request with HMAC. The server recomputes the MAC to verify the request came from someone with the secret key.
Every TLS record includes a MAC or AEAD tag. This prevents an attacker from silently modifying encrypted traffic between your browser and a server.
JSON Web Tokens use HMAC-SHA256 to sign claims. The server verifies the signature before trusting the token contents.
Web frameworks sign session cookies with HMAC. If a user tampers with their session data, the server detects the invalid tag and rejects the request.
A MAC does not provide confidentiality. The message itself travels in plaintext. Eve can still read it. She just cannot modify it without detection. Combining confidentiality and integrity in a single operation is the topic of a later chapter: Authenticated Encryption.