Hashing is an algorithm performed on data such as a file or message to produce a number called a hash (sometimes called a checksum). The hash is used to verify that data is not modified, tampered with, or corrupted. In other words, you can verify the data has maintained integrity.
A key point about a hash is that no matter how many times you execute the hashing algorithm against the data, the hash will always be the same if the data is the same.
Hashes are created at least twice so that they can be compared. As an example, imagine a software company is releasing a patch for an application that customers can download. They can calculate the hash of the patch and post both a link to the patch file and the hash on the company site. They might list it as:
- Patch file. Patch_v2_3.zip
- SHA-1 checksum. d4723ac6f72daea2c7793ac113863c5082644229
The Secure Hash Algorithm 1 (SHA-1) checksum is the calculated hash displayed in hexadecimal. Customers can download the file and then calculate the hash on the downloaded file. If the calculated hash is the same as the hash posted on the web site, it verifies the file has retained integrity. In other words, the file has not changed.
Hashing Algorithms are:
Message Digest 5 (MD5) is a common hashing algorithm that produces a 128-bithash. Hashes are commonly shown in hexadecimal format instead of a stream of 1s and 0s. For example, an MD5 hash is displayed as 32 hexadecimal characters instead of 128 bits. Hexadecimal characters are composed of 4 bits and use the numbers 0 through 9 and the characters a through f.
Secure Hash Algorithm (SHA) is another hashing algorithm. There are several variations of SHA grouped into four families—SHA-0, SHA-1, SHA-2, and SHA-3:
- SHA-0 is not used.
- SHA-1 is an updated version that creates 160-bit hashes. This is similar to the MD5 hash except that it creates 160-bit hashes instead of 128-bit hashes.
- SHA-2 improved SHA-1 to overcome potential weaknesses. It includes four versions. SHA-256 creates 256-bit hashes and SHA-512 creates 512-bit hashes. SHA-224 (224-bit hashes) and SHA-384 (384-bit hashes) create truncated versions of SHA-256 and SHA- 512, respectively.
- SHA-3 (previously known as Keccak) is an alternative to SHA-2. The S. National Security Agency (NSA) created SHA-1 and SHA-2. SHA-3 was created outside of the NSA and was selected in a non-NSA public competition. It can create hashes of the same size as SHA-2 (224 bits, 256 bits, 384 bits, and 512 bits).
Another method used to provide integrity is with a Hash-based Message Authentication Code (HMAC). An HMAC is a fixed-length string of bits similar to other hashing algorithms such as MD5 and SHA-1 (known as HMAC-MD5 and HMAC-SHA1). However, HMAC also uses a shared secret key to add some randomness to the result and only the sender and receiver know the secret key.
As an example, imagine that one server is sending a message to another server using HMAC- MD5. It starts by first creating a hash of a message with MD5 and then uses a secret key to complete another calculation on the hash. The server then sends the message and the HMAC-MD5 hash to the second server. The second server performs the same calculations and compares the received HMAC-MD5 hash with its result. Just as with any other hash comparison, if the two hashes are the same, the message retained integrity, but if the hashes are different, the message lost integrity.
RACE Integrity Primitives Evaluation Message Digest (RIPEMD) is another hash function used for integrity. It isn’t as widely used as MD5, SHA, and HMAC.
Many applications calculate and compare hashes automatically without any user intervention. For example, digital signatures use hashes within email, and email applications automatically create and compare the hashes.
Additionally, there are several applications you can use to manually calculate hashes. As an example, sha1sum.exe is a free program anyone can use to create hashes of files. A Google search on “download sha1sum” will show several locations. It runs the SHA-1 hashing algorithm against a file to create the hash.
It’s worth stressing that hashes are one-way functions. In other words, you can calculate a hash on a file or a message, but you can’t use the hash to reproduce the original data. The hashing algorithms always create a fixed-size bit string regardless of the size of the original data. The hash doesn’t give you a clue about the size of the file, the type of the file, or anything else.
As an example, the SHA-1 hash from the message “I will pass the Security+ exam” is: 765591c4611be5e03bea41882ffdaa159352cf49. However, you can’t look at the hash and identify the message, or even know that it is a hash of a six-word message.
Passwords are often stored as hashes. When a user creates a new password, the system calculates the hash for the password and then stores the hash. Later, when the user authenticates by entering a username and password, the system calculates the hash of the entered password, and then compares it with the stored hash. If the hashes are the same, it indicates that the user entered the correct password.
Key stretching (sometimes called key strengthening) is a technique used to increase the strength of stored passwords and can help thwart brute force and rainbow table attacks. Key stretching techniques salt the passwords with additional random bits to make them even more complex. Two common key stretching techniques are bcrypt and Password-Based Key Derivation Function 2 (PBKDF2).
Bcrypt is based on the Blowfish block cipher and is used on many Unix and Linux distributions to protect the passwords stored in the shadow password file. Bcrypt salts the password by adding additional random bits before encrypting it with Blowfish. Bcrypt can go through this process multiple times to further protect against attempts to discover the password. The result is a 60-character string.
As an example, if your password is IL0ve$ecurity, an application can encrypt it with bcrypt and a random salt. It might look like this, which the application stores in a database:
Later, when a user authenticates with a username and password, the application runs bcrypt on the supplied password and compares it with the stored bcrypt-encrypted password. If the bcrypt result of the supplied password is the same as the stored bcrypt result, the user is authenticated.
As an added measure, it’s possible to add some pepper to the salt to further randomize the bcrypt string. In this context, the pepper is another set of random bits stored elsewhere.
PBKDF2 uses salts of at least 64 bits and uses a pseudo-random function such as HMAC to protect passwords. Many algorithms such as Wi-Fi Protected Access II (WPA2), Apple’s iOS mobile operating system, and Cisco operating systems use PBKDF2 to increase the security of passwords. Some applications send the password through the PBKDF2 process as many as 1,000,000 times to create the hash. The size of the resulting hash varies with PBKDF2 depending on how it is implemented. Bit sizes of 128 bits, 256 bits, and 512 bits are most common.
Some security experts believe that PBKDF2 is more susceptible to brute force attacks than bcrypt. A public group created the Password Hashing Competition (PHC). They received and evaluated 24 different hashing algorithms as alternatives. In July 2015, the PHC selected Argon2 as the winner of the competition and recommended it be used instead of legacy algorithms such as PBKDF2.
Hashing provides integrity for messages. It provides assurance to someone receiving a message that the message has not been modified. Imagine that Lisa is sending a message to Bart. The message is “The price is $75.” This message is not secret, so there is no need to encrypt it. However, we do want to provide integrity, so this explanation is focused only on hashing.
In this example, something modified the message before it reaches Bart. When Bart receives the message and the original hash, the message is now “The price is.75.” Note that the message is modified in transit, but the hash is not modified. A program on Bart’s computer calculates the MD5 hash on the received message as 564294439E1617F5628A3E3EB75643FE. It then compares the received hash with the calculated hash:
- Hash created on Lisa’s computer, and received by Bart’s computer: D9B93C99B62646ABD06C887039053F56
- Hash created on Bart’s computer: 564294439E1617F5628A3E3EB75643FE
Clearly, the hashes are different, so you know the message lost integrity. The program on Bart’s computer would report the discrepancy. Bart doesn’t know what caused the problem. It could have been a malicious attacker changing the message, or it could have been a technical problem. However, Bart does know the received message isn’t the same as the sent message and he shouldn’t trust it.
You might have noticed a problem in the explanation of the hashed message. If an attacker can change the message, why can’t the attacker change the hash, too? In other words, if Hacker Harry changed the message to “The price is .75,” he could also calculate the hash on the modified message and replace the original hash with the modified hash. Here’s the result:
- Hash created on Lisa’s computer:D9B93C99B62646ABD06C887039053F56
- Modified hash inserted by attacker after modifying the message: 564294439E1617F5628A3E3EB75643FE
- Hash created for modified message on Bart’s computer: 564294439E1617F5628A3E3EB75643FE
The calculated hash on the modified message would be the same as the received hash. This erroneously indicates that the message maintained integrity. HMAC helps solve this problem.
With HMAC, both Lisa and Bart’s computers would know the same secret key and use it to create an HMAC-MD5 hash instead of just an MD5 hash.
See also Integrity.