Integrity

Integrity provides assurances that data has not changed. This includes ensuring that no one has modified, tampered with, or corrupted the data. Ideally, only authorized users modify data. However, there are times when unauthorized or unintended changes occur. This can be from unauthorized users, from malicious software (malware), and through system and human errors. When this occurs, the data has lost integrity.
Hashing
You can use hashing techniques to enforce integrity. A hash is simply a number created by executing a hashing algorithm against data, such as a file or message. If the data never changes, the resulting hash will always be the same. By comparing hashes created at two different times, you can determine if the original data is still the same. If the hashes are the same, the data is the same. If the hashes are different, the data has changed.
As an example, imagine Homer is sending a message to Marge and they both want assurances that the message retained integrity. Homer’s message is, “The price is $19.99.” He creates a hash of this message. For simplicity’s sake, imagine the hash is 123. He then sends both the message and the hash to Marge. Marge receives both the message and the hash. She can calculate the hash on the received message and compare her hash with the hash that Homer sent. If the hash of the received message is 123 (the same as the hash of the sent message), she knows the message hasn’t lost data integrity. However, if the hash of the received message is something different, such as 456, then she knows that the message she received is not the same as the message that Homer sent. Data integrity has been lost.
Hashing doesn’t tell you what modified the message. It only tells you that the message has been modified. This implies that the information should not be trusted as valid.
You can use hashes with messages, such as email, and any other type of data files. Some email programs use a message authentication code (MAC) instead of a hash to verify integrity, but the underlying concept works the same way.
You can also use hashing techniques to verify that integrity is maintained when files are downloaded or transferred. Some programs can automatically check hashes and determine if a file loses even a single bit during the download process. The program performing the download will detect it by comparing the source hash with the destination hash. If a program detects that the hashes are different, it knows that integrity has been lost and reports the problem to the user.
As another example, a web site administrator can calculate and post the hash of a file on a web site. Users can manually calculate the hash of the file after downloading it and compare the calculated hash with the posted hash. If a virus infects a file on the web server, the hash of the infected file would be different from the hash of the original file (and the hash posted on the web site). You can use freeware such as md5sum.exe to calculate MD5 hashes.
It’s also possible to lose data integrity through human error. As an example, if a database administrator needs to modify a significant amount of data in a database, the administrator can write a script to perform a bulk update. However, if the script is faulty, it can corrupt the database, resulting in a loss of integrity.
Two key concepts related to integrity are:
- Integrity provides assurances that data has not been modified, tampered with, or corrupted. Loss of integrity indicates the data is different. Unauthorized users can change data, or the changes can occur through system or human errors.
- Hashing verifies integrity. A hash is simply a numeric value created by executing a hashing algorithm against a message or file. Hashes are created at the source and destination or at two different times (such as on the first and fifteenth of the month). If the hashes are the same, integrity is maintained. If the two hashes are different, data integrity has been lost.
Digital Signatures, Certificates, and Non-Repudiation
You can also use digital signatures for integrity. A digital signature is similar in concept to a handwritten signature. Imagine you sign a one-page contract. Anyone can look at the contract later, see your signature, and know it is the same contract. It isn’t possible for other people to modify the words in the contract unless they can reproduce your signature, which isn’t easy to do.
It’s common to use digital signatures with email. For example, imagine that Lisa wants to send an email to Bart. She can attach a digital signature to the email and when Bart receives it, the digital signature provides assurances to him that the email has not been modified.
A digital signature also provides authentication. In other words, if the digital signature arrives intact, it authenticates the sender. Bart knows that Lisa sent it.
Authentication from the digital signature prevents attackers from impersonating others and sending malicious emails. For example, an attacker could make an email look like it came from Lisa and include a link to a malicious web site urging Bart to click it. Without a digital signature, Bart might be fooled into thinking that Lisa sent it and click the link. This might result in Bart inadvertently downloading malware onto his system.
Digital signatures also provide non-repudiation. In other words, Lisa cannot later deny sending the email because the digital signature proves she did. Another way of thinking about non-repudiation is with credit cards. If you buy something with a credit card and sign the receipt, you can’t later deny making the purchase. If you do, the store will use your signature to repudiate your claim. In other words, they use your signature for non-repudiation.
Security systems implement non-repudiation methods in other ways beyond digital signatures. Another example is with audit logs that record details such as who, what, when, and where. Imagine Bart logged on to a computer with his username and password, and then deleted several important files. If the audit log recorded these actions, it provides non-repudiation. Bart cannot believably deny he deleted the files.
Digital signatures require the use of certificates and a Public Key Infrastructure (PKI). Certificates include keys used for encryption and the PKI provides the means to create, manage, and distribute certificates.
See also Hashing.