7.4 Integrity

Think of the number of the times you've signed your name to a piece of paper during the last week. You sign checks, credit card statements, legal documents, and letters. Your signature attests to the fact that you (as opposed to someone else) have acknowledged and/or agreed with the document's contents. In a digital world, one often want to indicate the owner or creator of a document, or to signify one's agreement with a document's content. A digital signature is a cryptographic technique for achieving these goals in a digital world.

Just as with human signatures, digital signing should be done in such a way that a digital signatures are verifiable, non-forgible, and non-repudiable. That is, it must be possible to "prove" that a document signed by an individual was indeed signed by that individual (the signature must be verifiable) and that only that individual could have signed the document (the signature can not be forged, and a signer can not later repudiate or deny having signed the document). This is easily accomplished with public key cryptography.

7.4.1 Generating Digital Signatures

Suppose that Bob wants to digitally sign a "document," m. We can think of the document as a file or a message that Bob is going to sign and send. As shown in Figure 7.4-1, to sign this document, Bob simply uses his private decryption key, dB, to compute dB(m). At first, it might seem odd that Bob is running a decryption algorithm over a document that hasn't been encrypted. But recall that "decryption" is nothing more than a mathematical operation (exponentiation to the power of d in RSA; see section 7.2) and recall that Bob's goal is not to scramble or obscure the contents of the document, but rather to sign the document in a manner that is verifiable, non-forgible, and non-repudiable. Bob has the document, m, and his digital signature of the document, dB(m).

Creating a digital signature for a document
Figure 7.4-1: Creating a digital signature for a document.

Does the digital signature, dB(m), meet our requirements of being verifiable, non-forgible, and non-repudiable? Suppose Alice has m and dB(m). She wants to prove in court (being litigious) that Bob had indeed signed the document and was the only person who could have possibly signed the document. Alice takes Bob's public key, eB, and applies it to the digital signature, dB(m), associated with the document, m. That is, she computes eB(dB(m)), and voila, with a dramatic flurry, she produces m, which exactly matches the original document! Alice then argues that only Bob could have signed the document because:

It is also important to note that if the original document, m, is ever modified to some alternate form, m', the signature that Bob created for m will not be valid for m', since eB(dB(m)) does not equal m'.

Thus we see that public key cryptography techniques provide a simple and elegant way to digitally sign documents that is verifiable, non-forgible, and non-repudiable, and that protects against later modification of the document.

7.4.2 Message Digests

We have seen above that public key encryption technology can be used to create a digital signature. One concern with signing data by encryption, however, is that encryption and decryption are computationally expensive. When digitally signing a really important document, say a merger between two large multinational corporations or an agreement with a child to have him/her clean her room weekly, computational cost may not may be important. However, many network devices and processes (e.g., routers exchanging routing table information and email user agents exchanging email) routinely exchange data that may not need to be encrypted. Nonetheless, they do want to ensure that:

Given the overheads of encryption and decryption, signing data via complete encryption/decryption can be overkill. A more efficient approach using so-called message digests can accomplish these two goals without full message encryption.

A message digest is in many ways like a checksum. Message digest algorithms take a message, m, of arbitrary length and compute a fixed length "fingerprint" of the data known as a message digest, H(m). The message digest protects the data in the sense that if m is changed to m' (either maliciously or by accident) then H(m), computed for the original data (and transmitted with that data), will not match the H(m) computed over the changed data. While the message digest provides for data integrity, how does it help with signing the message m? The goal here is that rather than having Bob digitally sign (encrypt) the entire message by computing dB(m), he should be able to sign just the message digest by compting dB(H(m)). That is, having m and dB(H(m)) together (note that m is not typically encrypted) should be "just as good as" having a signed complete message, dB(m); this means that m and dB(H(m)) together should be non-forgible, verifiable, and non-repudiable. Nonforgible will require that the message digest algorithm that computes the message digest have some special properties, as we will see below.

Hash functions are used to create message digests
Figure 7.4-2: Hash functions are used to create message digests.

Our definition of a message digest may seem quite similar to the definition of a checksum (e.g., the Internet checksum, see section 4.4) or a more powerful error detection code such as a cyclic redundancy check (see section 5.1). Is it really any different? Checksums, cyclic redundancy checks, and message digests are all examples of so-called hash functions. As shown in Figure 7.4-2, a hash function takes an input, m, and computes a fixed-size string known as a hash. The Internet checksum, CRC's and message digests all meet this definition. If signing a message digest is going to be "just as good as" signing the entire message, in particular if it is going to satisfy the non-forgibility requirement, then a message digest algorithm must have the following additional properties:

  1. Given a message digest value, x, it is computationally infeasible to find a message, y, such that H(y) = x;

  2. It is computationally infeasible to find any two messages x and y such that H(x) = H(y).

Informally, these two properties mean that it is computationally infeasible for an intruder to substitute one message for another message that is protected by a message digest. That is, if (m,H(m)) are the message and message digest pair created by the sender, then an intruder can not forge the contents of another message, y, that has the same message digest value as the original message. When Bob signs m by computing dB(H(m)), we know that no other message can be substituted for m. Furthermore, Bob's digital signature of H(m) uniquely identifies Bob as the verifiable, non-repudiable signer of H(m) (and as a consequence, m as well) as discussed above in section 7.4.1.

Sending a digitally signed message
Figure 7.4-3: Sending a digitally signed message.

In the context of Bob sending a message to Alice, Figure 7.4-3 provides a summary of the operational procedure of creating a digital signature. Bob puts his original long message through a hash function to create a messge digest. He then encrypts the message digest with his own private key. The original message (in clear text) along with the digitally signed message digest (henceforth referred to as the digital signature) is then sent to Alice. Figure 7.4-4 provides a summary of the operational procedure of verifying message integrity. Alice applies the Bob's public key to the message to recover the message digest. Alice also applies the hash function to the clear text message to obtain a second message digest. If the two message digests match, then the recipientAlice can be sure about the integrity of the message, and sure that Bob sent the message.

Verifying the integretiy of a signed message
Figure 7.4-4: Verifying the integrity of a signed message.

7.4.3 Hash Function Algorithms

Let's convince ourselves that a simple checksum, such as the Internet checksum, would make a poor message digest algorithm. Rather than performing 1 complement's arithmetic (as in the Internet checksum), let us compute a checksum by treating each character as a byte and adding the bytes together using 4-byte chunks at a time. Suppose Bob owes Alice $100.99" and sends an IOU to Alice consisting of the text string "IOU100.99BOB". The ASCII representation (in hexadecimal notation) for these letters is 49, 4F, 55, 31, 30, 30, 2E, 39, 39, 42, 4F, 42.

Figure 7.4-5 (top) shows that the 4-byte checksum for this message is B2 C1 D2 AC. A slightly different message (and a much more costly one for Bob) is shown in the bottom half of Figure 7.5-1. The message "IOU100.99BOB" and "IOU900.19BOB" have the same checksum! Thus, this simple checksum algorithm violates the two required requirements above. Given the original data, it is simple to find another set of data with the same checksum. Clearly, for security purposes. we are going to need a more powerful hash function than a checksum.

Figure 7.4-5: Initial message and fraudulent message have the same checksum!

message ASCII
I O U 1 49 4F 55 31
0 0 . 9 30 30 2E 39
9 B O B 39 42 4F 42
B2 C1 D2 AC checksum
message ASCII
I O U 9 49 4F 55 39
0 0 . 1 30 30 2E 31
9 B O B 39 42 4F 42
B2 C1 D2 AC checksum

The MD5 message digest algorithm by Ron Rivest [RFC 1321] is in wide use today. It computes a 128-bit message digest in a four-step process consisting of a padding step (adding a 1 followed by enough zero's so that the length of the message satisfies certain conditions), an append step (appending a 64-bit representation of the message length before padding), an initialization of an accumulator, and a final looping step in which the message's 16-word blocks are processed (mangled) in four rounds of processing. It is not known whether MD5 actually satisfies the requirements listed above. The author of MD5 claims "It is conjectured that the difficulty of coming up with two messages having the same message digest is on the order of 264 operations, and that the difficulty of coming up with any message having a given message digest is on the order of 2128 operations. "[RFC 1321]. No one has argued with this claim. For a description of MD5 (including a C source code implementation) see [RFC 1321]. Computational aspects of MD5 are discussed in [RFC 1810].

The second major message digest algorithm in use today is SHA-1, the Secure Hash Algorithm [FIPS 1995]. This algorithm is based on principles similar to those used in the design of MD4 [RFC 1320], the predecessor to MD5. The Secure Hash Algorithm (SHA-1), a US federal standard, is required for use whenever a secure message digest algorithm is required for federal applications. It produces a 160-bit message digest.