ly used in programming,but not all hash functions are suitable for use in cryptography. Cryptographic hash functions are hash functions that have several additional properties: Preimage resistance Given a hash,it's computationally unfeasible to find or construct a message that pro- duces it. Second preimage resistance Given a message and its hash,it's computationally unfeasible to find a different mes- sage with the same hash. Collision resistance It's computationally unfeasible to find two messages that have the same hash Hash functions are most commonly used as a compact way to represent and compare large amounts of data.For example,rather than compare two files directly(which might be diffi- cult,for example,if they are stored in different parts of the world),you can compare their hashes.Hash functions are often called fingerprints,message digests,or simply digests. The most commonly used hash function today is SHAl,which has output of 160 bits.Be- cause SHAI is considered weak,upgrading to its stronger variant,SHA256,is recommend ed.Unlike with ciphers,the strength of a hash function doesn't equal the hash length.Be- cause of the birthday paradox(a well-known problem in probability theory)the strength of a hash function is at most one half of the hash length. Message Authentication Codes A hash function could be used to verify data integrity,but only if the hash of the data is transported separately from the data itself.Otherwise,an attacker could modify both the message and the hash,easily avoiding detection.A message authentication code(MAC)or a keyd-ash is a cryptographic function that extends hashing with authentication.Only those in possession of the hashing key can produce a valid MAC. MACs are commonly used in combination with encryption.Even though Mallory can't de- crypt ciphertext,she can modify it in transit if there is no MAC;encryption provides confi- dentiality but not integrity.If Mallory is smart about how she's modifying ciphertext,she could trick Bob into accepting a forged message as authentic.When a MAC is sent along with ciphertext,Bob(who shares the hashing key with Alice)can be sure that the message has not been tampered with. Any hash function can be used as the basis for a MAC using a construction known as HMAC (short for hash-based message authentication code).3 In essence,HMAC works by interleaving the hashing key with the message in a secure way. 2Bimtdaypolen0mpede,retned6lune201 Keyed-Hashing or(K February 197) 10 Chapter 1:SSL,TLS,and Cryptography
ly used in programming, but not all hash functions are suitable for use in cryptography. Cryptographic hash functions are hash functions that have several additional properties: Preimage resistance Given a hash, it’s computationally unfeasible to find or construct a message that produces it. Second preimage resistance Given a message and its hash, it’s computationally unfeasible to find a different message with the same hash. Collision resistance It’s computationally unfeasible to find two messages that have the same hash. Hash functions are most commonly used as a compact way to represent and compare large amounts of data. For example, rather than compare two files directly (which might be difficult, for example, if they are stored in different parts of the world), you can compare their hashes. Hash functions are often called fingerprints, message digests, or simply digests. e most commonly used hash function today is SHA1, which has output of 160 bits. Because SHA1 is considered weak, upgrading to its stronger variant, SHA256, is recommended. Unlike with ciphers, the strength of a hash function doesn’t equal the hash length. Because of the birthday paradox (a well-known problem in probability theory),12 the strength of a hash function is at most one half of the hash length. Message Authentication Codes A hash function could be used to verify data integrity, but only if the hash of the data is transported separately from the data itself. Otherwise, an attacker could modify both the message and the hash, easily avoiding detection. A message authentication code (MAC) or a keyed-hash is a cryptographic function that extends hashing with authentication. Only those in possession of the hashing key can produce a valid MAC. MACs are commonly used in combination with encryption. Even though Mallory can’t decrypt ciphertext, she can modify it in transit if there is no MAC; encryption provides confidentiality but not integrity. If Mallory is smart about how she’s modifying ciphertext, she could trick Bob into accepting a forged message as authentic. When a MAC is sent along with ciphertext, Bob (who shares the hashing key with Alice) can be sure that the message has not been tampered with. Any hash function can be used as the basis for a MAC using a construction known as HMAC (short for hash-based message authentication code).13 In essence, HMAC works by interleaving the hashing key with the message in a secure way. 12 Birthday problem (Wikipedia, retrieved 6 June 2014) 13 RFC 2104: HMAC: Keyed-Hashing for Message Authentication (Krawczyk et al., February 1997) 10 Chapter 1: SSL, TLS, and Cryptography
Block Cipher Modes Block cipher modes are cryptographic schemes designed to extend block ciphers to encrypt data of arbitrary length.All block cipher modes support confidentiality,but some combine it with authentication.Some modes transform block ciphers to produce stream ciphers. There are many output modes,and they are usually referred to by their acronyms:ECB. CBC,CFB,OFB,CTR,GCM,and so forth.(Don't worry about what the acronyms stand for.)I will cover only ECB and CBC here:ECB as an example of how not to design a block cipher mode and CBC because it's still the main mode in SSL and TLS.GCM is a relatively new addition to TLS,available starting with version 1.2;it provides confidentiality and in- tegrity,and it's currently the best mode available. Electronic Codebook Mode Electronic Codebook (ECB)mode is the simplest possible block cipher mode.It supports on- ly data lengths that are the exact multiples of the block size;if you have data of different length,then you need to apply padding beforehand.To perform encryption,you split the data into chunks that match the block size and encrypt each block individually. The simplicity of ECB is its downside.Because block ciphers are deterministic (i.e.,they al- ways produce the same result when the input is the same),so is ECB.This has serious con- sequences:(1)patterns in ciphertext will appear that match patterns in plaintext;(2)the at- tacker can detect when a message is repeated;and(3)an attacker who can observe cipher- text and submit arbitrary plaintext for encryption (commonly possible with HTTP and in many other situations)can,given enough attempts,guess the plaintext.This is what the BEAST attack against TLS was about;I discuss it in the section called "BEAST"in Chap- ter 7. Cipher Block Chaining Mode Cipher Block Chaining(CBC)mode is the next step up from ECB.To address the determin- istic nature of ECB,CBC introduces the concept of the initialization vector(IV),which makes output different every time,even when input is the same. Building Blocks
Block Cipher Modes Block cipher modes are cryptographic schemes designed to extend block ciphers to encrypt data of arbitrary length. All block cipher modes support confidentiality, but some combine it with authentication. Some modes transform block ciphers to produce stream ciphers. ere are many output modes, and they are usually referred to by their acronyms: ECB, CBC, CFB, OFB, CTR, GCM, and so forth. (Don’t worry about what the acronyms stand for.) I will cover only ECB and CBC here: ECB as an example of how not to design a block cipher mode and CBC because it’s still the main mode in SSL and TLS. GCM is a relatively new addition to TLS, available starting with version 1.2; it provides confidentiality and integrity, and it’s currently the best mode available. Electronic Codebook Mode Electronic Codebook (ECB) mode is the simplest possible block cipher mode. It supports only data lengths that are the exact multiples of the block size; if you have data of different length, then you need to apply padding beforehand. To perform encryption, you split the data into chunks that match the block size and encrypt each block individually. e simplicity of ECB is its downside. Because block ciphers are deterministic (i.e., they always produce the same result when the input is the same), so is ECB. is has serious consequences: (1) patterns in ciphertext will appear that match patterns in plaintext; (2) the attacker can detect when a message is repeated; and (3) an attacker who can observe ciphertext and submit arbitrary plaintext for encryption (commonly possible with HTTP and in many other situations) can, given enough attempts, guess the plaintext. is is what the BEAST attack against TLS was about; I discuss it in the section called “BEAST” in Chapter 7. Cipher Block Chaining Mode Cipher Block Chaining (CBC) mode is the next step up from ECB. To address the deterministic nature of ECB, CBC introduces the concept of the initialization vector (IV), which makes output different every time, even when input is the same. Building Blocks 11
Figure 1.4.CBC mode encryption M M Message block 1 Message block 2 lessage block 3 Ciphertext block 1 Ciphertext block 2 Ciphertext block 3 The process starts by generating a random (and thus unpredictable)IV,which is the same length as the encryption block size.Before encryption,the first block of plaintext is com bined with the IV using XOR.This masks the plaintext and ensures that the ciphertext is always different.For the next encryption block,the ciphertext of the previous block is used as the IV,and so forth.As a result,all of the individual encryption operations are part of the same chain,which is where the mode name comes from.Crucially,the IV is transmitted on the wire to the receiving party,who needs it to perform decryption successfully. Asymmetric Encryption Symmetric encryption does a great job at handling large amounts of data at great speeds. but it leaves a lot to be desired as soon as the number of parties involved increases: Members of the same group must share the same key.The more people join a group, the more exposed the group becomes to the key compromise. For better security,you could use a different key for every two people,but this ap- proach doesn't scale.Although three people need only three keys,ten people would need 45(9++.+1)keys.A thousand people would need 499,500 keys! Symmetric encryption can't be used on unattended systems to secure data.Because the process can be reversed by using the same key,a compromise of such a system leads to the compromise of all data stored in the system Asymmetric encryption (also known as public-key cryptography)is a different approach to encryption that uses two keys instead of one.One of the keys is private;the other is public. As the names suggest,one of these keys is intended to be private,and the other is intended 12 Chapter 1:SSL,TLS,and Cryptography
Figure 1.4. CBC mode encryption M2 C2 /IV3 M3 C3 Message block 3 Encrypt Ciphertext block 3 Message block 2 Encrypt Ciphertext block 2 IV1 C1 /IV2 Ciphertext block 1 Encrypt M1 Message block 1 e process starts by generating a random (and thus unpredictable) IV, which is the same length as the encryption block size. Before encryption, the first block of plaintext is combined with the IV using XOR. is masks the plaintext and ensures that the ciphertext is always different. For the next encryption block, the ciphertext of the previous block is used as the IV, and so forth. As a result, all of the individual encryption operations are part of the same chain, which is where the mode name comes from. Crucially, the IV is transmitted on the wire to the receiving party, who needs it to perform decryption successfully. Asymmetric Encryption Symmetric encryption does a great job at handling large amounts of data at great speeds, but it leaves a lot to be desired as soon as the number of parties involved increases: • Members of the same group must share the same key. e more people join a group, the more exposed the group becomes to the key compromise. • For better security, you could use a different key for every two people, but this approach doesn’t scale. Although three people need only three keys, ten people would need 45 (9 + 8 + . . . + 1) keys. A thousand people would need 499,500 keys! • Symmetric encryption can’t be used on unattended systems to secure data. Because the process can be reversed by using the same key, a compromise of such a system leads to the compromise of all data stored in the system. Asymmetric encryption (also known as public-key cryptography) is a different approach to encryption that uses two keys instead of one. One of the keys is private; the other is public. As the names suggest, one of these keys is intended to be private, and the other is intended 12 Chapter 1: SSL, TLS, and Cryptography
to be shared with everyone.There's a special mathematical relationship between these keys that enables some useful features.If you encrypt data using someone's public key,only their corresponding private key can decrypt it.On the other hand,if data is encrypted with the private key anyone can use the public key to unlock the message.The latter operation doesn't provide confidentiality,but it does function as a digital signature. Figure 1.5.Asymmetric encryption Alice's public ke Alice's private key Encrypt Decrypt Original document BOB ALICE Original document Asymmetric encryption makes secure communication in large groups much easier.Assum- ing that you can securely share your public key widely (a job for PKI,which I discuss in Chapter 3,Public-Key Infrastructure),anyone can send you a message that only you can read.If they also sign that message using their private key,you know exactly whom it is from. Despite its interesting properties,public-key cryptography is rather slow and unsuitable fo use with large quantities of data.For this reason,it's usually deployed for authentication and negotiation of shared secrets,which are then used for fast symmetric encryption. RSA(named from the initials of Ron Rivest,Adi Shamir,and Leonard Adleman)is by far the most popular asymmetric encryption method deployed today.14 The recommended strength for RSA today is 2,048 bits,which is equivalent to about 112 symmetric bits.I'll discuss the strength of cryptography in more detail later in this chapter. Digital Signatures A digital signature is a cryptographic scheme that makes it possible to verify the authenticity of a digital message or document.The MAC,which I described earlier,is a type of digital signature;it can be used to verify authenticity provided that the secret hashing key is se curely exchanged ahead of time.Although this type of verification is very useful,it's limited because it still relies on a private secret key. RSA(Wikipedia,retrieved 2Jne2014) Building Blocks
to be shared with everyone. ere’s a special mathematical relationship between these keys that enables some useful features. If you encrypt data using someone’s public key, only their corresponding private key can decrypt it. On the other hand, if data is encrypted with the private key anyone can use the public key to unlock the message. e latter operation doesn’t provide confidentiality, but it does function as a digital signature. Figure 1.5. Asymmetric encryption Alice’s public key Alice’s private key BOB ALICE Encrypt Decrypt Original document Encrypted document Original document Asymmetric encryption makes secure communication in large groups much easier. Assuming that you can securely share your public key widely (a job for PKI, which I discuss in Chapter 3, Public-Key Infrastructure), anyone can send you a message that only you can read. If they also sign that message using their private key, you know exactly whom it is from. Despite its interesting properties, public-key cryptography is rather slow and unsuitable for use with large quantities of data. For this reason, it’s usually deployed for authentication and negotiation of shared secrets, which are then used for fast symmetric encryption. RSA (named from the initials of Ron Rivest, Adi Shamir, and Leonard Adleman) is by far the most popular asymmetric encryption method deployed today.14 e recommended strength for RSA today is 2,048 bits, which is equivalent to about 112 symmetric bits. I’ll discuss the strength of cryptography in more detail later in this chapter. Digital Signatures A digital signature is a cryptographic scheme that makes it possible to verify the authenticity of a digital message or document. e MAC, which I described earlier, is a type of digital signature; it can be used to verify authenticity provided that the secret hashing key is securely exchanged ahead of time. Although this type of verification is very useful, it’s limited because it still relies on a private secret key. 14 RSA (Wikipedia, retrieved 2 June 2014) Building Blocks 13
Digital signatures similar to the real-life handwritten ones are possible with the help of pub lic-key cryptography;we can exploit its asymmetric nature to devise an algorithm that al- lows a message signed by a private key to be verified with the corresponding public key. The exact approach depends on the selected public-key cryptosystem.For example,RSA can be used for encryption and decryption.If something is encrypted with a private RSA key, only the corresponding public key can decrypt it.We can use this property for digital sign ing if we combine it with hash functions: 1.Calculate a hash of the document you wish to sign;no matter the size of the input doc- ument,the output will always be fixed,for example,256 bits for SHA256. 2.Encode the resulting hash and some additional metadata.For example,the receiver will need to know the hashing algorithm you used before she can process the signa- ture. 3.Encrypt the encoded hash using the private key;the result will be the signature,which you can append to the document as proof of authenticity. To verify the signature,the receiver takes the document and calculates the hash indepen- dently using the same algorithm.Then,she uses your public key to decrypt the message and recover the hash,confirm that the correct algorithms were used,and compare with the de- rypted hash with the one she calculated.The strength of this signature scheme depends on the individual strengths of the encryption,hashing,and encoding components. Note Not all digital signature algorithms function in the same way as RSA.In fact,RSA is an exception,because it can be used for both encryption and digital signing. Other popular public key algorithms,such as DSA and ECDSA,can't be used for encryption and rely on different approaches for signing Random Number Generation In cryptography,all security depends on the quality of random number generation.You've already seen in this chapter that security relies on known encryption algorithms and secret keys.Those keys are simply very long random numbers. The problem with random numbers is that computers tend to be very predictable.They fol low instructions to the letter.If you tell them to generate a random number,they probably won't do a very good job.15 This is because truly random numbers can be obtained only by observing certain physical processes.In absence of that,computers focus on collecting small bilt-inandmer gee devices (sticks)thatcan beaddedto the erating system Chapter 1:SSL,TLS,and Cryptography
Digital signatures similar to the real-life handwritten ones are possible with the help of public-key cryptography; we can exploit its asymmetric nature to devise an algorithm that allows a message signed by a private key to be verified with the corresponding public key. e exact approach depends on the selected public-key cryptosystem. For example, RSA can be used for encryption and decryption. If something is encrypted with a private RSA key, only the corresponding public key can decrypt it. We can use this property for digital signing if we combine it with hash functions: 1. Calculate a hash of the document you wish to sign; no matter the size of the input document, the output will always be fixed, for example, 256 bits for SHA256. 2. Encode the resulting hash and some additional metadata. For example, the receiver will need to know the hashing algorithm you used before she can process the signature. 3. Encrypt the encoded hash using the private key; the result will be the signature, which you can append to the document as proof of authenticity. To verify the signature, the receiver takes the document and calculates the hash independently using the same algorithm. en, she uses your public key to decrypt the message and recover the hash, confirm that the correct algorithms were used, and compare with the decrypted hash with the one she calculated. e strength of this signature scheme depends on the individual strengths of the encryption, hashing, and encoding components. Note Not all digital signature algorithms function in the same way as RSA. In fact, RSA is an exception, because it can be used for both encryption and digital signing. Other popular public key algorithms, such as DSA and ECDSA, can’t be used for encryption and rely on different approaches for signing. Random Number Generation In cryptography, all security depends on the quality of random number generation. You’ve already seen in this chapter that security relies on known encryption algorithms and secret keys. ose keys are simply very long random numbers. e problem with random numbers is that computers tend to be very predictable. ey follow instructions to the letter. If you tell them to generate a random number, they probably won’t do a very good job.15 is is because truly random numbers can be obtained only by observing certain physical processes. In absence of that, computers focus on collecting small 15 Some newer processors have built-in random number generators that are suitable for use in cryptography. There are also specialized external devices (e.g., in the form of USB sticks) that can be added to feed additional entropy to the operating system. 14 Chapter 1: SSL, TLS, and Cryptography