ForWhoseEyes Only?CryptanalysisandFrequencyAnalysisDavidCochranIntroductionCryptanalysis,theartof breaking codes,hasproventobea combat multiplierthroughouthistory.Byreadingencryptedenemymessagetraffic,commandersoftengainimmediateintelligenceontheenemy'sdisposition,strengthsandintent.DuringWorld War ll, manyallied successeswere attributedtotheadvantagesgainedbycrackingboththeGermanEnigmaandJapanesePURPLEciphersystems.FurtheradvancesinthefieldofcryptanalysiswilllikewisegivefutureAmericancommandersadistinctadvantageonthemodernbattlefield.TheADFGVXCipherBy 1917,the warring nations in World WarIwere facingmanpower shortagesdue to the large-scale slaughter on the battlefields. The United States declaredwaronGermanyand joinedtheAlliesonApril 6,1917.On26June1917,thefirstlargecontingentofAmericantroopsarrived inFrance.WithfreshAmericanforcesarrivingregularlyinFrance,everyonerealizedtheAmericanswouldsoonbringa numerical advantagetotheAllies.TheyalsoknewtherewouldbesometimebeforesufficientnumbersofAmericanswouldbeavailabletobreakthestalemateontheWesternFront.The success of the Bolsheviks in Russia and the resulting separate peace inDecember1917allowedtheGermanstomoveforcesfromtheEasternFronttoFrance.ThechangeinforceratiosallowedtheGermanstogainatemporarynumerical superiorityintheWest.Theymadea surpriseattackonMarch211918 and madesignificant gains.The attack threatened to split the British andFrencharmiesandbroughtPariswithinrangeof Germany'sguns.Moreimportantly,theattackdepletedthepreviouslystrainedmanpowerresourcesofboththeBritishandFrench.OnMay27,inanefforttoreachParis,theGermansattackedagainbetweenReimsandSoissons.Theattackingforcespenetratedtheallied linestoadepthoffifteenmilesandweresosuccessful thattheFrenchgovernmentpreparedtoleaveParis.TheAlliedHighCommandknewtheKaiser'sforceswouldattempttoexploittheirsuccesswithyetanothermajolattack, but only had sufficient reserves to bolsterthe defenses in one sector.Discoveringthe locationandtimingofthenextlargeattackbecomecritical.[1]InMarch1918theGermansimplementedanowfamouscipher-theADFGXcipher-toencryptcommunicationsbetweentheircorps anddivisionlevel fieldheadquarters. This new cipher used only the letters A, D,F, G and X andresisted all attemptsat cryptanalysis.This cipher contributedtothetactical205
205 For Whose Eyes Only? Cryptanalysis and Frequency Analysis David Cochran Introduction Cryptanalysis, the art of breaking codes, has proven to be a combat multiplier throughout history. By reading encrypted enemy message traffic, commanders often gain immediate intelligence on the enemy's disposition, strengths and intent. During World War II, many allied successes were attributed to the advantages gained by cracking both the German Enigma and Japanese PURPLE cipher systems. Further advances in the field of cryptanalysis will likewise give future American commanders a distinct advantage on the modern battlefield. The ADFGVX Cipher By 1917, the warring nations in World War I were facing manpower shortages due to the large-scale slaughter on the battlefields. The United States declared war on Germany and joined the Allies on April 6, 1917. On 26 June 1917, the first large contingent of American troops arrived in France. With fresh American forces arriving regularly in France, everyone realized the Americans would soon bring a numerical advantage to the Allies. They also knew there would be some time before sufficient numbers of Americans would be available to break the stalemate on the Western Front. The success of the Bolsheviks in Russia and the resulting separate peace in December 1917 allowed the Germans to move forces from the Eastern Front to France. The change in force ratios allowed the Germans to gain a temporary numerical superiority in the West. They made a surprise attack on March 21, 1918 and made significant gains. The attack threatened to split the British and French armies and brought Paris within range of Germany's guns. More importantly, the attack depleted the previously strained manpower resources of both the British and French. On May 27, in an effort to reach Paris, the Germans attacked again between Reims and Soissons. The attacking forces penetrated the allied lines to a depth of fifteen miles and were so successful that the French government prepared to leave Paris. The Allied High Command knew the Kaiser's forces would attempt to exploit their success with yet another major attack, but only had sufficient reserves to bolster the defenses in one sector. Discovering the location and timing of the next large attack become critical. [1] In March 1918, the Germans implemented a now famous cipher - the ADFGX cipher - to encrypt communications between their corps and division level field headquarters. This new cipher used only the letters A, D, F, G and X and resisted all attempts at cryptanalysis. This cipher contributed to the tactical
surpriseachievedbytheGermansduringthe initial springoffensive.AFrenchcryptanalyst,GeorgesPanvin,attackedthenewcipherusingfrequencyanalysisBytheendofAprilhewasabletoreadsomeofthemessagesprotectedbythiscipher, buthe neverachieved ageneral solution.On June1,Panvin noticedthattheGermans had slightlychanged thecipherbyincluding theletterV.PanvinusedfrequencyanalysisoncemoreandwasabletodeciphersomemessagesprotectedbythisnewcipherbyJune2.Hedisseminatedthekeyheuncoveredto the otherFrench cryptanalysts.On June3,one of theFrench cryptanalysts,usingthekeyprovidedbyPanvin,decipheredan interceptedGermanmessagethatprovidedthefirsthintastothelocationandtimingoftheanticipatedGermanoffensive.Whenthenewattack came,therewas nosurpriseandthe Germanswere not as successful as in their previous attempts. Georges Panvin and themethod of frequency analysis helped saveFrance.[2]IntroductiontoCipherSystemsKoblitz defines cryptography as “the study of methods of sending messages indisquisedformsothatonlytheintendedrecipientscanremovethedisguiseandread themessage."[3] Aciphersystemconsistsofanencipheringmap f andits inverse f-1 for deciphering. Often both f and f-l depend on an encryptionkey,aparameterwhichmaychangetheencryptionmap in someway.Theencipheringmap f takestheplaintext,themessagewearedisguising,andtransforms it tothe ciphertext,the disguised message.Likewise,f-1takes theciphertext and transforms it back to the plaintext.PlainTextf→CipherTextf→PlainTextTherearetwogeneral classesof cipheringsystems.Thefirstisasymmetricciphering system.In a symmetric ciphering systemboth the senderand receivermusthavethesamekeyandencryptionalgorithmtoscrambleandunscramblethemessagetraffic.Anexampleofasymmetric ciphering systemisthesimplesubstitution method discussed later in this paper.The second class isa publickeycipheringsystem.Inapublickeycipheringsystem,the sender (andeveryone else)has access tothe receiver's publicencryptionkeyandcanusethepublickeytoencipherthemessagetothereceiver.Thereceiver alonehas access to the secret decipheringkeyAnexampleofaPublicKeyencryptionsystemistheRivestShamirAdleman(RSA)encryptionalgorithm.IntheRSAsystem,auserchoosestwoprime numbers, p and q, and calculates n= pq . Additionally, the usercalculatesarandomnumberewhichhasnofactorsincommonwith(p-1)(q-1).Theuserthenmakesthetwonumberseandnpublic.Apersonwhowishestocommunicatesecretlywiththeuserbreaksthemessageinplaintext message blocks p, and calculates the ciphertext c, by the formula206
206 surprise achieved by the Germans during the initial spring offensive. A French cryptanalyst, Georges Panvin, attacked the new cipher using frequency analysis. By the end of April he was able to read some of the messages protected by this cipher, but he never achieved a general solution. On June 1, Panvin noticed that the Germans had slightly changed the cipher by including the letter V. Panvin used frequency analysis once more and was able to decipher some messages protected by this new cipher by June 2. He disseminated the key he uncovered to the other French cryptanalysts. On June 3, one of the French cryptanalysts, using the key provided by Panvin, deciphered an intercepted German message that provided the first hint as to the location and timing of the anticipated German offensive. When the new attack came, there was no surprise and the Germans were not as successful as in their previous attempts. Georges Panvin and the method of frequency analysis helped save France. [2] Introduction to Cipher Systems Koblitz defines cryptography as “the study of methods of sending messages in disguised form so that only the intended recipients can remove the disguise and read the message.” [3] A cipher system consists of an enciphering map f and its inverse - 1 f for deciphering. Often both f and - 1 f depend on an encryption key, a parameter which may change the encryption map in some way. The enciphering map f takes the plaintext, the message we are disguising, and transforms it to the ciphertext, the disguised message. Likewise, - 1 f takes the ciphertext and transforms it back to the plaintext. There are two general classes of ciphering systems. The first is a symmetric ciphering system. In a symmetric ciphering system both the sender and receiver must have the same key and encryption algorithm to scramble and unscramble the message traffic. An example of a symmetric ciphering system is the simple substitution method discussed later in this paper. The second class is a public key ciphering system. In a public key ciphering system, the sender (and everyone else) has access to the receiver's public encryption key and can use the public key to encipher the message to the receiver. The receiver alone has access to the secret deciphering key. An example of a Public Key encryption system is the Rivest Shamir Adleman (RSA) encryption algorithm. In the RSA system, a user chooses two prime numbers, p and q , and calculates n = pq . Additionally, the user calculates a random number e , which has no factors in common with ( p - 1)(q - 1). The user then makes the two numbers e and n public. A person who wishes to communicate secretly with the user breaks the message in plaintext message blocks i p and calculates the ciphertext i c by the formula Plain Text Cipher Text Plain Text f f ¾ ® ¾ ¾ ® - 1
C,=p,modnwhereeandnarethenumberspreviouslymadepublictoeveryone.Todecryptthemessagetheusercalculatestheprivatekeyd,bytherelationship d=e-'mod(p-1)(q-1),and decrypts the message using therelationship p,=cmodn.The security of the RSA system lies in the difficulty offactoring the number n into its prime factors p and q.This is a difficultmathematicalproblemwhenn islarge.[4]Example1:EncryptthemessageATTACKusingtheRSAciphersystem,whenp=13and q=17.Solution: n= pq=(13)(17)= 221. Choose a random number e with no factors incommon with (p-1)-(q-1)=(12)(16)=192 = 26 .3. Let e = 25. The public keyconsistsofthetwonumbersn=221ande=25.Everyonehasaccesstothesetwonumbers.Toencryptthemessage A-T-T-A-C-K,assign each of theplaintextmessageblocks thecorresponding numberforeachletterintheplaintext.ThenP, = A =1, P, = T = 20, etc. The message is then encrypted as:C, = p, mod(n)C = 125 mod(221) = 1C2 = 2025 mod(221)= 150C = 2025 mod(221) =150C4 = 125 mod(221) = 1Cs = 32 mod(221) = 133Cg = 1125 mod(221) = 193Themessage1-20-20-1-3-11isencryptedas1-150-150-1-133-193To decrypt the message again, we calculatetheprivatekeyd =25-'mod(192),d =169, remembering the intended recipient alone knows thisnumber.Themessageisdecryptedasfollows:P, =c mod(221)Pi =1169 mod(221) = 1P2 =150169 mod(221) = 20etc.ThesecurityoftheRSAciphersystemisgreatlyenhancedwhenlargeprimenumbers areused andthe sizeof themessageblock is increased.Atthecurrentlevelofmathematicalknowledge,anumber100digitslongmaybefactored inminutes,while anumber200 digits longmaytake years tofactor.Because of207
207 c p n e i = i mod where e and n are the numbers previously made public to everyone. To decrypt the message the user calculates the private key d , by the relationship mod(( 1)( 1)) 1 = - - - d e p q , and decrypts the message using the relationship p c n d i = i mod . The security of the RSA system lies in the difficulty of factoring the number n into its prime factors p and q . This is a difficult mathematical problem when n is large. [4] Example 1: Encrypt the message ATTACK using the RSA cipher system, when p = 13 and q = 17. Solution: n = pq = (13)(17) = 221. Choose a random number e with no factors in common with ( 1) ( 1) (12)(16) 192 2 3 6 p - × q - = = = × . Let e = 25 . The public key consists of the two numbers n = 221 and e = 25 . Everyone has access to these two numbers. To encrypt the message A-T-T-A-C-K, assign each of the plaintext message blocks the corresponding number for each letter in the plaintext. Then 1 p1 = A = , 20 p2 = T = , etc. The message is then encrypted as: 11 mod(221) 193 3 mod(221) 133 1 mod(221) 1 20 mod(221) 150 20 mod(221) 150 1 mod(221) 1 mod( ) 25 6 25 5 25 4 25 3 25 2 25 1 = = = = = = = = = = = = = c c c c c c c p n e i i The message 1 - 20 - 20 - 1- 3 - 11 is encrypted as 1 - 150 - 150 - 1- 133 - 193. To decrypt the message again, we calculate the private key 25 mod(192), 169 1 = = - d d , remembering the intended recipient alone knows this number. The message is decrypted as follows: . 150 mod(221) 20 1 mod(221) 1 mod(221) 169 2 169 1 etc p p p c d i i = = = = = The security of the RSA cipher system is greatly enhanced when large prime numbers are used and the size of the message block is increased. At the current level of mathematical knowledge, a number 100 digits long may be factored in minutes, while a number 200 digits long may take years to factor. Because of
thisfact,inpracticethis algorithmisoftenusedwithprimenumbers,pandqover 100digits long.Primenumbers ofthissizewillgenerateanumber n thatmay take years to factor.Substitution CiphersThesimplestsymmetricciphersystemisasubstitutionsystem.Inasubstitutionciphersystem,plaintextsymbolsaresubstitutedone-for-onewithencryptedsymbols.Inthesimplestexampleanotherletterrepresents eachletterof thealphabet.For example,considerplaintextmessagetrafficthat consistsonlyofthe letters in the alphabet. An enciphering map and corresponding decipheringmap,f,maybedefinedbythetable:ABCDEFGHIJKLMNOPQRSTUVWXYZMATHBCDEFGIJKLNOPQRSUVWXYZThen the plain text message: ATTACK NOW is encrypted as the cipher textMSSMTILNW.Example2:Howmanydifferentencipheringmapscanbecreatedfroma26-letteralphabet?(Atleastonelettermustbeencrypted.)Ingeneralhowmanyencipheringmapsaretherefornsymbols?Howmanydifferent encipheringmapsarepossiblefornsymbolswhereeverysymbolismappedtoadifferentsymbol?Solution: There are 26 symbols to choose for the first letter, 25 symbols tochoose for the second symbol, 24 for the third, etc. Then there are(26·25.....2·1)1= 26l126l1= 4.033×1026ways to create a substitution enciphering map with 26 symbols.Ingeneral, thereare n!-1 ways to create a substitutionenciphering map withn symbols when atleastonesymbolmustbeencrypted.Ifeverylettermustbeencrypted,theproblemis a derangement andthereare approximately n!/e ways to createdifferentsubstitutionalphabets.Note:Aderangementisthegeneral problemofscrambling n elementswithspecified positions so that no element stays in its original position. Let D,representthetotal numberofpermutationsofarranging nitemssuchthatnoitemis in its original position. Clearly if n =1 then D, =0. If n =2, then D, =1for (1, 2) can be arranged only as [2, 1) when no element is placed in its originalposition. It can be shown that for n ≥3, then D, =(n-1)(Dn-2 + Dn-1) and we cancalculate D, =(3-1)-(0+1)=2 and D4 =(4-1)·(1+2)=9. Additionally, it can be208
208 this fact, in practice this algorithm is often used with prime numbers, p and q , over 100 digits long. Prime numbers of this size will generate a number n that may take years to factor. Substitution Ciphers The simplest symmetric cipher system is a substitution system. In a substitution cipher system, plaintext symbols are substituted one-for-one with encrypted symbols. In the simplest example another letter represents each letter of the alphabet. For example, consider plaintext message traffic that consists only of the letters in the alphabet. An enciphering map and corresponding deciphering map, f, may be defined by the table: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z M A T H B C D E F G I J K L N O P Q R S U V W X Y Z Then the plain text message: ATTACK NOW is encrypted as the cipher text MSSMTI LNW. Example 2: How many different enciphering maps can be created from a 26- letter alphabet? (At least one letter must be encrypted.) In general how many enciphering maps are there for n symbols? How many different enciphering maps are possible for n symbols where every symbol is mapped to a different symbol? Solution: There are 26 symbols to choose for the first letter, 25 symbols to choose for the second symbol, 24 for the third, etc. Then there are ways to create a substitution enciphering map with 26 symbols. In general, there are n!- 1 ways to create a substitution enciphering map with n symbols when at least one symbol must be encrypted. If every letter must be encrypted, the problem is a derangement and there are approximately n!/ e ways to create different substitution alphabets. Note: A derangement is the general problem of scrambling n elements with specified positions so that no element stays in its original position. Let Dn represent the total number of permutations of arranging n items such that no item is in its original position. Clearly if n = 1 then 0 D1 = . If n = 2 , then 1 D2 = for {1, 2} can be arranged only as {2, 1} when no element is placed in its original position. It can be shown that for n ³ 3 , then ( 1)( ) - 2 - 1 = - + n Dn Dn D n and we can calculate (3 1) (0 1) 2 D3 = - × + = and (4 1) (1 2) 9 D4 = - × + = . Additionally, it can be 26 26! 1 4.033 10 (26 25 2 1) 1 26! 1 - = ´ × ×L × × - = -
Dshownthat-Theresultthattheren!(n +1)!(n+2)!areapproximatelyn!/ewaystocreatedifferentsubstitutionalphabetsthenfollows. [5]Atfirstglance,theproblemoffindingthecorrectarrangementtodecipherthemessagemayseemtobeadauntingchallenge,butwecanusecluesfromthemessageitselftosimplifythetask.Noticeintheplaintextmessagefromthesubstitutionexampleabove,thesymbolsTandAbothoccurtwice.Likewisethecorresponding symbols in the ciphertext, M and S both occur twice. This givesuscluesthat wemaybeabletouseunderlyingstatistical characteristicsoftheplaintexttofindinformationaboutourciphertext.FrequencyAnalysisAs discovered above,someciphersystemsmaylendthemselvestostatisticalmethodsofcryptanalysis.The idea of using theunderlying frequencydistribution ofa language toaid in the attempt to readencryptedmessagetrafficisfirstattributedtoanArabnamed Qalgashandi earlyinthe15thCentury.[6] Sincehistime,frequencyanalysisis often the first stepundertakenbycryptanalystsin the attempt to read theenemy'sencryptedmessagetraffic.IntheEnglishlanguage,theletter'e'isthemostfrequentlyoccurringletter.Infact,usingan electronicversion of Melville's Moby-Dick,prepared byProfessorEugeneF.IreyattheUniversityofColorado[7],Icalculated thefollowingfrequencies andassociated probabilities:209
209 shown that + L + + - + = + - - + + ( 2)! 1 ( 1) ( 1)! 1 ( 1) ! 1 1 2 n n n D e n n n . The result that there are approximately n!/ e ways to create different substitution alphabets then follows. [5] At first glance, the problem of finding the correct arrangement to decipher the message may seem to be a daunting challenge, but we can use clues from the message itself to simplify the task. Notice in the plaintext message from the substitution example above, the symbols T and A both occur twice. Likewise the corresponding symbols in the ciphertext, M and S both occur twice. This gives us clues that we may be able to use underlying statistical characteristics of the plaintext to find information about our ciphertext. Frequency Analysis As discovered above, some cipher systems may lend themselves to statistical methods of cryptanalysis. The idea of using the underlying frequency distribution of a language to aid in the attempt to read encrypted message traffic is first attributed to an Arab named Qalqashandi early in the 15th Century. [6] Since his time, frequency analysis is often the first step undertaken by cryptanalysts in the attempt to read the enemy's encrypted message traffic. In the English language, the letter 'e' is the most frequently occurring letter. In fact, using an electronic version of Melville's Moby-Dick, prepared by Professor Eugene F. Irey at the University of Colorado [7], I calculated the following frequencies and associated probabilities: