MASSACHUSETTS LABORATORY FOR 't INSTITUTE OF COMPUTER SCIENCE TECHNOLOGY MIT/LCS/TM- 119 ON THE SECURITY OF THE MERKLE-HELLMAN CRYPTOGRAPHIC SCHEME Adi Shamir Richard E. Zippel December 1978 This research was supported by the Office of Naval Research under Contract No. NOOO14- 76-C- O366 and NASA Grant No. NSG 1323 545 TECHNOLOGY SQUARE, CAMBRIDGE, MASSACHUSETTS 02139 ~UT/I.CS/'lM- 119 ON THE SECURITY OF THE MERKLE- HELI.MAN CRYPI'CGRAPHIC SCHEME Adi Shamir Richard E. Zippel December 1978 ON THE SECURITY OF THE MERKLE-HELLMAN CRYPTOGRAPHIC SCHEME by Adi Shamir and Richard E. Zippel Abstract: In this paper we show that a simplified version of the Merkle-Hellman public-key cryptographic system is breakable. While their full-fledged system seems to be resistant to the cryptanalytic attack we propose, this result suggests some ways in which the security of their system can be further enhanced. Key Words: Cryptography, public- key cryptosystems, Merkle-Hellman knapsacks, Graham-Shamir knapsacks. This research was supported by the Offi"ce of Naval N NOOO Research under Contract o. 14-76-C-0366 and NASA Grant No. NSG 1323 -2- 1. The Merkle-Hellman Knapsack Systems. In this section we briefly outline the Merkle-Hellman cryptographic system. A fuller description can be found in [l]. A knapsack system is a vector of n natural numbers (a , ... ,an). It 1 represents a collection of knapsack problems (or instances) of the following type: given an integer S, find a 0-1 valued vector (x , ... ,xn) n 1 such that S = i~lxiai (if one exists). Knapsack problems are known to be NP-complete ([2]), and thus they serve as an attractive source for crypto- graphic functions . One way of using knapsack systems in public-key cryptography (see [3] for definitions) is to l et each network member publish his knapsack system (a1, ... ,an) in a publicly available network directory. Anyone wishing to send an n-bit message X = (x1 , ... ,x) to a network member uses the latter 1 s n n known knapsack system in order to calculate the sum S = .E x.a., and to 1= 1 1 l send it over the (insecure) communication channel . An eavesdropper who gets hold of Sand who tries to recover X from Sis faced with the apparently impossible task of solving the corresponding knapsack problem. In order to enable the intended receiver of S to solve this knapsack problem, some hidden structure must be embedded in the knapsack system (a1 , ... ,an) . This structure should be hard to find (i~e., the knapsack system should look like an n-tuple of random numbers to the uninformed observer), but it should enable those who know it to decode encrypted messages quickly by a shortcut method. The knapsack systems Merkle and Hellman use are based on superincreasing sequences . A vector (a1, ... ,a~) of natural numbers is a superincreasing -3- i -1 sequence if for each l < i < n, a~> .~ aJ~. A simple example of a super- - - l J- 1 increasing sequence is (l,2,4,8, ... ,2n) in which each number equals the sum of its predecessors plus one. Considered as a knapsack system, there is an easy algorithm for solving all the instances of a superincreasing sequence by successive subtractions -see [1] for details. The numbers a'. cannot be published in the public directory, since 1 their obvious structure enables any eavesdropper to decode encrypted messages S. To hide this structure, Merkle and Hellman suggest using a n modulus m and a multiplier w, such that m > i~l ai and gcd (w,m) = 1 1 (this insures the existence of a multiplicative inverse w- of w modulo m). Instead of publishing a~, the network member publishes the numbers a., 1 1 where for each 1 < i < m a . = a '. • w (mod m) 1 1 The network member, who knows the unpublished numbers m and w he used, n can quickly transform any instance S = i~l x.a. of the apparently difficult 1 1 n 1 knapsack system (a , ... ,an) to an instance S•w- = i~l xiai (mod m) of the 1 easily solvable knapsack system (a1, ... ,a~), and thus decode S into X. To use this efficient method, a cryptanalyst must determine m and w from the published numbers (a , ... ,an); the difficulty of this problem is studied 1 in the next section . In their paper, Merkle and Hellman recommend the following specific parameters for their knapsack systems: (i) n = 100 (knapsack systems with one hundred elements). (ii) Each ai is randomly chosen from a uniform distribution over the interval [(2i-l _ 1)•2lOO + 1,2i-l.2lOO] (it is a 99+ i bit natura 1 number). -4- 201 02 (iii) The modulus mis chosen uniformly from the interval [2 + 1,l - l] (thus making all the ai pseudo-random 202-bit natural numbers). (iv) The multiplier w is chosen uniformly from the interval [2,m-2] and then divided by its gcd with m. 2. The Cryptanalytic Attack. The starting point for our cryptanalytic attack was the following challenge in Merkle and Hellman's paper: "Attempts to break the system can start with simplified problems (e.g., assuming mis known). If even the most favored of certifi - cational attacks is unsuccessful, then there is a margin of safety against cleverer, wealthier, or luckier opponents. Or, if the favored attack is successful, it helps to establish where the security really must reside. For example, if knowledge of mallows solution, then an opponent's uncertainty about m must be large. 11 In this section we show that the knowledge of m makes any standard- parameter Merkle-Hellman knapsack system highly vulnerable to cryptanalysis . The key idea is that the first two numbers a1 and a2 in the unknown superincreasing sequence are much smaller than the modulus m (for the recommended parameters, a1, a2 and mare 100, 101 and 202 bits long, respec - tively). We assume that in the list of published numbers a , ... ,an the 1 cryptanalyst can identify the two numbers a and a which correspond to 1 2 a1 and a2 (if these numbers are published in a shuffled order, the crypt- analyst can repeat the following procedure for each one of the 100·99 possible pairs of published numbers, and still break the system in reason- able time). Since mis known, we can calculate the quotient q: a, q = (mod m) . a2 -5- But a. = a'. ·w (mod m) and thus 1 1 a' ·w a' q l l = (mod m) a' ·w = a' 2 2 or a' a I •q (mod m). l = 2 Consider now the set of al1 the modular multiples of q for multipliers in the range [1 ,21O l J: {1·q(mod m),2•q(mod m), 101 Since a2 ~ 2 , a2·q (mod m) (which is equal to a1) is in this set. All 101 these 2 multiples are very evenly distributed in the interval [O,m-1], l 01 I\, and thus the smallest number among them is likely to be around m/2 I\, l 00 But a1 is known to be smaller than or equal to 2 , and thus a1 itself is likely to be the smallest number in this set. Consequently a11 we need in order to find (a candidate for) a1 is to find the minimum value of j•q (mod m) when j ranges over the interval [1,2101 ] and q,m are known. Efficient methods for solving this number- theoretic problem (using the continued fraction approximation of the ratio q/m) can be found in [4] and [5]. Once a candidate value for a1 is found, w can be calculated as a ;a1 (mod m) and then the whole sequence ai can be generated from m, w 1 anti the published numbers ai. If the candidate value for a1 is the correct one, the ca1cu1ated sequence ai would turn out to be superincreasing, thus verifying the candidate and giving a quick way of solving instances of the published knapsack system. It is easy to see that for other choices of the parameters, this -6- cryptanalytic attack has a good probability of success only as long as a1-a2 is not much larger than m. The network member can of course use Merkle-Hellman knapsack systems in which this condition does not hold. There are two reasons why such a simple solution might not be adequate: n (i) If m > i~l ai and ai is superincreasing, then a simple calculation shows that m .:':.. 2na1 and m .:':.. 2n-l.a2, and thus a1•a2 ~m 2;22n- l To make a1-a2 much bigger than min a hundred element knapsack system (which is the minimum secure value), m must have considerably more than 200 bits. This slows down the computations and worsens the ratio between the number of bits in encrypted and original messages. (ii) Our cryptanalytic method uses only the two smallest numbers in the superincreasing sequence ai. If three or more elements are considered simultaneously, the condition a1-a2 ~m can be weakened considerably . Although we do not know how to do it at present, it seems dangerous to assume that such an extension is impossible. 3. Safer Variants of the Merkle-Hellman Knapsack Systems. After defining their basic knapsack systems, Merkle and Hellman note that a safer knapsack system can be obtained by iterating the modular multiplications technique a number of times. At each iteration a new n modulus m. (m. > .E a.) and a new multiplier wJ. (gcd(wJ.,mJ.) = 1) are J J 1= 1 l chosen, and all the knapsack elements ai are replaced by ai •wj (mod mj). The decoding of encrypted messages is done by successively dividing them by the wj (mod mj) in the reverse order, thus unwinding the iterations -7- all the way back to the original superincreasing sequence. When two or more iterations are used in order to obscure the structure of the superincreasing sequence, our cryptanalytic attack becomes in- effective (even when all the modulus mj and all but the last wj are known). The reason is that when we attempt to strip the last wj from the knapsack elements by dividing pairs of the published numbers modulo the last mj, we are left with large, random looking numbers (the results of the last but one iteration) to which the minimization technique cannot be applied. In their paper, Merkle and Hellman express the belief that knapsack systems obtained by two iterations are strictly more secure than their simple, single iteration knapsack systems. Our method is an explicit cryptanalytic example which substantiates Merkle and Hellman 1 s intuitive feeling. Another way of eliminating the potential weakness represented by extremely small knapsack elements has been suggested (independently) by Graham and Shamir . The idea is to use structured numbers, whose low-order parts are a superincreasing sequence and whose high-order parts are strings of random bits: a I = 0 0 l a• = 0 0------- 2 (superincreasing (random) sequence) a• = 0 ----------- n high-order part low-order part -8- Due to the existence of the high-order "noise", none of these numbers is likely to be small, but when some of them are added together, the sum can still be decoded by disregarding its high -order part and analyzing its low-order part in the usual way. A particularly simple knapsack system is obtained when the low-order part is decomposed further in the following way: a' l = 0 ........ l 0 0 a' = 0 . . . . 0 l 0 0 0 2 • • (ra~dom) • (random) • a• = l 0 ...... 0 n 0 0 The block of zeros between the low-order random bits and the diagonal matrix is log n bits wide. Its purpose is to serve as a buffer zone, so 2 that even when all then numbers ai are added together, the sum of the low order bits does not overflow into the region of the diagonal matrix. To obscure this structure, we use k ~ l iterations of Merkle and Hellman's modular multiplications technique. Encrypted messages are now very easy to decode: once we unwind the iterations back to the ai knapsack system, the decoded message can be read off an intermediate interval of bits in the (augmented) encoded message, without any further computations. This variant of Merkle and Hellman's scheme seems to be safer, faster and simpler to implement than the original variant recommended in [l]. -9- Acknowledgements: We would like to thank Len Adleman, Abraham Lempel, Michael Rabin and Ron Rivest for many fruitful discussions . Bibliography: [l] R. Merkle and M. Hellman, "Hiding Information and Receipts in Trap Door Knapsacks 11 , IEEE Trans. Information Theory, September 1978 . [2] R. Karp, "Reducibility Among Combinatorial Problems", in "Complexity of Computer Computations" (ed. R. Miller and J. Thatcher), Plenum Press, 1972. [3] W. Diffie and M. Hellman, "New Directions in Cryptography", IEEE Trans. Information Theory, November 1976. [4] J. Cassels, "An Introduction to Diophantine Approximation", Cambridge University Press, 1965. [5] W. Leveque, "Fundamentals of Number Theory", Addison-Wesley, 1977 . =