# FLASH INFORMATIQUE FI

# The mathematical problems underlying Information Security

Arjen LENSTRA |
---|

Modern information protection methods are based on a couple of mathematical problems that are believed to be hard to solve. In this note these problems and their cryptographic applications are discussed.

## Introduction

Many people prefer to keep their personal records confidential. To maintain confidentiality while sending personal data across the Internet -while e-banking, for instance-the data needs to be protected. Many common web-based applications do indeed provide protection, often without the user even being aware of it.

Information protection methods use encryption : before it is sent, the data is garbled into an incomprehensible looking string of symbols, in such a way that only the intended recipient can make sense of it and recover the original information. Anyone who eavesdrops on the communication does not learn anything, except that communication is taking place. Similar methods are used to establish a user’s identity.

This does not imply that encryption suffices. Miscreants may try to gain improper access to information using any of an ever greater variety of other methods. Weaknesses in operating systems can be exploited to get remote access to a user’s computer, a user’s password may be stolen or guessed, or a user may be tricked into revealing passwords using social engineering, etc. Thus, encryption alone does not guarantee confidentiality of a user’s data. But it is a necessary part of the protection chain.

This note concentrates on the basic mathematical tools that underly encryption and the corresponding decryption. How these tools are embedded in actual practical systems, and the many precautions that must be taken when doing so, is not discussed.
*Number theory* plays an important role in all widely accepted basic encryption tools .It is the branch of mathematics that studies, roughly speaking, integer numbers such as 0,1,2,3, ....An entertaining activity to some, but long thought to be mostly useless by most others. That changed when cryptographic applications of prime numbers were discovered. A positive integer is prime if it is evenly divisible only by 1 and itself. For instance, 17 is a prime number, since it is evenly divisible only by 1 and by 17 itself. But 91 is not prime because 91 = 7 _{*}13, so 91 is evenly divisible by a number other than 1 and 91, namely 7 (or 13). The number 1 is not a prime. Number theorists have studied prime numbers for ages, but their cryptographic potential was discovered only relatively recently (approximately in the 1970s, depending on whose claims one is willing to believe). These cryptographic applications cleverly exploit two problems involving primes :

- Factoring integers, and
- Computing discrete logarithms.

These problems have been studied extensively, in particular since the discovery of their cryptographic relevance. Only little progress was made, and it is widely believed that the two problems are ’hard’. This ’hardness’ is precisely what is required for the cryptographic applications, so from that point of view it is to be hoped that the problems remain unsolved. It should be kept in mind, however, that progress is conceivable and that it could in principle go either way : a proof of hardness or an efficient solution method. The former would provide the currently missing theoretical justification for the security of the cryptographic applications. The latter would mean that all widely accepted basic encryption tools can efficiently be broken : hardly a desirable outcome but unfortunately a possibility that cannot be discounted yet. One smart idea can still change the world-a dangerous or enviable proposition, depending on one’s point of view.

In the next sections, prime numbers and the two problems mentioned above are briefly discussed along with their cryptographic applications,

## Prime Numbers

Examples of prime numbers abound. There arealready 25 primes less than 100 :

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97

It can easily be argued that there are infinitely many prime numbers. If there would be a finite list of primes, then the product over all numbers on the list (say P ) is obviously evenly divisible by each of the prime numbers on the list. On the other hand P +1 is not evenly divisible by any of the numbers on the list. So, P +1 must be a prime number itself (and not on the list) or be evenly divisible by a prime smaller than P +1 that is not on the list. Because any finite list of primes can be extended with at least one other prime there are infinitely many primes.

This simple observation that there are infinitely many primes is made more explicit by a less straightforward result, the Prime Number Theorem. A precise statement of this theorem is not required here. Roughly speaking, it says that the chance that a randomly selected d-digit number is prime is proportional to 1/d. This allows to make estimates such as : ’there are more than 10^{96 }primes that have one hundred decimal digits’. The fact that there are so ’many’ primes is important for cryptographic applications. For instance, it implies that if two different persons each select at random a prime of, say, one hundred decimal digits, the chance is negligible that they pick the same prime.

This example raises three questions. Why would anyone want to generate such huge primes ? And why would it be important that different people most likely select different primes ? These two questions are addressed in the next section. The final question, how one would go about selecting a random prime of any size, is addressed in the remainder of this section.

Prime numbers can easily be distinguished from numbers that are not prime (a number that is not a prime number is called composite). To test whether or not a positive integer n is a prime, the ’obvious’ approach would be to check if n is evenly divisible by a positive integer less than n (note that it suffices to restrict the search to the primes that are at most √n). Although for some composite n’s this method may work quickly, in general it cannot be expected to work quickly-even worse, for large n it cannot be expected to work at all because it would take too much time. A less obvious but efficient method to recognize if n is composite is based on Fermat’s little theorem. This theorem says that if p is a prime number then for any integer number a the number a^{p }a is evenly divisible by p. Thus, if an integer a can be found such that n - does not divide a^{n }a, then n cannot be prime (because, if it were, n would n - divide a^{n } - a due to Fermat’s little theorem).

Fermat’s little theorem leads to a fast and practical method to recognize composite numbers if two additional facts are taken into account. In the first place, and using a slightly sharpened version of the theorem, if n is composite then most a-values in _{}*2, 3,...,n _{−}1_{}*would attest to that fact (namely, because n would not divide a

^{n }a). Furthermore, even if n counts thousands of digits, it can be verified quickly on any current computer whether or not n divides a

_{−}a. This is based on a computational trick called modular exponentiation. This trick has many applications in cryptography. Developing ever faster methods for modular exponentiation is a favorite research topic. More precisely, what modular exponentiation does for any integers m, b, and e

_{≥}0 is calculating b

^{e }mod m, which is a shorthand for ’the remainder upon division by m of the number b

^{e }’. The integers m, b, and e are often referred to as the modulus, base, and exponent, respectively. To check if n divides a

^{n }a, modular exponentiation is applied to modulus n, base a, and exponent n, and for the resulting value a mod n it is checked if (a

^{n }mod n)

_{−}a is evenly divisible by n.

Thus, to prove that n is composite, randomly select values a from_{}*2, 3,...,n _{−}1_{}*and use modular exponentiation to check if n divides a

^{n }

_{−}a, until an a is found for which it does not. If, after repeated attempts with randomly selected a-values, n turns out to divide all corresponding (a

^{n }a) -values, the compositeness proof for n fails. In that case, because the proof would most likely not have failed if n is composite, it must be assumed that n is prime. The latter is just an assumption. It is not a mathematical proof that n is prime. The ’alleged primes’ resulting from a failed compositeness proof are sometimes referred to as industrial primes. For all practical purposes, an industrial prime is prime, except that it has not been proved that the number is prime.

Combining the possibility of fast recognition of composites (and thus of industrial primes) with the relatively high density of primes (the Prime Number Theorem), leads to a relatively fast method to generate random (industrial) primes of any number of digits d : repeatedly pick a random d-digit number until one is found that is an industrial prime. From a practical point of view this answers the third question raised above.

The above method to generate primes trades mathematical rigor for practicality. A mathematically rigorous proof that the result is indeed prime can be obtained using entirely different methods. For the vast majority of industrial primes used in practical information protection scenarios, however, such proofs have not been obtained, and would not add much value either.

## Integer Factorization and RSA

Multiplication of integer numbers is easy. School kids memorize facts such as ’3 times 5 equals 15’, and learn how to use the elementary tables of multiplication to solve more challenging multiplication problems such as 123 _{* }456 = 56088 using the schoolbook method of multiplication. In this method each digit of the first number is multiplied by each digit of the second number, and the product is put together using simple additions. Numbers of any length can quickly be multiplied in the same way : on a computer numbers of thousands of decimal digits can be multiplied in a fraction of a second. Much faster methods exist, but they are beyond the scope of this note.

Doing a multiplication is easy, but undoing it seems to be hard : given a positive composite integer, say n, find two integers greater than 1 that when multiplied together produce n. These integers are called factors of n. For ex-ample, given n = 15, find 3 and 5 (since 3 _{*}5 = 15). This problem is known as integer factorization. In general it is believed to be a hard problem. As mentioned in Section 1, however, the hardness of integer factorization has not been proved. The problem is easy on a quantum computer, but so far no one knows how to build one. In the remainder of this section it is shown how the alleged difficulty of integer factorization can be exploited for cryptographic purposes, and the state of the art of integer factorization algorithms is briefly discussed. Cryptographic application. Suppose that someone, say Alice, wants to en-able anyone else to send encrypted messages to her. An elegant way to do this is the RSA public key crypto system, named after its inventors Ronald Rivest, Adi Shamir, and Leonard Adleman. It is a called a public key cryptosystem because Alice has to publish certain key material, allowing anyone to encrypt messages intended for her, while keeping secret a corresponding private key that allows her to decrypt the encrypted messages sent to her. Obviously, for such a scheme to work it should be impossible to derive Alice’s private key from the published public key, since otherwise anyone could perform the decryption. For RSA this impossibility requires the hardness of integer factorization. It works as follows.

As mentioned in Section 2, Alice can reasonably quickly generate random primes of virtually any size. Let pA and qA be two distinct random primes that she generated, such that factoring their product nA = pA _{*}qA is believed to be infeasible. This means that Alice can safely publish nA while keeping secret ’her’ primes pA and qA : even though pA and qA are uniquely determined by nA there is simply no efficient (known) way to find pA and qA based on the knowledge of just nA (if pA and qA are chosen appropriately). Except for pA and qA Alice also determines an integer eA and calculates an integer dA such that eA _{* }dA _{−}1 is evenly divisible by both pA _{−}1 and qA _{−}1. Given eA, pA, and qA the computation of dA can be done very quickly using another important computational trick, the extended Euclidean algorithm. The integer eA is published by Alice (along with nA), but she keeps secret dA (alongwith pA and qA). The pair (nA,eA) is Alice’s public key, consisting of Alice’s public modulus nA and her public exponent eA. Alice’s corresponding secret or private key consists of dA, the secret or private exponent.

The primes pA and qA can in principle be discarded after the set-up phase, but because they can be used to derive the private exponent dA (since eA is publicly known), they should in any case not be made public. It follows that, for the security of the RSA cryptosystem, factoring the public modulus nA should be difficult. It has however not been proved that the factorization of nA is needed to derive dA from eA and nA.

The set-up may look cumbersome. But, Alice has to do it just once. And, once it has been done, anyone who knows Alice’s public key can easily encrypt messages resulting in ciphertexts that only Alice can decrypt : for a message M, which is a positive integer less than nA, the encryption E(M) equals M^{e}^{A }mod nA.The corresponding decryption looks just as simple : to decrypt E(M) Alice calculates (E(M))^{d}^{A }mod nA which is equal to the original M. Both processes make us of modular exponentiation (cf.previous section).

If RSA is used to exchange plain text messages, they need to be converted to integer values in some agreed upon fashion. There are lots of ways in which this can be done. For instance, one could encode ’a’ as ’01’, ’b’ as ’02’, etc., until ’z’ as ’26’(andthus ’secret’ as ’190503180520’), or use any other more efficient scheme.

For readers interested in more details it is shown that the decryption indeed produces M. Substituting the identity E(M)= M into the decryption produces It follows that the decryption equals M For the decryption to work, this should be proved equal to M. The exponentseA and dA were constructed such that eA M which is the same as M It follows from Fermat’s little theorem (cf. previous section) and the fact that pA is prime that pA divides M M which is It is proved similarly (but with a possibly different k-value) that the remainder upon division by qA of the decryption equals M mod qA. Because the remainder of the decryption upon division by pA and qA equals M mod pA and M mod qA, respectively, the Chinese remainder theorem now learns that the remainder upon division by the product pA |

Despite the deceptively simple formulation, and the fact that a lot of effort has been invested to make modular exponentiation as fast as possible, encryption and decryption are too slow for real-time encryption and decryption of large volumes of data. In practice the method is typically used to exchange keys which are then used along with much faster symmetric key encryption. An other important application of the RSA cryptosystem is digital signatures. Without further precautions ,RSA as described here is vulnerable to all kinds of attacks. All these issues are beyond the scope of this note.

The efficiency of the RSA cryptosystem decreases rapidly with larger modulus sizes. Therefore, in practice one tries to get away with the smallest modulus size that still offers an acceptable level of security : obviously a modulus such as n =15 does not offer any security (since its factors 3 and 5 are easy to find), but a very secure modulus of millions of digits is practically infeasible. Finding the right middle ground between the security and practicality requirement depends on the state of the art of integer factoring algorithms, and is further commented on below.

Suppose Alice has received a confidential note from Bob in the manner described above. How does she reply to Bob in a confidential manner ? Simply by using the above scheme along with Bob’s public key (nB,eB). Thus, Bob too has to go through the set-up phase, generating his random primes pB and qB and his public and private exponents. For more general communications involving more parties, each party has to generate its own public and private key, and everyone’s public key has to be made publicly available. This leads to the need for so-called Public Key Infrastructures (PKIs) and Certification Authorities (CAs) to certify the link between parties and their public keys. All these issues are beyond the scope of this note.

One remaining issue is how one assures that different parties select different primes.If any two parties, say Alice and Bob, happen to select even one identical prime, then the greatest common divisor (which can very quickly be computed using the classical Euclidean algorithm) of their public moduli nA and nB would reveal this common prime -and thus the secret keys of both Alice and Bob- to anyone who cares to perform this simple calculation. To address this concern it suffices to notice that there are so many primes (assuming appropriate size) that if Alice and Bob use proper random number generators the chance is negligible that this occurs : the chance to win the lottery ten times in a row is higher, even if many more than two parties are involved.

Integer factorization algorithms. There are two types of integer factorization algorithms : special purpose and general purpose methods. For the former ones the time to success depends mostly on special properties of the factor one finds, for the latter it depends just on the size of the number being factored. Because moduli as used in the RSA cryptosystem cannot be expected to have factors with special properties, special purpose factoring methods are not relevant for this note.

The most common special purpose factoring methods are the following. If an integer of d decimal digits has a factor of k decimal digits, the factor can be found using trial division in time proportional to 10 |

An accessible description of general purpose factoring algorithms is Carl Pomerance’s *A tale of two sieves*[see www.ams.org/notices/199612/pomerance.pdf ]]. All currently known methods arebased on the same simple idea, but no substantial progress has been made since the invention in the late 1980s of the Number Field Sieve, the current fastest general purpose integer factoring algorithm. This lack of progress may be an indication that this ’same simple idea’ that factoring researchers have been pounding on since the mid 1970s is a dead end and that an entirely new idea is needed to make further progress.

All that is known-or can reasonably be predicted, even though making predictions of this sort is risky-is the following. In the late 1970s an RSA modulus of 129 decimal digits was published, along with the claim that factoring it would take 40 quadrillion years. Back then, this sounded reasonable. In 1994 this 129-digit modulus was factored. In the mid 1990s 155-digit moduli were widely used in actual implementations of the RSA cryptosystem. In 1999 the first 155-digitRSA modulus was factored, thereby showing that this widely used size no longer offers adequate security. RSA moduli having about 200 decimal digits have recently been factored, but this requires a substantial effort involving hundreds of computers and months of computing time. Assuming somewhat refined versions of current algorithms (and more computing power) one may expect that moduli of about 300 decimal digits can be factored within the next 10 or 20 years. The current practical use of RSA moduli of about 300 decimal digits thus seems fine, but they cannot be relied upon for long term use. Moduli of 500 or more decimal digits look out of reach for current methods. But how long before a single new bright idea affects those moduli as well ? Only the future will tell.

## Discrete Logarithms and the Diffe-Hellman Protocol

As mentioned in Section 2, modular exponentiation can be use to efficiently calculate b^{e }mod m, the remainder upon division by m of b^{e}, for any modulus m, base b, and exponent e. Undoing a modular exponentiation is the problem where a modulus m, base b, and a positive integer y that is less than m are given and where an exponent e must be calculated such that y = b^{e }mod m. This problem is commonly referred to as the discrete logarithm problem. There are many remarkable parallels between the discrete logarithm problem and integer factorization : for properly chosen m and b it is believed to be hard, the hardness has not been proved, it is easy on a quantum computer, the alleged hardness can be used for cryptographic purposes, and all known algorithms to compute discrete logarithms are very similar to integer factorization algorithms. In this section a famous cryptographic application and some discrete logarithm algorithms are briefly discussed.

**The Diffe-Hellman protocol**. The Diffe-Hellman protocol is a key agreement protocol. It allows two parties, say Alice an Bob, to agree on a secret piece of information, commonly referred to as a key, by communicating over a public channel. Afterwards, the resulting key can be used by Alice and Bob to communicate confidentially (and efficiently) over the same public channel, using symmetric key encryption.

In the Diffe-Hellman protocol it is assumed that a suitable modulus m and base b are known system-wide. Alice simply picks an exponent eA _{≥}0at random, uses modular exponentiation to calculate b^{e}^{A }mod m, and transmits the resulting value b^{e}^{A }mod m to Bob. Bob, similarly, picks an exponent eB 0 at random, ≥ uses modular exponentiation to calculate b^{e}^{B }mod m, and transmits the resulting value b^{e}^{B }mod m to Alice. Alice then uses her exponent eA and the value b^{e}^{B }mod m received from Bob to calculate (b^{e}^{B }mod m)^{e}^{A }mod m, while Bob uses his exponent eB and the value b^{e}^{A }mod m received from Alice to calculate (b^{e}^{A }mod m)^{e}^{B }mod m. But the value (b^{e}^{B }mod m)^{e}^{A }mod m calculated by Alice equals b^{e}^{B}^{*e}^{A }mod m, and so does the value (b^{e}^{A }mod m)^{e}^{B }mod m calculated by Bob (since eB _{*}eA equals eA _{*}eB). So, at this point Alice and Bob share the ’key’ b^{e}^{A}^{*e}^{B }mod m.

If the value eA can be derived from the system parameters m and b and the value b^{e}^{A }mod m that is transmitted by Alice over the public channel, then the shared key b^{e}^{A}^{*e}^{B }mod m can be computed using modular exponentiation from m, eA, and the value b^{e}^{B }mod m transmitted by Bob. So, the security of the resulting key relies on the difficulty of computing the discrete logarithm eA of the transmitted value b^{e}^{A }mod m. Thus, m and b must be chosen in such a way that computing this discrete logarithm is too hard. Vice versa, the security also depends on the difficulty of computing the discrete logarithm eB of the other transmitted value(b^{e}^{B }mod m). Though computing a discrete logarithm suffices to derive the key b^{e}^{A}^{*e}^{B }mod m from the two transmitted values b^{e}^{A }mod m and b^{e}^{B }mod m (and the public parameters m and b) it has not been proved that computing a discrete logarithm is necessary : in principle there may be a shortcut that leads to the key b^{e}^{A}^{*e}^{B }mod m by combining the available values m, b, b^{e}^{A }mod m, and b^{e}^{B }mod m in some other way. The problem of finding b^{e}^{A}^{*e}^{B }mod m given m, b, b^{e}^{A }mod m, and b^{e}^{B }mod m is known as the Diffe-Hellman problem. It is at most as hard as the discrete logarithm problem, but may be easier. Nevertheless, it is not uncommon to say that the security of the Diffe-Hellman protocol relies on the difficulty of the discrete logarithm problem (though it mayactually be easier), since if the discrete logarithm problem can be solved, then the Diffe-Hellman prototol can be ’broken’ (i.e., the key can be found) by anyone eavesdropping on the communication.

As usual, the elementary description as given above is vulnerable to all kinds of attacks unless precautions are taken. The most infamous (and common) one is the man in the middle attack : if Alice and Bob do not make sure that they actually communicate with each other they could each be setting up separate secret keys with a third party, which could then have easy access to all confidential data that Alice and Bob want to exchange.

Discrete logarithm algorithms. The difficulty of the discrete logarithm problem depends to a large extent on properties of the modulus m and the base b other than just sheer size. It is assumed that m and b are carefully chosen to avoid ’easy’ cases.

discrete logarithm problem. If the modulus m is composite then, as a consequence of the Chinese remainder theorem, the discrete logarithm problem can be reduced to discrete logarithm problems involving the prime factors of m. In some applications m is constructed in such a way that its factorization is hard to find, but it is more common to select a prime modulus m. Here it is assumed that m is prime. An important number is the smallest positive integer ℓ for which b |

Under the assumption that the discrete logarithm problem cannot be solved using special properties of m or b, the difficulty of the problem is determined by the size of m. The best methods known in this case are all very similar to general purpose integer factoring methods. For n and m of approximately the same size, the effort required to factor an RSA modulus n is comparable to solving a (general) discrete logarithm problem with prime modulus m. Thus, parameters of about the same size should be used for RSA and the Diffe-Hellman protocol as discussed here. As is the case for integer factorization, there is no mathematical proof that any practically applicable discrete logarithm problem is indeed hard.

Other discrete logarithm problems. The first proposed cryptographic applications of the discrete logarithm problem all used the simplest case described above based on plain integer numbers. It was later realized that there are instances of the discrete logarithm problem involving other mathematical structures that are suitable for cryptographic applications as well and that may even offer certain advantages compared to the more traditional integer approach. A popular example is Elliptic Curve Cryptosystems which involve the use of advanced mathematical structures called elliptic curves. Unfortunately, as in the integer case, there is no proof of the hardness of the elliptic curve discrete logarithm problem.

## Conclusion

To an outsider it may seem strange that, deep down, the security of virtually all electronic communications is based on a couple of mathematical problems that are not well understood. The security argument relies solely on years of failed solution attempts, not on rigorous mathematical reasoning. This is worrying to some, but does not seem to be a concern to practitioners. Other method, relying on different mathematical problems, have been proposed. But they have never reached the level of popularity of factoring or discrete logarithm based systems, and most problems are regarded with even more skepticism than the ones discussed here. All we can do at this point is keep looking for better alternatives and hope that quantum computers cannot be built.

### Cherchez ...

- dans tous les Flash informatique*(entre 1986 et 2001: seulement sur les titres et auteurs)*

- par mot-clé

### Avertissement

Cette page est un article d'une publication de l'EPFL.

Le contenu et certains liens ne sont peut-être plus d'actualité.

### Responsabilité

Les articles n'engagent que leurs auteurs, sauf ceux qui concernent de façon évidente des prestations officielles (sous la responsabilité du DIT ou d'autres entités). Toute reproduction, même partielle, n'est autorisée qu'avec l'accord de la rédaction et des auteurs.### Archives sur clé USB

Le Flash informatique ne paraîtra plus. Le dernier numéro est daté de décembre 2013.### Taguage des articles

Depuis 2010, pour aider le lecteur, les articles sont taggués:-
**tout public**

que vous soyiez utilisateur occasionnel du PC familial, ou bien simplement propriétaire d'un iPhone, lisez l'article marqué tout public, vous y apprendrez plein de choses qui vous permettront de mieux appréhender ces technologies qui envahissent votre quotidien -
**public averti**

l'article parle de concepts techniques, mais à la portée de toute personne intéressée par les dessous des nouvelles technologies -
**expert**

le sujet abordé n'intéresse que peu de lecteurs, mais ceux-là seront ravis d'approfondir un thème, d'en savoir plus sur un nouveau langage.