Security in Web Applications

Vrije Universiteit Brussel Faculty of Sciences Computer Science Department Academic Year 1999-2000 Security in Web Applications Köll_ Edit Promotor...
Author: Owen Day
15 downloads 1 Views 220KB Size
Vrije Universiteit Brussel Faculty of Sciences Computer Science Department Academic Year 1999-2000

Security in Web Applications

Köll_ Edit

Promotor:

Prof. dr. Olga De Troyer

Security in Web Applications

Masters Thesis Master of Science in Computer Science

2

Security in Web Applications

Content Content .................................................................................................................... 3 Table of figures ....................................................................................................... 4 1 Abstract ............................................................................................................. 5 2 Introduction ....................................................................................................... 5 3 Security.............................................................................................................. 7 3.1 Security Services ..................................................................................... 7 3.2 Cryptography Algorithms........................................................................ 8 3.2.1 Cryptanalytic Attacks.................................................................... 10 3.2.2 DES (Data Encryption Standard) .................................................. 12 3.2.3 IDEA (International Data Encryption Algorithm) ........................ 14 3.2.4 RSA (Rivest-Shamir-Adleman) .................................................... 14 3.2.5 SSL (Secure Sockets Layer).......................................................... 16 3.2.6 SET (Secure Electronic Transaction)............................................ 19 3.2.7 Hash Functions.............................................................................. 19 3.2.8 Message Authentication Codes (MAC) ........................................ 20 3.3 Security Infrastructures ......................................................................... 20 3.3.1 Digital Certificates ........................................................................ 20 3.3.2 Public-Key Infrastructure (PKI).................................................... 22 4 Smart cards...................................................................................................... 25 4.1 Card types.............................................................................................. 25 4.2 Memory types and capacity................................................................... 26 4.3 Physical specifications .......................................................................... 28 4.4 Logical and physical security of a smart card ....................................... 28 4.5 File system of a smart card.................................................................... 29 4.6 Smart card applications ......................................................................... 31 5 Web Applications............................................................................................ 34 5.1 Construction and Syntax of URLs......................................................... 34 5.2 The HTTP Protocol ............................................................................... 35 5.3 The HTTPs Protocol.............................................................................. 36 5.4 HTML.................................................................................................... 36 5.5 FORMS ................................................................................................. 37 6 Secured E-Forms ............................................................................................. 39 6.1 Introduction to Secured E-Forms .......................................................... 39 6.2 What kind of data can be secured?........................................................ 40 6.3 Interaction between the Web server and Web client............................. 41 6.4 Security of the E-Forms ........................................................................ 45 6.4.1 Authentication ............................................................................... 45 6.4.2 Confidentiality............................................................................... 45 6.4.3 Data integrity................................................................................. 46 6.4.4 Non-repudiation............................................................................. 46 7 Conclusions ..................................................................................................... 47 3

Security in Web Applications

Glossary................................................................................................................. 48 References ............................................................................................................. 54

Table of figures Figure 1: Encryption and decryption using a key.................................................... 9 Figure 2: The DES algorithm ................................................................................ 13 Figure 3: The triple DES algorithm....................................................................... 14 Figure 4: The operations of the IDEA cipher........................................................ 14 Figure 5: Confidentiality and authentication of the message with RSA ............... 16 Figure 6: The relation between SSL, the application and the transport layer ....... 17 Figure 7: The relationship between SSL and the ISO reference model ................ 18 Figure 8: The certificate definition in X.509 version 3......................................... 22 Figure 9: Certification authority hierarchy used in SET ....................................... 24 Figure 10: Basic smart card configuration ............................................................ 27 Figure 11: Logical file structure of a smart card................................................... 30 Figure 12: Possible sources of data to be protected .............................................. 41 Figure 13: Client/Server interaction...................................................................... 43

4

Security in Web Applications

1 Abstract One of the main pitfalls of the widespread usage of the World Wide Web for transaction processing is the lack of security services provided. The main subject of the thesis is the security of electronic forms (E-Forms) since they are completely representative for the security problems involved in a typical user interaction with a Web application. First, the thesis presents the main security concepts, then it introduces Smart cards as a secured device, and after describes the basic elements of the Web involved in forms. In the end, different ways of securing E-Forms are discussed. Finally it is concluded that although a typical security solution for E-Forms like SSL provides some of the basic security services, a Smart card based process allows a stronger and more complete solution.

2 Introduction Web security is a complex topic, encompassing computer system security, network security, authentication services, message validation, personal privacy issues, and cryptography [W3CSecRes]. The continuous explosive growth of the World Wide Web’s applicative usage has brought with it a need to securely protect sensitive communications sent over the Internet. Businesses that accept transactions via the Web can gain a competitive edge by reaching a worldwide audience, at very low cost. But the Web poses a unique set of security issues, which businesses must address at the outset to minimize risk. Customers will submit information via the Web only if they are confident that their personal information, such as credit card numbers, financial data, or medical history, is secure. Most encrypted transactions use a combination of private keys, public keys, symmetric keys, hash functions, and digital certificates to achieve authentication (both of the user and the Web server), confidentiality, data integrity, and nonrepudiation by either party. The general problem of securing E-Forms proposed in this thesis is the cryptographic protection of Web HTTP traffic generated by the usage of the electronic forms. The connectionless and stateless nature of the HTTP protocol makes it a bit difficult to implement a secured transaction using EForms. A Secured E-Form is an electronic form containing different fields to hold the user’s data, but also data coming from external storage, Smart card, local database, etc. It also contains certain logic to be applied to this data and the

5

Security in Web Applications

methods for the security services. One example is transaction processing by secured E-Forms for the basic on-line banking services: account management, payment orders, and requests for documents. Chapter 3 explains the most important security concepts: services, algorithms and infrastructures. Chapter 4 introduces smart cards, their physical and logical characteristics as well as most common applications of them. Smart cards can be used both to hold data and to perform certain cryptography functions. Web, HTTP and Form related concepts were subject of the Chapter 5 to define what is a Web application. Chapter 6 is a discussion about securing WEB data transactions using E-Forms. It also gives examples of secured E-Forms usage in banking, electronic commerce and administration. Chapter 7 presents the conclusion. In order to facilitate the understanding and the use of the terminology both in security, web and smart cards domain, a glossary of the most common terms used was added in the end.

6

Security in Web Applications

3 Security 3.1 Security Services The most common security services are: 1. Authentication 2. Confidentiality 3. Data integrity 4. Non-repudiation

1. Authentication Authentication is the process of verifying identity so that one entity can be sure that another entity is who it claims to be. Two forms of authentication are distinguished: • •

Peer entity authentication is "The corroboration that a peer entity in an association is the one claimed." and Data origin authentication is "The corroboration that the source of data received is as claimed." (ISO 7498-2).

An authentication process consists of two steps: a. Identification step: Presenting an identifier to the security system. Identifiers should be assigned carefully, because authenticated identities are the basis for other security services, such as access control service. b. Verification step: Presenting or generating authentication information that corroborates the binding between the entity and the identifier. One can distinguish strong and weak authentication: Strong authentication is an authentication process that uses cryptography; particularly public-key certificates in order to verify the identity claimed for an entity. It is "Authentication by means of cryptographically derived credentials." [X509]. Weak authentication is an authentication process that uses a password as the information needed to verify an identity claimed for an entity.

7

Security in Web Applications

2. Confidentiality The property that information is not made available or disclosed to unauthorized individuals, entities, or processes. Data confidentiality service is a security service that protects data against unauthorized disclosure. 3. Data integrity Data integrity is the property that data has not been changed, destroyed, or lost in an unauthorized or accidental manner. So it insures that the data received is the same as the data that was sent. Data integrity service is a security service that protects against unauthorized changes to data, including both intentional change or destruction and accidental change or loss, by ensuring that changes to data are detectable. 4. Non-repudiation Non-repudiation service is a security service that provides protection against false denial of involvement in a communication. The goal of non-repudiation is to prove that a message has been sent and received. This is extremely important in banking networks where financial transactions must be verifiably completed, and in legal networks where signed contracts are transmitted. Current technology to accomplish this involves a certification authority that verifies and time stamps digital signatures. The service provides evidence that can be stored and later presented to a third party to resolve disputes that arise if and when a communication is repudiated by one of the entities involved. There are two basic kinds of non-repudiation service: - "Non-repudiation with proof of origin" provides the recipient of data with evidence that proves the origin of the data, and thus protects the recipient against an attempt by the originator to falsely deny sending the data. This service can be viewed as a stronger version of an data origin authentication service, in that it proves authenticity to a third party. - "Non-repudiation with proof of receipt" provides the originator of data with evidence that proves the data was received as addressed, and thus protects the originator against an attempt by the recipient to falsely deny receiving the data.

3.2 Cryptography Algorithms

8

Security in Web Applications

A cryptography system (see Figure 1) is composed of: • M – the clear text, plaintext • C – the encoded text, ciphertext • 2 inverse functions E() and D() • An algorithm which produces the keys Ke and Kd such that: C=EKe(M) and M=DKd(C).

Ke M (plaintext)

E()-Encryption

Kd C (ciphertext) D()-Decryption

M (plaintext)

Figure 1: Encryption and decryption using a key

They are two kinds of cryptography systems: •

Symmetric systems, with a secret key: the same secret key is used both for encoding and decoding (K e = Kd). Well known systems are: the US Data Encryption Standard (DES) or the International Data Encryption Algorithm (IDEA). There are two types of symmetric cipher in common usage: The block cipher takes a fixed length of plaintext (called the block size) and generates the same amount of ciphertext. If the total length of the plaintext is not a multiple of the block size, then padding data may be used to make up the difference on the last block of plaintext. The stream cipher converts plaintext to ciphertext one bit at a time.



Asymmetric systems, with public key: two different but related keys are used for encryption and decryption (Ke ≠ Kd), one of them is public and the other one is kept secret. Although the public and private keys are mathematically related, the private key cannot feasibly be computed from the public key, so it is safe to publish the public key anywhere. This is achieved by relating the two keys through a mathematical "trap-door" function. A trap-door function is one which is easy to compute in one direction, but which has an inverse which is much more difficult to calculate. So in the case of public-key cryptography, creating the two keys from scratch is easy but attempting to find one from the other one is hard. Well known asymmetric systems are: RSA (Rivest-Shamir-Adleman), the El Gamal cipher (EG), Digital Signature Standard (DSS) and ciphers based on elliptical curves.

Asymmetric algorithms require more computation than equivalently strong symmetric ones. Thus, asymmetric encryption is not normally used for data

9

Security in Web Applications

confidentiality except in distributing symmetric keys in applications where the key data is usually short (in terms of bits) compared to the data it protects. Electronic commerce applications most widely employ DES, IDEA, or RC4 algorithm (RSA). The number of bits used for the encryption keys is important since it indicates the level of effort required to perform a brute-force search for the correct key. Another classification of ciphers is in block ciphers and stream ciphers: Block cipher A block cipher is an encryption algorithm that breaks plaintext into fixed-size segments and uses the same key to transform each plaintext segment into a fixedsize segment of ciphertext. For example, Blowfish, DEA, IDEA, RC2, and SKIPJACK. However, a block cipher can be adapted to have a different external interface, such as that of a stream cipher, by using a mode of operation to "package" the basic algorithm. Stream cipher A stream cipher is an encryption algorithm that breaks plaintext into a stream of successive bits (or characters) and encrypts the n-th plaintext bit with the n-th element of a parallel key stream, thus converting the plaintext bit stream into a ciphertext bit stream.

3.2.1 Cryptanalytic Attacks The purpose of cryptography is to keep the plaintext secret from interceptors. They are assumed to have complete access to the communication between the sender and receiver. An encryption algorithm is computationally secure if, given all the finite resources conceivably available now or in the future, it still cannot be broken. Different cryptanalytic attacks exist. Each of them assumes that the attacker has complete knowledge of the encryption algorithm used and implementation. A fundamental assumption is that the secret must reside entirely in the key. 1. Ciphertext-only attack A cryptanalyst has the ciphertext from one or more messages, encrypted using the same encryption algorithm and is able to deduce the plaintext or the key. Known: Deduce:

C1=Ek(M1), C2=Ek(M2), … Ci=Ek(Mi) Either M1, M2, … Mi; k; or an algorithm

10

Security in Web Applications

To infer Mi+1 from Ci+1 =Ek(Mi+1) 2. Known-plaintext attack In this case, both the ciphertext and corresponding plaintext for one or more messages are available from which plaintext or key are obtained. Known: M1, M2, … Mi, C1=Ek(M1), C2=Ek(M2), … Ci=Ek(Mi) Deduce: Either k; or an algorithm To infer Mi+1 from Ci+1 =Ek(Mi+1) 3. Chosen-plaintext attack The cryptanalyst not only has access to the ciphertext and associated plaintext for several messages (as in case of the known-plaintext attack), but he also chooses the plaintext that gets encrypted. This is more powerful than a known-plaintext attack, because the cryptanalyst can choose specific plaintext blocks to encrypt, ones that might yield more information about the key. Known:

M1, M2, … Mi, C1=Ek(M1), C2=Ek(M2), … Ci=Ek(Mi) Where the cryptanalyst gets to choose M1, M2, … Mi Deduce: Either k; or an algorithm To infer Mi+1 from Ci+1 =Ek(Mi+1) Known-plaintext attacks and chosen plaintext attacks are more common than one might think. 4. Adaptive-chosen-plaintext attack This is a special case of the chosen-plaintext attack. The cryptanalyst can choose subsequent plaintexts on the basis of previously selected plaintext/ciphertext pairs. In a chosen-plaintext attack, a cryptanalyst might just choose one large block of plaintext to be encrypted; in an adaptive-chosen-plaintext attack he can choose a smaller block of plaintext and than choose another based on the results of the first, and so forth. 5. Chosen-ciphertext attack For a symmetric cipher it is identical to chosen-plaintext attack. For an asymmetric (public-key) cipher this differs by letting the cryptanalyst choosing the ciphertext, which is decrypted. Known: Deduce:

C1, C2, … Ci, M1=Dk(C1), M2=Dk(C2), … Mi=Dk(Ci) k

6. Chosen-key This attack doesn’t mean that the cryptanalyst can choose the key; it means that he has some knowledge about the relationship between different keys.

11

Security in Web Applications

7. Brute-force attack This is not really cryptanalysis, but it is still a valid attack. If the cipher forces the cryptanalyst to resort to this method then the cipher is regarded as strong. The attack consists of going through every possible key, deciphering the ciphertext with each key, and attempting to find some recognisable plaintext in the result. So for a 40-bit key, there would be 240 possible keys of which, on average, 239 would have to be tested. The benefit of forcing a cryptanalyst to fall back on a brute-force attack to crack a code is that the key length can be used as a quantitative metric for comparison with other algorithms. For instance, a 16-bit key would only take seconds to search, on any modern computer, no matter which of the common symmetric ciphers was used. Whereas a 128-bit (or more) key would provide secure encryption for hundreds of years, even when future advances in computer technology are taken into account. When considering a particular encryption algorithm, a decision has to make over the value of the original data and the time period over which it must be kept secure. The key length provides the means to assess these issues.

3.2.2 DES (Data Encryption Standard) In 1977, a standard symmetric encryption method was adopted by the U.S. government, called DES (Data Encryption Standard). In 1981 the American National Standard Institute (ANSI) approved DES as an industry standard, calling it the Data Encryption Algorithm (DEA). DES is a block cipher. That means that it operates on a single chunk of data at a time, encrypting 64 bits (8 bytes) of plaintext to produce 64 bits of ciphertext. The key length is 56 bits, often expressed as an 8-character string with the extra bits used as parity check. The algorithm is based on a set of permutations, substitutions and modulo 2 sums, applied iteratively 16 times on a 64 bits block, using each time a different key of 48 bits, extracted from the original 56 bit key. Figure 2 shows the overall process:

12

Security in Web Applications

Input

K1

Initial Permutaion

L0

R0 K1 f Round 1

L1 = R 0

R1 = L0 + f (R 0, K1)

K2 Round 2

K16

Round 16

32-bit Swap Final Permutation Output

Figure 2: The DES algorithm

Triple DES is a variation of DES and is appealing in that it requires no new algorithms or hardware over and above conventional DES. Figure 3 shows three 56-bit DES keys being used as input to an array of three DES chips or software blocks. The pattern used for the encryption step is encrypt-decrypt-encrypt (EDE) with a DED pattern being used to reverse the process.

Plaintext

DES (Encrypt)

DES (Decrypt)

DES (Encrypt)

K1

K2

K3

DES (Encrypt)

DES (Decrypt)

DES (Decrypt)

Ciphertext

13

Security in Web Applications

Figure 3: The triple DES algorithm

3.2.3 IDEA (International Data Encryption Algorithm) Like DES, the International Data Encryption Algorithm (IDEA), is a block cipher, using secret-key symmetric encryption. IDEA uses a 128-bit key to operate on 64bit plaintext blocks. The same algorithm is used for both encryption and decryption and consists of eight main iterations. It is based on the design concept of “mixing operations from different algebraic groups”. The three algebraic groups whose operations are being mixed are: XOR, addition, ignoring any overflow (addition modulo 216), multiplication, ignoring any overflow (addition modulo 216 +1). As shown in Figure 4 these operations operate on 16-bit subblocks, making the algorithm efficient even on 16-bit processors. IDEA runs much faster in software than DES.

Plaintext (64 bits) 16

16

16

16

X1

X2

X3

X4

Xor, Addition and Multiplication (ignoring any overflow)

Round 1

Xor, Addition and Multiplication (ignoring any overflow)

Round 8

128 bit key

Y1

Y2

Y3

Y4

Ciphertext (64 bits)

Figure 4: The operations of the IDEA cipher

IDEA’s key length is 128 bits, over twice as long as DES, which means that trying out half the keys would take 2127 encryptions. This is such a large number that braking IDEA by brute force is very difficult. It appears IDEA is significantly more secure than DES.

3.2.4 RSA (Rivest-Shamir-Adleman) The de-facto standard algorithm for implementing public-key cryptography can be used for both encryption and authentication and is called the RSA algorithm. Its inventors Rivest, Shamir and Adleman developed it in 1978 while working at

14

Security in Web Applications

MIT. Its security is based on the difficulty of factoring very large numbers, while finding huge prime numbers is not that difficult. As an example, for a human to factor the number 29,083 by hand would take perhaps an hour, but confirming that the factors are indeed 127 and 229 takes about a minute. The disparity between the effort required for computing the factors and that required for confirming them gets wider as the size of the numbers is increasing. The RSA algorithm works as follows: take two large distinct primes, p and q, and compute their product n = pq; n is called the modulus. Choose a number, e, less than n and relatively prime to (p-1)(q-1), which means e and (p-1)(q-1) have no common factors except 1. Find another number d such that (ed - 1) is divisible by (p-1)(q-1). The values e and d are called the public and private exponents, respectively. The public key is the pair (n, e); the private key is (n, d). The factors p and q may be destroyed or kept with the private key. [RSASec] It is currently difficult to obtain the private key d from the public key (n, e). However if one could factor n into p and q, then one could obtain the private key d. Thus the security of the RSA system is based on the assumption that factoring is difficult. The discovery of an easy method of factoring would ``break'' RSA Encryption works as follows: Suppose A wants to send a message m to B. A creates the ciphertext c by exponentiating: c = me mod n, where e and n are B's public key. A sends c to B. To decrypt, B also exponentiates: m = cd mod n. d and n are B's private key. The relationship between e and d ensures that B correctly recovers m. Since only B knows d, only B can decrypt this message. Suppose A wants to send a message m to B in such a way that B is assured the message is both authentic, has not been tampered with, and is from A. A creates a digital signature s by exponentiating: s = md mod n, where d and n are A's private key. A sends m and s to B. To verify the signature, Bob exponentiates and checks that the message m is recovered: m = se mod n, where e and n are A's public key. This is the standard way digital signatures works. In order to ensure confidentiality and authentication of the message at the same time, both the private key of the sender (K dA) and the public key of the receiver (KeB) are applied. For the decryption the secret key of the receiver (KdB) and the public key of the sender (KeA) have to be applied respectively. See Figure 5 Sender KdA M

DA()

Receiver KeB

DA(M)

EB()

KdB C = EB(DA(M))

DB()

KeA DB(C)

EA()

EA(DB(C)) = M

EA(DB(C)) = E A(DB(EB (DA(M)))) = EA(DA(M)) = M

15

Security in Web Applications

Figure 5: Confidentiality and authentication of the message with RSA

Thus encryption and authentication take place without any sharing of private keys: each person uses only another's public key or their own private key. Anyone can send an encrypted message or verify a signed message, but only someone in possession of the correct private key can decrypt or sign a message. The size of key used is completely variable, but for normal use a key size of 512 bits is typically enough, for more severe security requirements (or security must remain valid for many years into the future), key lengths of 1024 and 2048 bits are used. Performing exponentiations with numbers of this size is expensive in terms of computing resources. A typical software implementation of a symmetric encryption algorithm (e.g. DES) would be around 100 times faster. [EPaymSyst]

3.2.5 SSL (Secure Sockets Layer) SSL (Secure Sockets Layer) is a general purpose protocol designed to be used to secure any dialog taking place between applications communicating across a “socket” interprocess communications mechanism. The protocol was developed by Netscape Communications Corporation [NetscapeTech] in late 1994, as the standard for web browser and server authentication and secure data exchange on the web. The SSL security protocol provides data encryption, server authentication, message integrity, and optional client authentication for a TCP/IP connection. Since SSL is built into all major browsers and web servers, simply installing a digital certificate turns on their SSL capabilities. E.g. The VeriSign Server ID does this [VeriSign]. VeriSign is a leading provider of PKI and digital certificates. By installing a VeriSign Server ID (available as part of VeriSign’s Site Trust Services) at the Web Server’s side, you can securely sensitive information can be collected online. Besides this it provides authentication of the Web server; the user can be sure about the identity of the server. SSL can layer on top of any transport protocol, it is not TCP/IP dependent, and can run under application protocols such as HTTP, FTP and TELNET. SSL comes in two strengths, 40-bit and 128-bit, which refer to the length of the "session key" generated by every encrypted transaction. The longer the key, the more difficult it is to break the encryption code. Recent browsers (e.g. Netscape Communicator 4.x) support 128-bit sessions - trillions of times stronger than 40bit sessions. In implementation, public-key systems are far slower than symmetric ciphers. It is this fact that has lead to the use of a combination of both to achieve security and

16

Security in Web Applications

speed. SSL uses X.509 certificates for strong authentication, RSA as its publickey cipher and one of RC4-128, RC2-128, DES, Triple DES or IDEA as its bulk symmetric cipher.

Figure 6: The relation between SSL, the application and the transport layer

SSL uses the secret key to share the symmetric key (session key). To prevent the man-in-the-middle attack message authentication code (MAC) is introduced into the protocol. The Man-In-The-Middle attack occurs when an adversary acts as a third party in a two party conversation. Both legitimate parties assume that they are talking securely with each other; in fact the adversary is intercepting the entire conversation, decrypting it, re-encrypting it and sending it on to the intended recipient.

17

Security in Web Applications

A MAC is computed by using a secret and some transmitted data, e.g. by using MD5 (a good cryptographic digest algorithm invented by RSA). In order to avoid the Parrot-attack (someone just recording and repeating messages – this can be simply just annoying) some random elements are introduced from both sides of the conversation, so each message is completely unique. It is important that any new communications protocol conform to a standard model if it is to easily replace or become part of an existing protocol structure. The Relationship between the ISO Reference Model for Open Systems Interconnection or 7-Layer Model and SSL is presented in the Figure 7.

Figure 7: The relationship between SSL and the ISO reference model

SSL is conceptually split into two parts, the SSL Handshake Protocol (SSLHP) and the SSL Record Protocol (SSLRP). The SSLHP negotiates which bulk cipher to be used and performs authentication of server and client, if requested. The SSLRP packets the data into records and performs the agreed encryption on them or receives records and decrypts them. The SSLHP and application data layer on top of the SSLRP.

18

Security in Web Applications

So it can be seen (as in the diagram) that SSL actually rather neatly straddles the two ISO layers with SSLHP being at the Application Layer level and SSLRP at the Presentation Layer level. Since security functions have not been implemented in many protocols, SSL therefore acts as an add-on to such protocols and not a replacement. Also it can be seen that the use of SSL does not preclude the use of other security protocols, which operate at a higher level; for instance SHTTP (Secure HyperText Transfer Protocol), which is a document or data, level security protocol, maybe layered over SSL. [SSLPRef]

3.2.6 SET (Secure Electronic Transaction) The SET™ Secure Electronic Transaction technology is an encryption technology that helps protect the transfer of payment information over open networks, such as the Internet. SET™ uses advanced security technology, which allows cardholders to make secure payments to merchants on the Internet [SETCO]. SET™ technology protects payment information in four ways: 1. Authenticate that a merchant is authorized to accept payment cards, 2. Authenticate the payment card being used, 3. Protect personal payment information 4. Payment information is read only by the intended recipient Message data is encrypted using randomly generated key that is further encrypted using the recipient’s public key. This is referred as the “digital envelope” of the message and is sent to the recipient with the encrypted message. The recipient decrypts the digital envelope using a private key and then uses the symmetric key to unlock the original message. This protocol neither depends on transport security mechanisms nor prevents their use. The SET™ Specification is an open standard for the e-commerce industry developed by Visa and MaterCard. Digital Certificates create a trust chain throughout the transaction, verifying cardholder and merchant validity. In the SET environment, there exists a hierarchy of Certificate Authorities. The trust chain method continues all the way up to the CA at the top of the hierarchy, which is referred to as the SET Root CA. Products that pass SET Compliance Testing are eligible to display the SET Mark.

3.2.7 Hash Functions

19

Security in Web Applications

A hash function is an algorithm that computes a value based on a data object (such as a message or file; usually variable-length; possibly very large), thereby mapping the data object to a smaller data object (the "hash result"), which is usually a fixed-size value. Mathematically it can be defined as a function, which maps values from a large (possibly very large) domain into a smaller range. A 'good' hash function is such that the results of applying the function to a (large) set of values in the domain will be evenly distributed (and apparently at random) over the range. [X509]. The kind of hash function needed for security applications is called a "cryptographic hash function", an algorithm for which it is computationally infeasible (because no attack is significantly more efficient than brute force) to find either (a) a data object that maps to a pre-specified hash result (the "one-way" property) or (b) two data objects that map to the same hash result (the "collision-free" property). The hash function is public; there is no secrecy to the process. The security of a one-way hash function is its one-way ness. The output (hash value) is not dependent on the input (pre-image) in any discernible way. A single bit change in the pre-image changes, on the average, half of the bits in the hash value. Given a hash value it is computationally unfeasible to find a pre-image that hashes to that value. Examples of hash functions are: MD2, MD4, MD5, SHA-1.

3.2.8 Message Authentication Codes (MAC) A message authentication code (MAC), also known as a data authentication code (DAC), is a one-way function with the addition of a secret key. The hash value is a function of both the pre-image and the key. The theory is exactly the same as hash functions except only someone with the key can verify the hash value. One can create a MAC out of a hash function or a block encryption algorithm.

3.3 Security Infrastructures 3.3.1 Digital Certificates Digital certificates are electronic files that act like a kind of online identification (passport). They are issued by a trusted third party (TTP), a certificate authority (CA), which verifies the identity of the certificate's holder. They are tamper-proof and cannot be forged.

20

Security in Web Applications

Certificate authorities (CAs) are the digital world's equivalent of identification (passport) offices. They issue digital certificates and validate the holder's identity and authority. CAs embed an individual's or an organization's public key along with other identifying information into each digital certificate and then cryptographically "sign" it as a tamper-proof seal, verifying the integrity of the data within it and validating its use. Examples of CAs: VeriSign, GlobalSign, the SET Root CA. The SET Root CA is owned and maintained by SET Secure Electronic Transaction LLC. A certificate has the following content: •

The certificate issuer's name



The entity for whom the certificate is being issued



The public key of the subject



Some time stamps

The certificate is signed using the certificate issuer's private key. Everybody knows the certificate issuer's public key (that is, the certificate issuer has a certificate, and so on...). Certificates are a standard way of binding a public key to a name. Digital certificates do two things: 1. They authenticate that their holders - people, web sites, and even network resources such as routers - are truly who or what they claim to be. 2. They protect data exchanged online from theft or tampering. In SET, bindings between identities and their corresponding public keys are stored using certificates in X.509 version 3 format. In Figure 8 the format of such certificate expressed in abstract syntax notation 1 (ASN.1) is presented. The principle purpose of this certificate is to bind the identity given by the subject field to the public key held in subjectPublicKeyInfo where the binding is certified by issuer: the authority that applies the signature to the certificate.

Certificate ::=SIGNED{SEQUENCE{ version [0]Version DEFAULT V1, serialNumber CertificateSerialNumber, signature AlgorithmIdentifier, issuer Name, validity Validity, subject Name, subjectPublicKeyInfo SubjectPublicKeyInfo, issuerUniqueID [1] IMPLICIT UniqueIdentifier OPTIONAL,

21

Security in Web Applications

subjectUniqueID [2] IMPLICIT UniqueIdentifier OPTIONAL, extensions [3] Extensions OPTIONAL }

Figure 8: The certificate definition in X.509 version 3

There are two types of digital certificates that are important when building secure web sites: server certificates and personal certificates. Server certificates let visitors of a web site send personal information, such as credit card numbers, free from the threat of interception or tampering by encrypting the information exchanged between their web browser and the web server. Server certificates also let visitors of a site authenticate the site’s identity so they can feel secure that they are communicating with the addressed site and not with a rogue site impersonating it. Server certificates are important for anyone building an e-commerce site or a site designed to exchange confidential information with clients, customers, or vendors. Personal certificates let a Web page authenticate a visitor's identity and restrict access to specified content to particular visitors. They can also be used to send secure email for private account information. Personal certificates are perfect for business-to-business communications such as offering your suppliers and partners controlled access to special web sites for updating product availability, shipping dates, and inventory management.

3.3.2

Public-Key Infrastructure (PKI)

A public-key infrastructure (PKI) consists of protocols, services, and standards supporting applications of public-key cryptography. PKI sometimes refers simply to a trust hierarchy based on public-key certificates and in other contexts embraces encryption and digital signature services provided to end-user applications as well [OG99]. A middle view is that a PKI includes services and protocols for managing public keys, often through the use of Certification Authority (CA) and Registration Authority (RA) components, but not necessarily for performing cryptographic operations with the keys. Most PKI definitions are based on X.509 certificates. Among the services likely to be found in a PKI are the following: • Key registration: issuing a new certificate for a public key. • Certificate revocation: cancelling a previously issued certificate. • Key selection: obtaining a party's public key.

22

Security in Web Applications



Trust evaluation: determining whether a certificate is valid and what operations it authorizes. • Key recovery has also been suggested as a possible aspect of a PKI.

In SET, a hierarchy of certification authorities is shown in Figure 9.

23

Security in Web Applications

Root Certification Authority

Brand Certification Authority

Geo-Political Authority

(optional)

Cardholder CA

Merchant CA

Payment CA

Cardholder

Merchant

Payment Gateway

Figure 9: Certification authority hierarchy used in SET

The entities (cardholder, merchant, etc.) involved in SET transactions are identified in the certificates using an X.500 distinguished name. The tendency is that multiple independent PKIs will evolve with varying degrees of coexistence and interoperability. It is usually accepted that there will be multiple ``root'' or ``top-level'' certificate authorities in a global PKI, not just one ``root,'' although in a local PKI there may be only one root.

24

Security in Web Applications

4 Smart cards In many applications, particularly in payment, secure hardware devices can play an important role. One of the most important secure hardware devices is the smart card, which is a portable data storage device with intelligence and provisions for identity and security. It most often resembles a traditional credit or bankcard in size and dimensions, and embedded within the card is a customized integrated circuit. By demanding that the user provide a password, usually in the form of a personal identification number (PIN), before making any meaningful response, a chip card is equipped to identify positively its authorized bearer on each occasion. Secondgeneration chip cards are characterized by the presence of a magnetic stripe. In addition an active or super smart card may consist of a small keyboard, a liquid crystal display, and an onboard power supply. Such devices are also referred as electronic wallets in the context of electronic payment systems. [EPaymSyst]. The physical structure of a smart card is specified by the International Standards Organisation (ISO) 7810, 7816/1 and 7816/2. Generally it is made up of three elements. The plastic card is the most basic one and has the dimensions of 85.60mm x 53.98mm x 0.80mm. A printed circuit and an integrated circuit chip are embedded on the card. There is an operating system inside each smart card which may contain a manufacturer identification number (ID), type of component, serial number, profile information, and so on. More important, the system area may contain different security keys, such as manufacturer key or fabrication key (KF), and personalisation key (KP). All of this information should be kept secret and not be revealed by others.

4.1 Card types Magnetic stripe cards This type of card has a magnetic stripe at the back of the card that holds the detailes about the user, such as name, the card number, and so forth. Anyone with an appropriate card reader device can read the information stored on the card. Memory cards

25

Security in Web Applications

These are used for simple applications such as the prepaid telephone card, which has a chip with a certain number of memory cells, one for each telephone unit. Each cell can be switched on/off. A memory cell is cleared each time a telephone unit is used. Once they are finished, the card is useless and is thrown away. Processor cards Processor cards are characterized by the presence of a microprocessor onboard that controls access to information on the card. It operates under the control of an operating system also housed on the chip. The microprocessor card increases protection against fraud and can be used in high-value or security-critical applications. An example application is the storage of cryptographic keys, and the chip card would act as a secure hardware device. Processor cards can be further classified as: Contact cards: A smart card that operates by physical contact between the reader and the smart card's different contacts. When the card is inserted into a smart card reader, it makes contact with electrical connectors that transfer data to and from the chip. Contactless cards: Said of cards with no visible module; they communicates by means of a radio frequency signal. There is no need of physical contact between the card and a reader. Contactless smart cards are passed near an antenna to carry out a transaction. They look just like plastic credit cards, except that they have an electronic microchip and an antenna embedded inside. These components allow the card to communicate with an antenna / coupler unit without a physical contact. Contactless cards are the ideal solution when transactions must be processed very quickly, as in mass-transit or toll collection activities.

4.2 Memory types and capacity Processor and non-processor chip cards contain data storage or memory as shown in Figure 10.

26

Security in Web Applications

ROM

Input API

Data I/O

CPU

EEPROM

Output

RAM

Figure 10: Basic smart card configuration

ROM: Read-only memory (ROM) is memory programmed by means of masks and is integrated at the time the chip is manufactured. The data contained in it can subsequently be read by the microprocessor, but not altered. Into this memory is put the card operating system (COS), the input/output routines, routines for data logging, and basic functions such as algorithms for PIN checking and authentication.

EPROM An electronically programmable read-only memory (EPROM) can generally be erased with ultraviolet light after it has been programmed.

EEPROM Unlike EPROM, an electrically erasable programmable read-only memory (EEPROM) allows memory cells to be erased electrically. Advantage is that it removes virtually all limitations on the use of the card. The memory cells can be reprogrammed at least 10,000 times. Data can be retained in the memory for at least 10 years.

RAM Random access memory (RAM) serves as high-speed working storage for the processor. The amount of RAM has a lot of influence on the overall performance of the card. If more RAM is available, larger blocks of data can be exchanged in

27

Security in Web Applications

one single message between the card and the external world, reducing communication overheads.

4.3 Physical specifications There is considerable variation in the specification of the microprocessors and memory technology used by various vendors. A typical specification for a chip card is: Clock rates: EPROM: RAM: EEPROM:

1 to 5 MHz; 8 to 16 KB (for data storage, non-volatile); 256 to 500 bytes (for operating system computation); 2 to 8 KB (non-volatile).

4.4 Logical and physical security of a smart card One of the key benefits of smart cards is the ability for some cards to support on board cryptography. Cryptographic smart cards open up a whole new realm in information security because it now allows a secure place for storage of keys and keyrings. By doing the actual cryptography on the card, the keys never have to leave their storage place. This gives the cardholder a secure way of storing keys especially if the key pair was generated on the card. The smart card access control system covers file access mainly. Each file is attached by a header, which indicates the access conditions or requirements of the file and the current status as well. The fundamental principle of the access control is based on the correct presentation of PIN numbers and their management. Primarily, the access conditions of a file can be defined into the following five levels. Some of the operating systems may offer more than these depending on the application they provide: •

Always (ALW): Access of the file can be performed without any restriction.



Cardholder verification 1 (CHV1): Access can only be possible when valid CHV1 value is presented.



Cardholder verification 2 (CHV2): Access can only be possible when valid CHV2 value is presented.



Administrative (ADM): Allocation of these levels and the respective requirements for their fulfilment are the responsibility of the appropriate administrative authority.

28

Security in Web Applications



Never (NEV): Access of the file is forbidden.

Those condition levels are not hierarchical. For instance, correct presentation of CHV2 does not mean that access of file is allowed, which requires presentation of CHV1. During the operation, corresponding requirements have to be fulfilled before the selection of the file. For example, correct CHV1 value has to be presented if it is the access condition of a file. The PINs are normally stored in separate elementary files, EF CHV1 and EFCHV2 for example. Use of the access conditions on those files can prevent the PINs from being changed. The PIN can be changed by issuing the change PIN instruction together with the new and old PIN. However, for most of the smart card operating systems, the corresponding PIN will be invalidated or blocked when a fixed number of invalid PINs are presented consecutively. The number of times will vary with different systems. Summing up the file structure and access control the smart card provided, data stored on the card can be protected either individually by setting access conditions in the header of each file or hierarchically by grouping files together under a single dedicated file (DF) with access conditions set on it. Furthermore, the irreversible blockage gives maximum protection to the card so that enormous intrusions are impossible. The smart card is designed such that no single function or combination of functions can result in disclosure of sensitive data except as allowed by the security procedures implemented in the card. It incorporates not only logical but physical security features as well. Special layers of oxide over the chip protect against analysis of the content of the memory. At Cavendish laboratory in Cambridge, a technique is developed for reverse engineering the circuit chips. The layout and function of the chip can be identified using that technique. Then another technique developed by IBM can be used to observe the operation of the chip. As a result its secret can be fully revealed. Besides this, there are many different ways to perform physical attacks. For instance, erasing the security lock bit by focusing UV light on the EPROM, probing the operation of the circuit by using microprobing needles, or using laser cutter microscopes to explore the chip, and so on. However, these kinds of attacks are only available for well-funded laboratories as the costs associated are considerably high.

4.5 File system of a smart card

29

Security in Web Applications

Smart cards contain a central file system which follows the ISO/IEC 7816-4 standards. The file system is arranged hierarchical like many modern day operating systems. Files are named by a 2-byte file identifier. Smart Cards contain 3 major file types: •

Master File (MF)



Dedicated File (DF)



Elementary File (EF)

Figure 11: Logical file structure of a smart card

A root or Master File (MF) is the peak of the hierarchy. It is identified by 3F 00 as it's 2-byte identifier. It contains information and locations of files contained within it. Dedicated Files (DF) contain the actual data files. Dedicated files are like directories on smart cards. They subdivide the cards to hold files called Elementary Files (EF). The elementary file is where the actual data is stored. It can be of four different types: •

Transparent File



Linear, Variable Length Record File



Linear, Fixed Length Record File



Cyclic, Fixed Length Record File

Each type is unique in how the data is stored and it's actual purpose. Transparent files are commonly just fixed byte files used for storing information. Linear Record Files contain subdivisions called records, which hold a certain amount of bytes each. Cyclic Files are Smart Card specific. They contain a cycle of information where records are written and read in a ring like manner.

30

Security in Web Applications

Smart cards follow a specific protocol when talking to the reader and/or the PC. One commonly mentioned phrase, T=0, T=1, is used to describe the protocol used to communicate to the card from the reader. T=0 is a byte oriented protocol in that every byte that is sent, an Acknowledgement must be received. With T=1, a specific length of bytes can be sent in a data block. Information is sent to the reader in hex code format: 0x60

Gets Reader Type and Activate Reader

0x61

Sets Reader with ICC communication parameters

0x62

Turns Card Power ON

0x63

Turns Card Power OFF

0x64

Sends Reset to Card

0x65

Gets Reader-Card Status

0x66

Sends one byte to Reader

0x67

Sends Data Block to Reader

0x68 0x69

Makes Reader Resend last data block Gets Reader Capabilities

0x6A

Deactivate Reader

0x6B

Activate Reader-Dependent Features

0x6C-0x6F Reserved

A typical command to the card would include the reader command and also the card command. A Get Challenge command from a card would look like this: 0x67 0x00 0x05 0xC0 0x84 0x00 0x00 0x08 The first three bytes [ 0x67 0x00 0x05 ] indicate that the following command should be sent to the card. 0x05 is the size of the command that follows. [ 0xC0 0x84 0x00 0x00 0x08 ] is the actual card command for doing the Get Challenge. ISO standards set the way the commands are handled by the following command set standard: CLASS, INSTRUCTION, P1, P2, P3 0xC0 would be the class, [ 0x84 ] would be the instructions, and [ 0x00 0x00 0x08 ] would be the three Parameters. ([ 0x08 ] is the desired return size of the Get Challenge.)

4.6 Smart card applications

31

Security in Web Applications

Smart cards are often used in different applications, which require strong security protection and authentication. For example, smart cards can act as an identification card, which is used to prove the identity of the cardholder. It also can be a medical card, which stores the medical history of a person. Furthermore, the smart card can be used as a credit/debit bankcard, which allows off-line transactions. All of these applications require sensitive data to be stored in the card, such as biometrics information of the card owner, personal medical history, and cryptographic keys for authentication, etc. Another application of smart cards is physical access. Physical access is the ability to open a door or gate. Most modern day physical security systems use a protocol called Wiegand to communicate with door locks and other security devices on the Wiegand strip. Wiegand is especially useful for it's ability to travel longer distances without interference. So it is especially important when designing a physical access door reader to interface it to the Wiegand specifications. These door readers can tie other factors into authentication including just the verification of a certificate on a smart card. They can also use biometrics. Biometrics is especially useful when not only do you want to tie authentication to something that the user has but also something that they are. Most people aren't prone to forget their eyes or hands when they come to work so biometrics give a very secure and reliable way of identifying a user. Biometric authentication also comes in many flavours. Retinal and Iris scanning determine a person's proprietary features of their eye. Facial scanning is a way of photographing a persons face and determining distinct features such as shape, size, or cheek to nose ratio. Finger print scanning, the cheapest and most effective, looks at features of the fingerprint. Everyone is born with a distinct fingerprint that does not change during his or her lifetime. Many fingerprint scanners map out the finger's minutiae - interesting points in the print where the ridges branch out. Others measure distance. There are many types of algorithms. Biometrics ties into smart cards in many ways. They often compliment each other. The users fingerprints can be stored on the card, which would keep authentication from leaving the smart card reader, thus the fingerprint representation never travels across a data line. Second, smart cards provide the second token during authentication. If only biometrics were used, a scanned print would have to try and match itself to hundreds or thousands of stored prints in a database. With the smart card it only has to check one print, thus speeding up the authentication process. Another application is smart card enabled secure web access. Smart cards and hardware tokens provide both greater mobility and enhanced security by allowing users to carry their digital certificates with them. Unlike passwords, which could be different on every site that you visit, the same digital certificate can be used everywhere that identification is required. And certificates allow users to establish

32

Security in Web Applications

confidential communications, identify themselves to other people and web sites, and detect message tampering. This ability exists today in raw form using the smart card to sign the document before it is sent. Smart card enabled secure web transaction today begins by opening a SSL or Secure Socket Layer to the host machines web server software. Once this encrypted pipe exists the web browser makes a hash or fixed width representation of the form. It then sign's or encrypts this hash using it's private key stored on the smart card. Since the form is hashed it will be fixed width and can be encrypted on the card quickly. The original form is then sent along with this encrypted hash, or signature to the web server. At the web server the form is received and sent through the same hash function. The signature is decrypted and the hashes are compared. If they are equal then the document has not been changed during its travels to the web server. Netscape, for example, uses PKCS #11, which is a library of functions to perform various security tasks such as smart cards. Netscape Communicator 4.0 supports RSA Labs' Public Key Cryptography Standard PKCS #11 and X.509 version 3 certificates. PKCS #11 is the first open, published standard for token and smart card-based implementations of public key cryptography in the United States. PKCS #11 will allow any application to support independently developed smart cards. Users benefit from additional protection when using a smart card or hardware token because it cannot be used without a personal identification number (PIN) known only to the user. Furthermore, smart cards and tokens are disabled automatically after a specified number of failed attempts to enter the correct PIN. Smart cards and tokens also provide enhanced protection of user data at the office by separating users' certificates from their hard drives and requiring a PIN, thus reducing the odds that any individual computer can be attacked. Microsoft uses PC/SC and their Crypto API to integrate with Internet Explorer and their Windows products. The problem exists: the two do not work together. To try and solve this a consortium of companies including IBM, Sun, Netscape, and others have tried to standardize smart card communication with something called the Open Card Framework. This is a set of Java cross-platform API's for integrating smart card technology in an interoperable way.

33

Security in Web Applications

5 Web Applications 5.1 Construction and Syntax of URLs Uniform Resource Locators, or URLs, are a scheme for specifying Internet resources using a single line of printable ASCII characters. This scheme encompasses all major Internet protocols including HTTP, FTP, Gopher and WAIS. URLs are one of the foundation tools of the WWW, however are not restricted to it, can be used to communicate information about Internet resources in e-mails, notes or even books etc. A URL contains the following information: •

The protocol to use when accessing the server (e.g. HTTP, FTP, Gopher).



The Internet domain name of the site on which the server is running, along with any required username and password information.



The port number of the server. If this is omitted, the browser assumes a commonly understood default value of the indicated protocol.



The location of the resource in the hierarchical (often directory, or directory-like) structure of the server.

For example: http://www.vub.ac.be:8080/path/subdir/file.ext

The URL syntax allows you to pass query strings to the designated resource, in situations (typically HTTP, Gopher) that support such service. This is accomplished by appending the query strings to the URL, separated from the URL by a question mark: http://www.somewhere.edu/cgi-bin/srch-data?archie+database

Thus question mark must be encoded if it does not want to indicate a query string. Any special character is represented as %xx, where xx is the hexadecimal code for the desired ISO Latin-1 character. Query strings, since they are part of a URL, must also be encoded. However, query string data takes an additional level of encoding, over and above the regular encoding. E.g. space characters within query strings are encoded as plus (+) signs and not via hex character encoding.

34

Security in Web Applications

5.2 The HTTP Protocol HTTP is an Internet client/server protocol designed for the rapid and efficient delivery of hypertext materials. HTTP is a stateless protocol, which means that once a server has delivered the requested data to a client, the server breaks the connection, and retains no memory of the event that just took place. Statelessness is in part what makes an HTTP server fast. All HTTP communications transmits data as a stream of 8-bit characters, or octets. This ensures the safe transmission of all forms of data, including images, executable programs, or HTML documents containing 8-bit ISO Latin-1 characters. An HTTP connection has four stages: 1. Open the connection. The client contacts the server at the Internet address and port number specified in the URL (the default port is 80). 2. Make the request. The client sends a message to the server, requesting service. The request consists of HTTP request headers that define the method requested for the transaction and provide information about the capabilities of the client, followed by the data being sent to the server (if any). Typical HTTP methods are GET, for getting an object from the server, or POST, for posting data to an object (e.g. a gateway program) on the server. 3. Sending the response. The server sends a response to the client. This consists of response headers describing the state of the transaction (for instance, the status of the response [successful or not] and also the type of data being sent), followed by the actual data. 4. Close the connection. The connection is closed; the server does not retain any knowledge of the transaction just completed. This procedure means that each connection processes a single transaction, and can therefore only download a single data file to the client, while the stateless nature of the transaction means that each connection knows nothing about previous connections [HTMLSource]. The implication of these features are illustrated in the following two examples: Assume that HTTP is used to access an HTML document containing five inline images via the IMG element. Composing the entire document requires six distinct connections to the HTTP server: one to retrieve the HTML document itself and five others to retrieve the five image files. So there is always a single transaction per connection.

35

Security in Web Applications

Now, suppose a user retrieves a first fill-in HTML FORM from a server. When submitting the form the Web server will resend any user specific information (e.g. the name of the user) in form of a context, since otherwise all the information between subsequent transactions would be lost at the server’s side. When retrieving a second fill-in form, the context information will be returned to the user inside the HTML document. This can be accomplished by including, in the FORM returned to the user, an element. Hidden elements are used to record the state or context of the client/server transaction, and in this case to record the name.

5.3 The HTTPs Protocol A special variant of the HTTP is HTTPs or Secure HTTP. HTTPs URLs are composed in exactly the same manner as HTTP URLs, except that the connection between the client and server is encrypted, so data can pass securely between client and server. HTTPs contains a complementary encryption and authentication scheme. This scheme is implemented as part of the HTTP protocol, and involves additional HTTP request and response headers that negotiate the type of encryption being used, the exchange of encryption keys, the passing of message digest information and so on. This is similar to the basic authentication scheme, except that the HTTPs allows the client and server to exchange cryptographic certificates and encryption keys, allowing each party to authenticate the other and to encrypt data that it is sending to the other. HTTPs and SSL are complementary schemes. SSL has the advantage of encrypting all the underlying communication, but the disadvantage that the same encryption is applied to all messages: there is no way to change the encryption scheme based, for example, on the personal encryption key of a user. HTTPs, on the other hand, can allow this form of encryption, since the keys can be passed in the HTTP headers.

5.4 HTML HyperText Markup Language (HTML) is a platform-independent system of syntax and semantics for adding characters to data files (particularly text files) to represent the data's structure and to point to related data, thus creating hypertext for use in the World Wide Web and other applications.

36

Security in Web Applications

HTML is designed to specify the logical organization and formatting of text documents, with extensions to include inline images, fill-in forms, and hypertext links to other documents and Internet resources.

5.5 FORMS

The FORM element encompasses the content of an HTML fill-in form. The usage is ... . This is the element used to create fill-in forms with checkboxes, radio boxes, text input windows, and buttons. Data from a FORM must be sent to server side gateway programs for processing; so a FORM collects data but does not process it. In general the FORM and the server-side program handling the form output must be designed together so that the program understands the data being sent from the FORM. FORMs do not nest - one cannot have a FORM within a FORM. The FORM element takes three attributes. These determine where the FORM input data is to be sent; what HTTP protocol to use when sending the data; and the data type of the content. These attributes are: ACTION = ”URL” (mandatory) The action specifies the URL to which the FORM content is to be sent. Usually this is a URL pointing to a program on a HTTP server. METHOD = “GET” or “POST” (optional). The method indicates the HTTP method for sending information to the server. The default method is GET. With GET, the content of the form is appended to the URL as a query string. With the POST method, the form content is sent to the server as a message body, and not as part of the URL. ENCTYPE = “MIME_type” (optional). ENCTYPE specifies the MIME type of the data sent, using the POST method. User input into a form is solicited by three elements INPUT, SELECT, and TEXTAREA. These elements can appear only inside a form. •

The INPUT element specifies a variety of editable fields inside a form. It takes several attributes that define the type of input mechanism (text fields, buttons, checkboxes, etc.), the variable name associated with the input data, and the alignment and size of the input element when displayed.



The SELECT element allows the user to select from among a set of values presented as selectable list of text strings, the possible values are specified by the OPTION element. The attribute MULTIPLE allows

37

Security in Web Applications

multiple values to be selected; otherwise, only one can be chosen. As with the INPUT elements, the selected data is sent to the server as one or more name=value pairs. •

TEXTAREA allows the user to enter a block of text. The input block of text can grow to almost unlimited size, and is not limited by the size of the area displayed on the screen. Scrollbars can be present. TEXTAREA can contain any printable characters, in principle it can send an entire HTML document to a server using this element.

38

Security in Web Applications

6 Secured E-Forms 6.1 Introduction to Secured E-Forms The general problem of securing E-Forms proposed in this thesis is the cryptographic protection of Web HTTP traffic generated by the usage of the electronic forms. From the wide domain of securing web applications I have focused only on securing electronic forms (E-Forms) since they are completely representative for typical user interactions with a web server in case of a secure transaction processing using the Web. Web, HTTP and Form related concepts were subject of the Chapter 5. Typical existing examples of Secured E-Forms are: • Transaction processing by secured E-Forms for the basic on-line banking services: account management, payment orders, request for documents and transfer of money to electronic purse. • Ordering and payment in E-Commerce systems using secured forms. E.g. ordering a book at Amazon.com is secured by https. So the order and payment information of the user are protected. There is a high need for migration of the paper based administration forms and the existing client/server applications to Web based applications. The Web based approach could be used for the VAT declarations signed by the companies or their representatives and sent to the VAT administration. Now this is paper-based and/or floppy-based. One advantage of replacing the paper declarations by E-Forms is that, it would allow the administration to add validation rules to the forms while this is not possible on the paper documents. On the floppy-based solution the validation rule can be explained by the specification of the file, but not enforced as it can be with e.g. Form applets. Its security relies on the post, which is accepted as a confidential transport service. Migrating to Internet raise the question of implementing the desired security services. Another example, the National Office for Social Security in Belgium is receiving from the employers, electronic declarations of employment. Currently this is done by using a specific application sending all the data by ftp in a pure client/server model. Information about the employee needed for this declaration is read from his or her SIS card. This SIS card is a trusted source, because it is issued and writable only by the insurance companies, under the control of the Social

39

Security in Web Applications

Security, following their specifications. The migration of this application to the Web would increase a lot the usability of this service, since the users would not have to buy the required software. In this new Web implementation with E-Forms, the same security services should be achieved as in the existing application. In all these applications there are always different transactions between the user interacting with E-Forms in his browser and a web server. The steps of these transactions are carrying sensitive data in both directions, both sides wanting to make sure that these data are secured. Chapter 3 described the basic security concepts. Smart cards introduced in Chapter 4 can be used both to read the data it contains both as an active part of the security process. One of the main advantages of performing this securing process by a smart card at the Web client’s side is that the smart card enforces the security services, especially non-repudiation. The power of smart cards is that the secret keys on the card used for encryption/decryption never leave the card. Not even when they are generated. They are generated directly on the card or in a temper proof device. In order to read some data from the Smart card and to import that automatically into the HTML forms, or use the smart card to encrypt/decrypt, hash and sign data, the followings are needed: a card, a card reader and a library on the computer the user is using. Regarding the library, it can be a DDL with a common application or an interface with the browser. As a summary a Secured E-Form considered by this thesis can be viewed as: • an electronic form containing different fields where data can be filled in by the user, read from an external storage, Smart card, local database, etc; • a certain logic can be applied to this data; • security services can be implemented with or without the help of a Smart card; • it is sent back to the Web server.

6.2 What kind of data can be secured?

Securing data from E-Forms is very general because the data can come from many sources as shown inFigure 12: 1. data filled into the form by the user. An electronic form can contain different fields: edit fields, list boxes, combo boxes, radio buttons, checkboxes, Submit and Reset buttons;

40

Security in Web Applications

2. data coming from the server within hidden fields, this is mostly used for context information; 3. data trough the browser, from a local database, from external devices; 4. public or protected information read from the Smart Card. The server can send Applets or active components with a certain logic to be applied to the data e.g. for data validation and/or for security services. The applets are signed. Data

User

Server

Browser

Smart card

Edit fields

Hidden fields

Context information

Public data

Combo boxes

Java Applet

Local database

Protected data

Radio buttons

ActiveX

External resources

Algorithms

Textarea

Secret keys

Submit, Reset buttons

Figure 12: Possible sources of data to be protected

6.3 Interaction between the Web server and Web client

As explained in chapter 5, in case of the HTTP protocol the communication between the Web server and the Web browser is connectionless. That means that each connection processes a single transaction. In case of an interaction between a user through his Web browser and the Web Server, one session will be composed of several transactions. (Figure 13) The connectionless and stateless nature of the HTTP protocol makes it a bit difficult to implement a secured transaction using E-Forms.

41

Security in Web Applications

Suppose a user retrieves a first fill-in HTML FORM from a server. The FORM has a field into which the user types a name, which in turn allows the user to access some name-specific information in a personnel database. When the user submits the FORM, the username (and any other possibly sensitive information gathered by the FORM) is sent to a gateway program residing on the server for subsequent processing of the request. The URL of the gateway program is specified as an attribute to the FORM element. This gateway program processes the data and returns, as a second HTML document, the results to the user. This contains a second FORM, which does not contain a place for name. Since the server is stateless and thus has no memory of the first connection, how does it know the name of the person the second time around?

42

Security in Web Applications

INTERNET

Client

Server

HTML/HTTP TCP/IP

GET FIRST FORM

Session

FIRST FORM FILLING REQUEST

PUT DATA IN FORM

GET DATA FROM USER

Repeatable for all the forms

FILLED FIRST FORM

FORM FILLING REQUEST (+ CONTEXT)

PUT DATA IN FORM

GET DATA FROM USER FILLED FORM (+ CONTEXT)

CONFIRMATION

Figure 13: Client/Server interaction

43

Security in Web Applications

The server does not know. Instead, the gateway program must explicitly keep track of this information for each client, for example by placing the name information inside the HTML document returned to the user. This can be accomplished by including, in the FORM returned to the user, an element. Hidden elements are used to record the state or context of the client/server transaction, and in this case to record the name. When the client submits the FORM for the second time, the content of the hidden element is sent along with any user-input data, thereby returning the name information to the server. Using hidden elements, information can be passed back and forth between the client and server, preserving knowledge of the name for each subsequent transaction. A second method to achieve state preservation is to create a temporary file at the server and store the transaction data within this file. In this case the gateway program must return an element containing the name of this temporary file, so the gateway program will know which file to reference. This reduces the amount of data that must be sent from client server and back (and consequently secured), but also this temporary files must be managed. (E.g. deleted in case of unfinished sessions). Submitting a FORM sends the data to the server as a collection of name/value pairs. In practice, this is accomplished by sending the data as a collection of string of the form name=value, linked by ampersand characters (&) (e.g. name1=value1&name2=value2…) In addition blank characters are encoded into plus signs. With the GET method, this string is appended to the URL, separated from it by a question mark. When the HTTP server receives these data, it forwards the entire query string to the gateway program. If the HTTP POST method is used instead of GET, the same data will be sent to the server but in a different way. The POST method sends data in a message body, and not in the URL. The differences are the followings: the method header field, which now specifies the POST method and which has no data appended to the URL, two additional header fields: Content-type and Contentlength are also included in the header. Data are sent to the server as a message body that follows the request header. The GET and POST methods for handling FORM input have different strengths and weaknesses. POST is clearly much superior if you are sending large quantities of data to the server, and should always be used in this case. In case of small quantities there is a choice. Since one wants to hide the E-Form’s content as much as possible, POST should be used.

44

Security in Web Applications

6.4 Security of the E-Forms In order to consider the security of the E-Forms the four security services: authentication, confidentiality, data integrity and non-repudiation, will be considered in the followings. Partly most of the security services can be achieved by using the SSL protocol at the transport layer level, but not all of them, e.g. the non-repudiation service must be provided by the application level. The SSL record layer provides confidentiality, authentication, data integrity and replay protection over a connection-oriented reliable transport protocol such as TCP.

6.4.1 Authentication Authentication as defined in §3.1 is the process of verifying identity so that one entity can be sure that another entity is who it claims to be. In the case of E-Forms the user knows the Web server and is sure that the Web server is who it pretends to be and vice versa, the Web server knows the user and it is sure that the user is who he or she pretends to be. Both authentication at the application layer level and authentication at the transport layer level can be discussed. Authentication of the Web server at the transport layer level can be done using SSL 2.0, but not the authentication of the user also. SSL 3.0 provides authentication of both the client and the Web server.

6.4.2 Confidentiality Confidentiality as stated in §3.1 is the property that information is not made available or disclosed to unauthorized individuals, entities, or processes. The SSL protocol provides confidentiality, however some subtle treats posed by traffic analysis, have been signalled by [AnalysisSSL], which is a detailed technical analysis of the cryptographic strength of the SSL 3.0 protocol. When a Web browser connects to a Web server via an encrypted transport such as SSL, the GET request containing the URL is transmitted in encrypted form. Exactly which Web page was downloaded by the browser is considered confidential information, since knowledge of the URL is often enough for an adversary to obtain the entire Web page downloaded. Traffic analysis is a passive attack, which can recover the identity of the Web server, the length of the URL requested, and the length of the HTML data returned by the Web server. This leak could often allow an eavesdropper to discover what page was accessed. Web

45

Security in Web Applications

search engines can catalogue the data openly available on a Web server and find all URLs of a given length on a given server, which return a given amount of HTML data. This vulnerability is present because the ciphertext length reveals the plaintext length in case of stream ciphers, or a close estimate of the plaintext length in case of block ciphers, since in the latest case plaintexts are padded out to the next 8byte boundary. As recommended in [AnalysisSSL], SSL should support even more, require the usage of the random-length padding for both cipher modes.

6.4.3 Data integrity

Data integrity is the property that data has not been changed, destroyed, or lost in an unauthorized or accidental manner i.e. the data that has been received is the same as the data that was sent. SSL protects the integrity of application data by using a cryptographic MAC (HMAC). Message authentication code (MAC), (See §3.2.8) is a one-way function with the addition of a secret key. SSL 2.0 suffered from at least one flaw along these lines: the SSL 2.0 MAC covered padding data but not the length of the padding, so an active attacker was free to manipulate the cleartext padding length field to compromise message integrity. SSL 2.0 was subject to quite a number of active attacks on its record layer and key-exchange protocol. SSL 3.0 plugs those gaping holes and thus is considerably more secure against active attacks. SSL 3.0 also provides much better message integrity protection in export-weakened modes the common case than SSL 2.0 did: SSL 2.0 provided only 40-bit MACs in those modes, while SSL 3.0 always uses 128-bit MACs.

6.4.4 Non-repudiation Non-repudiation is a security service that provides protection against false denial of involvement in a communication. The goal of non-repudiation is to prove that a message has been sent and received. Mostly non-repudiation can be achieved by using digital signatures certified by a Certification Authority (CA) in a Public Key Infrastructure (PKI). (See §3.3.2)

46

Security in Web Applications

Smart card can be used in order to: 1. add public and/or protected information from the Smart card, 2. compute hash, 3. sign the hash (for this PIN number must be requested and verified) Computing hash and signing data is used for digital signatures, which ensure besides the authentication service, the non-repudiation service also. Nonrepudiation however similar, is stronger than authentication, because besides of verifying identity, it can prove identity to a third party in case of dispute. In order to safely use the public key of the Web server, it must be first verified. More exactly its certificate issued by a CA must be verified. After that, different kind of data presented in §6.2 must be digitally signed, encrypted with a randomly generated symmetric key sent to the Web server together with the digital envelope. The client runs the data through a one-way algorithm to produce a unique value known as the message digest. This is a kind of digital fingerprint of the data and will be used later to test the integrity of the message. He then encrypts the message digest with his private signature key to produce the digital signature. Next, he generates a random symmetric key and uses it to encrypt all the data, the signature and a copy of his certificate, which contains his public signature key. To decrypt the data, the server will require a secure copy of this random symmetric key. The server’s certificate, which the client must have obtained prior to initiating secure communication with him, contains a copy of his public-exchange key. To ensure secure transmission of the symmetric key, the client encrypts it using the server’s public-exchange key. The encrypted symmetric key, referred to as the digital envelope, will be sent to the server along with the encrypted message itself. Finally, the client sends a message to the server consisting of the following: the symmetrically encrypted data, signature and certificate, as well as the asymmetrically encrypted symmetric key (the digital envelope).

7 Conclusions

47

Security in Web Applications

Glossary APDU Application Protocol Data Unit Abstract Syntax Notation One (ASN.1) A standard for describing data objects. [X680] Biometric authentication A method of generating authentication information for a person by digitising measurements of a physical characteristic, such as a fingerprint, a hand shape, a retina pattern, a speech pattern (voiceprint), or handwriting. Blowfish A symmetric block cipher with variable-length key (32 to 448 bits) designed in 1993 by Bruce Schneier as an unpatented, license-free, royalty-free replacement for DES or IDEA. Brute force A cryptanalysis technique or other kind of attack method involving an exhaustive procedure that tries all possibilities, one-by-one. Certification authority (CA) An entity that issues digital certificates (especially X.509 certificates) and vouches for the binding between the data items in a certificate. Cipher A cryptographic algorithm for encryption and decryption. Ciphertext The scrambled plaintext after encryption. Cross-certification The act or process by which two CAs each certify a public key of the other, issuing a public-key certificate to that other CA. Especially useful to enable users to validate each other's certificate when the users are certified under different certification hierarchies. Cryptanalysis

48

Security in Web Applications

The mathematical science that deals with analysis of a cryptographic system in order to gain knowledge needed to break or circumvent the protection that the system is designed to provide. Cryptography The mathematical science that deals with transforming data to render its meaning unintelligible (i.e., to hide its semantic content), prevent its undetected alteration, or prevent its unauthorized use. If the transformation is reversible, cryptography also deals with restoring encrypted data to intelligible form. Cryptology The science that includes both cryptography and cryptanalysis, and sometimes is said to include steganography. Data Encryption Algorithm (DEA) A symmetric block cipher, defined as part of the U.S. Government's Data Encryption Standard. DEA uses a 64-bit key, of which 56 bits are independently chosen and 8 are parity bits, and maps a 64-bit block into another 64-bit block. This algorithm is usually referred to as "DES". Dictionary attack An attack that uses a brute-force technique of successively trying all the words in some large, exhaustive list. Digital certificate A certificate document in the form of a digital data object (a data object used by a computer) to which is appended a computed digital signature value that depends on the data object. Digital envelope A digital envelope for a recipient is a combination of encrypted content data (of any kind) and the content encryption key in an encrypted form that has been prepared for the use of the recipient. Digital enveloping is a hybrid encryption scheme to "seal" a message or other data, by encrypting the data and sending both it and a protected form of the key to the intended recipient, so that no one other than the intended recipient can "open" the message. Digital signature A value computed with a cryptographic algorithm and appended to a data object in such a way that any recipient of the data can use the signature to verify the data's origin and integrity.

49

Security in Web Applications

El Gamal algorithm An algorithm for asymmetric cryptography, invented in 1985 by Taher El Gamal, that is based on the difficulty of calculating discrete logarithms and can be used for both encryption and digital signatures. Electronic Data Interchange (EDI) Computer-to-computer exchange, between trading partners, of business data in standardized document formats. Elliptic Curve Cryptography (ECC) A type of asymmetric cryptography based on mathematics of groups that are defined by the points on a curve. End-to-end encryption Continuous protection of data that flows between two points in a network, provided by encrypting data when it leaves its source, leaving it encrypted while it passes through any intermediate computers (such as routers), and decrypting only when the data arrives at the intended destination. Firewall An Internet gateway that restricts data communication traffic to and from one of the connected networks (the one said to be "inside" the firewall) and thus protects that network's system resources against threats from the other network (the one that is said to be "outside" the firewall). The firewall applies security policy rules to control traffic that flows in and out of the protected network. HTTPs When used in the first part of a URL (the part that precedes the colon and specifies an access scheme or protocol), this term specifies the use of HTTP enhanced by a security mechanism, which is usually SSL. Hybrid encryption An application of cryptography that combines two or more encryption algorithms, particularly a combination of symmetric and asymmetric encryption. (E.g. digital envelope.) International Data Encryption Algorithm (IDEA) A patented, symmetric block cipher that uses a 128-bit key and operates on 64-bit blocks. Key management The process of handling and controlling cryptographic keys and related material (such as initialisation values) during their life cycle in a

50

Security in Web Applications

cryptographic system, including ordering, generating, distributing, storing, loading, escrowing, archiving, auditing, and destroying the material. Key pair A set of mathematically related keys; a public key and a private key that are used for asymmetric cryptography and are generated in a way that makes it computationally infeasible to derive the private key from knowledge of the public key (e.g. Diffie-Hellman, Rivest-ShamirAdleman). Keyed hash A cryptographic hash in which the mapping to a hash result is varied by a second input parameter that is a cryptographic key. Man-in-the-middle A form of active wiretapping attack in which the attacker intercepts and selectively modifies communicated data in order to masquerade as one or more of the entities involved in a communication. MD2 A cryptographic hash that produces a 128-bit hash result, was designed by Ron Rivest, and is similar to MD4 and MD5 but slower. MD4 A cryptographic hash that produces a 128-bit hash result and was designed by Ron Rivest. MD5 A cryptographic hash that produces a 128-bit hash result and was designed by Ron Rivest to be an improved version of MD4. One-way function “A (mathematical) function, f, which is easy to compute, but which for a general value y in the range, it is computationally difficult to find a value x in the domain such that f(x) = y. There may be a few values of y for which finding x is not computationally difficult." [X509]. PIN (Personal Identification Number) the number or code that a cardholder must type in to confirm that he or she is the genuine cardholder. PKCS (Public-Key Cryptography Standards) PKCS #7

51

Security in Web Applications

A standard from the PKCS series; defines a syntax for data that may have cryptography applied to it, such as for digital signatures and digital envelopes. PKCS #11 A standard from the PKCS series; defines a software CAPI called Cryptoki (pronounced "crypto-key"; short for "cryptographic token interface") for devices that hold cryptographic information and perform cryptographic functions. PKI (Public Key Infrastructure) The software and/or hardware components necessary to manage and enable the effective use of public key encryption technology, particularly on a large scale. Plaintext Data that is input to and transformed by an encryption process, or that is output by a decryption process. Private key The secret component of a pair of cryptographic keys used for asymmetric cryptography. Public key The publicly disclosable component of a pair of cryptographic keys used for asymmetric cryptography. Public-key certificate A digital certificate that binds a system entity's identity to a public key value, and possibly to additional data items; a digitally signed data structure that attests to the ownership of a public key. Public-Key Cryptography Standards (PKCS) Series of specifications published by RSA Laboratories for data structures and algorithm usage for basic applications of asymmetric cryptography. (PKCS #7, PKCS #10, PKCS #11.) Random number generator A process used to generate an unpredictable, uniformly distributed series of numbers (usually integers). RC2, RC4 Rivest Cipher #2 and Rivest Cipher #4, two proprietary, variable-keylength block ciphers invented by Ron Rivest for RSA Data Security, Inc.

52

Security in Web Applications

Rivest-Shamir-Adleman (RSA) An algorithm for asymmetric cryptography, invented in 1977 by R. Rivest, A. Shamir, and L. Adleman. RSA uses exponentiation modulo the product of two large prime numbers. The difficulty of breaking RSA is believed to be equivalent to the difficulty of factoring integers that are the product of two large prime numbers of approximately equal size. Session key In the context of symmetric encryption, it is a key that is temporary or it is used for a relatively short period of time. Usually, a session key is used for a defined period of communication between two computers, such as for the duration of a single connection or transaction set, or the key is used in an application that protects relatively large amounts of data and, therefore, needs to be rekeyed frequently. Smart card A credit-card sized device containing one or more integrated circuit chips, which perform the functions of a computer's central processor, memory, and input/output interface. X.509 An ITU-T Recommendation [X509] that defines a framework to provide and support data origin authentication and peer entity authentication, including formats for X.509 public-key certificates, X.509 attribute certificates, and X.509 CRLs. X.509 describes two levels of authentication: simple authentication based on a password, and strong authentication based on a public-key certificate.

53

Security in Web Applications

References [AnalysisSSL] David Wagner, Bruce Schneier, “Analysis of the SSL 3.0 protocol”, 1997 [ApplCrypt] Bruce Schneier, “Applied Cryptography, Protocols, Algorithms, and Source Code in C”, 1996, John Wiley & Sons, Inc. [ECImpers1998] Brendan Macmillan, “Electronic Commerce Impersonation”, Masters thesis, 1998, Monash University, http://www.csse.monash.edu.au/~bren/ecommerce.html [EPaymSyst] D. O’Mahony, M. Peirce, H. Tewari “Electronic Payment Systems”, 1997, Artech House [GlobalSign] GlobalSign, http://www.globalsign.net/ [HTMLSource] Ian S. Graham, “HTML Sourcebook, A Complete Guide to HTML 3.0”, 1996, John Wiley & Sons, Inc. [NetscapeTech] Netscape’s Tech Briefs, http://home.netscape.com/security/techbriefs/index.html [OG99] The Open Group, Architecture for Public-Key Infrastructure (APKI), 1999, http://www.opengroup.org/publications/catalog/g801.htm [RSASec] RSA Security, http://www.rsasecurity.com/ [SETCO] SET Secure Electronic Transaction LLC, http://www.setco.org/setmark/ [SecInf]

54

Security in Web Applications

V.V.Patriciu, M. Pietro_anu-Ene, I. Bica, C. Cristea, “Securitatea informatic_ în Unix _i Internet”, 1998, Editura Tehnic_, Bucure_ti [SSLPRef] Jeremy Bradley“, The SSLP Reference Implementation Project”, University of Bristol, http://www.cs.bris.ac.uk/~bradley/publish/SSLP/contents.html [VeriSign] VeriSign, Inc., http://www.verisign.com/ [W3CSecRes] W3C Security Resources, http://www.w3.org/Security/ [X509] Recommendation X.509, "Information Technology—Open Systems Interconnection--The Directory: Authentication Framework". (Equivalent to ISO 9594-8.)

55