Streaming-Based XML Encryption and Decryption

Streaming-Based XML Encryption and Decryption Chair for Network- and Data Security Horst Görtz Institute for IT Security Ruhr-University Bochum Benj...
Author: Clarence Butler
1 downloads 0 Views 6MB Size
Streaming-Based XML Encryption and Decryption

Chair for Network- and Data Security Horst Görtz Institute for IT Security Ruhr-University Bochum

Benjamin Sanno Matriculation Number: 108006248774 Supervisors: Prof. Dr. Jörg Schwenk, Juraj Somorovský October 14, 2010

Declaration of Authorship I hereby declare that • that I have written this thesis without any help from others and without the use of documents and aids other than those stated below, • that I have mentioned all used sources and that I have cited them correctly according to established academic citation rules • that I have produced this thesis without the prohibited assistance of third parties and without making use of aids other than those specified • that this thesis has not previously been presented in identical or similar form to any other German or foreign examination board.

Bochum, October 14, 2010

Benjamin Sanno

2

Abstract XML Encryption is a W3C recommendation that specifies how XML elements should be encrypted. Therewith, message confidentiality can be achieved. However, conventional frameworks applying XML Encryption use DOM-based XML processing. The DOM API is tree-based and therefore the whole document must be parsed before data can be encrypted or decrypted. In contrast, SAX and StAX do streaming-based XML processing and their output is a stream of events. So far there are no efficient and fast frameworks that apply XML Encryption to a stream of XML events. In this thesis, an event pipeline concept is used to further process the output of streaming-based XML APIs. Efficient and fast event pipeline modules are proposed that facilitate encryption and decryption. Each module was implemented and the decryption modules were analyzed to figure out which is most efficient. Measurements reveal that an event pipeline which uses streaming XML parsers has advantages over the DOM API with regard to memory requirements and execution time for the parsing and decryption process.

3

Contents 1 Introduction

1

2 Related Work 2.1 XML . . . . . . . . . . . . . . . . . . . . 2.2 XML APIs . . . . . . . . . . . . . . . . 2.2.1 SAX - Simple API for XML . . . 2.2.2 DOM API . . . . . . . . . . . . . 2.2.3 StAX - Streaming API for XML 2.3 Event Pipeline . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

3 3 4 5 6 8 9

3 Design Concepts 3.1 Nested XML Encryption and Decryption . . . 3.2 Stream Encryption . . . . . . . . . . . . . . . 3.3 Stream Decryption . . . . . . . . . . . . . . . 3.3.1 Push-Pull Problem on Event Streams 3.3.2 Event-Stream Decryption . . . . . . . 3.3.3 Byte-Stream Decryption . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

11 11 12 13 14 15 17

. . . . . . . . . . . . . . . . . . . . . . . . Source Code

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

19 20 22 23 26

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

27 27 27 28 29 32 33 33 35 35 36 39

4 Implementation 4.1 Event Pipeline . . . . . . . . 4.2 Encryption Module . . . . . . 4.3 Decryption Modules . . . . . 4.4 Modification of the Javolution

. . . . . . . . . . . . Parser

. . . . . .

5 Performance Analysis 5.1 Experimental Setup . . . . . . . . . . . . 5.1.1 Measuring Execution Times . . . . 5.1.2 Measuring Memory Usage . . . . . 5.2 Excursion: Parser Analysis . . . . . . . . 5.3 Excursion: Base64 Decoding Performance 5.4 Parsing Analysis . . . . . . . . . . . . . . 5.4.1 Memory Usage . . . . . . . . . . . 5.4.2 Execution Time . . . . . . . . . . . 5.5 Decryption Analysis . . . . . . . . . . . . 5.5.1 Memory Usage . . . . . . . . . . . 5.5.2 Execution Time . . . . . . . . . . . 6 Conclusion

. . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

40

4

List of Figures 2.1 2.2 2.3 2.4 2.5

Simple API for XML [17] . . . . . . . . . . . . . . . . . . . . . . Transformation of a document to its DOM representation [7] . . Sun’s Project X reference implementation of a DOM API [17][28] Streaming API for XML . . . . . . . . . . . . . . . . . . . . . . . Basic event pipeline design . . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

5 6 7 8 9

3.1 3.2 3.3 3.4 3.5 3.6 3.7

Event pipeline configuration for nested encryption or decryption . . . . Internal design of the encryption module . . . . . . . . . . . . . . . . . . Core design problem: interface between character events and the parser Push-pull problem on streams . . . . . . . . . . . . . . . . . . . . . . . . Decryption module that implements simple buffering . . . . . . . . . . . Decryption module that implements thread-based decryption . . . . . . Illustration of the CharactersInputStream component . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

11 12 14 15 16 17 18

4.1 4.2 4.3 4.4 4.5 4.6 4.7

Prototype suite package overview . . . . . . . . . . . . . . . . Inner body of the pipeline package . . . . . . . . . . . . . . . Strongly simplified call graph of the pipeline implementation Call graph of the encryption module . . . . . . . . . . . . . . Internal dependencies of the decryption package . . . . . . . Internal dependencies of the ESBufferDecrypterModule . . . Internal dependencies of the ESThreadDecrypterModule . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

19 20 21 22 23 24 25

. . . .

30 31 32 32

. . . . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . . . .

5.1

Parser: heap size over time during parsing of 200.000 XML elements that have either the same tag name or all different tag names . . . . . . . . . . . . . . . . . . . . . 5.2 Xerces Parser: MAT analysis results . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Parser: heap size over time during parsing a completely encrypted XML document 5.4 Base64 performance analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Parser modules: heap size over time during parsing of 200.000 XML elements (150 different names) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Parser modules: performance analysis . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Decryption modules: heap size over time during parsing and decryption of 200.000 XML elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Decryption modules: heap size over elements during parsing and decryption of 100.000 XML elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 Decryption modules: heap size over decryption progress in percent (200k elements, 8MB file) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10 Decryption modules: performance analysis . . . . . . . . . . . . . . . . . . . . . . .

5

. 34 . 35 . 36 . 37 . 38 . 39

1 Introduction The Extensible Markup Language (XML) [1] is used in many modern applications e.g. web services, cloud computing, database management, Service Oriented Architectures (SOA), et cetera. XML Encryption XML files can contain sensitive and confidential data. Therefore, security and especially privacy is important. To achieve these goals, computationally intensive XML file encryption is necessary based on the XML Encryption recommendation [5]. XML APIs A common Application Programming Interface (API) to access XML files is the DOM API [6]. The Document Object Model (DOM) is an in-memory representation of the XML file. This API is usually used to read an XML file. However, a disadvantage is that much system memory is occupied if large files are parsed. Furthermore, multiple DOM objects must be created and stored in the memory if XML Encryption is used. Another concept are streaming-based XML APIs like SAX (Simple API for XML) [14] or StAX (Streaming API for XML) [19]. These do not create an in-memory representation of the XML file. Their output is a stream of XML events, not a DOM. Streaming-based concepts have great advantages over tree-based concepts. XML element data becomes available to the subsequent application as soon as the parser has read it. This results in low latency, efficient use of CPU cycles and more reactive client applications. Another advantage of the streaming concept is that XML source files can be larger than the system memory. For instance, XML database files of insurance companies, in cloud computing environments, and others can have sizes up to 10GB or more [29, 30]. Some modern XML applications are executed on mobile devices with limited computing and memory resources so that a DOM object would occupy a large proportion of that limited resources [31]. Streaming-Based Processing "It can be argued that the majority of XML business logic can benefit from stream processing, and does not require the in-memory maintenance of entire DOM trees." [18]. An event pipeline pattern can be used to apply complex functionality to streamingbased XML APIs. To address confidentiality of information, an XML event pipeline should realize the XML Encryption recommendation to enable cryptographic functionality for those XML APIs. Until now, there is less scientific work in this field of IT-security [2, 3, 4]. Prototype Implementation The main goal of this thesis is to design efficient and fast eventstream encryption and mainly decryption concepts for event pipeline modules. Those concepts are implemented to demonstrate their functionality. The final decryption component should be able to process nested encrypted data as efficient as possible. Although the source XML file contains nested and encrypted XML elements, the final implementation should be able to process the elements strictly in sequential order. Otherwise, the advantages of the event pipeline pattern may be lost.

1

1 Introduction

Optimization Usually, network bandwidth as well as execution time, latency and efficient memory usage are crucial factors in computing environments. As a consequence of this, it is very important to address performance and processing efficiency. Therefore, the performance of the prototype implementation was extensively analyzed for this thesis. The next chapter explains the related work. Basic concepts like XML, XML Encryption, event pipeline pattern, and XML APIs are described. Chapter three is about design concepts for streaming-based encryption and decryption. The fourth chapter deals with the prototype implementation details. And finally, in chapter five the performance analysis results are illustrated and evaluated.

2

2 Related Work The following sections introduce to basic concepts that are necessary to understand the subsequent chapters. Terms and acronyms are defined and specified e.g. XML, API, SAX, DOM, tree-based, streaming-based, base64, event pipeline, et cetera.

2.1 XML The acronym XML stands for Extensible Markup Language. It is a W3C recommendation since 1998 [1]. W3C is a community of information technology experts. This organization publishes technical specifications and recommendations to ensure a long-term growth of the World Wide Web. XML was designed to transport and store data. Originally, it should be human readable to some extent for debugging and other administrative work. An XML document consists of markup and character data. Markup is defined as all tags, references, declarations, sections, and comments. Character data is all text that is not markup ([1] Section 2.4). There are some rules how markup elements can be used, which results in strict, tree-structured documents. A simple XML example is shown in Listing 2.1. The core concept is the XML element that consists of a start-tag, an end-tag and content. Tags must be well-formed, i.e. for each start-tag must exist an end-tag with the same name in the document. Otherwise, a parser would throw an exception. All tags must be nested correctly. That means all end-tags must be closed in the opposite order than the start-tags. Every XML document has a header and a body. The header specifies that the text file is an XML document. Listing 2.1 is a simple XML document and its header is the first line . Beside the version attribute, it can also contain attributes like encoding or standalone to signalize the XML parser how to interpret the body’s content correctly. 1 2 3 4 5 6 7 8 9