Toward A User Guide for File Compression

Toward A User Guide for File Compression Ahmad Sharieh Computer Science Department/ KASIT The University of Jordan, Amman/Jordan E-mail: [email protected]...

Author: Rudolph Lamb

1 downloads 0 Views 47KB Size

Report

Download PDF

Recommend Documents

File-Mate 1500 User Guide

Creditor Master File Information User Guide

Cloud File Server Desktop Edition. Installation Guide. For General User

On-line Data Compression in a Log-structured File System

File Recipe Compression in Data Deduplication Systems

H.264 Video Compression Digital Video Recorder. User Guide

Modifying User Accounts from a Flat File

Video Compression Card User Manual

PyPop User Guide User Guide for Python for Population Genomics

File Management Guide. A complete guide to managing StarSCAN and StarMOBILE user data files

File & ServeXpress. Filing and Serving Documents User Guide

User Guide for iphone

User Guide (For AirPrint)

User Guide For WinRx

User Guide for MNTradeStation

Communication Journey: Aphasia. Vocabulary File User s Guide

User Guide for Breakout

for iphone User Guide

User guide for ebooks

User Guide for Middleware

The C3D File Format. User Guide. By Motion Lab Systems

User Guide User Guide

Diesel: A User Guide

Guide for File Prep and Cutting

Toward A User Guide for File Compression Ahmad Sharieh Computer Science Department/ KASIT The University of Jordan, Amman/Jordan E-mail: [email protected]

ABSTRACT In this paper, a user guide is proposed to compress multimedia files based on their extension names. The files are arranged according to their alphabetical extension names or according to their conceptual contents. A system is built for helping the user to compress or decompress a given file based on its multimedia content types such as: text, image, audio, and video. Some experiment results support the selection of appropriate compression methods rather than others, based upon the file type (content) and extension name. Keywords: Compression, Decompression, File Extension, Huffman, Multimedia Files, Run Length Encoding.

1. Introduction It is still necessary to compress files in order to save storage as well as to speed transfer in networks. Modern hardwareeven with high speed and a large amount of storage- is not capable of storing or transferring files which are large in size [4, 7, 10]. Most types of files these days tend to be big. For example: a text of a dictionary of 50,000 words (with six characters in average) may need 300,000 bytes, when 8-bits ASCII coding is used. An image with a resolution of 1024x1024 requires 3,145,728 bytes [21]. A multimedia (that combines text, image, audio, video, and animation) file requires more space for storage [21, 24]. A data file in a computer system (with hardware, software, and communication media) is thought to be a descendent of a bit (1 or 0 value) seed. A collection of bits treated as one unit forms a byte (or a word). The size of the byte is agreed on to be 8 bits in ASCII code, where a byte represents a symbol. A set of related symbols forms a field. A set of related fields are a record. A set of related records is a file. The set of related files is organized in a database. The databases form a knowledge base. From the knowledge base, you obtain information which is very essential to make a decision.

Thus, if you consider a bit is an object class-from an object oriented point of view-then a file is an inheritance object class (derived) from the record as a parent (derived class) and the field as its grandparent (base class). Each filewhatever its type-has a set of attributes and a set of methods in which it operates. The attributes are used by most operating system including: WINDOWS and UNIX. The attributes we added are: compressed size, entropy, and compressed ratio [1, 4, 21]. We added-to the well known methods used by most operating systems-the operations of compression and decompression. Indeed, the integration of new attributes-and a new method is our concerns in this paper. There are compression measures such as compression ratio, entropy, and time complexity in which to compare compression algorithms [2, 7, 13]. The compression ratio is defined as the value of the size of the output stream divided into the size of the input stream. The compression size is the size of the new file after compression is completed. The compression ratio is a percentage that results from dividing the compression size by the original file size and then multiplying the result by 100%. The

entropy is the number that results from dividing the compression size into bits by the number of symbols in the original file and scales as bits/symbol in symbol probability. A probability for each symbol in the original file is calculated by dividing the frequency of this symbol in the original file by the total of the whole symbols in this file. The entropy of a single symbol ai is computed as –Pilog2Pi, where Pi is the probability of occurrence of ai in the data. Subsection 1.2 reviews several compression algorithms, while Subsection 1.3 shows the results of applying several compression algorithms on different files. Section 3 introduces the guide to the users in order to help him/her to efficiently compress their files. Section 4 presents the conclusions and future work. 1.2. Compression Algorithms Different files-with different contents and formats need appropriate compression techniques. Conventional compression methods can be classified as lossy or lossless compression [4, 7, 24]. Image, sound, and video data can be compressed using lossy of those types does not reduce perceived quality of the compressed data [24]. The size of a compressed file-using either approach- depends on the specific content of the file. A name of a file consists of two parts. The first is on the left of the “.”. The second is to the right of the dot, called extension. In general, the default extension part consists of three symbols. We studied different files with different extension name and context in order to guide the user to select an appropriate file compressor. A typical alphabet may consist of 2 binary digits, 4-bit pixels, 7-bit ASCII Code, or 8-bits (or 8-bit ASCII). Files can be classified into: text, image, video, and audio. Each symbol in most of these files is represented in: 8 (as in text and ASCII), 12, 16, 24, or 32-bits (as in sound blaster files). This representation is

one of the factors that affect the compressor selection. The pattern of the content and its repetition also affects the compression efficiency. Some sound files may compress well under Run Length Encoding (RLE) algorithms but not under dictionary algoritms [5, 11, 13]. Some video files may compress well under statistical compression but not under a RLE. Thus, the user needs an expert system to guide him/her. Huffman’s algorithm provided the first solution to the problem of constructing a minimumredundancy code [2, 3, 20]. It is an important algorithm because it has been a benchmark upon which other compression algorithms are compared. Arithmetic coding replaces the source ensemble by a code string which unlike other codes is not concatenation [19]. The RLE may work well when the file contains long runs of identical samples [11, 24]. The statistical methods assign variable-size codes to the symbols in a file according to the frequency of occurrence [2, 13, 20]. The dictionary based methods work well on files that have symbols that are repeated again and again [5, 7]. Thus, this method fits well with text, where certain phrases may be repeated often. In sound files, the phrases (samples) look to us similar, but are slightly different. So, the dictionary based methods are not appropriate for sound compression. There are other methods which are not classified as: RLE, statistical, or dictionary based methods. Among these methods are: Burrows-Wheeler, Symbol Ranking, Associative Dictionary (or ACB), Sortbased Context Similarity, Sparse Binary Strings, compression methods based on word rather than on symbols, textual image, FHM (Fibonacci, Huffman, and Markov), Dynamic Markov, Sequitur, geometric compression [21]. Table 1 summarizes some of these- and other methods- with the content of some files which are suited to be compressed.

Table 1: Explanation of some un-conventional compression methods Compression method Explanation Works well on Burrows-Wheeler Starts with a string S on length n symbols Images, Sound, and and scrambles them into another string L, Text where L has a concatenation of just a few symbols. Symbol Ranking Uses context to rank symbols and not 8-bit symbols (Text) probabilities. It is slow, but produces excellent compression [21] ACB (Associative Compresses data based on an associative Text Coder of Buyanovsky) dictionary. Its features are related to traditional dictionary base and symbol ranking methods. Highly efficient text P691 compression Sort-based Context Assigns ranks to symbols and is related to Image, Sound, and Similarity Burrows-Wheeler and symbol ranking Text methods Sparse Binary Strings Compresses a file based on a number of Bit Stream, Binary consecutive zeros it contains Base on Word Compresses data based on word rather Text than individual symbols. Examples are: Word-based Adaptive Huffman (better than on character, but slower), LZW, and Order-1 prediction. Textual Image Groups pixels to form characters. It is OCR, Adope. complex and excellent compression. Scanned, Marks, Textual Images FHM Compresses curves FHM. Curves, Digital Signature Dynamic Markov Uses finite-state machines to estimate the Two-Binary Alphabet Coding probability of symbols, and the arithmetic (machine code or coding. Very good compression and fast executable file files, images) and sound. Sequitur Compresses semi-structured text based on HTML, Forms, Emails context-free grammar Messages, Data Bases. Geometric Edge breaker is one example of this Polygon and Mesh method and compresses polygonal surfaces 1.3 Compression algorithm Performance We implemented some of the well known compression algorithms such as: Huffman (the oldest and most widespread technique for data compression, and used in the compression of many types of data such as: text, image, audio, and video), FixedLength Code (FLC), and Huffman after using Fixed-Length Code (HFLC). This section analyses different results for a variety of experiments. We will show the results of applying: Huffman, FLC, HFLC, and LZW algorithms [3, 8, 12, 14, 16, 20]

on a sample of text files. The results of testing Huffman and RLE algorithms on three categories of image files are presented. The Arithmetic, FGK, Huffman, and LZ algorithms were tested on audio files [24]. The Huffman and RLE were examined on a sample of video files [6, 18]. First, consider the results of the compression ratio and entropy of eleven text files using four types of compression

techniques. These files are taken from the Calgary Corpus [21]; which is a set of traditional files used to test data compression programs. The results are shown in Table 2. A PC with 2.4GHz Processor and 248M RAM was the platform for these experiments. For files of small sizes (1024 to 4096 bytes), the Huffman perform the best, but for the larger sizes (8192 to 1048576), the LZW is the best. For entropy measure, the results show the average entropy is 4.42 for LZW is 4.42 (which is the best); 4.74 for Huffman; 5.09 for HFLC; and 7.00 for FLC- for the files of sizes 4096 to 1048576. The HFLC improve the entropy

of FLC. The LZW shows the best performance regarding time complexity (compression time). The Huffman and RLE algorithms were tested on image files with extension BMP. These files can be in different formats such as: 8-bit Gray scale; 8-bit colored, and 24bit colored. An image file in 8-bit BMP has file header, image header, color table, and pixel data; while an image file of 24bit MBP has: file header, image header, and pixel data [9, 17, 23]. The pixel data is the part of the file that a compression algorithm works on.

Table 2: Compression Ratios (CR percentage), Entropies (E bits/character), and Time complexities (T in milliseconds) resulted from applying 4 compression techniques: Huffman, FLC, HFLC, and LZW on 11 text files. File Size CR 1024 0.67 2084 0.60 4096 0.64 8192 0.61 16384 0.45 32768 0.63 65536 0.58 131072 0.59 262144 0.60 524288 0.60 1048576 0.57

Huffman E T

CR

5.33 4.90 5.13 4.91 4.97 5.05 4.65 4.74 4.82 4.83 4.52

0.88 0.74 0.88 0.88 0.64 0.88 0.88 0.88 0.88 0.88 0.88

0.002 0.002 0.004 0.008 0.015 0.061 0.061 0.122 0.244 0.488 0.976

FLC E T 7.00 6.00 7.00 7.00 7.00 7.00 7.00 7.00 7.00 7.00 7.00

The sample tested files shown in Table 3 are taken from [14, 15]. It is clear that the format of an image file affects the performance of the compression algorithms (see the case of the 8-bit gray scale lena.bmp and 8-bit colored lena.bmp) Both algorithms perform less efficiently on the 24-bit colored files than on the 8-bit colored. It is clear from Table 8 that the Huffman algorithm is better on the file of 8-bit than the RLE. The Audio files have different behavior compared to text files such as: frequency, rhythm, and wave format. The Arithmetic

CR

0.003 0.007 0.008 0.008 0.015 0.031 0.061 0.122 0.244 0.488 0.976

0.68 0.61 0.65 0.62 0.46 0.64 0.60 0.62 0.64 0.75 0.58

HFLC E T 5.45 4.96 5.23 4.97 5.09 5.15 4.81 4.96 5.09 5.99 4.66

0.004 0.008 0.008 0.015 0.305 0.610 0.122 0.244 0.244 0.488 0.976

CR 0.92 0.74 0.72 0.59 0.43 0.58 0.50 0.54 0.55 0.57 0.50

LZW E T 7.40 5.99 5.73 4.74 4.70 4.61 4.00 4.29 4.36 4.55 403.

0.004 0.004 0.004 0.005 0.005 0.007 0.007 0.013 0.012 0.024 0.059

(A); Adaptive Huffman encoding (FGK); Huffman, and Lempel-Ziv (LZ) compression algorithms were tested on audio files of different types. The tested files came from four types: window files (wave), formal speech (real player), speech file (real player), and music (real player). The results in Table 4 show that the Arithmetic technique achieved CR values close to those achieved by Huffman- but a little better. The average values of E for the compressed given files are: 6.75, 6.92, 7.81, and 8.35 for: Arithmetic, Huffman, FGK, and LZ, respectively. The FGK achieved the best in compression time. The

LZ was the worst in the three measures. It is clear from these results that if one

algorithm is the best in one measure it are not necessary the best in all.

Table 3: Compressed Sizes (CS), Compression Ratios (CR percentage), Entropies (E bits/character), and Time Complexities (T in milliseconds) resulted from applying 2 compression techniques: Huffman and RLE on 16 BMP files of type (quality) 8-bit Gray scale (GS), 8-bit colored (8C), and 24-bit colored (24C). File

Original

BMP

(with

(in

Type

extension

MB)

BMP)

Huffman

CS

RLE

CR

E

T

CS

CR

E

T

Ijpg140

65.1

GS

13.1

0.20

2

0*

18.8

0.29

5

1

3gray

184.1

GS

124.4

0.68

5

0

154.8

0.84

7

1

Lena

257.1

GS

241.2

0.94

8

0

256.7

1.00

8

1

Peppers

257.1

GS

245.1

0.95

8

0

255.0

0.99

8

2

Monarch

385.1

GS

348.4

0.90

7

0

380.8

0.99

8

1

Balloon

0.4

8C

0.2

0.57

5

0

0.4

1.00

8

0

horzz1

3.3

8C

0.9

0.29

2

1

3.3

1.00

8

0

Nhl

8.6

8C

6.7

0.80

6

9

6.4

0.77

6

0

2

40.1

8C

28.2

0.70

6

34

32.5

0.81

6

0

1

96.0

8C

90.0

0.94

8

107

85.3

0.89

7

0

3

183.4

24C

183.4

1.00

17

0

154.3

0.84

20

0

Peppers

768.1

24C

768.1

1.00

23

0

766.3

1.00

24

5

Lena

768.1

24C

768.1

1.00

23

0

768.1

1.00

24

6

Monarch

1152.1

24C

1152.5

1.00

23

0

1147.0

1.00

24

6

Frymire

7990.6

24C

7990.6

1.00

14

3

1564.1

0.43

10

12

2004-10-

16055.5

24C

16055.5

1.00

21

9

16055.5

1.00

24

96

a-full_tif *The time is very small in terms of milliseconds. The results in Table 5 are for files of extension: rm (first line in the table), ora (lines 2 to 4), and avi (lines 5 to 7). The Arithmetic and the Huffman achieved very close values (99%) for CR. The FGK and LZ achieved more than 100% for CR. The E values are close for Arithmetic and FGK.

Regarding T, the LZ achieved the best for large files with avi extension. Other experiments were done for files in music and speech. It seems that the Arithmetic technique is not the best when it comes to compression time.

Table 4: Compression Ratios (CR percentage), Entropies (E bits/character), and Time complexities (T in milliseconds) resulted from applying 4 compression techniques: Arithmetic, FGK, Huffman, and LZ on windows files (wave). File Size

Arithmetic CR E T

1066 0.86 9701 0.80 20176 0.79 31834 0.80 68698 0.81 77260 0.96 635384 0.82

6.91 6.40 6.80 6.31 6.50 7.71 6.59

0.055 0.493 1.015 1.150 1.430 3.080 4.230

CR 0.99 0.95 0.97 0.95 0.94 1.01 0.96

FGK E T 8.42 7.53 7.77 7.60 7.55 8.08 7.70

0.030 0.093 0.245 0.330 0.340 0.880 0.700

Huffman CR E T

CR

E

1.01 0.82 0.79 0.80 0.82 0.97 0.83

0.98 0.94 0.93 0.93 0.93 1.12 1.09

8.26 8.51 7.95 7.56 8.45 8.97 8.73

8.17 6.53 6.40 6.45 6.53 7.75 6.62

0.065 0.090 0.195 0.550 0.600 0.770 5.390

LZ T 1.395 2.997 17.190 22.045 37.460 31.470 285.28

Table 5: Compression Ratios (CR percentage), Entropies (E bits/character), and Time complexities (T in milliseconds) resulted from applying 4 compression techniques: Arithmetic, FGK, Huffman, and LZ on windows Formal Speech files (with extension: rm, ora, and avi). File Size

Arithmetic CR E T

41543 .98 1261857 .98 2115085 .99 2132945 .99 3103733 .98 6186771 .99 7504868 .99

8.13 8.19 8.20 8.19 8.65 8.85 8.95

147.33 154.33 66.36 65.33 356.69 156.22 162.66

CR 1.02 1.02 1.02 1.02 1.02 1.02 1.02

FGK E T 8.19 8.19 8.20 8.19 8.20 8.19 8.16

Similar experiments were performed on video files (with extension avi and mpeg). Some of those files have: small animation frame; animation with symmetric color; small frames with multicolor; colored film; low animation wit and with nonsymmetrical audio; and fast animation with symmetric music. For the tested files, the RLE is almost 10 times faster than the Huffman algorithm. The Huffman algorithm achieved better values for the entropy for the RLE on the avi files, but it was not the case for the tested MPEG files. The results obtained in this section assure the fact that compression performance depends on file contents. Thus, it is necessary to have a system which guides the users to compress their files more

14.22 15.44 15.82 22.36 43.06 68.33 82.83

Huffman CR E T

CR

E

.99 .99 .99 .99 .99 .99 .99

1.09 1.09 1.09 1.10 1.10 1.11 1.08

8.20 8.47 8.17 8.82 8.88 8.85 8.95

7.93 7.97 7.97 7.97 7.97 7.92 7.92

0.44 12.36 20.71 21.48 31.20 60.58 73.50

LZ T 42.06 50.98 51.23 51.88 51.85 49.74 50.96

efficiently- rather than use one fixed compression algorithm.

2. The Compressor System The number of files based on the extension names is increasing. More than 3200 files of with different extensions were collected [24]. This number increases with new technological development. These files were organized into a database. Based on the attributes of a given file, the user will be advised to compress the using a specific algorithm. Table 6 shows samples of the fields (some of the attributes) in the database we built. The files are sorted by the extension part of the file names. The third column shows a short explanation of the content of the files. Based on the information in the third column in Table 1 and the information on

the second and third columns in Table 6, the user will be guided to select the appropriate compression methods. The last column in Table 6 shows how difficult it is to the user to distinguish between files if he/she depends on the extension names only. For example, there are two different files-with extension A-which have different contents (one is an author ware and one is a batched file).

The size of the alphabet from which a file was created is a factor which helps us to guide a user to select an appropriate compression technique. For example, if all files had a content of only 20 symbols, you do not need to use an ASCII code for each symbol. In this case, a code with 5 bits is enough to map the source into a set of bitsand leads to less space usage.

Table 6: Samples of file names with different extension Example

Explanation

A

*.A *.A5W

2 1

B

*.B *.B&W *.C *.C86 *.DAO *.D64

Author ware or Object code library Author ware Windows file (unpackaged) Applause Batch list 129 st 1 Reader Mono binary screen image C code 210 Computer Innovation (C86) source code Windows Registry Backup 209 Commodore 64 emulator disk image

*.M *.MYP

Program file (Matlab) Presentation file (Make Your Point)

225

1 1

*.Y

Grammar file (Yaac)

15

1

*.Y09

Secondary index file (Paradox)

*.Z

Compressed ASCII (COMPRESS) Game module (Infocom)

C

D

Number of Names for a letter 231

Number of Repeated Names

First Letter

1 1 1 1 1

… M … Y

Z

*.Z3 Others

*.000 *.~$~ *.@@@

1 archive 12

1

Data file (GEOWorks) 29 ST Temporary file (1 Reader) Instruction on use of such applications as Microsoft Code view for C

Thus, the large number of different files with different extension names, different compression classifications, different content of files, and different representations in bits of the symbols in a file make it difficult for a user to select an appropriate compression algorithm. This claim is supported by evidence and tests

1

1 1 1

results which were presented above. Thus, users are in need of help to compress their files in efficient ways. A system was built based on an object oriented approach in order to contribute to the solution of the above obstacles. The system will provide services such as: Displaying the attributes of a given file.

Displaying a short description and content types of a given file.Suggesting an appropriate compressor to compress a file. Suggesting an appropriate decompression method to produce a file.Compressing/ decompressing a file, by default, based on its extension name.

compression system. Figure 2 shows the layers structure for the proposed system. Results of applying some of these algorithms on different multimedia files were explained in Section 3.

Figure 1 shows a general view of the use case implemented in the developed

Compress

Decompress

User

Select file Select algorithm Decompress Compress

file

file

Figure 1: The use case to compress or decompress a file.

SUB-SYSTEM

File Collection

SUB-SYSTEM

Compress file

To display the location of a file to be selected To compress the selected file, if it was not compressed, using an appropriate algorithm

SUB-SYSTEM

Data Base

Database that has the information about the files and the compression algorithm

SUB-SYSTEM

Decompress file

To decompress a file using a specific decompression algorithm

Figure 2: The layers of the subsystems in the compressor system

3. Conclusions and Future Work This paper explains an effort to guide users in compressing their files efficiently. The guide compressor system was built based on the file: extension name, type (text, image, audio, and video), and content. The compressor acts as an expert system to the user who is dealing with file compression. We have introduced ideas which deal with file compression/file decompression using well established compression algorithms. However, we need to explore the ideas more based on compression measurements factors and their weights. Also, files based on their types: text, image, and multimedia

files will be investigated using more compression algorithms that best fit each type. Acknowledgment We would like to thank the students from Arab Amman University for Graduate Studies: Hashim, Isa, Mohamad, Rafat, Rula, Suzane, and Ziad for their help in testing the algorithms.

References: [1] Adder, Micah and Tom Leighton, "Compression Using Efficient Multicasting," Journal of Computer

and System Science, August 2001, Vol. 63, Number 1, pp 127-145. [2]

Arturo San Emeterio Campos, "Huffman an Algorithm: Making Code from Probability," www.arturocompas.com/cp_ch3.html, 17-Sept-2000.

[3] Astrachan, Owen L., "Huffman Coding: ACS2 Assignment from ASCII Coding to Huffman Coding," Feb 2004, www.cs.duke.edu/csed/poop/huff/i nfo . [4] Awan, F., N. Zhang, N. Motgi, R. Iqbal and A. Mukhenrjee, "LIPT: A Reversible Lossless Text Transform to Improve Compression Performance," Proceeding of the IEEE Data Compression Conference 2001, Snowbird, pp. 481-210. [5] Bravaco, Ralph and Shai Simonson, Text Compression and Huffman Trees, Stonehill College, 2004, www.stonehill.edu/compsci/LC/Te xtcompression.htm04/06/2003. [6]

Boreczky, J. S. and L. A. Rowe, "Comparison of Video Shot Boundary Detection Techniques," Proceeding of Storage and Retrieval for Image and Video Databases IV, SPIE 2670, 1996, pp. 170-179.

[7] Bucknall, Julian, " Chapter 11: Data Compression," in The Times of Design Algorithms and Data Structures, Wordware Publishing, 2001. [8] Cheok Yan Cheng, "Introduction on Text Compression Using Lampel Ziv, Welch (LZW)," updated 17may-2001, Method," http://www.programmers.leaven.co m/zone22/Toparticleszone.html.

[9] Edwin, S Hong and Richard E. Lander, "Group Testing for Image Compression," IEEE Transactions on Image Processing, Vol 11, No. 8, August 2002. [10] Edelsbrunner, Herbert, LZW Data Compression, http://www.cs.duke.edu/csed/curio us/compression/lzw.htm last modified: Feb 2004. [11] Golomb, S. W., "Run-Length Encodings," IEEE Transaction on Information Theory, Vol. IT-12, No. 3, July 1966, pp 399-401. [12] Gonzalo Navarro, "A Guided Tour to approximate String Matching ,”ACM Computing Surveys, March 2001,Vol. 33, Number 1, pages 3188. [13] Harris, Tom, “How file compression works”, http://www.howstuffworks.com/file -compression.htm , November 11, 2002. [14] Heggeseth, Michael," Compression Algorithms: Huffman and LZW," CS 372: Data Structures, December 15, 2003. http://www.stolaf.edu/people/hegge set/compression/ [15] Kumar, B. "Point 4: Working with Data and Graphical Algorithms in C," Skillsoft@2002, http://www.books24x7.com/book/i d_4259/viewer_r.asp?bookid=4259 &chunkid=379.... [16] Marshall, Dave, Lempel-Ziv-Welch (LZW) Algorithm," 10/4/2001, http://www.cs.cf.ac.uk/Dave/Multi media/node214.html [17] Miano, J., Compressed Image File Formats, Addison-Wesley, NY 2001.

[18] Qawasmeh, Eyas, "Scene Change Detection Schemes for Video Indexing in Uncompressed Domain," INFORMATICA, 2003, Vol. 14, No. 1, pp. 1-17. [19] Printz, H. and P. Stubley, " Multi Alphabet Arithmetic Coding at 16 MBytes/sec," Proc. of Data Compression Conference, J. A. Storer & M. Cohn, eds., Snowbird, Utah, March 30 – April 1, 1993. [20]

Saju, Vami, "The Huffman Compression Algorithm," March 2004, http://www.howtodothings.com/sho warticle.asp?article=313

[21] Salmon, David, Data Compression: The Complete Reference, Second Ed., Springer, 2001. [22] Sayood, K., Introduction to Data Compression, San Francisco: Morgan Kaufman Publishers, Inc. 1996. [23] Yi xiao, Ju Jia Zou, and Hong Yan, "An Adaptive Split-and-Merge Method for Binary Image Contour Data Compression,” Pattern Recognition Letters, March 2001, Vol. 22, Numbers 3-4, pages 299307. [24] Whittle, Robin, " First Principles: Lossless Compression of Audio," http://www.firspr.com.au/audiocom p/lossless/ , visited July 2004. [25] "Every File Format in the World," http://www.whatis.techtarget.com/f ileFormatB/0,289937,sid9,00.html, visited march 2004.