Review: the Binary System
CS100: Introduction to Computer Science
n
q
n
Lecture 5: Data Storage -- Data Compression & error correction
Convert a binary representation to its equivalent base ten form
q
n
n
Store integers q q
n
q
n
Example bit patterns q q
Sign bit, exponent, mantissa
Data Compression Lossy versus lossless technologies
n
How to store or transfer the data?
n
How to reduce the size of the data while remaining the underlying information?
Generic Data Compression Techniques n
Run-length encoding q
q
Lossless: do not lose information in the compression process.
q
n
Lossy: may lead to the loss of information. n
n
Lossy technologies often provide more compression than lossless ones. Popular where minor errors can be tolerated.
Data consist of long sequence of the same value Lossless method
Frequency-dependent encoding q
q
100000000000000000000000000000000000 111111111111111111111111111111111111
Floating-point notation n
n
101.11
Data Compression
Two’s complement notation Excess Notation
Store fractions
13
Representing fraction q
Review: Storing integers & fractions
1101
Convert a base ten representation to its equivalent binary form
q
(Huffman codes): short bit patterns for frequent data item, long bit patterns for less frequent ones Lossless method n n
The most frequent letter: e (0.127), t (0.09), a(0.08) The least frequent letter: z (0.0007), q(0.0009),x(0.0015)
1
Generic Data Compression Techniques n
Relative encoding/ differential encoding q
n
Recode the difference between consecutive data units rather than entire units. n
q
Generic Data Compression Techniques Dictionary encoding example q
A typical dictionary of 25,000 words n n
Motion pictures q
Lossless or lossy
Character-by-character encoding n
n
Dictionary encoding q q
Dictionary refers to a collection of building blocks Message is encoded as a sequence of references to this dictionary.
n n n
GIF: JPEG: TIFF:
q
n
n
n n
Bit map techniques Pixel: short for “picture element” q
1 bit for 1 pixel n
q
n
q
q q
Scalable Word processing systems use vector techniques to provide flexibility in character size. PostScript – characters and general pictorial data Also popular in Computer-aided design systems
For color images. RGB encoding, 1 byte for the intensity of each color
Other approach: Luminance and chrominance
Compressing Images n
GIF: (Graphic Interchange Format) q q
q
For white and black photos, allows a variety of shades of grayness to be represented.
3 bytes for 1 pixel n
q
A black and white image is encoded as a long string of bits representing rows of pixels in the image. The bit is 1 if the corresponding pixel is black, 0 otherwise.
8 bit for 1 pixel n
Bit map techniques Vector techniques
The dictionary is changed during the encoding process. The dictionary would grow as the message is encoded.
Representing Images
q
n
42 bits needed using ASCII encoding 96 bits using Unicode
Adaptive/dynamic dictionary encoding
n
Representing Images
Assume each word consists of 6 letters q
n
Compressing Images
Map words to integers, [0, 24,999] 15 bits for each integer, [0, 32,765]
q
q q
q q
Dictionary encoding system Reducing the number of colors assigned to a pixel to 256 The red-green-blue combination for each of these colors is encoded using 3 bytes 256 encodings stored in a table (palette) Each pixel can be represented by a single byte whose value indicates which of the 256 palette entries represents the pixel’s color. Lossy compression system Good for cartoons
2
Compressing Images n
JPEG: (Joint Photographic Expert Group) q
q
q
q
q
Lossless mode (rarely used, low compression) Lossy mode, baseline standard, in many applications
n
Compresses color images by a factor of at least 10, often as much as 30 without noticeable loss of quality. Good for photographs
q
q
It has a color image compression option
q
Used for image archiving
n
Similar to the techniques used by GIF
Figure 1.14 The sound wave represented by the sequence 0, 1.5, 2.0, 1.5, 2.0, 3.0, 4.0, 3.0, 0
Sample the amplitude of the sound waves at regular intervals and record the series of values obtains. 8000 samples per second n
q
used in long-distance telephone communication
44,100 samples per second for high quality recordings (each sample represented in 16 or 32 bits) a million bits for a second of music
Representing Sound
Compressing Audio and Video n
n
Date, time, and camera settings. Image itself is normally stored as RGB pixels without compression
Sampling techniques q
n
A standard format for storing photographs along with related information. n
The human eye is more sensitive to changes in brightness than changes in colors Encoding a eight-by-eight pixel block as a unit
Representing Sound n
TIFF: (Tagged Image File Format)
Take advantage of human a eye’s limitation q
q
n
Several methods of image compression. q
q
Compressing Images
Sampling techniques MIDI q
q
q
Used in music synthesizers in electronic keyboards Contains individual instructions for playing each individual note of each individual instrument. Encoding directions for producing music on a synthesizer rather than encoding the sound itself.
MPEG (Motion Picture Experts Group) q
A variety of standards for different applications n n
n
High definition television broadcast Video conferencing
MP3 (MPEG layer 3) q q
q
Audio compression, within the MPEG standards Significant compression while maintaining near CD quality sound. Take the advantage of human ear n
Temporal masking
n
Frequency masking
q
q
Hard to detect softer sounds after a loud sound A sound at one frequency tends to mask softer sounds at nearby frequencies.
3
Communication Errors n n n
Parity bits (even versus odd) Checkbytes Error correcting codes
Communication Errors n
n
Errors when information is transferred among various parts of computers. Detecting errors q
n q
Error Correcting Codes n
Add a bit to make each bit pattern have an odd number of 1s. Failed to detect even number of errors.
Checkbytes n n
A collection of parity bits. Detect more errors
Errors Correcting codes: q q
Parity bits (even versus odd) n
n
Figure 1.28 The ASCII codes for the letters A and F adjusted for odd parity
n
errors can be detected Errors can be corrected
Hamming distance q q q
The number of bits in which two patterns differ. 000000 and 010100 001111 and 010100
Correcting errors q
Error correcting codes
Figure 1.29 An error-correcting code
Figure 1.30 Decoding the pattern 010100 using the code in Figure 1.30
4
Homework Assignment2: (Due in-class next Monday Feb. 19th) n
Page 73: q
n n
Next Lecture: n
24, 26(b,e,h,k), 27(b,e), 30(b,e), 31(b,e), 34(b,e)
Page 74: 35(b,e) Page 75: 58
n
Computer architecture, machine language & Program language Reading assignments: Chapter 2.1, 2.1,2.3
5