CS100: Introduction to Computer Science

Review: the Binary System CS100: Introduction to Computer Science n q n Lecture 5: Data Storage -- Data Compression & error correction Convert a...
Author: Silvia Robbins
18 downloads 0 Views 156KB Size
Review: the Binary System

CS100: Introduction to Computer Science

n

q

n

Lecture 5: Data Storage -- Data Compression & error correction

Convert a binary representation to its equivalent base ten form

q

n

n

Store integers q q

n

q

n

Example bit patterns q q

Sign bit, exponent, mantissa

Data Compression Lossy versus lossless technologies

n

How to store or transfer the data?

n

How to reduce the size of the data while remaining the underlying information?

Generic Data Compression Techniques n

Run-length encoding q

q

Lossless: do not lose information in the compression process.

q

n

Lossy: may lead to the loss of information. n

n

Lossy technologies often provide more compression than lossless ones. Popular where minor errors can be tolerated.

Data consist of long sequence of the same value Lossless method

Frequency-dependent encoding q

q

100000000000000000000000000000000000 111111111111111111111111111111111111

Floating-point notation n

n

101.11

Data Compression

Two’s complement notation Excess Notation

Store fractions

13

Representing fraction q

Review: Storing integers & fractions

1101

Convert a base ten representation to its equivalent binary form

q

(Huffman codes): short bit patterns for frequent data item, long bit patterns for less frequent ones Lossless method n n

The most frequent letter: e (0.127), t (0.09), a(0.08) The least frequent letter: z (0.0007), q(0.0009),x(0.0015)

1

Generic Data Compression Techniques n

Relative encoding/ differential encoding q

n

Recode the difference between consecutive data units rather than entire units. n

q

Generic Data Compression Techniques Dictionary encoding example q

A typical dictionary of 25,000 words n n

Motion pictures q

Lossless or lossy

Character-by-character encoding n

n

Dictionary encoding q q

Dictionary refers to a collection of building blocks Message is encoded as a sequence of references to this dictionary.

n n n

GIF: JPEG: TIFF:

q

n

n

n n

Bit map techniques Pixel: short for “picture element” q

1 bit for 1 pixel n

q

n

q

q q

Scalable Word processing systems use vector techniques to provide flexibility in character size. PostScript – characters and general pictorial data Also popular in Computer-aided design systems

For color images. RGB encoding, 1 byte for the intensity of each color

Other approach: Luminance and chrominance

Compressing Images n

GIF: (Graphic Interchange Format) q q

q

For white and black photos, allows a variety of shades of grayness to be represented.

3 bytes for 1 pixel n

q

A black and white image is encoded as a long string of bits representing rows of pixels in the image. The bit is 1 if the corresponding pixel is black, 0 otherwise.

8 bit for 1 pixel n

Bit map techniques Vector techniques

The dictionary is changed during the encoding process. The dictionary would grow as the message is encoded.

Representing Images

q

n

42 bits needed using ASCII encoding 96 bits using Unicode

Adaptive/dynamic dictionary encoding

n

Representing Images

Assume each word consists of 6 letters q

n

Compressing Images

Map words to integers, [0, 24,999] 15 bits for each integer, [0, 32,765]

q

q q

q q

Dictionary encoding system Reducing the number of colors assigned to a pixel to 256 The red-green-blue combination for each of these colors is encoded using 3 bytes 256 encodings stored in a table (palette) Each pixel can be represented by a single byte whose value indicates which of the 256 palette entries represents the pixel’s color. Lossy compression system Good for cartoons

2

Compressing Images n

JPEG: (Joint Photographic Expert Group) q

q

q

q

q

Lossless mode (rarely used, low compression) Lossy mode, baseline standard, in many applications

n

Compresses color images by a factor of at least 10, often as much as 30 without noticeable loss of quality. Good for photographs

q

q

It has a color image compression option

q

Used for image archiving

n

Similar to the techniques used by GIF

Figure 1.14 The sound wave represented by the sequence 0, 1.5, 2.0, 1.5, 2.0, 3.0, 4.0, 3.0, 0

Sample the amplitude of the sound waves at regular intervals and record the series of values obtains. 8000 samples per second n

q

used in long-distance telephone communication

44,100 samples per second for high quality recordings (each sample represented in 16 or 32 bits) a million bits for a second of music

Representing Sound

Compressing Audio and Video n

n

Date, time, and camera settings. Image itself is normally stored as RGB pixels without compression

Sampling techniques q

n

A standard format for storing photographs along with related information. n

The human eye is more sensitive to changes in brightness than changes in colors Encoding a eight-by-eight pixel block as a unit

Representing Sound n

TIFF: (Tagged Image File Format)

Take advantage of human a eye’s limitation q

q

n

Several methods of image compression. q

q

Compressing Images

Sampling techniques MIDI q

q

q

Used in music synthesizers in electronic keyboards Contains individual instructions for playing each individual note of each individual instrument. Encoding directions for producing music on a synthesizer rather than encoding the sound itself.

MPEG (Motion Picture Experts Group) q

A variety of standards for different applications n n

n

High definition television broadcast Video conferencing

MP3 (MPEG layer 3) q q

q

Audio compression, within the MPEG standards Significant compression while maintaining near CD quality sound. Take the advantage of human ear n

Temporal masking

n

Frequency masking

q

q

Hard to detect softer sounds after a loud sound A sound at one frequency tends to mask softer sounds at nearby frequencies.

3

Communication Errors n n n

Parity bits (even versus odd) Checkbytes Error correcting codes

Communication Errors n

n

Errors when information is transferred among various parts of computers. Detecting errors q

n q

Error Correcting Codes n

Add a bit to make each bit pattern have an odd number of 1s. Failed to detect even number of errors.

Checkbytes n n

A collection of parity bits. Detect more errors

Errors Correcting codes: q q

Parity bits (even versus odd) n

n

Figure 1.28 The ASCII codes for the letters A and F adjusted for odd parity

n

errors can be detected Errors can be corrected

Hamming distance q q q

The number of bits in which two patterns differ. 000000 and 010100 001111 and 010100

Correcting errors q

Error correcting codes

Figure 1.29 An error-correcting code

Figure 1.30 Decoding the pattern 010100 using the code in Figure 1.30

4

Homework Assignment2: (Due in-class next Monday Feb. 19th) n

Page 73: q

n n

Next Lecture: n

24, 26(b,e,h,k), 27(b,e), 30(b,e), 31(b,e), 34(b,e)

Page 74: 35(b,e) Page 75: 58

n

Computer architecture, machine language & Program language Reading assignments: Chapter 2.1, 2.1,2.3

5

Suggest Documents