ABSTRACT DATA TYPES IN JAVA

Appendix A ABSTRACT DATA TYPES IN JAVA ab-stract: That which comprises or concentrates in itself the essential qualities of a larger thing or of sever...
Author: Peter Shaw
3 downloads 0 Views 50KB Size
Appendix A ABSTRACT DATA TYPES IN JAVA ab-stract: That which comprises or concentrates in itself the essential qualities of a larger thing or of several things. – The New Merriam-Webster Pocket Dictionary

In practice, the larger the program, the harder the debugging and the more limited the confidence in its correctness. One of the basic programming rules is that no method should ever exceed a page. The years of experience have shown that the best way is to split the program into small coherent and understandable pieces, or modules, and then fit them together. Generally speaking, a module is a unit in a larger software system that bundles together some data and some operations and has a carefully defined interface. The external users of the module can make use of the operations and data provided in the module interface, but the internal implementation of the module is concealed and is made inaccessible to external users. Thus the hidden internal data representation can be completely replaced without affecting the external users. This is called substitutability of data representations, and it permits us to improve the efficiency of a software system by replacing less efficient data representations with more efficient ones. Most of the algorithms in this handout are too small to involve this technique. Nonetheless, it can be used to separate data structures from algorithms. Consider, for example, any above-mentioned algorithm for sorting a set of numbers. It is clear that the algorithm will work, even without specifying what data structure is used to represent the set. This separation of data structure from algorithm permits us to study each in isolation as well as to organize and simplify them. This concept is called data abstraction. An abstraction of a thing has two qualities: it suppresses irrelevant details and it seeks to isolate the essence of the thing being abstracted. 221

222

COMPSCI.220FT

A.1 Abstract Data Types Abstract data types(ADT) are a way of organizing the objects and operations that define a data type in such a way that the specifications and behaviours of the data type are rigorously separated from the data type’s implementation. Java provides strong, generalpurpose support for modular programming through its classes, interfaces, and packages. A Java class is an example of an ADT.

  



ADT’s externally accessible operations and data are given by public methods and data fields declared in the class. The ADT’s hidden implementation details are represented by the private data fields and methods of the class. Java interfaces can be used to define ADTs that can incorporate general purpose replaceable data components. The interface defines abstract behavioural characteristics that allowable components must implement. Any specific kinds of data components that implement the interface can then be used as plugcompatible components, suitable for plugging-in to the ADT. Java packages are named collections of related classes and interfaces that can be separately compiled for use in Java applets or applications. Typically, packages, such as the Java util (utilities) package or the Java awt (abstract window tools) package, bundle collections of useful software components into a software component library suitable for use by others in building their own Java programs.

By using Java classes to implement ADTs, the hidden implementation details can be modified (or can even be totally replaced) within the local boundaries of the Java class definitions without having to make a single external change elsewhere in the program. Ease of modification is thus convincingly achieved. Although the terms data type (or just type), data structure, and ADT sound alike, they have different meanings. In a programming language, the data type of a variable is the set of values that the variable may assume (e.g., a variable of type boolean can assume either the value true or the value false, but no other value). An ADT is a mathematical model, together with various operations defined on the model. We shall design algorithms in terms of ADT’s, but to implement an algorithm in a particular programming language we must find some way of representing the ADT’s in terms of data types and operators supported by the programming language itself. To represent the mathematical model underlying an ADT we use data structures, which are collections of variables, possibly of several different data types, connected in various ways.

COMPSCI.220FT

223

ADT in mathematics Here are some standard examples of mathematical entities, not tied to any particular representation.





A set is a collection of zero or more entries. An entry may not appear more than once. A set of n entries may be denoted fa1 ; a2 ; : : : ; an g, but the position of an entry has no significance; f0; 3; 6; 7; 8g and f3; 0; 8; 7; 6g represent just the same set. A map is a special kind of set, namely, a set of pairs, each pair representing a one-dimensional mapping from one element to another. For example, a dictionary (words mapped to meanings) or the conversion from base 2 to base 10 are maps.



A multiset is a set in which repeated elements are allowed; e.g., f1; 3; 1; 5; 4; 0g is a multiset. Multisets are generally easier to deal with than sets, since checking for duplicates is expensive.



A sequence is an ordered collection of zero or more entries, denoted [a1 ; a2 ; : : :, an ]. The position of an entry in a sequence is significant; for example, the fifth entry, or the successor of a given entry, may be referred to.



A graph G = [V; E ] is a set V of vertices (nodes) and a set E of edges (arcs, links), i.e. two-element subsets of V . This definition excludes self-loops (edges from a vertex to itself) and parallel edges (two edges connecting the same two vertices). An example is given in Figure A.1.

V = { a, b, c, d } E = { { a, d }, { d, c }, { a, c } }

b

a c

d

Figure A.1: Graph G = [V; E ] with the four vertices and three edges.

This list can be easily extended to include, for example, directed graphs (digraphs), complex numbers, matrices, etc. When one of these entities is used in an algorithm, certain operations are performed on it (say, insertions and membership tests on a set). This observation leads naturally to the concept of an ADT: a mathematical entity together with some operations defined on it.

224

COMPSCI.220FT

The ADT concept The ADT concept has been implicitly used since the beginning of modern computing history. For example, the mathematical entity integer with the operations addition, subtraction, negation, multiplication, division, and the comparisons, has always been at the heart of computing machinery, and it provides a good illustration of the advantages to be gained from specifying ADTs. 1. Users of integer never need concern themselves with the implementation of the operations: they know what the operations do, and can use them effectively without ever knowing what (micro)electronic circuitry is being employed. 2. The implementor of integer, in this case a hardware designer, is free to experiment with different implementations. All that matters is that the right result be returned. By providing a clean interface between use and implementation, the ADT separates the two and clarifies the task of both. The integer ADT model consists of a finite range of integers (either machine or language dependent) together with those operations supported by the language that can be performed directly on integers. As a programmer, you do not need to understand how the integers will be represented internally (binary, two’s complement form, etc.) or how the specific operations have been implemented by the language processor in order to use them in writing a program. When a class of data objects or data structures belongs to an ADT, it should not be necessary for a programmer to know the internal representation of the ADT data objects in order to use or manipulate them within the rules for the class; this property is known as information hiding. Information hiding is important for several reasons. 1. By hiding the internal structure of a data object, the user is able to work at a higher programming level and is less likely to inadvertantly misuse data objects of the class (a prime source of bugs is thus avoided). 2. Less sophisticated users are able to use data objects effectively without having to understand the complexities of the internal structure of the data object or how the operations for the ADT class were implemented. The ADT concept is useful in the study of data structures. As each new data structure is introduced, the set of procedures developed to create and manipulate the structure can be viewed as operations to be performed on the class of data objects defined by the data structure. The ADT concept implies that the manipulation of the ADT data objects is restricted to only those operations that are part of the ADT definition. This restriction simplifies program development and significantly reduces a major source of program bugs. We think of an ADT as a mathematical data model with a collection of operations defined on that model. Sets of integers, together with the operations of union, intersection and

COMPSCI.220FT

225

set difference, form a simple example of an ADT. In an ADT, the operations can take as operands not only instances of the ADT being defined but other types of operands, say, integers or instances of another ADT, and the result of an operation can be other than an instance of that ADT. However, at least one operand, or the result, of any operation is of the ADT in question. ADTs are generalizations of primitive data types (integer, real, and so on), just as procedures or methods are generalizations of primitive operations(+, ;, and so on). The ADT encapsulates a data type in the sense that the definition of the type and all operations on that type are localized to one section of the program (therefore, each Java class can be treated as an ADT). If you wish to change the implementation of an ADT, you know where to look, and by revising one small section you can be sure that there is no subtlety elsewhere in the program that will cause errors concerning this data type. Outside the section in which the ADT’s operations are defined, you can treat the ADT as a primitive type; you have no concern with the underlying implementation. Simple example – the Palindrome class. Using the ADT principles you can write programs that use a data type without knowing any more about its implementation. In fact, that is a sign of a properly specified interface. If you need to peek into the code that implements a data type in order to use the type, then the type is not properly specified. Let us use a sample program called Palindrome to illustrate these principles with a bit more complex data structure than integers. A palindrome is a character string that is the same when read forward or backward, for instance, “anna”, “bob”, and “+*=*+”. Since computer character sets differentiate between the lower- and uppercase letters, “Anna” is not a palindrome. Our program will read a character string from the terminal, determine whether it is a palindrome, and print an appropriate message. The java.lang package contains the desired string data structures, namely, the classes String and StringBuffer. The instance method equals() of the first class permits you to check whether two strings are the same and the method reverse() of the second class is able to reverse a string. The data types String and StringBuffer are used below without knowing any of their details. All we knew were their specifications in the package java.lang. 1 2 3 4 5 6 7 8 9

import java.io.*;

// methods for reading input data

public class Palindrome { public static void main( String args[] ) { String s = new String( "" ); BufferedReader in = new BufferedReader( new InputStreamReader( System.in ) );

226

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

COMPSCI.220FT

System.out.println("Enter one string per line"); System.out.println("(an empty line stops the program)"); while( true ) { try{ s = in.readLine(); } catch( IOException e ) { s = ""; } finally { if ( s.length() > 0 ) { StringBuffer fsb = new StringBuffer( s ); StringBuffer rsb = fsb.reverse(); if ( s.equals( rsb.toString() ) ) System.out.println("is a palindrome !!!"); else System.out.println("is not a palindrome ..."); } else { System.out.println( "-------END-----" ); System.exit(0); } } } } }

Lines 1, 8–12, and 14–17 permit you to read a text line from your terminal. You must use the try-catch()-finally construction because the BufferedReader’s method readLine() may throw an IOException. Lines 13 and 31 produce the infinite loop for entering and checking text lines, and lines 19 and 26–29 terminate the program when an empty line is entered. Lines 20 and 21 reverse a given text by using the StringBuffer data type and the available instance method reverese(). To compare the input string to the reversed one, using the method equals() of the class String, the method toString() converts the StringBuffer data to the String data. Of course, you may write another version of the program, giving similar results like Enter one string per line (an empty line stops the program) Anna It is not a palindrome ... ANNA It is a palindrome !!! westtsew It is a palindrome !!!

COMPSCI.220FT

227

%%%%%%% It is a palindrome !!! l It is a palindrome !!! -------END----But in either case, you need not know how the classes String and StringBuffer are implemented in Java. We have an interface that includes everything the user needs to know in order to use the data types, and everything an implementor needs to know in order to write code to implement the type. It is a contract between the user and implementor.

A.2 ADTs and Java classes Abstract data types are collections of objects and operations that present well-defined abstract properties to their users, while hiding the way they are represented in terms of lower-level data representations. It should be emphasized that there is no limit to the number of operations that can be applied to instances of a given mathematical model, or object. Each set of operations defines a distinct ADT. Java classes, interfaces, and packages are used to express the concepts of modularity, information hiding, and data abstraction. The access modifiers (public, private, and so on) for data fields and methods in Java class definitions permit us to define whether these data fields and methods are public (and thus visible to and available for use by outside users of the class) or are private (and thus invisible to and unavailable for use by outside users). Objects and operations at higher level of data abstraction are represented by organizing objects and operations at lower levels. The ADTs at the highest level of data abstraction – structures such as sets, trees, lists, stacks, and queues – can be represented in a variety of different ways by lower-level representations, including those in the two broad classes - sequential representations and linked representations1. At still lower levels, the linked representations can be represented in a variety of different ways, such as using the parallel arrays or Java objects that contain references to other Java objects in their data fields (see Figure A.2). For security reasons, Java does not provide explicit pointer values (in contrast to other OOP languages such as C++ or Object Pascal). However, Java uses implicit pointers to 1

Sequential data representation uses a sequence of individual blocks of storage, each block is independently accessed by its unique address in that sequence. Linked data representations are created by linking individual blocks of storage together using pointers.

228

COMPSCI.220FT

Lists

Queues

Stacks

Sequential Representations Arrays

Strings

Sets

Trees

ADTs

Linked Representations Implicit Pointer Representations

Parallel Arrays

Figure A.2: Levels of data abstraction.

access arrays and objects. In fact, Java divides its data values into two classes: primitive data values (such as integers, characters, boolean truth values, and floating point numbers) and reference values that are references to objects and arrays. Such Java reference values are just pointers to objects and arrays, even though Java does not provide any pointer following operations. Java reference values can be stored in data fields of objects, as items in arrays, or as values of variables. This provides a satisfactory basis for implementing linked data representations. The Collections Framework, first introduced with the Java 1.2, provides a well-designed set of interfaces and classes for storing and manipulating groups of data as a single unit, a collection2 . In Java, a collection is a group of related data elements, organised into a single object, with operations provided to manipulate the data. Java technology has always offered support for collections, in particular, via the Vector, Stack, Hashtable, and Properties classes. But the new framework for collections in Java 1.2 has significant advantages over the old classes. These latter can be used in Java 1.2, too, as well as they are available for use with Java 1.1 runtime environments. The advantages of the Java 1.2 Collections Framework include:

    

2

Reduced programming effort. Support for software reuse, in that data structures conforming to the collection interfaces are reusable in a wide variety of contexts and applications. Easier to design with Application Programming Interfaces (APIs). Easier to pass collections between unrelated APIs. Increased program speed.

In common usage a collection is the same as the intuitive, mathematical concept of a set, so that both the terms might be considered as synonyms. But in mathematics each set entry can appear only once whereas collections have generally no such restriction.

COMPSCI.220FT

229

Collection

List

Set

Map

SortedMap

SortedSet

AbstractCollection

AbstractMap

AbstractSet AbstractList

Abstract classes

I n t e r f a c e

AbstractSe− quentialList

HashSet

ArrayList

HashMap

TreeSet

LinkedList

TreeMap

I m p l e m e n t a t i o n

Concrete classes Figure A.3: The Collections Framework hierarchy in Java 1.2. Solid lines show relationships between the interfaces and the concrete classes that implement the interfaces and dot-dashed lines show relationships between the abstract classes and the concrete classes extending the abstract ones.

230

COMPSCI.220FT

List

AbstractList

Map

Dictionary

Stack

Vector

Hashtable

Properties

Figure A.4: Historical classes in the Collections Framework.



Increased program quality (fewer bugs, easier maintenance). The Collections Framework provides a convenient API to many of the ADTs such as maps, sets, lists, trees, arrays, hashtables and other collections (see Figure A.3). Because of their object-oriented design, the Java classes in the Collections Framework encapsulate both the data structures and the algorithms associated with these abstractions. This offers a standard programming interface to many of the most common abstractions, without burdening the programmer with too many procedures and interfaces. The operations supported by the collections framework nevertheless permit the programmer to easily define higher level ADTs, such as stacks, queues, and so forth3 . In such a hierarchy, the root Collection interface defines a group of objects, known as its elements. Some Collection implementations allow duplicate elements and others do not. A Set extends Collection but forbids duplicates. As you might expect, this interface models the mathematical set ADT. A List extends Collection also, allows duplicates and introduces positional indexing so that it is an ordered collection. The user of a List generally has precise control over where in the List each element is inserted. A Map extends neither Set nor Collection and maps keys to values. Maps cannot have duplicate keys: each key can map to at most one value. The last two more collection interfaces (SortedSet and SortedMap) are merely sorted versions of Set and Map. In Java, there are two ways to order objects: the 3

One might think that Map would extend Collection. In mathematics, a map is just a collection of pairs. In the Collections Framework, however, the interfaces Map and Collection are distinct and have no link in the hierarchy. The reasons for this distinction are that the typical application of a Map is to provide access to values stored by keys. The set of collection operations are all there, but to work with a key–value pair, instead of an isolated element, Map supports the basic operations of get() and put() keys or values which are not required by Set. Moreover, there are methods that return Set views of Map objects, for example, Set set = aMap.keySet().

COMPSCI.220FT

231

Comparable interface provides automatic natural order on classes that implement it, while the Comparator interface gives the programmer complete control over object ordering. Note that these are NOT core collection interfaces, but underlying infrastructure. A SortedSet is a Set that maintains its elements in ascending order and provides several additional operations to take advantage of the ordering. A SortedMap is a Map that maintains its mappings in ascending key order (it is the Map analogue of SortedSet). The SortedSet interface permits you to build word lists and membership rolls, whereas the SortedMap interface can be used for creating dictionaries and telephone directories. The following table shows the six collection implementations introduced with the Java 1.2 framework, in addition to the four historical collection classes (the framework is designed in such a way that the new and historical classess can interoperate). The historical collection classes are called such because they have been used since the Java 1.0 release. Figure A.4 shows how these historical classes have been integrated into the Collections Framework. Interface Set

Hash table HashSet

List Map

Implementation Resizable array Balanced tree TreeSet ArrayList

HashMap

TreeMap

Linked list

Historical classes

LinkedList

Vector Stack Hashtable Properties

Java 1.2 provides two implementations of each interface, except for Collection which has no direct implementations, but serves as a least common denominator for the other collection interfaces. In each case, the primary implementations to be used, all other things being equal, are HashTable, ArrayList, and HashMap, respectively. Note that the SortedSet and SortedMap have no rows in the table above. Each of these interfaces has one implementation and these implementations (TreeSet and TreeMap) are listed in the Set and Map rows. The Collections Framework is made up of the above set of interfaces for working with groups of objects (the different interfaces describe the different types of groups). For the most part, once you understand the interfaces, you understand the framework. While you always need to create specific implementations of the interfaces, access to the actual collection should be restricted to the use of the interface methods, thus allowing you to change the underlying data structure, without altering the rest of your code. All the above-mentioned Java 1.2 general-purpose implementations have consistent behaviour. They permit null elements, keys, and values, and have fail-fast iterators, which

232

COMPSCI.220FT

detect illegal concurrent modification of a collection during iteration and fail quickly and cleanly, rather than risking arbitrary behaviour in the future. Note that the new Java 1.2 implementations are now unsynchronized, in contrast to the previous classes Vector and Hashtable, which were introduced in Java 1.0. This was taken because the collections are frequently used in a manner where the synchronization is of no benefit (for instance, single-threaded use, or read-only use, or use as a part of a larger data object that does its own synchronization). If one needs a synchronized collection, there exist the synchronization wrappers that allow any collection to be transformed into a synchronized one.

Suggest Documents