A survey of interconnection methods for reconfigurable

A survey of interconnection methods for reconfigurable parallel processing systems* by HOWARD JAY SIEGEL, ROBERT J. MCMILLEN and PHILIP T. MUELLER, JR...
Author: Barrie Golden
14 downloads 0 Views 1MB Size
A survey of interconnection methods for reconfigurable parallel processing systems* by HOWARD JAY SIEGEL, ROBERT J. MCMILLEN and PHILIP T. MUELLER, JR. Purdue University West Lafayette, Indiana

physically located with respect to the system processors. Two types are distributed and centralized. Networks may be used to connect processors and memories (processor-tomemory) or to connect processing elements (PEs) to other processing elements (pE-to-PE), where a PE is a processormemory pair. Reconfiguration method is the method used to reconfigure the network, i.e., to change the way in which submachines are organized. The communications setup method is the method used to establish an interprocessor communications path within an already existing submachine. Delay is the time it takes a network to transfer one data item from a source to the desired destination. The ease of use of a network is the degree to which connections are automatically established. The cost of a network is the asymptotic complexity of its implementation. The partitionability of a network is its ability to divide the system into independent subsystems of different sizes. Partitionable systems may be characterized by any limitation on the subset of processors which may belong to a partition. Furthermore, a system may be logically partitioned_ using software techniques or physically partitioned using hardware switches within the network control structure. A network is homogeneous if it treats all processors similarly. Modularity is the ability of a network to be constructed from a small set of basic modules. LSI compatibility is the suitability of a module to be implemented as an LSI chip, i.e., high-circuit complexity and low external connection requirements. The extensibility of a network is its ability to be extended to a larger size, i.e., the amount of modification needed to make the network function for a larger number of inputs/outputs. Fault tolerance will be discussed in terms of a system's features which would allow the system to remain operational with faulty components (with possible degradation). Let m be the number of processors which can transfer data simultaneously using the interconnection network. Then the degree of simuLtaneity supported by the interconnection network is S = miN , l:5m:5N. Permutations are one-to-one connections in which all processors participate. For networks with N inputs, N outputs, and S= 1, let r be the number of permutations possible in a single pass through an interconnection network. Then the connectivity

INTRODUCTION This is a survey of a variety of interconnection networks for reconfigurable parallel processing systems that have appeared in the literature. A system is reconfigurable if it may assume several architectural configurations, each of which is characterized by its own topology of activated interconnections between modules. 1s The systems whose networks will be examined include multiple-SIMD and MIMD systems, as well as both fixed and dynamic word size systems. This paper is restricted to networks for geographically-localized parallel processing systems using 12 or more processors in a reconfigurable manner. Related survey papers include References 1, 3, 10, 19, 20, 45-47. The next section defines parameters that will be used to describe and evaluate networks. The later sections will discuss the interconnection networks, grouped by their overall structure-multistage swi!ching networks, dedicated path networks, and shared path networks. PARAMETERS A variety of parameters which can be used to describe interconnection networks are briefly presented. Their purpose is to provide a common set of terms to use as a basis for the examination of the different networks; however, all parameters will not be applicable to all networks. It is assumed that a system has N processors and, if N is a power of two, n=log~. Anderson and Jensen 1 define a path as "the medium by which a message is transferred between the other system elements" (e.g., wires or buses), and a switching element as "an entity which may be thought of as an 'intervening intelligence' between the sender and receiver of a message." Networks may be described by the type of switching elements used and the paths between switching elements. One classification is by the way the switching elements are * This work was supported in part by the Air Force Office of Scientific Research, Air Force Systems Command, USAF, under Grant No. AFOSR78-3581. The United States Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon.

529

From the collection of the Computer History Museum (www.computerhistory.org)

530

National Computer Conference, 1979

'of the network is C=r/(N!), l:::;r:5N!. The ability of a processor attached to the network to broadcast a single data item to all other processors can be measured by the broadcast scope. Let b be the maximum number of other processors which can receive data simultaneously from a given processor after one pass through the interconnection network. Then the broadcast scope is B = bieN -1). The broadcast delay is the number of transfers required for a complete broadcast. The range of a network can be measured by R =X/(N -1), where x is the order of the set of processors (i.e., the number of processors) from which a single processor can choose to send data to in one pass through the network. The range can be further characterized by specifying the set of processors which can be sent data. Similarly the domain of a network can be measured by D = x/eN -1), where x is the order of the set of processors a single processor can receive data from in one pass through the network, and can be further characterized by specifying the set of processors which can send the data. The networks discussed support SIMD, MSIMD, MIMD, or PSM parallelism. Furthermore, some support dynamic word sizes. An SIMD (single instruction stream-multiple data stream) machine 14 typically consists of a set of N processors, N memories, an interconnection network, and a control unit (e.g., Illiac IVB). The control unit broadcasts instructions to the processors and all active ("turned on") processors execute the same instruction at the same time. Each processor executes instructions using data taken from a memory to which only it is connected. The interconnection network allows interprocessor communication. An MSIMD (multiple-SIMD) system is a parallel processing system which can be structured as two or more independent SIMD machines (e.g. MAp23). An MIMD (multiple instruction stream -multiple data stream) machine 14 typically consists of N processors and N memories, where each processor may follow an independent instruction stream (e.g. C. mmp 50). As with SIMD architectures, there is a mUltiple data stream and an interconnection network. A PSM (partitionable SIMD/MIMD) system 32 is a parallel processing system which can be structured as two or more independent SIMD and/or MIMD machines (e.g., PASM35-3B). There are two methods of achieving variable word sizes. The first method, intraprocessor dynamic word size, uses processors with long data words which can be split up to form several independent smaller data words (e.g. Illiac IVB). The second method, interprocessor dynamic word size, combines two or more processors with small data words to form a single processor with a long data word (e.g. Dynamic Computer1B). MULTISTAGE SWITCHING NETWORKS Introduction A Multistage Sl1'if('hing Network ('151 V) is an interconnection network consisting of many (usually n) stages of

switches. Each stage is connected to the next by at least N paths. Each switch can choose from two or more input paths to connect to an output path. The multistage networks discussed in this section are all physically centralized and have simultaneity, range and domain S=R=D= 1. All have a cost of O(nN) and a transfer delay proportional to their number of stages. The switch elements are modular, but not complex enough for LSI. These MSN s are capable of exploiting pipelining to pass data through the network. For example, stage i could contain N w-bit registers, where w is the width of the network, and act as the i-th stage of the pipe 31l ,39. If a switch element fails, the network cannot perform completely without significant revision of data routing strategies and algorithms. Three parameters which are used to describe different multistage switching networks are topology, switch and control structure. 34 The topology of a multistage network is t~e actual interconnection patterns that are used to connect the stages of the network. 34 Interconnection functions specify these patterns. An interconnection function 30 is a bijection (permutation) on the set of input/output addresses, which consists of the integers from 0 to N -1. The interconnection function f connects input i to output f(i), O:::;i 'E--7 ~-)-E-~~

'"

.....

~---~

STAGE

U

6

T

7

0

2

(a)

o U N

T

P

p

U

U

T

T

STAGE

2

0

(b) 2A

0

U N

T

p

P

6

U

U

7

i

i

2C

STAGE

0

2

Figure I--STARAN network for N=8, (a) with flip control, (b) "a" redrawn, and (c) with shift control (OA, lA, etc. are the control signals).34

531

performing the Cubei interconnection function on their inputs, O:5i :

o

t-i~1+"1--4_~+-t---""";.L:...i+-t- 4. The computers shown are connected to 24 different O-buses, 16 I-buses, 64 2-buses, 48 3buses, and 34 4-buses. However, only one I-bus, one O-bus and parts of a 2-bus and another I-bus are explicitly shown. Each bus is connected to only 16 computers and is used in the PE-to-PE configuration. Since each processor is connected to two buses and there are 16 processors per bus, in one bus cycle a given processor can communicate with any of 30 other processors, thus R=D=30/(N -1). No particular hardware features are described that could be used to partition the system. It is also not designed to broadcast data, so B= 1/(N -1). Using a recursive doubling algorithm, data can be passed to all processors with broadcast delay O(n). The mechanisms that move data from bus to bus are built into the computers, thus the network is physically distributed as well as modular. If bus widths are kept reasonable, itappears the system is LSI compatible. The cost is O(N). Routing in the network is automatically performed by com-

Inl",clu.le' Bus

Map Bus

P-S-M

P-S-M

P-S-M

P-S-M

P-S-M

21 3 1 2 1 II 1 21 3 121X

P-S-M PART nF A 2-lms

Figure 9-A simple three cluster Cm;' system. 43

Figure 10 .. Logical groups of computers showing bus sharing in a Mega· Micro-Computer network. 49

From the collection of the Computer History Museum (www.computerhistory.org)

Interconnection Methods for Reconfigurable Parallel Processing Systems

paring a destination tag to a local address, making it necessary only for a user to specify the desired destination. Since there are a large number of routing choices, it is a simple matter to bypass faulty computers. If A is the absolute difference between the addresses of the sender and receiver, the delay is O(lOg2A). If the maximum level is i and there are 16 i +1 computers, then the number of buses i

in the system is b=

L

16 i/2 j • Since all buses may be in use

j=O

simultaneously, S=b/16i+l. For example, for i=4, S=O.12. The Hierarchical Restructurable Multi-Microprocessor (HR MM) 2 architecture employs multiple control buses called Control Groups (CGs) and a circulating data bus. Switching elements, called Block Short Modules (BSMs), segment the control group buses between adjacent pairs of processors, as shown in Figure 11. A CG consists of three buses-( 1) the CMD bus that carries commands to processors; (2) the ACK/NAK bus with which a processor recognizing a command can acknowledge its acceptance or rejection (due to a full queue); and (3) the DONE bus where a command processor can acknowledge the completion of a required task. When a command is received by a processor, it is placed into a queue and cannot be executed until the data associated with it arrives on the data bus, consequently all processors execute instructions independently as an MIMD machine. Each CG is given a fixed priority, thus enabling establishment of a hierarchy for communication. While the buses that form a CG are of the conventional type, the data bus is of the circulating loop (or Pierce Loop) type where data packets are moved a fixed distance and direction in each unit of time. Both buses are used in the PE-to-PE configuration. The simultaneitv of the data bus is S = 1, since the bus may be viewed as ~ parallel shift register and each processor may place one data packet on the bus at one time. "Carries" and synchronization information are provided between processors by the sync/carry loop (see Figure 11). Thus, there is interprocessor dynamic word size. Changing the settings of the BSMs changes the structure of HRMM and is accomplished by issuing commands on the Master CG CMD bus. The structure of the system may be viewed as a tree. The broadcast scope B= 1/

Figure II-Hierarchical, restructurable multi-microprocessor architecture. 2

539

(N -1), but broadcasting is possible on the data bus in N - 1 bus cycles by placing a "don't care" in the destination address. The data bus and CG buses are physically distributed and modular. CG and data bus widths and complexity are not completely specified so LSI compatibility cannot be established. The cost of the CGs is O(mN) and the cost of the data bus is O(N), where m is the number of CGs. The delay through the CG buses is one control bus cycle. The worst case delay through the data bus is N -1 data bus cycles, where a bus cycle is on the order of 50 ns.49 The range and domain of the data bus are R=D= 1/(N -1) because of its unconventional structure. Since data packets move from one processor to the next in a fixed amount of time (i.e., one bus cycle), one processor can only send data to or receive data from one other processor each bus cycle. The range and domain of the CG buses are a function of the BSM settings with a best case of R=D= 1. The HRMM is readily expandable by adding more processors to either end. All processors are treated equally by the data bus, thus it is homogeneous. The data bus is easy to use, whereas the CG buses are more complex due to their flexibility. The CGs are fault tolerant; however, the flexibility to configure them is reduced by varying degrees, depending on which BSMs fail. The data bus is not fault tolerant since a break in the loop makes it virtually unusable. Crossbar switches Crossbar switch (CBS) networks are shared bus networks in which p nodes can be connected to q nodes. Such a CBS is called a pXq CBS. The cost of a pXq CBS is O(pq), and the delay of a CBS is constant. CBSs can be extended incrementally, the difficulty of which is implementation dependent. CBSs are modular in design, and may be appropriate for LSI, depending on the complexity of the crosspoints (e.g. queues). C.mmp 50 is an MIMD system consisting of p processors, one large shared memory with m memory modules, and k I/O buses, as shown in Figure 12. The system contains two CBSs. One, the Skp, is a kXp CBS which connects the k 1/ o buses with the p processors. The other, the Smp, connects the m memory modules with the p processors and is a mXp CBS. Each processor is a self-contained PDP-ll. The simultaneity is S=min(k,p)/p for the Skp and S=min(m,p)/p for the Smp. Both the CBSs are homogeneous and physically centralized. However, the (software) control is distributed. A memory address is translated by hardware into a setting for the CBS. This makes the CBS transparent to the user. There are hardware switches which can force the CBSs into a given state. The connectivity, range and domain are C = R = D = 1, since any processor can be connected directly to any memory; however, processorto-processor communications are limited to using a memory as an intermediate node. The broadcast scope is B = 1/ (N -1) and the broadcast delay is O(n), using a recursive doubling scheme.

From the collection of the Computer History Museum (www.computerhistory.org)

540

National Computer Conference, 1979

SlDp

(.-to-, cro .. poi.nt)

r- - - - - - 1 -

--I

I

1 K.configuration

1

tions. Since each processor is treated equally, the network is homogeneous. Any processor can broadcast a data item to all of the processors via the CBS, and can send data to, or receive data from, any other processor, so B=R=D= 1. If a processor fails, it can be removed from the list of available processors. This means it can never be assigned to a program. If a sector fails, all of the processors in the sector can be removed from the list of available processors. A control unit failure means one less SIMD program can be run simultaneously; however the rest of the system can run unaffected. CONCLUSIONS

I

K.configuration

I

1_ _ _ _ _ _ _ _ _ _ _I

k

Skp

i

i

(p-to-k; nu 11 dual dup lex crosspoint

where:

Pc/central processor; lip/primary memory; T/terminah; ls/alow device control (e.g., for Teletype); Kf/fast device control (e.g., for disk); Kc/control for clock, t1lller, interprocessor cOImIunlcation

IBoth switches have static configuration control by manual aM program control

Figure 12-Block diagram of C.mmp which shows both CBSs.50

The C.mmp is completely partitionable, i.e., it can be reconfigured so that any set of processors can work together by sharing a memory. However, there may be some interference. As for fault tolerance, if a memory module or I/O bus fails, the hardware switches can be set to isolate the bad hardware from the rest of the system. Similarly, if a processor fails the switches can be set so that it cannot access any memory modules or I/O buses. However, a failure in either of the CBSs could force the entire system to stop. The Multi Associative Processor (MAP)23 is an MSIMD system consisting of eight control units and 1024 PEs. The PEs are grouped together into 16 sectors of 64 PEs each. All of the PEs in a sector are connected via a bus. The bus of each sector can be connected to anyone of the eight control units via a 16x8 CBS, as shown in Figure 13. Having eight control units allows up to eight independent SIMD programs to be executed simultaneously. Any number of processors can be dynamically allocated to anyone of the eight control units; however, the most efficient partitions will be those which put all of the processors of a given sector into the same partition. The CBS of MAP is physically centralized. Each control unit makes its own requests, and there is a control unit supervisor which arbitrates conflicts. The ~imultaneity i~ S= 1611024 for intra-sector PE-to-PE communications and 8/1024 for inter-sector communica-

Summaries of a wide variety of methods for providing interprocessor communications in reconfigurable largescale parallel processing systems have been presented. A set of parameters for describing the features of these networks were defined. These parameters were used in the descriptions of the net works to provide a common basis for comparison. The rest of this section discusses future research directions in network design for a reconfigurable system. Reconfigurable large-scale parallel processing systems are becoming more prevalent as hardware costs decrease and the knowledge about exploiting parallelism in tasks increases. The interconnection networks for these systems should be restructurable under software control. In applications where it is possible, parallel programs, including interprocessor communications, should be generated automatically, with the explicit parallelism hidden from the user. In applications where this is not possible, or for those who wish to have direct command over the parallel system

PE

~ ~

Figure 13-Diagram of the CBS for the MAP system. 23

From the collection of the Computer History Museum (www.computerhistory.org)

Interconnection Methods for Reconfigurable Parallel Processing Systems

for efficiency or research, the network control must be accessible. For ease of use, each programmer dealing with a partition or submachine of size M should write network commands based on a logical numbering of the processors, from 0 to M -1. Furthermore; the user should specify communication paths in terms of destination tags, and let the processors or the network hardware itself compute the data paths, such as is done with the Omega network. 21 There should also be machine instructions for specifying particular network connections, e.g., a method to specify "+ 1 modulo M" on the Illiac. 4 Since these systems and their networks will have a large number of components, fault tolerance is very important. Networks should have the ability to work around faulty switch elements, as in the SW-banyan network. 22 Efficient methods for detecting faults in a network must be devised. As the size of systems increases, the interest in SI-MD machines will shift to MSIMD architectures, such as MAP.23 To increase the range of applications these systems can handle, networks should be able to (1) allow all processors to transfer data simultaneously, (2) prevent independent users' submachines from communicating with each other, (3) allow a single user's submachines to communicate when desired, (4) allow submachines to be of different sizes, and (5) allow e-ach submachine to control its network independently (as the ADM network can 34,35)., MSIMD systems with such communication abilities will have some fault tolerance in that a faulty component need only shutdown the smallest size partition (e.g. an MC group in PASM35). Furthermore, in SIMD applications that require high reliability, the task may be run simultaneously in several different partitions. Then the partitions can communicate with each other to confirm the validity of their results. (This assumes a suitable backup scheme for operatingwithout or replacing the "master" controller in case of its failure.) MIMD system networks should be able to support a high degree of simultaneity so that independent subsystems can communicate with minimum interference. Crossbar switches are too costly for large systems. Multiple bus systems such as those in CHoPP42 and MMC49 appear to be a promising approach. Ways to implement these networks, the use of packet switching, and techniques for evaluating MIMD networks need to be examined. PSM systems have all of the advantages of MSIMD and MIMD systems. Furthermore, they allow a single system to be built, and then have its processors operate in any combination of either mode (MSIMD or MIMD), depending on the users' needs. Thus, for example, a system may simultaneously behave as four independent SIMD machines and two independent MIMD machines, all of different sizes. In addition, a group of PEs may, for example, do preprocessing for a pattern recognition task in SIMD mode and then the same set of PEs may continue the task in MIMD mode. How well proposed PSM networks such as the ADM and SW-banyan can support such activities must be evaluated, and new approaches must be explored. Systems capable of varying their word size will also be more prevalent. Machines such as RVAp22 and DC18 will

541

provide flexible systems, capable of functioning as PSM computers, where each "composite processor" operates on a user designated word size. The "carry" lines portion of the network should be easily reconfigured, as with the .. carry" network in DC. The inter-" composite processor" network should have the range and domain of a network like the SW-banyan, if operating on tasks which use a large number of "composite processors" that must communicate often. As in the PSM network area, network evaluation techniques and new schemes must be investigated. Ways in which networks can exploit LSI technology must be studied. Physically distributed networks, such as those in DC 18 and CHoPP42, can be incorporated with other system components and take advantage of LSI. Most proposed and existing physically centralized networks do not take advantage of LSI, due to low complexity/pin count ratio. Fut-ure networks may make use of LSI by becoming more "intelligent." For example, architects could design switch elements capable of supporting features such as pipelining, conflict (switch contention) resolution, fault detection, fault tolerance, and destination tag-based routing. Finally, network designers must not forget the user. Architects must remember that the networks they design must function efficiently for user problems. Therefore, networks should not be designed without considering the intended applications of the system the network is supporting. Work needs to be done in defining descriptive parameters for both networks and the communication needs of users' problems. By establishing a relation between these two sets of parameters, a problem could be analyzed to find its "communication needs parameters," and then the appropriate "network parameters" necessary to solve the problem efficiently could be determined.

REFERENCES 1. Anderson, G. A., and E. D. Jensen, "Computer interconnection structures: taxonomy, characteristics, and examples," ACM Computing Surveys, Vol. 7, No.4, Dec. 1975, pp. 197-213. 2. Arnold, R. G., and E. W. Page, "A hierarchical, restructurable multimicroprocessor architecture," 3rd Annual Symposium on Computer Architecture, Jan. 1976, pp. 40-45. 3. Baer, J. L., "Multiprocessing systems," IEEE Trans. on Comput., Vol. C-25, No. 12, Dec. 1976, pp. 1271-1277. 4. Barnes, G., et. al., "The Illiac IV computer," IEEE Trans. Comput., Vol. C-17, No.8, Aug. 1968, pp. 746-757. 5. Batcher, K. E., "STARAN parallel processor system hardware," Nat'l. Computer Conj., May 1974, pp. 405-410. 6. Batcher, K. E., "The flip network in STARAN," 1976 Int'l. Conj. on Parallel Processing, Aug. 1976, pp. 65-71. 7. Batcher, K. E., "The multi-dimensional access memory in STARAN," IEEE Trans. on Comput., Vol. C-26, No.2, Feb. 1977, pp. 174-177. 8. Bouknight, W. J., et. al., "The Illiac IV system," Proc. IEEE, Vol. 60, Apr. 1972, pp. 369-388. 9. Davis, E. W., "STARAN parallel processor system software," Nat'l. Computer Conj., May 1974, pp. 17-22. 10. Enslow, P. H., Jr., "Multiprocessor organization-a survey," ACM Computing Surveys, Vol. 9, Mar. 1977, pp. 103-129. II. Feierbach, G., and D. Stevenson, A Feasibility Study of Programmable Switching Networks for Data Routing, Institute for Advanced Computation Phoenix Project Memorandum No. 003, May 1977. 12. Feldman, J. D., and L. C. Fulmer, "RADCAP-an operational parallel processing facility," Nat'l. Computer Con!., May 1974, pp. 7-15.

From the collection of the Computer History Museum (www.computerhistory.org)

542

National Computer Conference, 1979

13. Feng, T., "Data manipulating functions in parallel processors and their implementations," IEEE Trans. Comput., Vol. C-23, No.3, Mar. 1974, pp. 309-318. 14. Flynn, M. J., "Very high-speed computer systems," Proc. IEEE, Vol. 54, Dec. 1%6, pp. 1901-1909. 15. Goke, L. R., "Connecting networks for partitioning polymorphic systerns," Doctoral dissertation, Dept. of Electrical Engineering, University of Florida, 1976. 16. Goke, L. R., and G. J. Lipovski, "Banyan networks for partitioning mUltiprocessor systems," 1st Annual Symposium on Computer Architecture, Dec. 1973, pp. 21-28. 17. Jones, A. K., et. al., "Software management of Cm*-a distributed mUltiprocessor," Nat'l. Computer Coni, June 1977, pp. 657-663. 18. Kartashev, S. I., and S. P. Kartashev, "Dynamic architectures: problems and solutions," Computer, Vol. 11, July 1978, pp. 26-41. 19. Kartashev, S. I., and S. P. Kartashev, guest eds., "Modular computers and networks," Computer, Vol. 11, July 1978, whole issue. 20. Kuck, D. J., "A survey of parallel machine organization and programming," ACM Computing Surveys, Vol. 9, Mar. 1977, pp. 29-59. 21. Lawrie, D., "Access and alignment of data in an array processor," IEEE Trans. on Comput., Vol. C-24, No. 12, Dec. 1975, pp. 1145-1155. 22. Lipovski, G. J., and A. Tripathi, "A reconfigurable varistructure array processor," 1977 Int'l. Coni on Parallel Processing, Aug. 1977, pp. 165-174. 23. Nutt, G. J., "Microprocessor implementation of a parallel processor," 4th Annual Symposium on Computer Architecture, Mar. 1977, pp. 147152. 24. Okada, Y., H. Tajima and R. Mori, "A novel multiprocessor array," 2nd Symposium on Micro Architecture, 1976, pp. 83-90. 25. Paker, Y., and M. Bozyigit, "Variable topology multicomputer," 2nd Symposium on Micro Architecture, 1976, pp. 141-151. 26. Pease, M. c., "The indirect binary n-cube microprocessor array," IEEE Trans. Comput., Vol. C-26, No.5, May 1977, pp. 458-473. 27. Reddi, S. S., and E. A. Feustel, "A restructurable computer system," IEEE Trans. Comput., Vol. C-27, No. I, Jan. 1978, pp. 1-20. 28. Siegel, H. J., "Analysis techniques for SIMD machine interconnection networks and the effects of processor address masks," 1975 Sagamore Computer Conf. on Parallel Processing, Aug. 1975, pp. 106-109. 29. Siegel, H. J., "Single instruction stream-multiple data stream machine interconnection network design," 1976 Int'l. Coni on Parallel Processing, Aug. 1976, pp. 272-282. 30. Siegel, H. J., "Analysis techniques for SIMD machine interconnection networks and the effects of processor address masks," IEEE Trans. Comput., Vol. C-26, No.2, Feb. 1977, pp. 153-161. 31. Siegel, H. J., "The universality of various types of SIMD machine interconnection networks," 4th Annual Symposium Of! Computer Architecture, Mar. 1977, pp. 70-79. 32. Siegel, H. J., "Preliminary design of a versatile parallel image pro-

33.

34.

35.

36.

37.

38.

39.

40. 41. 42.

43. 44. 45. 46. 47. 48. 49.

50.

cessing system," Third Biennial Conf. on Computing in Indiana, April 1978, pp. 11-25. Siegel, H. J., "Partitionable SIMD computer system interconnection network universality," 16th Annual Allerton Coni on Communication, Control, and Computing, Oct. 1978. Siegel, H. J., and S. D. Smith, "Study of multistage SIMD interconnection networks," 5th Annual Symposium on Computer Architecture, Apr. 1978, pp. 223-229. Siegel, H. J., P. T. Mueller, Jr., and H. E. Smalley, Jr., Preliminary Design Alternatives for a Versatile Parallel Image Processor, School of Electrical Engineering, Purdue University, Technical Report TR-EE 7832, June 1978. Siegel, H. J., P. T. Mueller, Jr., and H. E. Smalley, Jr., "Control of a partitionable multi microprocessor system," 1978 Int'l. Coni Parallel Processing, Aug. 1978, pp. 9-17. Siegel, H. J., and P. T. Mueller, Jr., "The organization and language design of microprocessors for an SIMD/MIMD system," Second Rocky Mt. Symp. on Microcomputers, Aug. 1978, pp. 311-340. Siegel, H. J., R. J. McMillen, P. T. Mueller, Jr., and S. D. Smith, A Versatile Parallel Image Processor: Some Hardl1,'are and Software Problems, School of Electrical Engineering, Purdue University, Technical Report TR-EE 78-43, Oct. 1978. Smith, S. D., and H. J. Siegel, "Recirculating, Pipelined, and Multistage SIMD interconnection networks," 1978 Int'l. Coni Parallel Processing, Aug. 1978, pp. 206-214. Stone, H. S., "Parallel processing with the peIfect shuffle," IEEE Trans. Comput., Vol. C-20, No.2, Feb. 1971, pp. 153-161. Stone, H. S., "Parallel Computers." in Introduction to Computer Architecture, H. S. Stone, ed., S.R.A., 1975. Sullivan, H., T. R. Bashkow, and K. Klappholz, "A large scale homogeneous, fully distributed parallel machine," Fourth Annual Symposium on Computer Architecture, Mar. 1977, pp. 105-124. Swan, R. J., S. H. Fuller, and D. P. Siewiorek. "Cm*: a modular, multi-microprocessor," Nat'l. Computer Coni, June 1977, pp. 637-644. Swan, R. J., et. al., "The implementation of the Cm* multi-microprocessor," Nat' I. Computer Coni, June 1977, pp. 645-655. Thurber, K. J., "Interconnection networks-a survey and assessment," Nat'l. Computer Conf., May 1974, pp. 909-919. Thurber, K. J., "Circuit switching technology: a state-of-the-art survey," Compcon 78, Sept. 1978. Thurber, K. J., and L. D. Wald, "Associative and parallel processors," ACM Computing Surveys, Vol. 7. No. ~, Dec. 1975, pp. 215-255. Widdoes, L. c., Jr., "The Minerva Multi-Microprocessor," 3rd Annual Symposium on Computer Architecture, Jan. 1976, pp. 34-39. Wittie, L. D., "Efficient message routing in mega-micro-computer networks," 3rd Annual Symposium on Computer Architecture, Jan. 1976, pp. 136-140. Wulf, W. A., and C. G. Bell, "C.mmp-a multi-miniprocessor,"FJCC, Dec. 1972, pp. 765-777.

From the collection of the Computer History Museum (www.computerhistory.org)