Prozessen. Master's thesis, University of Berne, 1990. [13] D.C. Meier. Automatische Kon guration von kommunizierenden sequentiellen [12] INMOS. Trans...
is very simple and easy to implement is very ecient
operating system, the Dmapper seems to be a good candidate, because it pilation phase of a program development [13]. Regarding the mapping problem in such an Investigations have shown that some of these aspects may already be treated in the com-
7
Conclusions
The performance analyser delivers thus accurate load and communication cost information
the number of bytes transferred along each communication channel the number of communications between the processes the exact CPU-time of each parallel process
processor and provides: It based on the principle of simulated execution of a parallel occam program on a single
the routing processes present on each processor node. having their other end on a remote Transputer. Figure 6 shows the principle structure of
w
r
w
r
w
w
mux
demux
0
in
out
0
1
in
out
1
2
in
out
2
3
in
out
3
VAL load.diff IS my.load - load[l] : SEQ l = 0 FOR links {{{ balance load locally costs and send it to neighboring processor ... choose a process causing high local communication and processes with direct neighbors ... exchange process location information SEQ SEQ i = 0 FOR iterations
(see Figure 4). greater than the one of its neighbors B , a part of the load dierence will be given to B processes are then moved to reach a local load equilibrium. If the load of a processor A is the current knowledge about their load and the probable position of all the processes and associated with a restriction). At each iteration step, the neighboring processors exchange At the beginning all the processes are placed on an arbitrary processor (except those
making use of the parallel hardware itself to increase the speed of the mapping
mesh. Since mapping a set of communicating processes onto processors networks is known Figure 3 shows the mapping of a 4 2 4 mesh of processes onto a 2 2 2 dimensional processor distributed as evenly as possible over all communication links. sors should be used as optimal as possible, i.e. the overall communication should be The communication capacity between any two proces-
Communication minimization:
all the processors. is well balanced, i.e. that the load caused by the processes is distributed evenly over The processes have to be distributed such that the overall system load
Load balancing:
Performance Analysis
Routing Tables
Load and communication optimized Mapping PERFORMANCE ANALYSER
MARC
Routing System Generation
PARALLEL HARDWARE
Program Configuration PARALLEL
MAPPER
PROGRAM
ROUTING CONFIGURING
OPTIMALLY DISTRIBUTED PROGRAM
The MARC project aims towards a true distributed operating system and development tion onto arbitrary (network) topologies as well as ecient and secure routing strategies. includes a new method for a load balanced and communication optimized process distribuorder to produce a load balanced and communication optimized executable program. It structure of a parallel program and the structure of the available parallel architecture in occam [10] and for Transputer networks [12] as target machines. The system analyses the sion the MARC system [3],[6],[5] has been realized for programs written in the language for execution on a multiprocesor machine with arbitrary interconnections. In a rst vermunicating processes and produces subsequently an ecient con guration of this program implementations. It expects from the user a parallel program in the form of a set of com-
L anggassstrasse 51 University of Berne Institute for Informatics and applied Mathematics Peter G. Kropf and Jacques E. Boillat