PARALLEL COMPUTATION OF POLYMER FLOW THROUGH A DOUBLE SCREW EXTRUDER

PARALLEL COMPUTATION OF POLYMER FLOW THROUGH A DOUBLE SCREW EXTRUDER * † † Yansheng Jiang , J.D.J. Grinwis , A.J.N. Vreenegoor and F.J.J. Rosendal ...
10 downloads 0 Views 881KB Size
PARALLEL COMPUTATION OF POLYMER FLOW THROUGH A DOUBLE SCREW EXTRUDER *





Yansheng Jiang , J.D.J. Grinwis , A.J.N. Vreenegoor and F.J.J. Rosendal



*

LTAS-Mécanique de la Rupture Institut de Mécanique Université de Liège Rue Ernest Solvay 21, B-4000 Liège, Belgique e-mail: [email protected], web page: http://isis.ltas.ulg.ac.be/dang/index.html †

Shell Research & Technology Centre at Amsterdam (SRTCA) PO BOX 3800, 1030 BN Amsterdam, Netherlands

Key words: Polymer Flow, Parallel Computation, Double-Screw Extruder Abstract. This paper describes numerical simulations of polymer flow trough a double-screw extruder of commercial scale used to manufacture a Shell product of polymer named as CARILON. A large problem (more than 250,000 variables) of polymer flow through a 3D complex geometry has been solved with a parallel version of POLYFLOW on an SP2 of 30 nodes at SRTCA. A hybrid direct/iterative solver is used and proved to be efficient. The beneficial impact of parallel computing to industrial day-life CAE application is demonstrated.

1

Yansheng Jiang, J.D.J. Grinwis, A.J.N. Vreenegoor and F.J.J. Rosendal

1

INTRODUCTION

Although the parallel computing in polymer flows has been an active academic research domain for some 10 years1,2,3,4,5,6, its acceptance by industries as a day-life numerical simulation tool in CAE is only a very recent event. This is partially because parallel computers had been for long time too much expensive and too user-unfriendly. The situation was changed when arrived in 90s on the market the new generation of parallel computers such as SP2 of IBM and Power Challenge of SGI. The former is a “typical” distributed parallel platform and the latter a shared memory one. SP2 is made of a number of standard relative cheap RS6000/360 or 590 which are linked together by a switch of very high communication bandwidth. Being aware of the progress in parallel computers and the forthcoming of the era of the parallel computing in industries, POLYFLOW made large effort in developing its parallel version. The work was supported by European Commission under the project EUROPORT and lasted for three year from the beginning of 1994. The paralleled POLYFLOW was tested on a number of benchmarks of small and media scales. On the other hand, industry customs of POLYFLOW such as Shell International, which owned already an SP2, were keen on using paralleled CFD software in its simulation of polymer flows of large size. The need for each other attracted POLYFLOW and Shell together and entered both into the project EUROPORT-D which had as the objective the demonstration of the impact of parallel computing to modern industries. In the following sections, the polymer extrusion process simulated in the current paper is described in order to give a comprehensive insight on the physical problem. The numerical simulations by Shell in the past is shortly introduced which make clear the limit of sequential computing. The parallel algorithms of POLYFLOW are described in somewhat detail which are necessary to understand the techniques used in the current parallel simulations. Finally, physical results of our simulations demonstrate the impact of parallel computing to industrial applications. 2 POLYMER EXTRUSION Raw polymer is normally produced in the form of powder. The bulk material is then sold to customers in the form of “pellets”, small cylindrical pieces of material. The conversion of the polymer powder into these pellets is done by means of very large “double-screw extruders”. Such a machine is typically consisted of two electronically driven co-rotating screws surrounded by steel barrel. Through an opening on the top side of the barrel, the powder is continuously fed into the extruder. The rotating screws convert electrical power into mechanical energy used to melt the powder at 200-250 °C and to build up pressure while the polymer melt is transported towards the screw tips. At this location, the flow channel changes into a cylinder that guide the polymer towards a circular split which is closed by a “die plate”: a metal plate that contains a circular pattern of holes. Polymer strands flow from these holes and cut by the rotating knife of a “pelletizer” like grass is cut by a lawn mower. The results of this process are the required pellets.

2

Yansheng Jiang, J.D.J. Grinwis, A.J.N. Vreenegoor and F.J.J. Rosendal

Entry section of 8 shape

Current simulation part

Figure 1. A typical double-screw extruder. The main components, the extruder itself and the pelletizer are de-attached for clarity. The screws are not shown;their tips enter from the left through the visible 8-shape section Traditionally, the design of commercial scale extruders is based on prior experience. Although such experience is very important, it is difficult to use when scaling up existing extruders to larger machines or for the design of a new product. Shell was recently faced with this situation when, after ten years of research, with the first production plant having opened at Carington (UK) in 1996. Compared to classical polymers, CARILON has different melt flow properties and standard extrusion equipment required a substantial re-design. To conserve its superior mechanical properties, it is of utmost importance that CARILON not be exposed to too high temperature for too long. Otherwise, the result would be a lower quality product and machine fouling which, eventually, requires the extruder to be stopped and cleaned: a major exercise requiring substantial man power and bringing the pelletizer down for a couple of days. Experiments are very expensive. To perform temperature measurements at the Carrington plant requires the shipping of a couple of tons of CARILON powder, the manufacturing of a die plate with 30 thermocouples connected to a data logger and a team of Shell engineers from various different locations to run the extruder and operate the experimental equipment.

3

Yansheng Jiang, J.D.J. Grinwis, A.J.N. Vreenegoor and F.J.J. Rosendal

3

NUMERICAL SIMULATIONS IN THE PAST

Therefore, SRTCA heavily relies on numerical CFD simulations (based on POLYFLOW software) enabling a non-intrusive inspection of the 3D flow domain. Numerical simulations have been performed in the past by Shell to predict temperature distributions with a sequential version of POLYFLOW. However, high computing time and lack of memory are the limiting factors for realistic and accurate polymer simulation. Even the largest workstation available at SRTCA, a fat SP2 node with 1 Gbyte RAM, was far too small to simulate the flow through the complete core part of the extruder (Fig. 2). It is estimated, as shown in table 1, that, even having sufficient memory available, the solution of this problem (250000 DOF) would take more than one week, unacceptably high for Shell as part of its design optimisations. On the other hand, this particular simulation was very important: temperature measurements indicated two “hot spots” on opposite sides of the die plate giving evidence to the presence of certain areas of unacceptably high shear somewhere inside the extruder. In the past, Shell was forced to simplify the situation to be able to perform simulations. For instance, instead of using the full geometry, simulations were restricted to the most critical subregions (surrounding the screw tips, see Figure 2) using coarse meshes of just 34 000 DOF and with unrealistic assumptions about certain physical parameters to avoid numerical instabilities. Although only a crude approximation, the results still allowed qualitative analyses of high shear areas and also identified areas of high “residence time” where the polymer melt was entrapped and would finally foul the extruder. However it is strongly desired to extend the simulation domain to the whole downstream flow channel towards the die exit (Figure 2). simulation domain in the past

die exit

Figure 2 Extented domain of simulation The extension of simulation domain leads to a drastic increase of the problem size, table 1. Table 1. Size of original and extended problems problems

Nb. variables

Frontal width

Original Extended

34049 254698

2165 3410

4

Time on one RS6000/590 5 hours 7 days (estimated)

Yansheng Jiang, J.D.J. Grinwis, A.J.N. Vreenegoor and F.J.J. Rosendal

Furthermore, the solution of the extended problem would require a free disk space as large as 8 Gbytes to store the stiffness matrix after the LU decomposition. This requirement is practically impossible to be satisfied. Because of all these difficulties, numerical simulations on the extended domain had not been possible with POLYFLOW until its parallel version was operational on SP2 of SRTCA. 4

DESCRIPTION OF PHYSICAL PROBLEM

The polymer flow considered in this study is of laminar incompressible type and of steady state. An analysis of boundary conditions and material parameters revealed the following orders of magnitude for the dimensionless numbers of interest: -4

Renolds number: Re = O(10 -6), Galileï number: Ga = O(104 ) Péclet number: Pe = O(10 ) The relatively weak Re indicates that, as expected, viscous forces dominate over inertial force. The small value of Ga guarantees that the gravitational force may be safely neglected. The high value of Pe shows that convective heat flux dominate the diffusive flux and might be a source of numerical instabilities. Based on this analysis, the continuity, the momentum and the energy equations may be written as the follows: ∇⋅v = 0

(1)

σ = − pI + T

(2)

∇ ⋅ σ = ρ∇ ⋅ v

(3)

ρC p v∇ ⋅ T = T : ∇v - ∇ ⋅ (k∇T)

(4)

where T stands for the extra stress tensor while T the temperature. ρ is the density of fluid, Cp the heat capacity, k the heat conduction coefficient. The polymer in this study is considered as a generalised Newtonian fluid. Its constitutive equation reads .

T = 2η( γ , T) D

(5)

where D is the rate of deformation tensor defined by 1 D = ( ∇v + ∇v T ) 2

(6) .

and η is the shear viscosity which varies with temperature T and the shear rate γ . The explicit expression of η has been obtained by fitting the experiment data. It can be written as a product of two functions as:

5

Yansheng Jiang, J.D.J. Grinwis, A.J.N. Vreenegoor and F.J.J. Rosendal

.

η = F( γ, T) ⋅ H(T)

(7)

Notice that the temperature T appearing in F and in H results in the so called horizontal and vertical shifting of the power law curve of the viscosity with reference of that at a reference temperature. 5 PARALLEL ALGORITHMS IN POLYFLOW POLYFLOW is an implicit finite element code specialised in the simulation of industrial flow processes at low Reynolds numbers but of rheologically complex liquids. Extrusion, coextrusion, fibre spinning, blow moulding, die design, laminar mixing, coating, forced and natural convection, heat transfer are typical industrial processes which can be simulated with POLYFLOW. The fluids encountered in processing, such as polymer solutions and polymer melts, rubber, food products, oil, detergents and suspensions are endowed with a highly nonlinear behaviour which classifies them as non-Newtonian fluids. The parallel version of POLYFLOW has been developed since 1994 within a frame of work supported by Europort-1 project. The parallelisation is implemented in POLYFLOW in a machine-independent manner based on the message-passing library PARMACS. The current parallel version of POLYFLOW supports all distributed memory platforms without modification. The parallelism of POLYFLOW is based on the domain decomposition. POLYFLOW has a built-in code of CHACO, a fully automatic domain decomposition program, in its interactive data pre-processor POLYDATA. The user needs only to introduce a desired number of subdomains and can then visualise the resulted decomposition. POLYDATA minimises the computation cost in each subdomain by resorting the element numbers. Once running in parallel environment with either distributed or shared memory, n POLYFLOW processes are divided into three execution modes: one process runs in the role of the so called Task Broker, one process runs as Master and the remaining (n-2) processes work in the mode of Slave. The Slave processes work only in a passive way and perform tasks received from the Task Broker. The Master process reads data file, generates computational tasks following the problem definition and then enters in the mode of Slave. The Task Broker collects tasks generated by the Master process in its task stack and distributes them to slave processes as well as the master process in function of the availability of depending data. The management of M tasks and N processes in parallel computation is quite similar to that of M clients (= tasks) and N employees (= processes) in a bank. In the later case, M clients forms a queue in which they wait for being received by an employee. Whenever an employee is free, he (or she) picks up a client, if any, from the client queue. The difference, however, is that in a bank, clients in the queue are served in the order of FIFO (first in first out) while the computational tasks have inter-dependence among them. A task can be performed only if the data it depends on are available. Check of the availability of data is a role of the Task Broker. For a given task, the locality of data that the task depends on is a determinant factor for its assignment.

6

Yansheng Jiang, J.D.J. Grinwis, A.J.N. Vreenegoor and F.J.J. Rosendal

This is in order to avoid as much as possible the transfer of data between processes. The Task Broker has been written in C++ with an object-oriented approach inspired from a bank simulation system as described by Haython7. A detailed description of the Task Broker can be found in Jiang8. Two kinds of parallel solvers have been implemented in POLYFLOW9, the direct solver and the hybrid direct/iterative solver (sometimes simply called iterative solver). Both of the solvers rely on domain decomposition for parallelism but differ in the technique for solving variables along interfaces of sub-domains. For the direct solver, the dependence between data of different sub-domains may be represented with the graph of a complete binary tree. An example is shown in figure 3 for the case of 4 sub-domains of a simple square domain.

root node 4

2

5

1

parent node

parent node 1

2

1 4

6

3

7

5

leaf nodes

Decomposition into 4 sub-domains

3 7

6

leaf nodes

Resulted binary graph

Figure 3. Domain decomposition and its tree for the full direct solver In this example (but without loss of generality), the grid is decomposed into 4 sub-domains numbered from 4 to 7. Variables are condensed as long as they do not lie along the interfaces. The remaining matrix formed by the coefficients related to interface variables is called Shur complement. It is a full popular matrix and of impressive size in industrial 3D problems. The frontal elimination can no longer be performed without knowledge of Shur complement from neighbour sub-domains. Shur complements assembled from two neighbour sub-domains form a new system of equations to be solved. This is represented in the above binary tree by a node at one level up which is also called parent node. This process of condensation of eliminable variables and moving the remaining ones as Shur complement to the parent node continues until reaching the root node where all remaining variables may be solved completely. The memory required through the whole process depends on the maximum frontal width among all nodes. Unfortunately, commonly used tools for domain decomposition generate more than often too large frontal width on parent nodes so that the active matrix may no longer be fit into the core memory of computers. This drawback constitutes in a bottleneck of the application of

7

Yansheng Jiang, J.D.J. Grinwis, A.J.N. Vreenegoor and F.J.J. Rosendal

the direct solver to solve practical large problems. However the direct solver has the advantage of robustness for non-elliptic problems. From our opinion and experience, the direct/iterative solver is the most promising one for solving large problems such as the current one. At the first stage, Shur complements are formed at the leaf nodes exactly the same way it is done for the full direct solver. What is different is that there is only one parent node and the system of equations composed of all Shur complements is solved with an iterative solver. To do that, we use a modified GMRES technique scaled with the inverse of diagonal terms of Shur complements. This completely avoids the memory bottleneck encountered in the direct solver as all the intermediate parent nodes are removed. Shur complements need not be transferred at all. The matrix/vectors products which are essential computational effort in the iterative solver are also performed in parallel. Figure 4 shows an example of decomposition and the resulted flat tree.

root node 2

1

1

3

1

1 2

4

1

5

3

leaf nodes

4

5

leaf nodes Resulted flat graph

Decomposition into 4 sub-domains

Figure 4. Domain decomposition and its graph for the direct/iterative solver A drawback of the iterative solver based on GMRES technique consists in the memory requirement for the computation defined on the root node: the Krylov space increases as the square of the total number of interface variables. But the difference as compared to the direct solver is that one knows this a priori and can then manage to have a process of large memory for performing the iterative computation. That means one needs a heterogeneous parallel system which is composed of one "fat" node of huge memory and a number of "thin" nodes equipped with normal quantity of memory. Fortunately, the configuration of SP2 of SRTCA as shown in the following table satisfies exactly what we need. Table 2. Configuration of SP2 of SRTCA # nodes 28 thin 2 fat

RAM/node 192 MB 1024 MB

Disk/node 400 MB 0

8

Swap space 300 MB 1500 MB

Yansheng Jiang, J.D.J. Grinwis, A.J.N. Vreenegoor and F.J.J. Rosendal

Note that the above statement concerns only on parallel computers with distributed memory such as SP2 and clusters of work stations. As for multiple-processes computers of shared memory such as Silicon Graphic Power Challenge series, the memory requirement for the iterative solver is often satisfied because those computers are equipped generally with very large core memory. The essential drawback of the iterative solver resides in its relatively weak robustness in the solution of non-elliptic problems as compared to that of the direct solver. But this can be improved by carefully scaling down the condition number of the system of equations. The current version of POLYFLOW needs user's explicit knowledge on the problem to be solved for the scaling. This will be improved in future releases. 6 PARALLEL COMPUTATIONAL ASPECTS For the current problem, the finite element mesh is composed of 18,240 brick and wedge elements. Application of the finite element discretisation to the governing equations together with boundary conditions generates a system of equations of 254698 degrees of freedom. As explained previously, parallel computing is the only way to take over such challenging problems. Although the parallel version of POLYFLOW had been subjected to a number of rigorous tests of industrial scale through the project Europort-1, simulations of polymer flow on the extended domain as depicted in figure 2 constitutes in the largest industrial problem that POLYFLOW has ever solved. At the current state-of-art, parallel computing is not yet as easy as using a black box especially when solving a large problem for the first time : the memory limit of a node may be exceeded, a local disk may not be sufficient. In general, given an existent parallel computing system, we, as merely normal user, can do nothing to change the system configuration to fit our special requirement. All we can do is to vary the number of subdomains or to improve the quality of domain decomposition or, in case of failure in using the direct solver due to hardware limitations, to turn on the iterative solver instead. In one word, careful investigation on the solution feasibility is often necessary. Our investigation of the current problem is summarised in the following table of domain decomposition: Table 3. Domain decomposition Nb. of subdomains

Maximum Frontal width

Size of active matrix (MB)

8 16 20 32

5027 3942 3625 2926

193 119 101 66

Maximum buffer size/node (MB) 1059 358 282 151

9

Nb. of Interface variables

Memory for iterat-ive solver (MB)

---22804 26397 33443

---247 356 542

Yansheng Jiang, J.D.J. Grinwis, A.J.N. Vreenegoor and F.J.J. Rosendal

Compared with the configuration of SP2 of SRTCA given in table 2, it is clear that the decomposition in 8 sub-domains is not feasible because both the memory and the local disk will be exceeded. Neither is it interest to use the decomposition in 32 sub-domains as there are only 30 nodes in total. Although a better load balance may be obtained when the number of subdomains is higher than that of nodes, there are risks of shortage in the local disk space on the node which performs the elimination on the most sub-domains. Therefore, valid decompositions are that of 16 and 20 sub-domains. The latter was chosen in our final computation for the raison of safety margin in terms of both memory and disk space. The configuration used in the computation is shown in table 4. Note that the lack of local disk prevented the fat node from the elimination task on subdomains. Otherwise, we would need only 20 thin nodes (19 for slave and one for the task broker) in total. As for the solution strategy for solving the problem, the Picard Iteration Scheme is applied to the viscosity law which shows a quite low power index. All nominal values of boundary conditions and material parameters were used in the simulation except the heat conduction coefficient which was 100 times higher. This is for the raison to reduce the convective heat flux which has been proved to be too strong to reach a reasonable solution without creating the numerical instability. This difficulty may be overcome by using an effective upwinding technique implemented in a future release of POLYFLOW. Table 4. Configuration used in current simulations Number of nodes 1 thin

Role in computation task broker

1 fat

master

20 thin

slave

Tasks *generate master and slave processes *distribute computational tasks iterative solver for interface variables *direct solver on sub-domains; *product of matrix/vectors in iterative solver

The finite element discretisation of the current polymer flow problem leads to a system of non-linear equations for unknown nodal values of velocity, temperature and pressure. All of them are coupled together. The solution of such system of equations with POLYFLOW resorts to Newton-Raphson scheme which iterates from an initial solution towards a converged solution. Ten Newton-Raphson iterations were necessary to reach an acceptable final solution. In each Newton-Raphson iteration, the direct solver is used on all sub-domains to generate the Shur complements and then the iterative solver is used for solving all interface variables. 1250 iterations in average were necessary for the iterative solver to reach a convergent solution. The Krylov space used in the iterative solver had been fixed to be 1500 which was proved to be large enough to avoid the restart of iterations10.

10

Yansheng Jiang, J.D.J. Grinwis, A.J.N. Vreenegoor and F.J.J. Rosendal

The exclusive access to the 22 reserved nodes had been dedicated to all our computational jobs. The elapsed time per Newton-Raphson iteration was 90 minutes in which 40 minutes were spent for the iterative solver. The total elapsed time is 15 hours. Compared to the estimated time of sequential computing, the speedup of the current parallel computing would be superior than 10. 7 NUMERICAL RESULTS As one of results of major interest, the temperature distribution is shown in figure 5. Across 8−shape entry section, the polymer liquid flows in at a constant temperature. Because of viscous dissipation mainly around the co-rotational double screws, the polymer is heated there and hot polymer is then transported by the convective flow towards the die exit forming two "hot spots". This results has been confirmed by measurements in experiment. hot spot

hot spot

hot spot entry section

hot spot

Figure 5. Temperature distribution In figure 6 on the next page, it can be seen that polymer particles flowing through the hot spots come from the inter-screws region where the shear and the viscous heating are the largest. 8 CONCLUSIONS Parallel computation with POLYFLOW using 22 nodes of SP2 at SRTCA has made it possible to solve a large problem of polymer flow through a 3D complex geometry . A hybrid direct/iterative solver is used and proved to be efficient. The solution time of 7 days if the sequential computation was performed has been reduced with parallel computation to only 15 hours for this problem of more than 250.000 DOF. That constitutes in a major improvement for numerical simulations of large problems to be accepted in engineering design. The simula-

11

Yansheng Jiang, J.D.J. Grinwis, A.J.N. Vreenegoor and F.J.J. Rosendal

Twin screw head (inlet section)

hot spot

hot spot Figure 6. Magnitude of the velocity across the inlet and the outlet sections of the flow domain and particle traces. -tion has given a more clear insight into the flow of a polymer melt through all the flow domain from co-rotational double screws up to the die exit. The results confirm an inhomogeneous temperature distribution at the die exit measured in practice and could explain the occasional outflow of degraded polymer at two specific positions. A decomposition of 20 subdomains has been proved to be a good compromise between the computational efficiency and the memory as well as the local disk space limitations. More extended simulations which take into account the resident time effect coupled with all other variables may be feasible on the SP2 at SRTCA. Improvements need to be done in the user-friendliness of the parallel version of POLYFLOW which currently requires skilful users in handling the distributed parallel platform. Parallel computing now allows the complete extrusion process to be efficiently simulated on a fine mesh, involving higher order elements and correct values for the physical parameters. Shell is very satisfied with these results since they are more reliable and much closer to the actual situation in practice. In particular, they are directly related to the temperature measurements at the die plate. Smaller test cases can now even be closely integrated into a design cycle, while large ones may mainly serve for validation. Looking back to the past, Shell estimates that more than 50% of the manpower invested in sequential simulation could have been saved, corresponding to an approximate cost of $50,000. Due to the lower predictive value of the sequential simulations, there is a high risk that new re-designs will be required for further optimisations of the Carrinton pelletizer. Each re-design will cost about $200,000 for malfunctioning hardware and trouble-shoot man hours together with an additional $100,000 for every day of off-spec production. This risk could

12

Yansheng Jiang, J.D.J. Grinwis, A.J.N. Vreenegoor and F.J.J. Rosendal

have been minimised if Shell had had more accurate simulation results. In any case, future optimisations can now be done with much less effort and much higher accuracy. REFERENCES [1] R. Keunings, O. Zone, R. Aggarwal, Parallel Algorithms in Computational Rheology, Proc. XIth Int. Congress on Rheology, Brussels, Eds. P. Moldenaers, R. Keunings, Elsevier, 274-276 (1992) [2] R. Aggarwal, P. Henriksen, R. Keunings, D. Vanderstraeten, O. Zone, Numerical Simulation of Non-Newtonian Flow on MIMD Parallel Computers, Computational Fluid Dynamics '92, Ch. Hirsch et al. (Eds.), Elsevier, Vol. 2, 1139-1146 (1992) [3] R. Aggarwal, R. Keunings, Finite Element Simulation of Memory Fluids on Message Passing Parallel Computers, Parallel Computational Fluid Dynamics '92, R.B. Pelz, [4] D. Vanderstraeten, O. Zone, R. Keunings, L. Wolsey, Non-Deterministic Heuristics for Automatic Domain Decomposition in Direct Parallel Finite Element Calculations, Proc. 6th SIAM Conf. on Parallel Processing for Scientific Computing, R.F. Sincovec et al. (Eds.), SIAM, 929-932 (1993) [5] R. Aggarwal, R. Keunings, F.X. Roux, Numerical Simulation of Polymer Flows : A Parallel Computing Approach, Proc. 6th SIAM Conf. on Parallel Processing for Scientific Computing, R.F. Sincovec et al. (Eds.), SIAM, 79-82 (1993) [6] O. Zone, D. Vanderstraeten, R. Keunings, A Parallel Finite Element Solver Based on Automatic Domain Decomposition Applied to Viscoelastic Flows, Proc. Parallel Computational Fluid Dynamics 1994, N. Satofuka et al. (Eds.), Elsevier, North-Holland, 297-304 (1995) [7] Haythorn, W. (1994) , What is object-oriented design? Journal of Object-Oriented Programming, March-April. [8] Jiang, Y.S. (1998), Object oriented control of parallel computations. Proceedings of The first international conference on engineering computation technology. August 18th-20th 1998, Edinburg, England. [9] Debae F, Jiang Y, Marchal J-M, Henriksen P. A. parallel version of POLYFLOW, in Lecture Notes in Computer Science 919, High-Performance Computing and Networking, International Conference and Exhibition, Italy, May 1995, HPCN Europe 1995, Springer [10] Saad Y., Schultz M.H. GMRES: A Generalised Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems, SIAM J; Sci. Stat. Comput., Vol. 7, No. 3, July 1986 ACKNOWLEDGEMENT The work in this paper was partially supported by the European Commission under the name of Europort-D, Deployment Subproject POLYFLOW/D. The first author would like to express his gratitude to GMD - Forschungszentrum Informationstechnik GMBH for the availability of an SP2 at GMD which was used intensively for setting up the current problem.

13