Massively Parallel Computation of Sti. Propagating Combustion fronts. Marc Garbey and Damien Tromeur-Dervout

Massively Parallel Computation of Sti Propagating Combustion fronts Marc Garbey and Damien Tromeur-Dervout CDCSP, LAN-UMR 5585, Universite Claude Be...
Author: Reginald Walsh
5 downloads 0 Views 2MB Size
Massively Parallel Computation of Sti Propagating Combustion fronts Marc Garbey and Damien Tromeur-Dervout CDCSP, LAN-UMR 5585, Universite Claude Bernard Lyon 1, Bat101, 43 Bd du 11 Novembre 1918, 69622 Villeurbanne cedex, France E-mail: [email protected] [email protected]

1

Keywords: Combustion, Parallelism, Adaptative Domain Decomposition, Chebyche Pseudo Spectral Method

Abstract In this paper we study the computation of combustion fronts using MIMD architecture. Our applications in gas combustion, solid combustion as well as frontal polymerization are characterized by sti fronts that propagate with nonlinear dynamics. The multiple scale phenomena under consideration lead to very intense computation. The understanding of the mechanism of instabilities is the goal of our direct numerical simulation. This paper emphasizes the special use of domain decomposition and operator splitting combined with asymptotic analytical qualitative results to obtain ecient and accurate solvers for such problems.

1 Introduction In this paper we use domain decomposition and operator splitting combined with asymptotic analytical qualitative results to obtain ecient and accurate solvers for the computation of combustion fronts using MIMD architecture. In order to illustrate these methods, we consider the two following models of combustion fronts:

 rst, a classical thermo-di usive model describing the combustion of a gas with a two-step chemical reaction [1, 21, 24, 25]. Such a model describes the appearance of cellular ames and more complex pattern formation, [2, 3]. This model has been analyzed rather intensively, but few numerical results seem to be available.

 second, a model describing the reaction process of frontal polymerization. Frontal

polymerization has been studied for several years in the former USSR [31, 23, 28, 29] to design new materials. New aspects of frontal polymerization are currently being investigated to design new materials that cannot be produced by classical processes [22]. We consider here an idealized model that couples a well known reaction diffusion system that describes solid combustion - see [20] and its references - to the Navier Stokes equations written in Boussinesq approximation. Our interest in this

The authors want to thank the Oak Ridge National Laboratory and especially J. Dongarra for access to the iPSC/860 machines and the Bergen University parallel Laboratory and especially Tor Sorevich for access to the Paragon machine. These parallel codes were developed on the Paragon of the Centre Pour le Developpement du Calcul Scienti que Parallele of Lyon sponsored by Region Rh^one Alpes 1

1

second model is to study the interaction between a mechanism of convective instability similar to Rayleigh Benard instability and a mechanism of thermal instability well known in solid combustion [20, 15]. The direct numerical simulation [18] complements our analytical work in [13, 14]. We will refer to Model A and Model B respectively as the rst and second items described above. The former problems are characterized by sti fronts that propagate with nonlinear dynamics. Understanding of the mechanism of the instabilities is the goal of our direct numerical simulation [18] [5]. The multiple scale phenomena under consideration lead to very intense computation. The aim of this paper is to describe the numerical methods and the parallel implementation. We make special use of domain decomposition and operator splitting combined with asymptotic analytical qualitative results to obtain, on parallel computers, ecient and accurate solvers [11, 27] adapted to the nature of the solution of such problems. The plan of this paper is as follows, section 2 describes the equations of Model A and Model B. The decomposition methods and the mapping used to accurately solve the problem in the thin layers of the reaction is described in section 3. The parallel implementation is studied in detail in section 4. We also give some examples of the numerical simulations of Model A and Model B. Section 5 is our conclusion. A detailed analysis of the numerical simulation for Model B will be given in the companion paper [18].

2 Basic Models

2.1 Model A: a thermodi usive model for gas combustion

We consider a thermodi usive model of a two step chemical reaction that follows the sequential reaction mechanism: A ! B ! C , where  is the stoichiometric coecient for the rst reaction. Reactant A is converted to an intermediate species B prior to being burned and converted to the nal product C. This highly simpli ed model has been used with considerable success in modeling the combustion of hydrocarbons [30]. Here, we consider the case of a cylindrical ame stabilized by a line source of fuel. The problem involves three variables| T , a scaled temperature, C1, the concentration of reactant A, and C2, the concentration of the intermediate species B, all functions of the polar coordinates r and | and corresponds to a thermodi usive model for a two-step Arrhenius reaction [25]. After a suitable nondimenzionalization, the equations satis ed by T , C1 and C2 are,

@T = T ? K @T + Da C R (T ) + (1 ?  )Da C R (T ) 1 1 1 2 2 2 @t r @r @C1 = 1 C ? K @C1 ? Da C R (T ); 1 1 1 @t L1 1 r @r @C2 = 1 C ? K @C2 + Da C R (T ) ? Da C R (T ) 1 1 1 2 2 2 @t L2 2 r @r where R1 and R2 are the nonlinear source terms: R1(T ) = exp 1 + 1 (T(T? ?T0T) ) ; R2(T ) = exp 1 + 2 (T(T? ?1)1) : 1 0 2 2

(1) (2) (3)

 is the Laplacian in polar coordinates. The parameters are L1 and L2, the Lewis numbers; Da1 and Da2, the Damkohler's numbers;  the heat release of the rst reaction; and K , a measure for the strength of the fuel injection; T0 2 [0; 1] is a (scaled) temperature close to the temperature at which the rst reaction occurs. i is proportional to the activation energy Ei of each chemical reaction. The coecients i are functions of the temperature of the unburned fresh mixture, the nal adiabatic burned temperature and T0 . The activation energy of each chemical reaction is large and consequently i for i = 1; 2 as well as at least one of the Damkohler's number are large. The derivation of this model for cartesian geometry has been done, for example, in [24, 25]. T , C1 and C2 are periodic functions of . The boundary conditions satis ed by T , C1 and C2 are T ! 0; C1 ! 1; C2 ! 0; as r ! 0; T ! 1; C1 ! 0; C2 ! 0; as r ! 1: For the computational domain we take (r; ) 2 (r0; r1)(0; 2 ), where 0 < r0 < r1 < 1; r0 is suciently small and r1 suciently large that the boundary conditions can be replaced by the Dirichlet boundary conditions, T (r0; ) = 0; C1(r0; ) = 1; C2(r0; ) = 0;

T (r1; ) = 1; C1(r1; ) = 0; C2(r1; ) = 0: Because the activation energy of the chemical reaction is large, the ame is a thin layer, and the combustion problem is of the singular perturbation type. Since we have two chemical reactions, we have two combustion fronts. Depending on the value of the parameters these two fronts may merge or be widely separated. We are mainly interested in the nonlinear interaction between these two fronts and the pattern formation of cellular ames. The asymptotic analysis of the thermodi usive model with a two step chemical reaction can be found in [21, 25]. The linear stability analysis of model (1-3) has been done in [26].

2.2 Model B: a model to describe the propagation of a reaction front in liquid

We consider the propagation of a combustion ame in a liquid with a simple chemical reaction mechanism: reactant A is converted to the nal product B. This model includes equations for the temperature and the concentration for the one-step chemical reaction, and the Navier-Stokes (NS) equations under the Boussinesq approximation. The conservation laws lead to the following equations:

@T + V:rT @t @C + V:rC @t @V + V:rV @t r:V

= T + w(T; C )

(4)

= C ? w(T; C ) = ? 1 rp +  V + g (T ? T0 ) = 0

(5) (6) (7)

Here T is the temperature, C the concentration of the reactant A, V the velocity of the medium, p the pressure,  the coecient of thermal di usion, q the adiabatic heat release, 3

 the density,  the viscosity,  the mass di usion, g the acceleration of gravity, the coecient of thermal expansion, T0 the average value of temperature, the unit vector in the vertical direction, and w(T; C ) is the reaction rate. Usually w is considered of the form

w(T; C ) = ke?E=R0T (C ); (C ) = C n; where k is the pre-exponential factor, E is the activation energy, R0 the gas constant, and n the order of the reaction. For the direct computation of (4-7), we use the Stream-Vorticity ( ? ! ) formulation of NS equations under Boussinesq approximation and we consider the reaction rate to be rst order, i.e. n = 1 in the formula for the kinetic function (C ). The system (4-7) is converted into the dimensionless system (8-11) which is solved on the domain

= [?L; L]  [0; 2]

@T + @ @T ? @ @T = T + W @t @z @x @x @z @C + @ @C ? @ @C = C ? W @t @z @x @x @z @! + @ @! ? @ @! = P ! ? RP @T @t @z @x @x @z @x  = ?!;

(8) (9) (10) (11)

where W represents the source term:

W = ZC exp( 1 + ZT (1 ? T ) ):

(12)

 The parameters are the Zeldovich number Z = RqE 2 , the Prandtl number P =  , the 0 Tb Rayleigh number R,  a dimensionless mass di usion and  = q=Tb where Tb is the adiabatic temperature, Tb = Ti + q ; Ti is the temperature of the cold product A. The boundary conditions are periodic in the x direction. The boundary conditions on the other walls are de ned according to the asymptotic behavior of the unknowns when z ! 1, i.e.

T (?L; x) = 0 C (?L; x) = 1 !(?L; x) = 0 (?L; x) = 0

T (L; x) = 1 C (L; x) = 0 !(L; x) = 0 (L; x) = 0

(13) (14) (15) (16)

Also the length of the domain 2 L has to be large enough in order to have no in uence on the dynamics of the front in the numerical computation. Our numerical scheme follows adaptively in time the combustion ame by tracking the maximum value of the heat release as in [2], and shifting the grid as well as the domain of computation = [?L; L]  [0; 2] when the front is no longer solved accurately enough. Numerical validations of our solutions include test on the sensitivity of the solution with respect to L. 4

3 Numerical method Both models we consider here have stable solutions in time which exhibit one or several thin layers. These layers are closed to a circle of radius of order K (respectively a moving straight line parallel to x direction) in model A (respectively Model B). So the numerical methods to solve these two problems follow the same approach. The main diculty in computing the solutions of our combustions problems are the zone(s) of rapid variation of the temperature and concentration of the reactant(s). More precisely, we have one (or two) transition layer(s) that correspond to the sharp combustion front: the Arrhenius nonlinear source term that models the heat release is of order one only in these layer(s); we use this criterion to numerically locate the layers. The dynamics of the combustion process is driven by these thin layers; consequently it is critical to compute them very accurately. The computation to obtain a highly accurate representation of the solution in the Transition Layer (TL) is of leading importance because the instability of the front starts in the TL and then smoothes out in the regular domain. Therefore we design our numerical method essentially to solve this diculty. Let us notice for Model B, that derivation of the corresponding interface model [14] shows that in the limit of in nite activation energy the concentration exhibits a jump at the front, the temperature is continuous, but has a jump in its rst order normal derivative to the front, and the velocity up to the third order normal derivative to the front stays continuous. Therefore the main diculty in the computation is driven by the combustion process itself and not the computation of the ow. Domain decomposition techniques for combustion problems were introduced in [10]. Multiple domain techniques and adaptivity using some mapping techniques have been intensively analyzed for spectral methods, see for example [2, 3, 4]. We use the adaptive multiple domain decomposition technique of [11], which is speci c to singular perturbation problems.

3.1 Special discretization and mapping for adaptivity

Let us describe brie y the method. We consider rst the example of a scalar equation on domain = (A; B ):

@u = @ 2u + F (u); u(A; t) = 0; u(B; t) = 1; (z; t) 2  [0; T ] (17)  @t @z 2 with some appropriate initial condition. We suppose that u exhibits a TL of  thickness located at z = S (t) 2 and we suppose that S is such that F (u(z; t)) reaches its maximum at z = S (t), with F = O(1). We assume u(:; t) 2 C 1 ( ) for all time. We recall the properties of pseudospectral approximations with Chebyshev polynomials on a single domain of computation denoted and with no adaptivity. Let h be an ane function from [?1; 1] into . In the Chebyshev pseudospectral method, u(z ) is approximated as a nite sum of Chebyshev polynomials:

PN u(z) =

N X j =0

aj Tj (h?1 (z));

where Tj (y ) = cos(j cos?1 (y )). The coecients aj are obtained by collocating the solution at the points (zj )j =0:::N such that 5

h?1 (zj ) = cos( j N ); j = 0 : : :N:

Let eN be the error: eN = u ? PN u. We put u~(h?1 (z )) = u(z ). Then the following a priori estimate takes place: j e j C t k u~ kp ; N

N p?1

where j  j is the maximum norm, p is an integer and C t is a real number depending on the choice of the Sobolev norm k kp . Since p is not speci ed the error seems to be smaller than any inverse power of N . However, C t k u~ kp grows as p increases according to the smoothness of u. In practice for sti problems, which we consider here, the pseudospectral method is error prone and subject to spurious oscillations. To avoid this diculty we introduce an adaptive grid. It was shown in [2, 11] that an ecient and accurate way to solve a TL is to use two subdomains with an interface located at the front and a mapping that concentrates the collocation points at the end of the subdomains in the layer. To be more speci c, we introduce two one-parameter families of mappings. Let Snum S be a numerical approximation of S (t). First we decompose the interval into [A; Snum ] [Snum ; B ] and introduce a one-parameter family of ane mappings that corresponds to the domain decomposition:

g1 : [?1; 1] ?! 1 = [A; Snum] y ?! z = g1(y; Snum ) and g2 : [?1; 1] ?! 2 = [Snum ; B] y ?! z = g2 (y; Snum ) Now the solution u of (17) restricted to one of these subdomains exhibits a Boundary Layer (BL) in the neighborhood of Snum . Therefore we use a second family of mapping of BL type:

fi : [?1; 1] ?! [?1; 1] s ?! y = fi(s; ) with

  fi(s; ) =  4 tan?1 ( tan( 4 (s ? 1))) + 1 ; i = 1; 2 Here is a small free parameter that determines the concentration of the collocation points in the physical space. i = 1 and + (respectively. i = 2 and ?) corresponds to a

boundary layer at the right end of the interval (respectively. the left end). Other mappings can also be considered. In singular perturbations, one uses stretching variables of the form:  = (z ? S )=, where S is the location of the layer and " is a measure of the stretching. It is easy to see that the parameter in the non-linear mappings f1 and f2 plays a role similar to  in the numerical method. The unknown function u on the interval is approximated by a continuous piecewise polynomial PN;i ; i = 1; 2 with the condition of continuity of its rst derivative at the interface. In the numerical computation of the problem (17), we nd the maximum of the function F (u(z; t)) and adapt Snum in time, depending on the speed of propagation of the front. We have observed in [11] that the best choice for the parameter is asymptotically  p. One can eventually optimize the choice of the stretching parameter by means of an a priori estimates as in [3]. 6

3.2 Adaptive domain decomposition method:

We rst consider the case of two subdomains. Let us consider a semi-implicit Euler scheme for the time marching:

un+1 ? un = D2 un+1 + F (un); z 2 ; i = 1; 2; (18)  i t where t is a constant time step, and D is the operator of di erentiation. We take the

second order derivative term implicitly because it gives the main constraint on the time step. Since the nonlinear term is taken explicitly, un+1 can be found as a solution of the linear system ~ n+1 = un + t F (un ): Du (19) with a matrix D~ invariant in time as long as the mapping parameters do not change. Let Di2 be the matrix of spectral di erentiation of second order on subdomain i. Because of the domain decomposition, the matrix D~ , the solution u~ and the RHS F~ , of equation (19) have the following block structure:

1 0 A ^ D~ = @ ~t ~t A ; 1

^ A2

0 u 1 u = @ uinterface A ; (1)

u

(2)

0F 1 F~ = @ 0 A (1)

F

(2)

(20)

where

 u k (respectively F k ) is the restriction of u (respectively F~ ) inside the subdomains ( )

( )

k ,

 Matrices A and A correspond to the (N ? 1)  (N ? 1) sub-matrices of the N  N matrix Di without the coecients of matrix that operate on the interface point  ^, ^ are vectors with N ? 1 components. These vectors correspond to the coecients of matrix Di that operate on the interface point and belong to the rows associated 2

1

2

2

with the internal points of the subdomains,  (~ t; ~t) are vectors with N ? 1 components and is a real number.

The row (~ t ; ; ~t) operates on the interface point and applies the condition of continuity of the rst derivative at the interface. The solution is obtained in two steps:

 Write the internal unknowns with respect to the interface solution and solve the interface problem:

uinterface = ~t A?1 1 F (1) + ~tA?2 1 F (2) where  = ~ tA?1 1 ^ ? + ~t A?2 1 ^ (21)

 Then the solution on each subdomain can be computed independently ^ interface ) u(1) = A?1 1 (F (1) ? ^ uinterface ) and u(2) = A?2 1 (F (2) ? u 7

(22)

The A?i 1 ; i = 1; 2 matrices are computed with the gaussian elimination algorithm. The Ai are full matrices so their inverses are stored in memory without any unnecessary overload of memory requirement. These matrices are updated in time only if the mapping parameters have to be updated. In particular these matrices, as well as the scalar  , are unchanged when one shifts the computational grid to track adaptively a propagating front: this is often the case since the thickness of the layer does not change in time for model B. In such a case, the right hand side computation of (21) and the inversion of local systems (22) need only matrix-vector multiplies provided by the BLAS library. The adaptive domain decomposition method with two subdomains technique can be generalized to an arbitrary number of subdomains. To balance the amount of work inside each subdomain we keep the same number N of Chebyshev points in each subdomain. We recall that the choice of the number of subdomains is dictated by the number of transition layers. in particular it is not best to increase the number of subdomains based only on the criterion of the parallel algorithm eciency because for a xed total amount of work, the accuracy of the method may deteriorate as the number of subdomains increases. The general construction of D~ is described for example in [11]. The choice of the interface position is critical and is discussed in [11, 12, 17]. The computation is easily amenable to the resolution of a tridiagonal system of dimension nd ? 1 that gives the interface unknowns followed by the resolution of nd decoupled linear systems of order N unknowns that gives the unknowns inside each subdomains. Since most of the time in the time marching scheme is spent in building and solving these nd linear systems, this domain decomposition technique is valuable for parallel computation. We demonstrate this eciency in the next section. However it constitutes a rst level of rough parallelism that is a priori not scalable because the number of subdomains should depend strongly on the number of layers.

3.3 Fine grid parallelism for two dimensional space problem

To move toward a complementary second level of parallelism that exploits a ne grid level, we have to consider the generalization of our previous example (17) to two dimensional space:

@u = @ 2u + @ 2u + F (u); u(?1; x) = 0; u(1; x) = 1; z 2 (?1; 1); x 2 (0; 2)(23)  @t @z 2 @x2 with some appropriate initial condition. In addition we look for a periodic function u(z; x) in x. Most of the following discussion can be generalized to three space dimensions as well. We assume now that the TL of the problem (23) depends on x weakly. This

hypothesis signi cantly simpli es the technique of the domain decomposition method since we can consider strip subdomains. However orthogonal mapping techniques in two space dimensions [8] might be used to solve the problems when the front is smooth but not close to a straight line. At this point we adopt two critical choices for the approximation of the space derivatives with respect to the periodic variable x:

 rst we use a scheme that is explicit with respect to: @@x2u2 ,  second the derivatives with respect to the periodic variable are computed by high order nite di erence formulas.

8

Our choices are motivated by the special structure of the solution of our combustion problems as described below. In the x-direction,2 we use a central nite-di erence scheme of @ 2 , on a uniform grid with a step h = 2 . order 6 for Dx2 , the discrete approximation of @x Nx ?1 In theory, for regular problems where   1, the numerical accuracy of our method should be limited to the sixth order by the nite-di erence approximation in the x- direction. Since the front is assumed to be sti in the z -direction but not sti in the x-direction, we can keep the numerical error of the nite-di erence approximation of @x@ 22 smaller than the numerical error of the approximation of @z@ 22 with the pseudospectral discretization [11, 2]. Therefore it is not necessary to use a better approximation of @x@ 22 such as for example the @ 2 u2 is Fourier approximation. The nite di erence approximation of order 6 of the term @x treated explicitly. We observe that the spectral radius of Dx2 , is asymptotically smaller with the sixth order nite di erences than with the Fourier approximation. So in our computations the time step constraint due to the explicit treatment of the second order derivative is of the same order as the time step restriction due to the explicit treatment of the source terms, and of the same order as the accuracy requirement connected with the physics of the process. Typically, we use the range (0:2 10?3 ; 5 10?3 ) for the time step and 32  Nxg  256 for the space step. The numerical algorithm can exploit the ne grid parallelism due to the explicit dependence on x because the computation of Dx2 un is ful lled with an explicit formula which includes only local values of the function un . It is di erent for the "global" pseudospectral polynomial explicit approximation in the x-direction which is less appropriate for the parallel computing and/or for the use of the cache memory on the RISC architecture. Finally, we observe that the splitting of the operator in (23) replicates the procedure of the asymptotic expansion in the shock layer of an analytical study, since the dependence on x occurs as a second order expansion term [9]. Generally speaking it is possible to derive new parallel algorithms that exploit systematically this asymptotic feature [17]. However for steady problems analogous to (23), i.e:

@ 2u + @ 2 u = G (x; z); u(?1; x) = 0; u(1; x) = 1; z 2 (?1; 1); x 2 (0; 2) (24)  @z 2 @x2

one should use a di erent approach that is implicit with respect to both space variables. We look for u as a discrete Fourier expansion:

u(z; x) = k= ?Nxg ; Nxg ?1 u^k (z) eikx : 2

2

The functions u^k can be found from the system of Nxg independent ordinary di erential equations: @ 2 u^ ? k2 u^ = F^ ; k = ?Nxg ; Nxg ? 1: k k @z 2 k 2 2 These equations are then solved with the same one dimensional domain decomposition method as described above. This algorithm has been applied to the stream function equation in Model B and can be extended in a straightforward way to implicit schemes for the two dimensional di usion term in the unsteady equations for the other unknowns. However the cost of this implicit algorithm is no longer linear with respect to Nxg , due to the Fourier transform. Also the matrices must be transposed and the parallel implementation of the scheme should carefully overlap communication by computation using some appropriate partitioning of the data. 9

3.4 Parallelism based on the splitting of the system of PDE's

A third level of parallelism is obtained for combustion problems that are modeled by a system of PDE's. Let us consider Model A. We observe that these three equations are weakly coupled in space because they are coupled through the Arrhenius source terms

Dai CiRi (T ); i = 1; 2 that are exponentially small except in the combustion layers; so the third level of parallelism is to split the three equations into three parallel subsets of computations matched to three subsets of nodes. In addition, we can improve the parallel eciency of the scheme by restricting the communications to only the non exponentially small values of the Arrhenius source terms. Also special care should be taken in the pre-heat zone: the small value of the heat release in this part of the domain must not be neglected in order to preserve an accurate location of the ame front.

4 Parallel implementation and results This section gives the parallel implementation details and results for Model A and Model B for the following levels of parallelism.  (a) splitting in the space variables, adapted to the frontal structure of our problems.  (b) domain decomposition to solve a Transition Layer (TL) with the pseudospectral method.  (c) splitting of the equations that are weakly coupled by the Arrhenius terms outside the TL. To measure how well the algorithm or the implementation are designed for the parallel computer target, one de nes the eciency of a parallel implementation. Its de nition is:

Efficiency = (Elapsed Time on 1 Processor)=(P  Elapsed Time on P Processors) On one processor this eciency is obviously 100% but on P processors because of com-

munications and sequential bottle necks in the algorithm, this eciency is generally less than 100%. The goal of a well suited implementation is to reach an eciency as close as possible to 100%. To study the implementation details and results of these di erent levels, let us introduce some de nitions. Let Nxg be the number of points in the x direction where nite di erences are used, Nx the number of nite di erences points per processor, nd the number of subdomains in the z direction where spectral discretization is performed, Nz the number of Chebyshev points in each subdomain and Nzg = nd  Nz .

4.1 Parallel implementations of level (a) and (b)

The distributed memory multiprocessor computer targets used were the Intel iPSC/860 and Intel Paragon Multiple Instructions Multiple Data (MIMD) computers. Each processor gets its own memory and is connected to the others processors through a communication network. The data required in the computation, that are not localized in the processor's memory must be sent by another processor and received in a bu er through 10

the network. These communications have a cost that depends on the number and the size of messages [7]. Ecient algorithms on parallel computer often try to minimize these communication costs. On parallel MIMD computers two types of communication between processors are possible, blocking and non blocking communications. With blocking communication, the process on a processor waits until the message send or receive instructions are performed. With non-blocking communication, the process continues after the send or receive instructions and a message check must be performed before using the data in the bu er in order to make sure that the communication is complete. Non-blocking communication permits us to take full advantage of the parallel computer by overlapping the communication with some computation. The implementation details of level (a) and (b) of parallelism are the following:

 The level (a) of parallelism consists of mapping a ring of processors to a strip domain decomposition that is periodic in the x direction; each processor has in its own local memory one Nx Nzg matrix per unknown. Since we use sixth order central nite di erences to compute the derivatives with respect to x, each processors has to received two 3  Nzg matrices from its two neighbors to performed all nite di erence

formulas. Our code is written in Fortran, and we improve signi cantly the eciency of the message passing by accessing by column all arrays that are involved in message send and receive. In consequence each unknown is stored as a Nx  Nzg matrix rather than a Nzg  Nx matrix. One can reduce the length of the messages that have to be exchanged between neighbor's processors down to two vectors of size Nzg by using one sided sixth order nite di erence formulas to compute the derivatives. However it is not really an issue for two dimensional problems and our target machine, since the cost of communication is mostly dominated by the start up cost to establish a message path and relatively insensitive to the message length up to Nzg = 72.

 The level (b) of parallelism consists of splitting the computational domain in the z direction into non-overlapping domains according to the number of transition layers. At each time step, we have to solve successively Nx small linear interface problems of size nd ? 1 and Nx local linear problems of size Nz . Each processor computes independently of others, its contributions to the right hand side of the linear interface problems. Since the number of subdomains in the z direction is very small and the startup cost of communication high, we found it best to solve these interface problems redundantly by all processors. Since the matrices of the local problems are invariant with respect to translation of the spatial grid, they are inverted once for all, or more precisely each time that the mappings change which is not very often in practice. The local problems are then solved in parallel at each time step with a simple matrix multiply. This matrix multiply is obviously an order of magnitude more time consuming than the solution of the interface problems. So the sequential bottle neck of the algorithm is very small compare to the parallel part of the algorithm.

4.2 Results on thermo-di usive model The numerical method to solve (17) is applied to our system of non linear PDE's (1-3) with uni = (Tin ; C1n;i; C2n;i). In addition to the Euler scheme, we have used a predictorcorrector scheme of order two for the time marching scheme, but this does not add any 11

diculties from the point of view of getting a parallel algorithm. These results are an extension of [6] with the three levels (a),(b),(c), of parallelism. Let us rst recall brie y the performance of strategy (a): we map our cylindrical ame to a ring of processors and the cost of the algorithm is linear with respect to the number of points used to discretize the angle dependence. Timings of the strategies (a) for the iPSC/860 and Paragon computers are given in Table 1. These results are obtained for a small number of Chebyshev points Nz = 24 and for nd = 2 and Nxg varying from 32 to 256. "-" entries in the table mean that the problem was too large for the memory size of the target machine or the number of discrete angles Nxg was too small for the number of processors of the ring. number of processors

1

32 64 128 256

38.29 76.02 152.07 -

32 64 128 256

26.91 53.62 107.08 -

Nxg

Nxg

2 4 8 16 32 64 Timing for the Intel iPSC/860 22.64 11.80 6.19 44.36 22.66 11.78 6.21 85.45 44.36 22.66 11.82 6.22 175.00 88.46 44.36 22.76 11.83 6.21 Timing for the Intel Paragon 16.54 8.94 5.04 31.93 16.68 9.00 5.05 62.53 32.12 16.72 9.03 125.01 63.18 32.26 16.78

Table 1: Timing results in seconds for the Intel iPSC/860 with up to 64 processors and for the Intel Paragon with up to 16 processors to perform 100 iterations We notice that speedups are very good, even for a small number of angles per processor and that the algorithm scales very well, i.e. when increasing the number of processors by a factor of two as well as Nx, the execution time stays roughly the same. Timing results on the Paragon machine are 20% better than on the iPSC/860. This improvement in the performance is essentially a consequence of the fact that the i860 XP processor of the Paragon is 25% faster than the i860 XR of the iPSC/860. On the Paragon machine, communication bandwidth is ve times bigger than on the iPSC/860 (12:8Mb/s compared to 2:5Mb/s). However, as our communications are fully overlapped, with the non-blocking communication strategy, we only measure the parallel computation time. Figure 1 shows, for the rst Model A, the e ect on eciency of the three following levels of parallelism. For the rst model with two space dimensions and a discretization with about 3  104 degrees of freedom, we obtain an eciency of:  87 per cent with 32 nodes using (a)  70 per cent with 96 nodes using (a) and (b)  at least 60 per cent with 192 nodes using (a), (b) and (c). for this last case, our code is running 145 times faster than the original optimized sequential code on a HP720 workstation and 15 times faster than a fully vectorized code running on a CRAY YMP that provides the same level of accuracy. We observe that the combination of the di erent level of parallelism (a), (b) and (c) allows as to use eciently a large number of processors for a rather small problem, i.e 12

3  104 degrees of freedom: this is a very nice result because we do not need to solve a larger problem to capture the bifurcations but we rather need to decrease the elapsed time of the computation for a xed size of the problem and a given set of parameters (L1 ; L2; Da1; Da2; :::) in order to investigate a large physical parameter space. Figures 3, 4, 5, show some examples of cellular ames computed with this method for the case L1 = 0:26 and L2 = 0:9 which correspond to a 4-cell Traveling wave with a single front structure. The accuracy of the solution was tested with various grids and numbers of grid points. This solution obtained with two subdomains and boundary layer mapping strategy was also compared with a single domain code that uses a transition layer mapping [11]. The accuracy of our computation in the maximum norm with 12000 degrees of freedom is of order 10?4 . This accuracy is analogous to the accuracy of the solution given by the sequential code.

4.3 Results on the reaction process of frontal polymerization

We present here the result of strategy (a) and (b) applied to Model B. The implementation of the numerical method is very similar to the previous case in Sect 4.2, except that the stream function is now solved using a discrete Fourier expansion for the periodic variable. However this di erence is signi cant because the solution of the stream function (i.e equation (11) ) represents the most costly part of the elapsed time. We note that:

 The computation cost for is proportional to Nxg  Nxg while the cost of the other unknowns T; C; ! is proportional to Nxg .  The communications time is large because the parallelism with respect to the modes

require the global transposition of matrices partitioned on a two dimensional network of processors.

Figure 2 gives the elapsed time to compute the solution (T; C; !; ) for a xed number of time steps as a function of Nx . The curve '-' refers to the total elapsed time for the code. The curve '*' gives the time spent to compute the stream function . The time to compute C or ! is almost identical to the time spent to compute the temperature T and given by the last curves '+,o,x'. In order to overlap the communications with computations, non blocking communications are performed: we found an easy and ecient way to split the computation of the stream function solution into two steps (see 4 and 8 in Table 2), in order to alternate the communications (see 1, 5, 8 in Table 2), with some independent computations (see 2, 6, 9 in Table 2). In the following, we map a grid of P = Px  Pz processors to the decomposition of the domain of computation into (Px = Nxg =Nx)  Pz subdomains. Pz is equal to one (respectively nd) for the level (a) (respectively (b)) of parallelism. When P is one, we use the sequential version of the code that has no communications calls.

4.3.1 Study of the level (a) of parallelism with respect to Nxg Table 3 gives the parallel eciency with respect to the number of processors and the values of Nxg = f32; 64; 128; 256g. Table 3 points out that: 13

1. 2. 3. 4. 5. 6. 7.

Communicate boundary unknowns in x direction.

Compute Source Term then wait for step 1 receive message done. Compute derivatives of !; T; C; : Compute Vorticity Solution. Gather the Nxg values of vorticity for each z xed coordinate.

Compute Temperature then wait for step 5 receive message done. Compute NxNz second members !^k (z), then solve the Nxg Nz Stream function Fourier modes equations.

8. Gather all the Stream-function Fourier modes. 9. Compute Concentration then wait for step 8 receive message done. 10. Build Stream function. Table 2: Algorithm steps for each iteration in time

 We obtain an eciency of order 90% for the frontal polymerization code except for

the last columns. The rapid degradation of the performance for large number of nodes is mostly the e ect of the memory limitation of the systems that we have used. We have checked carefully that in the cases (P = 64; Nxg = 256; Nz = 49) and (P = 32; Nxg = 256; Nz = 59) the memory swaps. The extra-storage to gather the vorticity to compute the Fourier Modes RHS increases with the number of processors and the code overpass the RAM memory available in these cases. Therefore the non-blocking messages that require more memory than blocking message become less ecient.  This eciency seems to scale i.e when we double the number Nx of points and the number of processors, the eciency stays relatively constant (Except in the cases (P = 64; Nxg = 256; Nz = 49) and (P = 32; Nxg = 256; Nz = 59) for the reasons number of processors

1

2

4

32 64 128 256

100 100 100 100

96.3 97.1 98.7 99.1

91.0 96.6 97.1 98.5

32 64 128 256

100 100 100 100

95.9 97.0 97.9 98.4

91.6 96.5 96.7 98.3

Nxg

Nxg

8 16 Nz = 49 81.7 92.0 82.3 94.7 91.9 97.5 94.9 Nz = 59 83.7 92.7 84.05 94.6 92.1 96.5 93.4

32

64

82.6 88.8 74.6 82.7 82.2 72.6

Table 3: Parallel Eciency (%) for 100 iterations and nd = 2 14

mentioned above).

 For a xed number of processors the eciency increases when Nxg grows. It can be

explained in the following way: rst the measured communication time is almost linear with respect to Nxg but the computation time is dominated by the computation of the stream function which is quadratic with respect to Nxg . So the ratio between communication time and computation decreases when Nxg increases. Second the non blocking messages are better hidden in time by the computation when this ratio decreases.

We conclude that the level (a) of parallelism provides a high and scalable eciency that grows with respect to the Nxg value.

4.3.2 Study of the level (b) of parallelism with respect to nd 1 processor 43.11 63.51 84.28 104.67 125.41 100 100 100 100 100

Px processors 1 2 4 8 nd = Pz processors Elapsed time to perform 100 time steps 2 3 4 5 6 nd = Pz processors 2 3 4 5 6

22.60 23.11 23.28 23.50 23.73

11.51 11.85 12.00 12.15 12.44

95.4 91.6 90.5 89.1 88.1

93.6 89.3 87.8 86.1 84.0

6.20 6.42 6.52 6.70 6.89 Eciency 86.9 82.4 80.8 75.9 75.8

3.67 3.89 4.02 4.18 4.35 73.4 68.0 65.5 62.6 60.1

Table 4: Time and Eciency (%) with respect to nd Table 4 gives the level (b) of parallelism with respect to the number of subdomains nd. We suppose a xed number of time steps and a xed number of unknowns per subdomain, that are 100 time steps and (Nz = 30)  (Nxg = 64) unknowns. In order to give a fair measure of the eciency of our algorithm we give the average performance obtained after several runs and Nz  Nx is chosen to be less than what we use in our simulation. The left column represents the run on one processor. The others columns represent the runs on Pz  Px processors. Table 4 points out the good extensibility of the method with respect to the number of subdomains i.e. the elapsed time stays relatively constant when we impose Px and when nd varies and equals Pz . The level (b) of parallelism provides an extensible algorithm with respect to nd with a quite good eciency. However the optimum number of subdomains to solve a given front is not necessarily more than two. To be more speci c, it is unclear that we can keep the same level of accuracy for a xed total amount of work, when increasing the number of subdomains. Our experience with propagating fronts shows on the contrary that two domains to solve a transition layer with boundary layer mapping is best. 15

4.3.3 Study of the level (b) of parallelism with respect of Nz 1 processor number of processors 26.57 43.11 59.83 81.48 102.62 100 100 100 100 100

Nz 20 30 40 50 60

Nz 20 30 40 50 60

2 4 8 16 Elapsed time to perform 100 time steps 13.89 7.19 3.98 2.42 22.61 11.51 6.20 3.67 30.96 15.77 8.29 4.78 42.54 21.53 11.22 6.26 52.58 26.49 13.81 7.67 Eciency 95.6 92.4 83.4 68.6 95.3 93.6 86.9 73.4 96.6 94.8 90.2 75.8 95.8 94.6 90.8 81.3 97.5 96.8 92.9 83.6

Table 5: Time and Eciency (%) with respect to Nz Table 5 studies the level (b) of parallelism with respect of the number of Chebyshev points Nz used in each subdomains. The number of time steps as well as the number of subdomains is constant: we suppose 100 time steps with nd = 2 and Nxg = 64. The left column represents the run on one processor. The others columns represent the runs on (Pz = 2)  Px processors. Table 5 points out that:

 The elapsed time grows almost linearly with respect to Nz . The slope of these lines decreases as P increases. This result is somehow surprising, since we have used

a matrix vector approach for the pseudospectral method. On the other hand, the vectorization of the computation on each node performs very well.

 For a xed number of processors, the eciency increases with Nz as one should expect.

 The eciency decreases with respect to Px. This lost of eciency is less important for the high Nz values. The performance deteriorates signi cantly in the last column because the ratio of computation with respect to communication is too low with

Nx = 8.

4.3.4 Some comments about level (c) of parallelism for Model B

In Model A, the computational cost of C1, C2 ,and T is almost exactly the same for each unknown. this allows us to split the computation of (C1; C2; T ) into three identical subnetworks of processors with a well balanced amount of work. We can no longer apply the same approach to Model B, because the computation of the stream function takes much longer than the computation of the other unknowns i.e T , C , and ! . However for 2 @ g large problems, let us say Nx > 128, the explicit treatment of the derivative: x2 in Model B introduces a constraint on the time step that might be stronger than the accuracy requirement in time that we have. Therefore a further improvement of the method for 16

large Nxg is to implicit the computation for each unknown in both space directions.We could apply in a straightforward way the same algorithm to compute the stream function and the other unknowns. We are then back to the situation of Model A and we can divide the computation into four balanced parallel computation on four subnetworks of processors. This method is currently implemented. Let us mention that extension of Model A and Model B to more complex chemistry will give us more unknowns to compute and possibly will increase the potential of such parallelism.

4.3.5 Example of computed solutions We would like to end this section by mentioning that several very interesting phenomena have been investigated thanks to the eciency of our parallel code. Let us xed the Prandtl number to be one, and the dimensionless mass di usion  to be 0:005. It is known that the model problem with zero Rayleigh number has stable traveling wave solutions when the Zeldovich number is less than a critical value Zc . Above this value of the Zeldovich number, we can have pulsating fronts and/or spinning modes. We have done two types of computations:

 First we have studied instabilities that appears for values of the Zeldovich number Z signi cantly less than the critical value Zc when we let the Rayleigh number R increasing or decreasing. Figure 6 is a typical cellular instability pattern of a frontal polymerization when Z = 6 and R = 20; the results show that the frontal

structure is far from an ordinary layer: more precisely the chemical reactor is mainly active at one hot spot sustained by the convection of the fresh reactant. This instability is somehow similar to Rayleigh Benard instability since the exothermic chemical reaction heats the fresh reactant from below. However cellular instabilities are obtained also for descending fronts. Figures 7-8-9 show such a solution. both phenomena were predicted by the asymptotic analysis see [14].

 Second, we have studied the e ect of convection on the formation of spinning modes. So we have started to compute spinning modes with Z above Zc and R = 0, and let then the Rayleigh number increased or decreased. Figures 10 and 11 show such a solution. This simulation con rms the stabilizing or destabilizing e ect of the gravity described in [14] depending on the direction of propagation of the front.

We have tested our computation with various small  and obtained similar results. However, we can have spurious oscillations on the concentration pro le when  is less than 10?3 that eventually disappears when Nz is large enough. But then the conditioning of the matrices deteriorates signi cantly. So there is a lower limit on the variation of  in practice that depends strongly on the arithmetic accuracy of the computer. Finally, the main observations that we have done in all these computations is that even when the parameter is far from the rst bifurcation points reached, the combustion fronts (say the level set C = 0:5) stays remarkably close to a straight line parallel to x direction, and the sti ness of the front is dominant in the direction of propagation z. Since our mapping strategy is relatively robust to perturbation [2], we have a posteriori a justi cation of our methodology. 17

5 Conclusion In conclusion we have been able to use MIMD architecture with high eciency and/or a large number of nodes to compute non-linear propagating phenomena such as combustion fronts. In our two dimensional computation the size of the discretization problems were dictated by the accuracy requirement as well as the ill conditioned nature of the operators. However we were able to use up to 192 nodes on a Paragon system for such small problems and reduce the computing time dramatically. Our technique relies on combining complementary levels of parallelism such as domain decomposition or operator splitting according to the asymptotic property of the mathematical models. Let us mention that the numerical simulations provided with this method, have been compared against the linear and/or weakly non linear stability analysis for our two model problems and showed very good numerical agreement [13, 14, 15, 16, 5] .

18

References [1] A.P. Aldushin,& S.G. Kasparyan, Doklady Akademii Nauk SSSR, 244 (5), pp. 67-70, (1979) (in Russian). [2] A. Bayliss,M. Garbey ,B.J. Matkowsky, Adaptive Pseudo-Spectral Domain Decomposition Method and the approximation of multiple layers, J. Comput. Phys., 119, pp. 132-141, (1995). [3] A. Bayliss,D. Gottlieb,B.J. Matkowsky, M. Minko , An adaptive PseudoSpectral Method for Reaction Di usion Problems, J. Comput. Phys., 81, pp. 421-443, (1989). [4] A. Bayliss & B.J. Matkowsky, Two routes to chaos in condensed phase combustion, SIAM J. Appl. Math., 50, pp. 437-459, (1990). [5] G. Bowden, M. Garbey, V. M. Ilyashenko, J. Pojman, S. Solovyov, A. Taik et V. Volpert, The e ect of Convection on a Propagating Front with a solid product: Comparison of Theory and Experiments, to appear in Journal of Chemical Physics. [6] F. Desprez & M. Garbey, Numerical Simulation of a Combustion Problem on a Paragon Machine, Parallel Computing, 21, pp.495-508, (1995). [7] T.H. Dunigan, Early experiences and performance of the Intel Paragon, ORNL/TM-12194, (1994). [8] R. Duraiswami & A. Prosperetti, Orthogonal mapping in two dimensions, J. Comput. Phys., 98, pp. 254-268, (1992). [9] W. Eckhaus, Asymptotic Analysis of Singular Perturbations, North-Holland, Amsterdam, 1979. [10] U.Ehrenstein, H.Guillard, R. Peyret, Flame computations by Chebychev multidomain method, International Journal for Numerical Methods in Fluids, 9, pp. 499-515, (1989). [11] M. Garbey, Domain Decomposition to solve Transition Layers and Asymptotic, SIAM J. Sci.Comput., 15 (4), pp. 866-891, (1994). [12] M. Garbey, A schwarz alternating procedure for singular perturbation problem, SIAM J. Sci.Comput., Vol 17-5,pp. 1175-1201, (1996). [13] M. Garbey, A. Taik,V. Volpert, In uence of natural convection on stability of reaction fronts in liquid, Quart. Appl. Math, No 2, pp225-247, 96. [14] M. Garbey,A. Taik,V. Volpert, Linear stability analysis of reaction fronts in liquid, Quart. Appl. Math, to appear. [15] M. Garbey, H.G. Kaper,G.K. Leaf, B.J. Matkowsky, Using Maple for the Analysis of Bifurcation Phenomena in Condensed Phase Surface Combustion, J. of Symbolic Comp., 12, pp. 89-113, (1991). 19

[16] M. Garbey, H.G. Kaper,G.K. Leaf, B.J. Matkowsky, Linear Stability Analysis of Cylindrical Flames, Quart. Appl. Math, 47, pp. 691-704, (1989). [17] M. Garbey & H.G. Kaper, Heterogeneous Domain Decomposition for Singularly Perturbed Elliptic Boundary Value Problems, Preprint MCS-P510-0495 Argonne Nat. Lab. to appear in SIAM Num. Anal. [18] M. Garbey and V. Volpert, Asymptotic and Numerical Computation of Polymerization Flows, Preprint No 200 - URA 740, 95, submitted. [19] D. Gottlieb & R.S. Hirsh, Parallel Pseudospectral Domain Decomposition Techniques, ICASE Report, (1986). [20] S.B. Margolis,H.G. Kaper,G.K. Leaf, B.J. Matkowsky, Bifurcation of pulsating and spinning reaction fronts in condensed two-phase combustion, Comb. Sci. and Tech., 43, pp. 127-165, (1985). [21] S.B. Margolis & B.J. Matkowsky, Steady and Pulsating Modes of Sequential Flame Propagation, Comb. Sci. Tech., 27, pp. 193-213, (1982). [22] I.P. Nagy, L. Sike, J.A. Pojman, Thermochromic Composite Prepared via a Propagating Polymerization Front, J. of the American Chemical Society, 117, (1995). [23] B.V. Novozhilov, Proc. Academy Sci. USSR, Phys. Chem. Sect. 141, pp. 836-838, (1961). [24] J.Pelaez & A Linan,Structure and Stability of Flames with two sequential reactions, SIAM J. Appli. Math, 45 (4), (1985). [25] J. Pelaez, Stability of Premixed Flames with two Thin Reactions Layers, SIAM J. Appl. Math., 47, pp. 781-799, (1987). [26] A. Taik, Modelisation et analyse asymptotique des fronts de reaction, These de l'Universite de Lyon1-Claude Bernard, (1995). [27] F.X Roux & D. Tromeur-Dervout, Parallelization of a multigrid solver via a domain decomposition method, Proceeding of the 7th conference on Domain Decomposition Methods, Contemporary Mathematics 180, pp. 439-444, (1994). [28] K.G. Shkadinskii, B.I. Haikin, A.G. Merzhanov, Combustion, Explosion, and Shock Waves, 7, pp. 15-22, (1971). [29] V.A. Volpert,V.A. Volpert, S.P. Davtyan,I.N. Megrabova, N.F. Surkov, Two dimensional combustion modes in condensed ow, SIAM J. Appl. Math., 52 (2), pp. 368-383, (1992). [30] C.K. Westbrook & F.L. Dryer, Simpli ed Reaction mechanism for the oxidation of hydrocarbon fuels in ames, Comb. Sci. Tech., 27, pp. 31-43, (1981). [31] Ya.B. Zeldovich & D.A. Frank-Kamenetsky, Zh. Fiz. Khim., 12 (100), (1938) (in Russian). 20

List of Tables 1 2 3 4 5

Timing results in seconds for the Intel iPSC/860 with up to 64 processors and for the Intel Paragon with up to 16 processors to perform 100 iterations Algorithm steps for each iteration in time . . . . . . . . . . . . . . . . . . Parallel Eciency (%) for 100 iterations and nd = 2 . . . . . . . . . . . . Time and Eciency (%) with respect to nd . . . . . . . . . . . . . . . . . Time and Eciency (%) with respect to Nz . . . . . . . . . . . . . . . . .

21

12 14 14 15 16

List of Figures

E ect on eciency of the levels (a)=*, (a+b)=+ and (a+b+c)=o of parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Times of: total code=- , Stream Function=*,and each ! =+,T =o ,and C =x (3 lines) solutions with respect of Nxg . . . . . . . . . . . . . . . . . . . . . 3 Concentration of the rst species with a 4 cells traveling wave. . . . . . . 4 Concentration of the second species with a 4 cells traveling wave. . . . . 5 Temperature with a 4 cells traveling wave. . . . . . . . . . . . . . . . . . 6 Cellular instability pattern of a frontal polymerisation , Zeldovich=6.5 Rayleigh=20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Cellular instability pattern of a frontal polymerisation , Zeldovich=6.5 Rayleigh=-17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Temperature Pro le, Zeldovich=6.5 Rayleigh=-17 . . . . . . . . . . . . . . 9 Concentration Pro le, Zeldovich=6.5 Rayleigh=-17 . . . . . . . . . . . . . 10 Evolution of the hot spot with respect of time . . . . . . . . . . . . . . . 11 Evolution of the vorticity contour lines with respect of time . . . . . . . . 1

22

24 24 25 25 26 27 27 28 28 29 30

Contents

1 Introduction 2 Basic Models

1 2

3 Numerical method

5

2.1 Model A: a thermodi usive model for gas combustion . . . . . . . . . . . . 2.2 Model B: a model to describe the propagation of a reaction front in liquid 3.1 3.2 3.3 3.4

Special discretization and mapping for adaptivity . . . . . Adaptive domain decomposition method: . . . . . . . . . Fine grid parallelism for two dimensional space problem . Parallelism based on the splitting of the system of PDE's

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

2 3

5 7 8 10

4 Parallel implementation and results 10 4.1 Parallel implementations of level (a) and (b) . . . . . . . . . . . . . . . . 10 4.2 Results on thermo-di usive model . . . . . . . . . . . . . . . . . . 4.3 Results on the reaction process of frontal polymerization . . . . . . 4.3.1 Study of the level (a) of parallelism with respect to Nxg . . 4.3.2 Study of the level (b) of parallelism with respect to nd . . 4.3.3 Study of the level (b) of parallelism with respect of Nz . . 4.3.4 Some comments about level (c) of parallelism for Model B . 4.3.5 Example of computed solutions . . . . . . . . . . . . . . . .

5 Conclusion

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

11 13 13 15 16 16 17

18

23

Three different levels of parallelism 200 180

o Levels (a)+(b)+(c)

160 + Levels (a)+(b) 140

speed up

120

* Level (a)

100 80 60 40 20 0 0

20

40

60

80 100 120 processor number

140

160

180

200

Figure 1: E ect on eciency of the levels (a)=*, (a+b)=+ and (a+b+c)=o of parallelism Nz=49 nd=2, P=8, 100 iterations 80 − Total Code 70 * Stream Function solution 60

Elapsed Time (s)

o Temperature Solution 50 x Concentration Solution 40 + Vorticity Solution 30

20

10

0 0

50

100

150 Nx global

200

250

Figure 2: Times of: total code=- , Stream Function=*,and each ! =+,T =o ,and C =x (3 lines) solutions with respect of Nxg . 24

Concentration of the first species (L1=0.26; L2=0.9)

50

1 0

0.8 0.6 0.4 0.2 0

-50 -50

-40

-30

-20

-10

0

10

20

30

40

50

Figure 3: Concentration of the rst species with a 4 cells traveling wave.

Concentration of the second species (L1=0.26; L2=0.9)

50

1 0

0.8 0.6 0.4 0.2 0

-50 -50

-40

-30

-20

-10

0

10

20

30

40

50

Figure 4: Concentration of the second species with a 4 cells traveling wave. 25

Temperature (L1=0.26; L2=0.9)

1.2 50

1 0.8 0.6

0 0.4 0.2 0

-50 -50

-40

-30

-20

-10

0

10

20

30

Figure 5: Temperature with a 4 cells traveling wave.

26

40

50

45 40

Psi 0.2

>

−0.2

Suggest Documents