Introduction to Scientific Computing

University of Tartu, Institute of Computer Science Introduction to Scientific Computing MTAT.08.025 [email protected] Spring 2016 2 Practical i...

Author: Isaac Norton

2 downloads 1 Views 888KB Size

Report

Download PDF

Recommend Documents

INTRODUCTION TO SCIENTIFIC COMPUTING DRAFT

Introduction to Python for Scientific Computing

BIL 104E Introduction to Scientific and Engineering Computing

Introduction to parallel computing

Introduction to Distributed Computing

Introduction to Parallel Computing

Introduction to parallel computing

Introduction to MPI. Introduction to MPI. Kadin Tseng Scientific Computing and Visualization Group Boston University

Introduction to Computing with MATLAB

Masterpraktikum Scientific Computing

Chemnitz Scientific Computing Preprints

ITI Introduction to Computing II

Scientific Computing in Physics

Algorithms of Scientific Computing

Masterpraktikum Scientific Computing

Studienplan Bachelor Scientific Computing

Chemnitz Scientific Computing Preprints

Scientific Computing. John A. Chandy

Using Python for Scientific Computing

Using R for scientific computing

Scientific Computing: An Introductory Survey

Introduction to GPU Computing with OpenACC

COMP 422: Introduction to Parallel Computing

University of Tartu, Institute of Computer Science

Introduction to Scientific Computing MTAT.08.025

[email protected]

Spring 2016

2 Practical information Lectures: Liivi 2 - 202/511 WED 14:15 Computer classes: Liivi 2 - 205, THU 12:15 Amnir Hadachi 3 ECTS Lectures: 16h; Computer Classes: 16h; Independent work: 46h Final grade forms from : 1. Active partitipation at lectures 10% 2. Stand-up quiz(es) 10% 3. Computer class activities 50% 4. Exam 30% Course homepage (http://courses.cs.ut.ee/2016/isc)

3 Introduction

1

Introduction

1.1

Syllabus

Lectures: • Python for Scientific Computing NumPy, SciPy • Scientific Computing - an Overview • Floating point numbers, how to deal with roundoff errors • Large problems in Linear Algebra, condition number • Memory hierarchies and making use of it • Numerical integration and differentiation • Numerical solution of differential and integral equations • Fast Fourier Transform

1.1 Syllabus

4 Introduction

1.1 Syllabus

Computer Classes (preliminary plan) 1. Python & Sage; Fibonacci numbers; Collatz conjecture 2. Discretization and round-off errors 3. NumPy arrays, matrices; LU-Factorization with Gauss Elimination Method (GEM) 4. UT HPC server; LU-Factorization and GEM on HPC cluster 5. Floating point numbers 6. Fractals 7. Fourier series and Fast Fourier Transform 8. Discrete-time models and ordinary differential equations

5 Introduction

1.2

1.2 Literature

Literature

General Scientific Computing: 1. RH Landau, A First Course in Scientific Computing. Symbolic, Graphic, and Numeric Modeling Using Maple, Java, Mathematica, and Fortran90. Princenton University Press, 2005. 2. LR Scott, T Clark, B Bagheri. Scientific Parallel Computing. Princenton University Press, 2005. 3. MT Heath, Scientific Computing; ISBN: 007112229X, McGraw-Hill Companies, 2001. 4. JW Demmel, Applied Numerical Linear Algebra; ISBN: 0898713897, Society for Industrial & Applied Mathematics, Paperback, 1997.

6 Introduction

1.2 Literature

Python & Sage: 1. Sage Tutorials and documentation: accessible through UT Sage (http:// sage.math.ut.ee) 2. Hans Petter Langetangen, A Primer on Scientific Programming with Python, Springer, 2009. Book webpage (http://vefur.simula.no/ intro-programming/). 3. Hans Petter Langtangen, Python Scripting for Computational Science. Third Edition, Springer 2008. Web-site for the book (http://folk.uio.no/ hpl/scripting/). 4. Neeme Kahusk, Sissejuhatus Pythonisse (http://www.cl.ut.ee/ inimesed/nkahusk/sissejuhatus-pythonisse/). 5. Travis E. Oliphant, Guide to NumPy (http://www.tramy.us), Trelgol Publishing 2006.

7 Introduction

1.3 1.3.1

1.3 Scripting vs programming

Scripting vs programming What is a script?

• Very high-level, often short, program written in a high-level scripting language • Scripting languages: – Unix shells, – Tcl,

– Scheme, – Rexx, – JavaScript,

– Perl, – Python, – Ruby,

– VisualBasic, – ...

8 Introduction 1.3.2

1.3 Scripting vs programming

Characteristics of a script

• Glue other programs together • Extensive text processing • File and directory manipulation • Often special-purpose code • Many small interacting scripts may yield a big system • Perhaps a special-purpose GUI on top • Portable across Unix, Windows, Mac • Interpreted program (no compilation+linking)

9 Introduction 1.3.3

1.3 Scripting vs programming

Why not stick to Java, C/C++ or Fortran?

Features of Perl and Python compared with Java, C/C++ and Fortran: • shorter, more high-level programs • much faster software development • more convenient programming • you feel more productive • no variable declarations , but lots of consistency checks at run time • lots of standardized libraries and tools

10 Introduction

1.4

1.4 Scripts yield short code

Scripts yield short code

Consider reading real numbers from a file, where each line can contain an arbitrary number of real numbers:

1.1 9 5.2 1.762543E-02 0 0.01 0.001

9 3 7

Python solution:

F = open(filename, ’r’) n = F.read().split()

Perl solution:

open F, $filename; $s = join "", ; @n = split ’ ’, $s;

11 Introduction

1.5 Performance issues

Ruby solution:

n = IO.readlines(filename).join.split

...Doing this in C++ or Java requires at least a loop, and in Fortran and C quite some code lines are necessary

1.5 1.5.1

Performance issues Scripts can be slow

• Perl and Python scripts are first compiled to byte-code • The byte-code is then interpreted • Text processing is usually as fast as in C • Loops over large data structures might be very slow

12 Introduction

1.5 Performance issues

for i in range(len(A)): A[i] = ...

• Fortran, C and C++ compilers are good at optimizing such loops at compile time and produce very efficient assembly code (e.g. 100 times faster) • Fortunately, long loops in scripts can easily be migrated to Fortran or C (or special libraries like numpy!)

13 Introduction 1.5.2

1.5 Performance issues

Scripts may be fast enough

Read 100 000 (x,y) data from file and write (x,f(y)) out again • Pure Python: 4s • Pure Perl: 3s • Pure Tcl: 11s • Pure C (fscanf/fprintf): 1s • Pure C++ (iostream): 3.6s • Pure C++ (buffered streams): 2.5s • Numerical Python modules: 2.2s (!) • Remark: in practice, 100 000 data points are written and read in binary format, resulting in much smaller differences

14 Introduction 1.5.3

1.5 Performance issues

When scripting is convenient

• The application’s main task is to connect together existing components • The application includes a graphical user interface • The application performs extensive string/text manipulation • The design of the application code is expected to change significantly • CPU-time intensive parts can be migrated to C/C++ or Fortran • The application can be made short if it operates heavily on list or hash structures • The application is supposed to communicate with Web servers • The application should run without modifications on Unix, Windows, and Macintosh computers, also when a GUI is included

15 Introduction 1.5.4

1.5 Performance issues

When to use C, C++, Java, Fortran

• Does the application implement complicated algorithms and data structures? • Does the application manipulate large datasets so that execution speed is critical? • Are the application’s functions well-defined and changing slowly? • Will type-safe languages be an advantage, e.g., in large development teams?

16 Introduction

1.5 Performance issues

This course • we use python • through sage environment http://sage.math.ut.ee • + numeric libraries numpy and scipy

NB! HOME EXERCISE for tomorrow’s computer class: • Get your “hands dirty” programming in python! – tutorials (found under Help) in UT sage server – videos at youtube or ∗ at last year’s course we used these slides; the lecture got recorded at Lecture 1 video (– starting from 47th minute...)

17 What is Scientific Computing?

2

2.1 Introduction to Scientific Computing

What is Scientific Computing?

2.1

Introduction to Scientific Computing

• Scientific computing – subject on crossroads of • physics, chemistry, [social, engineering,...] sciences – problems typically translated into ∗ linear algebraic problems ∗ sometimes combinatorial problems • a computational scientist needs knowledge of some aspects of – numerical analysis – linear algebra – discrete mathematics

18 What is Scientific Computing?

2.1 Introduction to Scientific Computing

• An efficient implementation needs some understanding of – computer architecture ∗ both on the CPU level ∗ on the level of parallel computing – some specific skills of software management

Scientific Computing – field of study concerned with constructing mathematical models and numerical solution techniques and using computers to analyse and solve scientific and engineering problems • typically – application of computer simulation and other forms of computation to problems in various scientific disciplines.

19 What is Scientific Computing?

2.1 Introduction to Scientific Computing

Main purpose of Scientific Computing: • mirroring

• characteristics of real world processes’

• prediction

– behaviour – development

Example of Computational Simulation A STROPHYSICS : what happens with collision of two black holes in the universe? Situation which is • impossible to observe in the nature, • test in a lab • estimate barely theoretically

20 What is Scientific Computing?

2.1 Introduction to Scientific Computing

Computer simulation CAN HELP But what is needed for simulation? • adequate mathematical model (Einstein’s general relativity theory) • algorithm for numerical solution of the equations • enough big computer for actual realisation of the algorithms Frequently: need for simulation of situations that could be performed experimantally, but simulation on computers is needed because: • H IGH COST OF THE REAL EXPERIMENT. Examples: – car crash-tests – simulation of gas explosions – nuclear explosion simulation – behaviour of ships in Ocean waves

21 What is Scientific Computing?

2.1 Introduction to Scientific Computing

– airplane aerodynamics – strength calculations in big constructions (for example oil-platforms) – oil-field simulations • T IME FACTOR . Some examples: – Climate change predictions – Geological development of the Earth (including oil-fields) – Glacier flow model – Weather prediction • S CALE OF THE PROBLEM . Some examples: – modeling chemical reactions on the molecular level – development of biological ecosystems

22 What is Scientific Computing?

2.1 Introduction to Scientific Computing

• P ROCESSES THAT CAN NOT BE INTERVENED Some examples: – human heart model – global economy model

23 What is Scientific Computing?

2.2

2.2 Specifics of computational problems

Specifics of computational problems Usually, computer simulation consists of: 1. Creation of mathematical model – usually in a form of equations – physical properties and dependencies of the subject 2. Algorithm creation for numerical solution of the equations 3. Application of the algorithms in computer software 4. Using the created software on a computer in a particular simulation process 5. Visualizing the results in an understandable way using computer graphics, for example 6. Integration of the results and repetition/redesign of arbitrary given step above

24 What is Scientific Computing?

2.2 Specifics of computational problems

Most often: • algorithm – written down in an intuitive way – and/or using special modeling software • computer program written, based on the algorithm • testing • iterating

Explorative nature of Scientific Computing!

25 What is Scientific Computing?

2.3

2.3 Mathematical model

Mathematical model G ENERAL STRATEGY: R EPLACE A DIFFICULT PROBLEM

WITH A MORE SIMPLE ONE

• – which has the same solution • – or at least approximate solution – but still reflecting the most important features of the problem

26 What is Scientific Computing?

2.3 Mathematical model

S OME EXAMPLES OF SUCH TECHNIQUES : • Replacing infinite spaces with finite ones (in maths sense) • Infinite processes replacement with finite ones – replacing integrals with finite sums – derivatives replaced by finite differences • Replacing differential equations with algebraic equations • Nonlinear equations replaced by linear equations • Replacing higher order systems with lower order ones • Replacing complicated functions with more simple ones (like polynomials) • Arbitrary structured matrix replacement with more simple structured matrices

27 What is Scientific Computing?

2.3 Mathematical model

AT THIS COURSE WE TRY TO GIVE : an overview of some methods and analysis for development of reliable and efficient software for Scientific Computing Reliability means here both the reliability of the software as well as adequaty of the results – how much can one rely on the achieved results: • Is the solution acceptable at all? is it a real solution? (extraneous solution?, instability of the solution? etc); Does the solution algorithm guarantee a solution at all? • How big is the calculated solution’s deviation from the real solution? How well the simulation reflects the real world? Another aspect: software reliability

28 What is Scientific Computing?

2.3 Mathematical model

Efficiency expressed on various levels of the solution process • speed • amount of used resources Resources can be: – Time – Cost – Number of CPU cycles – Number of processes – Amount of RAM – Human labour

29 What is Scientific Computing?

2.3 Mathematical model

General formula: approximation error time Even more generally: – minimise time of the solution min

Efficient method requires: (i)

good discretisation

(ii)

good computer implementation

(iii)

depends on computer architecture (processor speed, RAM size, memory bus speed, availability of cache, number of cache levels and other properties )

30 Approximation

3 3.1 3.1.1

3.1 Sources of approximation error

Approximation in Scientific Computing Sources of approximation error Error sources that are under our control

M ODELLING ERRORS – some physical entities in the model are simplified or even not taken into account at all (for example: air resistance, viscosity, friction etc)

(Usually it is OK but sometimes not... ) (You may want to look: http://en.wikipedia.org/wiki/Spherical_cow :-)

31 Approximation M EASUREMENT

ERRORS

3.1 Sources of approximation error – laboratory equipment has its precision

Errors come also out of • random measurement deviation • backward noise As an example, Newton and Planck constants are used with 8-9 decimal places while laboratory measurements are performed with much less precision!

T HE EFFECT OF PREVIOUS CALCULATIONS – the input for calculations is often already output of some previous calculation with some computational errors

32 Approximation 3.1.2

3.1 Sources of approximation error

Errors created during the calculations

Discretisation As an example: • replacing derivatives with finite differences • finite sums used instead of infinite series • etc Round-off errors – error created during the calculations due to limited available precision, which the calcualtions are performed with

33 Approximation

3.1 Sources of approximation error

Example 4.1 Suppose, a computer program can find function f value f (x) for arbitrary x. Task: find an algorithm for calculating approximation to the derivative f 0 (x) Algorithm: Choose small h > 0 and approximate: f 0 (x) ≈ [ f (x + h) − f (x)]/h Discretisation error is: T := | f 0 (x) − [ f (x + h) − f (x)]/h|. Using Taylor series, we get an estimation: T≤

h

f 00 . ∞ 2

(1)

34 Approximation

3.1 Sources of approximation error

Computational error is created using finite precision arithmetics approximating the real f (x) with an approximation f˜(x). Computational error C is: f (x + h) − f (x) f˜(x + h) − f˜(x) C = − h h [ f (x + h) − f˜(x + h)] − [ f (x) − f˜(x)] , = h which gives an estimate: 2 C ≤ k f − f˜k∞ . h

(2)

The resulting error is ˜ ˜ 0 f (x) − f (x + h) − f (x) , h which can be estimated using (1) and (2): h 2 T +C ≤ k f 00 k∞ + k f − f˜k∞ . 2 h

(3)

35 Approximation

3.1 Sources of approximation error

=⇒ if h is large – the discretisation error is dominating, if h is small, computational error starts dominating. 3.1.3

Forward error (arvutuslik viga e. tulemuse viga) and backward error (algandmete viga)

Consider computing y = f (x). Usually we can only compute an approximation of y, we denote the approximately calculated value by y. ˆ We can observe two measures of the error associated with this computation.

Forward error The forward error is a measure of the difference between the approximation yˆ and the true value y: absolute forward error: |yˆ − y| ˆ relative forward error: |y−y| |y|

36 Approximation

3.1 Sources of approximation error

The forward error would be a natural quantity to measure, but usually (since we don’t know the actual value of y) we can only get an upper bound on it. Moreover, tight upper bounds on it can be very difficult.

Backward error The question we might want to ask: For what input data we actually performed the calculations? We would like to find the smallest ∆x for which yˆ = f (x + ∆x) – Here we have yˆ as the exact value of f (x + ∆x). The value |∆x| (or |∆x| |x| ) is called backward error. This means, backward error is the one which we have in the input. (Like the forward error is the error we observe in the output of the calculations or an algorithms.)

37 Approximation

3.1 Sources of approximation error

Condition number – upper limit of their ratio: forward error ≤ condition number × backward error From (2) it follows that in Example 4.1 the value of condition number is: 2/h. In given calculations all the values are absolute: actual values of the approximated entities are not considered. Relative forward error and relative backward error are in this case: k f − f˜k∞ C and . |( f (x + h) − f (x))/h| k f k∞

Assuming that minx | f 0 (x)| > 0, it follows easily from (2), that: C ≤ |( f (x + h) − f (x))/h|

2 k f k∞ h minx | f 0 (x)|

k f − f˜k∞ . k f k∞

38 Approximation

3.1 Sources of approximation error

The value in the brackets {·} is called relative condition number of the problem. In general: • If (absolute or relative) condition number is small, – then (absolute or relative) error in the input data can produce only a small error in the result. • If condition number is large – then large error in the result can be caused even by a small error in the input data – such problems are said to be ill-conditioned

39 Approximation

3.1 Sources of approximation error

• Sometimes, in the case of finite precision arithmetics: – backward error is much more simple to estimate than forward error – Backward error combined with condition number makes it possibe to estimate the forward error (absolute or relative)

40 Approximation

3.1 Sources of approximation error

Example 4.2 (One of the key problems in Scientific Computing) Consider solving the system of linear equations: Ax = b,

(4)

where the input consists of • nonsingular n × n matrix A • b ∈ Rn The task is to calculate – an approximate solution: x ∈ Rn . Suppose, instead of exact matrix A – given its approximation A˜ = A + δ A, but (for simplicity) b known exactly. The solution x˜ = x + δ x satisfies the system of equations (A + δ A)(x + δ x) = b.

(5)

41 Approximation

3.1 Sources of approximation error

Then from (4),(5) it follows that (A + δ A)δ x = −(δ A)x. Multiplying it with (A + δ A)−1 and taking norms, we estimate kδ xk ≤ k(A + δ A)−1 kkδ Akkxk. It follows that if x 6= 0 and A 6= 0, we have: kδ xk kδ Ak ≤ k(A + δ A)−1 kkAk kxk kAk kδ Ak ∼ , = kA−1 kkAk kAk which is satisfied with δ A sufficiently small.

(6)

42 Approximation

3.1 Sources of approximation error

• =⇒for calculation of x an important factor is relative condition number κ(A) := kA−1 kkAk. – It is usually called as condition number of matrix A. – Depends on norm k · k

Therefore, common practice for forward error estimation is to: • find an estimate to the backward error • use the estimate (6)

43 Approximation

3.2

3.2 Floating-Point Numbers

Floating-Point Numbers

The number −3.1416 in scientific notation is −0.31416 × 101 or (as computer output) -0.31416E01. −.31416 10 1 exponent sign mantissa

base

– floating point numbers in computer notation. Usually, base is 2 (with a few exceptions like IBM 370 had a base 16; base 10 in most of hand-held calculators; 3 in an ill-fated Russian computer). For example, .101012 × 23 = 5.2510 . Formally, a floating-point number system F, is characterised by four integers: • Base (or radix) β > 1 • Precision p > 0 • Exponent range [L,U]: L < 0 < U

44 Approximation

3.2 Floating-Point Numbers

Any floating-point number x ∈ F has the form x = ±{d0 + d1 β −1 + ... + d p−1 β 1−p }β E ,

(7)

where integers di satify 0 ≤ di ≤ β − 1, i = 0, ..., p − 1, and E ∈ [L,U] (E is positive, zero or negative integer). The number E is called an exponent and in the part in the brackets {·} is called mantissa Example. In arithmetics with precision 4 and base 10 the number 2347 is represented as {2 + 3 × 10−1 + 4 × 10−2 + 7 × 10−3 }103 . Note that exact representation of this number in precision 3 and base 10 is not possible!

45 Approximation

3.3

3.3 Normalised floating-point numbers

Normalised floating-point numbers

A number is normalised if d0 > 0 Example. The number .101012 × 23 is normalised, but .0101012 × 24 is not Floating point systems are usually normalised because: • Representation of each number is then unique • No digits are wasted on leading zeros • In normalised binary (β = 2) system, the leading bit always 1 =⇒ no need to store it! Smallest positive normalised number in form (7) is 1 × β L – underflow threashold. (In case of underflow, the result is smaller than the smallest representable floatingpoint number)

46 Approximation

3.3 Normalised floating-point numbers

Largest positive normalised number in form (7) is (β − 1){1 + β −1 + ... + β 1−p }β U = (1 − β −p )β U+1 . – overflow threashold. If the result of an arithmetic operation is an exact number not represented in the floating-point number system F, the result is represented as (hopefully close) element of F. (Rounding)

47 Approximation

3.4

3.4 IEEE (Normalised) Arithmetics

IEEE (Normalised) Arithmetics

• β = 2 (binary) • d0 = 1 always – not stored Single precision: • p = 24, L = −126, U = 127 • Underflow threashold = 2−126 ≈ 10−38 • Overflow threashold = 2127 · (2 − 2−23 ) ≈ 2128 ≈ 1038 • One bit for sign, 23 for mantissa and 8 for exponent: 1

23

• – 32-bit word.

8

48 Approximation

3.4 IEEE (Normalised) Arithmetics

Double precision: • p = 53, L = −1022, U = 1023 • Underflow threashold = 2−1022 ≈ 10−308 • Overflow threashold= 21023 · (2 − 2−52 ) ≈ 21024 ≈ 10308 • One bit for sign, 52 for mantissa and 11 for exponent: 1

52

11

– 64-bit word • IEEE arithmetics standard – rounding towards the nearest element in F. • (If the result is exactly between the two elements, the rounding is towards the number which has the least significant bit equal to 0 – rounding towards the closest even number)

49 Approximation

3.4 IEEE (Normalised) Arithmetics

IEEE subnormal numbers - unnormalised numbers with minimal possible exponent. • Between 0 and the smallest normalised floating point value. • Guarantees that f l(x − y) (the result of operation x − y in floating point arithmetics) in case x 6= y never zero – to avoid underflow in such situatons IEEE symbols Inf and NaN – Inf (±∞), NaN (Not a Number)

• Inf - in case of overflow – x/ ± ∞ = 0 in case of arbitrary finite floating/point x – +∞ + ∞ = +∞, etc. • NaN is returned when operation does not have a well/defined finite or ininite value, for example

50 Approximation

3.4 IEEE (Normalised) Arithmetics

– ∞−∞ – –

0 0

√ −1

– NaN x (where – one of operations: +, - , *, / ), etc IEEE defines also double extended floating-point values • 64 bit mantissa; 15 bit exponent • most of the compilers do not support it • Many platforms support also quadruple precision (double*16) – often emulated with lower precision and therefore slow performance

51 Python in SC

4

4.1 Numerical Python (NumPy)

Python in Scientfic Computing

4.1

Numerical Python (NumPy)

• NumPy enables efficient numerical computing in Python • NumPy is a package of modules, which offers efficient arrays (contiguous storage) with associated array operations coded in C or Fortran • There are three implementations of Numerical Python • Numeric from the mid 90s • numarray from about 2000 • numpy from 2006 (the new and leading implementation) • numpy (by Travis Oliphant) – recommended

52 Python in SC

4.1 Numerical Python (NumPy)

# A taste of NumPy: a least-squares procedure 2 from numpy import * 3 n = 100; x = linspace(0.0, 1.0, n) # coordinates 4 y_line = -2*x + 3 5 y = y_line + random.normal(0, 0.55, n) # line with noise 6 # create and solve least squares system: 7 A = array([x, ones(n)]) 8 A = A.transpose() 9 result = linalg.lstsq(A, y) 10 # result is a 4-tuple, the solution (a,b) is the 1st entry: 11 a, b = result[0] 12 p=[(x[i],y[i]) for i in range(len(x))] 13 p0 = (0,a*0 + b); p1 = (1,a*1 + b) 14 G=list_plot(p,color=’red’)+line([(0,3),(1,1)],color=’blue’) 15 G=G+line([p0, p1], color=’red’) 16 G=G+text(’Blue - original line -2*x+3’, (0.7, 3.5), color=’blue’) G= G+text(’Red - line fitted to data’, (0.3, 0.5), color=’red’) 17 show(G) # note: retype symbols "’" when copy-pasting code to sage 1

53 Python in SC Resulting plot:

4.1 Numerical Python (NumPy)

54 Python in SC 4.1.1

4.1 Numerical Python (NumPy)

NumPy: making arrays

>>> from numpy import * >>> n = 4 >>> a = zeros(n) # one-dim. array of length n >>> print a # str(a), float (C double) is default type [ 0. 0. 0. 0.] >>> a # repr(a) array([ 0., 0., 0., 0.]) >>> p = q = 2 >>> a = zeros((p,q,3)) # p*q*3 three-dim. array >>> print a [[[ 0. 0. 0.] [ 0. 0. 0.]] [[ 0. 0. 0.] [ 0. 0. 0.]]] >>> a.shape # a’s dimension (2, 2, 3)

55 Python in SC 4.1.2

4.1 Numerical Python (NumPy)

NumPy: making float, int, complex arrays

>>> a = z e r o s ( 3 ) >>> p r i n t a . d t y p e # a ’ s d a t a t y p e float64 >>> a = z e r o s ( 3 , i n t ) >>> p r i n t a , a . d t y p e [0 0 0] int64 ( or i n t 3 2 , d e p e n d i n g on a r c h i t e c t u r e ) >>> a = z e r o s ( 3 , f l o a t 3 2 ) # single precision >>> p r i n t a [ 0. 0. 0.] >>> p r i n t a . d t y p e float32 >>> a = z e r o s ( 3 , complex ) ; a array ( [ 0 . + 0 . j , 0.+0. j , 0.+0. j ]) >>> a . d t y p e d t y p e ( ’ complex128 ’ )

56 Python in SC

4.1 Numerical Python (NumPy)

• Given an array a, make a new array of same dimension and data type:

>>> x = zeros(a.shape, a.dtype)

4.1.3

Array with a sequence of numbers

• linspace(a, b, n) generates n uniformly spaced coordinates, starting with a and ending with b

>>> x = linspace(-5, 5, 11) >>> print x [-5. -4. -3. -2. -1. 0. 1. 2. 3. 4. 5.]

57 Python in SC

4.1 Numerical Python (NumPy)

• arange works like range

>>> x = arange(-5, 5, 1, float) >>> print x # upper limit 5 is not included [-5. -4. -3. -2. -1. 0. 1. 2. 3. 4.]

4.1.4

Warning: arange is dangerous

• arange’s upper limit may or may not be included (due to round-off errors) 4.1.5

Array construction from a Python list

array(list, [datatype]) generates an array from a list:

>>> pl = [0, 1.2, 4, -9.1, 5, 8] >>> a = array(pl)

58 Python in SC

4.1 Numerical Python (NumPy)

• The array elements are of the simplest possible type:

>>> z = array([1, 2, 3]) >>> print z # int elements possible [1 2 3] >>> z = array([1, 2, 3], float) >>> print z [ 1. 2. 3.]

• A two-dim. array from two one-dim. lists:

>>> x = [0, 0.5, 1]; y = [-6.1, -2, 1.2] # Python lists >>> a = array([x, y]) # form array with x and y as rows

59 Python in SC

4.1 Numerical Python (NumPy)

• From array to list:

alist = a.tolist()

4.1.6

From “anything” to a NumPy array

• Given an object a,

a = asarray(a)

converts a to a NumPy array (if possible/necessary) • Arrays can be ordered as in C (default) or Fortran:

a = asarray(a, order=’Fortran’) isfortran(a) # returns True of a’s order is Fortran

60 Python in SC

4.1 Numerical Python (NumPy)

• Use asarray to, e.g., allow flexible arguments in functions:

def myfunc(some_sequence, ...): a = asarray(some_sequence) # work with a as array myfunc([1,2,3], ...) myfunc((-1,1), ...) myfunc(zeros(10), ...)

61 Python in SC 4.1.7

4.1 Numerical Python (NumPy)

Changing array dimensions

>>> >>> >>> (2, >>> 6 >>>

a = array([0, 1.2, 4, -9.1, 5, 8]) a.shape = (2,3) # turn a into a 2x3 matrix a.shape 3) a.size

a.shape = (a.size,) # turn a into a vector of length 6 again >>> a.shape (6,) >>> a = a.reshape(2,3) # same effect as setting a.shape >>> a.shape (2, 3)

62 Python in SC 4.1.8

4.1 Numerical Python (NumPy)

Array initialization from a Python function

>>> def myfunc(i, j): ... return (i+1)*(j+4-i) ... >>> # make 3x6 array where a[i,j] = myfunc(i,j): >>> a = fromfunction(myfunc, (3,6)) >>> a array([[ 4., 5., 6., 7., 8., 9.], [ 6., 8., 10., 12., 14., 16.], [ 6., 9., 12., 15., 18., 21.]])

63 Python in SC 4.1.9

4.1 Numerical Python (NumPy)

Basic array indexing

a = linspace(-1, 1, 6) # array([-1. , -0.6, -0.2, 0.2, 0.6, 1. ]) a[2:4] = -1 # set a[2] and a[3] equal to -1 a[-1] = a[0] # set last element equal to first one a[:] = 0 # set all elements of a equal to 0 a.fill(0) # set all elements of a equal to 0 a.shape = (2,3) # turn a into a 2x3 matrix print a[0,1] # print element (0,1) a[i,j] = 10 # assignment to element (i,j) a[i][j] = 10 # equivalent syntax (slower) print a[:,k] # print column with index k print a[1,:] # print second row a[:,:] = 0 # set all elements of a equal to 0

64 Python in SC 4.1.10

4.1 Numerical Python (NumPy)

More advanced array indexing

>>> a = linspace(0, 29, 30) >>> a.shape = (5,6) >>> a array([[ 0., 1., 2., 3., 4., 5.,] [ 6., 7., 8., 9., 10., 11.,] [ 12., 13., 14., 15., 16., 17.,] [ 18., 19., 20., 21., 22., 23.,] [ 24., 25., 26., 27., 28., 29.,]]) >>> a[1:3,:-1:2] # a[i,j] for i=1,2 and j=0,2,4 array([[ 6., 8., 10.], [ 12., 14., 16.]]) >>> a[::3,2:-1:2] # a[i,j] for i=0,3 and j=2,4 array([[ 2., 4.], [ 20., 22.]])

65 Python in SC

4.1 Numerical Python (NumPy)

>>> i = slice(None, None, 3); j = slice(2, -1, 2) >>> a[i,j] array([[ 2., 4.], [ 20., 22.]])

4.1.11

Slices refer the array data

• With a as list, a[:] makes a copy of the data • With a as array, a[:] is a reference to the data!!!

>>> b = a[1,:] # extract 2nd column of a >>> print a[1,1] 12.0 >>> b[1] = 2 >>> print a[1,1]

66 Python in SC 2.0

4.1 Numerical Python (NumPy) # change in b is reflected in a

• Take a copy to avoid referencing via slices:

>>> b = a[1,:].copy() >>> print a[1,1] 12.0 >>> b[1] = 2 # b and a are two different arrays now >>> print a[1,1] 12.0 # a is not affected by change in b

67 Python in SC 4.1.12

4.1 Numerical Python (NumPy)

Integer arrays as indices

• An integer array or list can be used as (vectorized) index

>>> a = linspace(1, 8, 8) >>> a array([ 1., 2., 3., 4., 5., 6., 7., 8.]) >>> a[[1,6,7]] = 10 >>> a # ? array([ 1., 10., 3., 4., 5., 6., 10., 10.]) >>> a[range(2,8,3)] = -2 >>> a # ? array([ 1., 10., -2., 4., 5., -2., 10., 10.]) >>> a[a < 0] # pick out the negative elements of a array([-2., -2.]) >>> a[a < 0] = a.max() >>> a # ?

68 Python in SC

4.1 Numerical Python (NumPy)

array([ 1., 10., 10., 4., 5., 10., 10., 10.])

• Such array indices are important for efficient vectorized code 4.1.13

Loops over arrays

• Standard loop over each element:

for i in xrange(a.shape[0]): for j in xrange(a.shape[1]): a[i,j] = (i+1)*(j+1)*(j+2) print ’a[%d,%d]=%g ’ % (i,j,a[i,j]), print # newline after each row

69 Python in SC

4.1 Numerical Python (NumPy)

• A standard for loop iterates over the first index:

>>> print a [[ 2. 6. 12.] [ 4. 12. 24.]] >>> for e in a: ... print e ... [ 2. 6. 12.] [ 4. 12. 24.]

• View array as one-dimensional and iterate over all elements:

for e in a.flat: print e

70 Python in SC

4.1 Numerical Python (NumPy)

• For loop over all index tuples and values:

>>> ... ... (0, (0, (0, (1, (1, (1,

for index, value in ndenumerate(a): print index, value 0) 1) 2) 0) 1) 2)

2.0 6.0 12.0 4.0 12.0 24.0

71 Python in SC 4.1.14

4.1 Numerical Python (NumPy)

Array computations

• Arithmetic operations can be used with arrays:

b = 3*a - 1 # a is array, b becomes array

1) compute t1 = 3*a, 2) compute t2= t1 - 1, 3) set b = t2

72 Python in SC

4.1 Numerical Python (NumPy)

• Array operations are much faster than element-wise operations:

>>> import time # module for measuring CPU time >>> a = linspace(0, 1, 1E+07) # create some array >>> t0 = time.clock() >>> b = 3*a -1 >>> t1 = time.clock() # t1-t0 is the CPU time of 3*a-1 >>> for i in xrange(a.size): b[i] = 3*a[i] - 1 >>> t2 = time.clock() >>> print ’3*a-1: %g sec, loop: %g sec’ % (t1-t0, t2-t1) 3*a-1: 2.09 sec, loop: 31.27 sec

4.1.15

In-place array arithmetics

• Expressions like 3*a-1 generates temporary arrays

73 Python in SC

4.1 Numerical Python (NumPy)

• With in-place modifications of arrays, we can avoid temporary arrays (to some extent)

b = a b *= 3 # or multiply(b, 3, b) b -= 1 # or subtract(b, 1, b)

Note: a is changed, use b = a.copy()

74 Python in SC

4.1 Numerical Python (NumPy)

• In-place operations:

a a a a a

*= -= /= +=

3.0 # multiply a’s elements by 3 1.0 # subtract 1 from each element 3.0 # divide each element by 3 1.0 # add 1 to each element **= 2.0 # square all elements

• Assign values to all elements of an existing array:

a[:] = 3*c - 1

75 Python in SC 4.1.16

4.1 Numerical Python (NumPy)

Standard math functions can take array arguments

# c c c # c c c c

let b be an array = sin(b) = arcsin(c) = sinh(b) same functions for the cos and tan families = b**2.5 # power function = log(b) = exp(b) = sqrt(b)

76 Python in SC 4.1.17

4.1 Numerical Python (NumPy)

Other useful array operations

# a is an array a.clip(min=3, max=12) a.mean(); mean(a) a.var(); var(a) a.std(); std(a) median(a) cov(x,y) trapz(a) diff(a)

# # # #

clip elements mean value variance standard deviation

# covariance # Trapezoidal integration # finite differences (da/dx)

# more Matlab-like functions: corrcoeff, cumprod, diag, eig, eye, fliplr, flipud, max, min, prod, ptp, rot90, squeeze, sum, svd, tri, tril, triu

77 Python in SC 4.1.18

4.1 Numerical Python (NumPy)

Temporary arrays

• Let us evaluate f1(x) for a vector x:

def f1(x): return exp(-x*x)*log(1+x*sin(x))

1. temp1 = -x

5. temp5 = x*temp4

2. temp2 = temp1*x

6. temp6 = 1 + temp4

3. temp3 = exp(temp2)

7. temp7 = log(temp5)

4. temp4 = sin(x)

8. result = temp3*temp7

78 Python in SC 4.1.19

4.1 Numerical Python (NumPy)

More useful array methods and attributes

>>> a = zeros(4) + 3 >>> a array([ 3., 3., 3., 3.]) # float data >>> a.item(2) # more efficient than a[2] 3.0 >>> a.itemset(3,-4.5) # more efficient than a[3]=-4.5 >>> a array([ 3. , 3. , 3. , -4.5]) >>> a.shape = (2,2) >>> a array([[ 3. , 3. ], [ 3. , -4.5]])

79 Python in SC

4.1 Numerical Python (NumPy)

>>> a.ravel() # from multi-dim to one-dim array([ 3. , 3. , 3. , -4.5]) >>> a.ndim # no of dimensions 2 >>> len(a.shape) # no of dimensions 2 >>> rank(a) # no of dimensions 2 >>> a.size # total no of elements 4 >>> b = a.astype(int) # change data type >>> b array([3, 3, 3, 3])

80 Python in SC 4.1.20

4.1 Numerical Python (NumPy)

Complex number computing

>>> from math import sqrt >>> sqrt(-1) # ? Traceback (most recent call last): File "", line 1, in ValueError: math domain error >>> from numpy import sqrt >>> sqrt(-1) # ? Warning: invalid value encountered in sqrt nan >>> from cmath import sqrt # complex math functions >>> sqrt(-1) # ? 1j >>> sqrt(4) # cmath functions always return complex... (2+0j)

81 Python in SC

4.1 Numerical Python (NumPy)

>>> from numpy.lib.scimath import sqrt >>> sqrt(4) 2.0 # real when possible >>> sqrt(-1) 1j # otherwise complex

82 Python in SC 4.1.21

4.1 Numerical Python (NumPy)

A root function

# Goal: compute roots of a parabola, return real when possible, # otherwise complex def roots(a, b, c): # compute roots of a*x^2 + b*x + c = 0 from numpy.lib.scimath import sqrt q = sqrt(b**2 - 4*a*c) # q is real or complex r1 = (-b + q)/(2*a) r2 = (-b - q)/(2*a) return r1, r2 >>> a = 1; b = 2; c = 100 >>> roots(a, b, c) # complex roots ((-1+9.94987437107j), (-1-9.94987437107j)) >>> a = 1; b = 4; c = 1 >>> roots(a, b, c) # real roots (-0.267949192431, -3.73205080757)

83 Python in SC 4.1.22

4.1 Numerical Python (NumPy)

Array type and data type

>>> import numpy >>> a = numpy.zeros(5) >>> type(a) >>> isinstance(a, ndarray) True >>> a.dtype dtype(’float64’) >>> a.dtype.name ’float64’ >>> a.dtype.char ’d’ >>> a.dtype.itemsize 8

# is a of type ndarray? # data (element) type object

# character code # no of bytes per array element

84 Python in SC

4.1 Numerical Python (NumPy)

>>> b = zeros(6, float32) >>> a.dtype == b.dtype # do a and b have the same data type? False >>> c = zeros(2, float) >>> a.dtype == c.dtype True

85 Python in SC 4.1.23

4.1 Numerical Python (NumPy)

Matrix objects

• NumPy has an array type, matrix, much like Matlab’s array type

>>> x1 = array([1, 2, 3], float) >>> x2 = matrix(x) # or just mat(x) >>> x2 # row vector matrix([[ 1., 2., 3.]]) >>> x3 = mat(x).transpose() # column vector >>> x3 matrix([[ 1.], [ 2.], [ 3.]]) >>> type(x3) >>> isinstance(x3, matrix) True

86 Python in SC

4.1 Numerical Python (NumPy)

• Only 1- and 2-dimensional arrays can be matrix • For matrix objects, the * operator means matrix-matrix or matrix-vector multiplication (not elementwise multiplication): >>> A = eye(3) >>> A = mat(A) >>> A matrix([[ 1., 0., [ 0., 1., [ 0., 0., >>> y2 = x2*A >>> y2 matrix([[ 1., 2., >>> y3 = A*x3 >>> y3 matrix([[ 1.], [ 2.], [ 3.]])

# identity matrix # turn array to matrix 0.], 0.], 1.]]) # vector-matrix product 3.]]) # matrix-vector product

87 Python in SC

4.2

4.2 NumPy: Vectorisation

NumPy: Vectorisation

• Loops over an array run slowly • Vectorization = replace explicit loops by functions calls such that the whole loop is implemented in C (or Fortran) • Explicit loops:

r = zeros(x.shape, x.dtype) for i in xrange(x.size): r[i] = sin(x[i])

• Vectorised version:

r = sin(x)

88 Python in SC

4.2 NumPy: Vectorisation

• Arithmetic expressions work for both scalars and arrays • Many fundamental functions work for scalars and arrays • Ex:

x**2 + abs(x) works for x scalar or array

A mathematical function written for scalar arguments can (normally) take a array arguments:

>>> def f(x): ... return x**2 + sinh(x)*exp(-x) + 1 ... >>> # scalar argument: >>> x = 2 >>> f(x) 5.4908421805556333 >>> # array argument: >>> y = array([2, -1, 0, 1.5])

89 Python in SC >>> f(y) array([ 5.49084218, -1.19452805, 1. 3.72510647])

4.2 NumPy: Vectorisation

,

4.2.1

Vectorisation of functions with if tests; problem

• Consider a function with an if test:

def somefunc(x): if x < 0: return 0 else: return sin(x) # or def somefunc(x): return 0 if x < 0 else sin(x)

90 Python in SC

4.2 NumPy: Vectorisation

• This function works with a scalar x but not an array • Problem: x>> x = linspace(-1, 1, 3); print x [-1. 0. 1.] >>> y = x < 0 >>> y array([ True, False, False], dtype=bool) >>> ’ok’ if y else ’not ok’ # test of y in scalar boolean context ... ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

91 Python in SC 4.2.2

4.2 NumPy: Vectorisation

Vectorisation of functions with if tests; solutions

A. Simplest remedy: call NumPy’s vectorize function to allow array arguments to a function:

>>> somefuncv = vectorize(somefunc, otypes=’d’) >>> # test: >>> x = linspace(-1, 1, 3); print x [-1. 0. 1.] >>> somefuncv(x) # ? array([ 0. , 0. , 0.84147098])

Note: The data type must be specified as a character • The speed of somefuncv is unfortunately quite slow

92 Python in SC B. A better solution, using where:

def somefunc_NumPy2(x): x1 = zeros(x.size, float) x2 = sin(x) return where(x < 0, x1, x2)

4.2 NumPy: Vectorisation

93 Python in SC 4.2.3

4.2 NumPy: Vectorisation

General vectorization of if-else tests

def f(x): if condition: x = else: x = return x

# scalar x

def f_vectorized(x): # scalar or array x x1 = x2 = return where(condition, x1, x2)

94 Python in SC 4.2.4

4.2 NumPy: Vectorisation

Vectorization via slicing

• Consider a recursion scheme (which arises from a one-dimensional diffusion equation) • Straightforward (slow) Python implementation:

n = size(u)-1 for i in xrange(1,n,1): u_new[i] = beta*u[i-1] + (1-2*beta)*u[i] + beta*u[i+1]

• Slices enable us to vectorize the expression:

u[1:n] = beta*u[0:n-1] + (1-2*beta)*u[1:n] + beta*u[2:n+1]

95 Python in SC

4.3

4.3 NumPy: Random numbers

NumPy: Random numbers

• Drawing scalar random numbers: import random random.seed(2198) # control the seed print ’uniform random number on (0,1):’, random.random() print ’uniform random number on (-1,1):’, random.uniform(-1,1) print ’Normal(0,1) random number:’, random.gauss(0,1)

• Vectorized drawing of random numbers (arrays): from numpy import random random.seed(12) u = random.random(n) u = random.uniform(-1, 1, n) u = random.normal(m, s, n)

# # # #

set seed n uniform numbers on (0,1) n uniform numbers on (-1,1) n numbers from N(m,s)

96 Python in SC

4.3 NumPy: Random numbers

• Note that both modules have the name random! A remedy: import random as random_number # rename random for scalars from numpy import * # random is now numpy.random

97 Python in SC

4.4

4.4 NumPy: Basic linear algebra

NumPy: Basic linear algebra

NumPy contains the linalg module for • solving linear systems • computing the determinant of a matrix • computing the inverse of a matrix • computing eigenvalues and eigenvectors of a matrix • solving least-squares problems • computing the singular value decomposition of a matrix • computing the Cholesky decomposition of a matrix

98 Python in SC 4.4.1

4.4 NumPy: Basic linear algebra

A linear algebra session

from numpy import * # includes import of linalg 2 n=100 # fill matrix A and vectors x and b: 3 A=random.uniform(0.0,1.0,(n,n)); x=random.uniform(-1,1,n) 4 b = dot(A, x) # matrix-vector product 5 y = linalg.solve(A, b) # solve A*y = b 6 if allclose(x, y, atol=1.0E-12, rtol=1.0E-12): 7 print ’--correct solution’ 8 d = linalg.det(A); B = linalg.inv(A) 9 # check result: 10 R = dot(A, B) - eye(n) # residual 11 R_norm = linalg.norm(R) # Frobenius norm of matrix R 12 print ’Residual R = A*A-inverse - I:’, R_norm 13 A_eigenvalues = linalg.eigvals(A) # eigenvalues only 14 A_eigenvalues, A_eigenvectors = linalg.eig(A) 15 for e, v in zip(A_eigenvalues, A_eigenvectors): 16 print ’eigenvalue %g has corresponding vector\n%s’ % (e, v) 1

99 Python in SC

4.5

4.5 Python: Plotting modules

Python: Plotting modules

By default, Sage comes with: • Interface to Gnuplot (curve plotting, 2D scalar and vector fields) • Matplotlib (curve plotting, 2D scalar and vector fields) • 3D: Tachyon (ray-tracing) Jmol (interactive plotting) Available Python interfaces to: • Interface to Vtk (2D/3D scalar and vector fields)

• Interface to Grace

• Interface to OpenDX (2D/3D scalar and vector fields)

• Interface to R

• Interface to IDL

• PyX (PostScript/TEX-like drawing)

• Interface to Matlab

• Interface to Blender

100 Python in SC

4.5 Python: Plotting modules

from numpy import * n = 100 ; x = linspace(0.0, 1.0, n); y = linspace(0.0, 1.0, n) a=-2; b=3; c=7 z_line = a*x +b*y + c rscal=0.05 xx = x + random.normal(0, rscal, n) yy = y + random.normal(0, rscal, n) zz = z_line + random.normal(0, rscal, n) A = array([xx, yy, ones(n)]) A = A.transpose() result = linalg.lstsq(A, zz) aa, bb, cc = result[0] p0 = (x[0], y[0], a*x[0]+b*y[0]+c) p1 = (x[n-1],y[n-1],a*x[n-1]+b*y[n-1]+c)

101 Python in SC

4.5 Python: Plotting modules

pp=[(xx[i],yy[i],zz[i]) for i in range(len(x))] p=[(x[i],y[i],z_line[i]) for i in range(len(x))] pp0 = (xx[0], yy[0], aa*xx[0]+bb*yy[0]+cc); pp1 = (xx[n -1],yy[n-1],aa*xx[n-1]+bb*yy[n-1]+cc) G=line3d([p0,p1],color=’blue’) G=G+list_plot(pp,color=’red’,opacity=0.2) G=G+line3d([pp0, pp1], color=’red’) G=G+text3d(’Blue - original line: ’+’%.4f*x+%.4f*y+%.4f’ %(a,b,c), (p[0][0], p[0][1], p[0][2]), color=’blue’) G=G+text3d(’Red - fitted line: ’+’%.4f*x+%.4f*y+%.4f’ % (aa,bb,cc), (p[n-1][0], p[n-1][1], p[n-1][2]), color=’ red’) show(G)

102 Python in SC

4.5 Python: Plotting modules

103 Python in SC

4.6 4.6.1

4.6 I/O

I/O File I/O with arrays; plain ASCII format

• Plain text output to file (just dump repr(array)):

a = linspace(1, 21, 21); a.shape = (2,10) # In case of Sage Notebook, use the variable DATA, which holds the current working directory name for current worksheet file = open(DATA+’tmp.dat’, ’w’) file.write(’Here is an array a:\n’) file.write(repr(a)) # dump string representation of a file.close()

(If you need the objects in a different worksheet, use the directory name that was stored in variable DATA of the original worksheet...)

104 Python in SC

4.6 I/O

• Plain text input (just take eval on input line):

file = open(DATA+’tmp.dat’, ’r’) file.readline() # load the first line (a comment) b = eval(file.read()) file.close()

105 Python in SC 4.6.2

4.6 I/O

File I/O with arrays; binary pickling

• Dump (serialized) arrays with cPickle:

# a1 and a2 are two arrays import cPickle file = open(DATA+’tmp.dat’, ’wb’) file.write(’This is the array a1:\n’) cPickle.dump(a1, file) file.write(’Here is another array a2:\n’) cPickle.dump(a2, file) file.close()

106 Python in SC

4.6 I/O

Read in the arrays again (in correct order):

file = open(DATA+’tmp.dat’, ’rb’) file.readline() # swallow the initial comment line b1 = cPickle.load(file) file.readline() # swallow next comment line b2 = cPickle.load(file) file.close()

In Sage: Almost all Sage objects x can be saved in compressed form to disk using save(x,filename) (or in many cases x.save(filename))

A=matrix(RR,10,range(100)) save(A,DATA+’A’)

B=load(DATA+’A’)

107 Python in SC

4.7 4.7.1

4.7 SciPy

SciPy Overview

• SciPy is a comprehensive package (by Eric Jones, Travis Oliphant, Pearu Peterson) for scientific computing with Python • Much overlap with ScientificPython • SciPy interfaces many classical Fortran packages from Netlib (QUADPACK, ODEPACK, MINPACK, ...) • Functionality: special functions, linear algebra, numerical integration, ODEs, random variables and statistics, optimization, root finding, interpolation, ... • May require some installation efforts (applies ATLAS) Included in Sage by default; See SciPy homepage (http://www.scipy.org)

108 Systems of LE

5

5.1

5.1 Systems of Linear Equations

Solving Systems of Linear Equations

Systems of Linear Equations

System of linear equations: a11 x1 + a12 x2 + ... + a1n xn = b1 a21 x1 + a22 x2 + ... + a2n xn = b2 . . . . . . . . . . . .

.

.

am1 x1 + am2 x2 + ... + amn xn = bm

109 Systems of LE

5.1 Systems of Linear Equations

Matrix form: A- given matrix, vector b - given; vector of unknowns x      

a11 a21 .. . am1

a12 · · · a1n a22 · · · a2n .. . . . . .. . am2 · · · amn

     

x1 x2 .. . xn





    =    

b1 b2 .. .

    or Ax = b  

(8)

bm

Suppose, • m > n – overdetermined system; aDoes system thiswith system more have equations a solution? than unknowns has usually no solution • m < n – underdetermined system; How a system many with solutions fewer equations does it have? than unknowns has usually infinitely many solutions • m = n – awhat system about with thethe solution same number in this case? of equations and unknowns has usually a single unique solution ←− we will deal only with this case now

110 Systems of LE

5.2

5.2 Classification

Classification

Two main types of systems of linear equations: systems with • full matrix – most of the values are nonzero – storage how to store in a 2D it? array • sparse matrix – most of the matrix values are zero – How storing to in store a full such matrix matrices? system would be waste of memory ∗ different sparse matrix storage schemes Quite different strategies for solution of systems with full or sparse matrices

111 Systems of LE 5.2.1

5.2 Classification

Problem Transformation

• Common strategy is to modify the problem (8) such that – the solution remains the same – modified problem – more easy to solve What kind of transformations do not change the solution? • Possible to multiply the both sides of the equation (8) with an arbitrary nonsingular matrix M without the change in solution. – To check it, notice that the solution of MAz = Mb is: z = (MA)−1 Mb = A−1 M −1 Mb = A−1 b = x.

112 Systems of LE

5.2 Classification

– For example, M = D – diagonal matrix, – or M = P permutation matrix NB! Although, theoretically the multiplication of (8) with nonsingular matrix M does not change the solution, we will see later that it may change the numerical process of the solution and the exactness of the solution...

The next question we ask: what type of systems are easy to solve?

113 Systems of LE

5.3

5.3 Triangular linear systems

Triangular linear systems

If the system matrix A has a row i with a nonzero only on the diagonal, it is easy to calculate xi = bi /aii ; if now there is row j where except the diagnal a j j 6= 0 the only nonzero is at position a ji , we find that x j = (b j − a ji xi )/a j j and again, if there exist a row k such that akk 6= 0 and akl = 0 if l 6= {i, j}, we can have xk = (bk − aki xi − ak j x j )/akk etc. • Such systems – easy to solve • called triangular systems. With rearranging rows and unknowns (columns) it is possible to transform the system to Lower Triangular form L or Upper Triangular form U:    L=  

`11 0 `21 `22 .. .. . . `n1 `n2

··· ··· .. .

0 0 .. .

· · · `nn

   ,  

   U =  

u11 u12 0 u22 .. .. . . 0 0

 · · · u1n  · · · u2n  ..  .. . .   · · · unn

114 Systems of LE

   L=  

`11 0 `21 `22 .. .. . . `n1 `n2

5.3 Triangular linear systems

··· ··· .. .

0 0 .. .

· · · `nn

Solving system Lx = b is called Forward Substitution : x1 = b1 /`11 xi = bi − ∑i−1 ` x j=1 i j j /`ii

   ,  

   U =  

u11 u12 0 u22 .. .. . . 0 0

 · · · u1n  · · · u2n  ..  .. . .   · · · unn

Solving system Ux = b is called Back Substitution : xn = bn /unn xi = bi − ∑nj=i+1 ui j x j /uii

But how to transform an arbitrary matrix to a triangular form?

115 Systems of LE

5.4

5.4 Elementary Elimination Matrices

Elementary Elimination Matrices

for a1 6= 0:

"

1 0 −a2 /a1 1

#"

a1 a2

#

" =

a1 0

# .

In general case, if a = [a1 , a2 , ..., an ] and ak 6= 0: 

1 ··· 0  .. . . ..  . . .   0 ··· 1  Mk a =   0 · · · −mk+1  ..  .. . . . .  . 0 · · · −mn where mi = ai /ak , i = k + 1, ..., n.

0 ··· .. . . . . 0 ··· 1 ··· .. . . . . 0 ···

 0  ..   .    0     0   ..   .  1

 a1 ..  .   ak   = ak+1   ..  .  an



 a1  ..   .     a   k   ,  0     ..   .  0

116 Systems of LE

5.4 Elementary Elimination Matrices

The divider ak is called pivot (juhtelement) Matrix Mk is called also elementary elimination matrix or Gauss transformation 1. Mk is nonsingular ⇐= Why? being lower triangular and unit diagonal 2. Mk = I − meTk , where m = [0, ..., 0, mk+1 , ..., mn ]T and ek is column k of unit matrix 3. Lk =(de f ) Mk−1 = I + meTk . 4. If M j , j > k is some other elementary elimination matrix with multiplication vector t, then Mk M j = I − meTk − teTj + meTk teTj = I − meTk − teTj , due to eTk t = 0.

117 Systems of LE

5.5

5.5 Gauss Elimination and LU Factorisation

Gauss Elimination and LU Factorisation

• apply series of Gauss elimination matrices from the left: M1 , M2 ,...,Mn−1 , taking M = Mn−1 · · · M1 – we get the linear system: MAx = Mn−1 · · · M1 Ax = Mn−1 · · · M1 b = Mb

– upper triangular =⇒ ∗ easy to solve. The process is called Gauss Elimination Method (GEM)

118 Systems of LE

5.5 Gauss Elimination and LU Factorisation

• Denoting U = MA and L = M −1 , we get that −1 L = M −1 = (Mn−1 · · · M1 )−1 = M1−1 · · · Mn−1 = L1 · · · Ln−1

is lower unit triangular (ones on the diagonal) • =⇒ A = LU. Expressed in an algorithm:

119 Systems of LE

5.5 Gauss Elimination and LU Factorisation

Algoritm 5.1. LU-factorisation using Gauss elimination method (GEM)

do k=1,...,n-1 # cycle over matrix columns if akk ==0 then stop # stop in case pivot == 0 do i=k+1,n mik = aik /akk # coefficient calculation in column k enddo do i=k+1,n do j=k+1,n # applying transformations to ai j = ai j − mik ak j # the rest of the matrix enddo enddo enddo

NB! In practical implementation: For storing mik use corresponding elements in A (will be zeroes anyway)

120 Systems of LE

5.6

5.6 Number of operations in GEM

Number of operations in GEM

Finding operation counts for Alg. 5.1: • Replace loops with corresponding sums over number of particular operations: n−1

∑

i=1

n−1

=

n

∑

j=i+1

n

1+

!

n

∑ ∑

2

j=i+1 k=i+1

2

∑ ((n − i) + 2(n − i)2) = 3 n3 + O(n2)

i=1

k k+1 /(k + 1) + O(mk ) (which is enough for finding the num• used that ∑m i=1 i = m ber of operations with the highest order)

• Number How many of operations operations there for forward are for solving and backward a triangular substitution system?for L and U is 2 O(n )

121 Systems of LE

5.7 GEM with row permutations

• =⇒ the whole system solution Ax = b takes 23 n3 + O(n2 ) operations

5.7

GEM with row permutations

• If pivot == 0 GEM won’t work • Row permutations or partial pivoting may help • For numerical stability, also the pivot must not be small Example 5.1 • Consider matrix

" A=

0 1 1 0

#

– non-singular – but LU-factorisation impossible without row permutations

122 Systems of LE

5.7 GEM with row permutations

• But on contrary, the matrix " A=

1 1 1 1

#

– has the LU-factorisation " # " #" # 1 1 1 0 1 1 A= = = LU 1 1 1 1 0 0

– But, with what A being is wrong actually with singular matrixmatrix! A?

123 Systems of LE

5.7 GEM with row permutations

Example 5.2. Small pivots • Consider

" A=

ε 1 1 1

# ,

with ε such that 0 < ε < εmach in given floating point system – (i.e. 1 + ε = 1 in floating point arithmetics) – Without row permutation we get (in floating-point arithmetics): " M=

1 0 −1/ε 1

#

" =⇒ L =

• But then

" LU =

1 0 1/ε 1

1 0 1/ε 1

#"

#

" ,U =

ε 1 0 −1/ε

ε 1 0 1 − 1/ε

#

" =

ε 1 1 0

#

" =

# 6= A

ε 1 0 −1/ε

#

124 Systems of LE

5.7 GEM with row permutations

Using row permutation • the pivot is 1; • multiplier −ε =⇒ " M=

1 0 −ε 1

#

" =⇒ L =

1 0 ε 1

#

1 1 0 1

#

" ,U =

1 1 0 1−ε

#

" =

in floating point arithmetics • =⇒

" LU =

1 0 ε 1

#"

" =

1 1 ε 1

# ← OK!

1 1 0 1

#

125 Systems of LE

5.7 GEM with row permutations

Algoritm 5.2. LU-factorisation with GEM using row permutations

do k=1,...,n-1 # cycle over matrix columns Find index p such that: # looking for the best pivot |a pk | ≥ |aik |, k ≤ i ≤ n # in given column if p 6= k then interchange rows k and p if akk = 0 then continue with next k # skip such column do i=k+1,n mik = aik /akk # multiplier calculation in column k enddo do i=k+1,n do j=k+1,n # transformation application ai j = ai j − mik ak j # to the rest of the matrix enddo enddo enddo

126 Systems of LE

5.7 GEM with row permutations

As a result, MA = U, where U upper-triangular, OK but actually so far? M = Mn−1 Pn−1 · · · M1 P1

• M −1 is still notlower-triangular? lower-triangular any more, although it is still denoted by L – but we have is it triangular? still triangular L – knowing the permutations P = Pn−1 · · · P1 in advance would give PA = LU, where L indeed lower triangular matrix

127 Systems of LE

5.7 GEM with row permutations

• But, do instead we really of row need exchanges to actually weperform can perform the rowjust exchanges appropriate – explicitly? mapping of matrix (and vector) indexes – We start with unit index mapping p = [1, 2, 3, 4, ..., n] – If rows i and j need to be exchanged, we exchange corresponding values p[i] and p[ j] – In the algorithm, take everywhere a p[i] j instead of ai j (and other arrays correspondingly) To solve the system Ax = b (8) How does the whole algorithm look like now? • Solve the lower triangular system Ly = Pb with forward substitution • Solve the upper triangular system Ux = y backward substitution The term partial pivoting comes from the fact that we are seeking for the best pivot only in the current column (starting from the diagonal and below) of the matrix

128 Systems of LE

5.7 GEM with row permutations

• Complete pivoting – the best pivot is chosen from the whole remaining part of the matrix – This means exchanging both rows and columns of the matrix PAQ = LU, where P and Q are permutation matrices. – The system is solved in three stages: Ly = Pb; Uz = y and x = Qz Although numerical stability is better in case of complete pivoting it is rarely used, because • more costly • usually not needed

129 Systems of LE

5.8

5.8 Reliability of the LU-factorisation with partial pivoting

Reliability of the LU-factorisation with partial pivoting

Introduce the vector norm: kxk∞ = maxi |xi |, x ∈ Rn and corresponding matrix norm: kAk∞ = supx∈Rn

kAxk∞ , x ∈ Rn . kxk∞

Looking at the rounding errors in GEM, it can be shown that actually we find L, and U which satisfy the relation: PA = LU − E, where error E can be estimated: kEk∞ ≤ nεkLk∞ kUk∞

(9)

130 Systems of LE

5.8 Reliability of the LU-factorisation with partial pivoting

where ε is machine epsilon In practice we replace the system PAx = Pb with the system LU = Pb. =⇒the system we are solving is actually (PA + E)˜x = Pb for finding the approximate solution x˜ . • How far is it from the real solution? From the matrix perturbation theory: Let us solve the system Ax = b

(10)

where A ∈ Rn×n and b ∈ Rn are given and x ∈ Rn is unknown. Suppose, A is given with an error A˜ = A + δ A. Perturbed solution satisfies the system of linear equations (A + δ A)(x + δ x) = b.

(11)

131 Systems of LE

5.8 Reliability of the LU-factorisation with partial pivoting

Theorem 5.1. Let A be nonsingular and δ A be sufficiently small, such that 1 kδ Ak∞ kA−1 k∞ ≤ . 2

(12)

Then (A + δ A) is nonsingular and kδ xk∞ kδ Ak∞ ≤ 2κ(A) , kxk∞ kAk∞

(13)

where κ(A) = kAk∞ kA−1 k∞ is the condition number. Proof of the theorem (http://www.ut.ee/~eero/SC/konspekt/ perturb-toest/perturb-proof.pdf) (EST) (http://www.ut.ee/~eero/SC/konspekt/perturb-toest/ perturb-toest.pdf)

132 Systems of LE

5.8 Reliability of the LU-factorisation with partial pivoting

Remarks 1. The result is true for an arbitrary matrix norm derived from the corresponding vector norm 2. Theorem 5.1 says, that in case of small condition number a small relative error in matrix A can cause small only aor small large? error in the solution x 3. If condition number is big, what everything can happen? can happen 4. It is not simple to calculate condition number but still, it can be estimated Combining the result (9) with Theorem 5.1, we see that: GEM forward error can be estimated as follows: kδ xk∞ ≤ 2nεκ(A)G, kxk∞ where the coefficient G =

h

kLk∞ kUk∞ kAk∞

i

is called growth factor

133 Systems of LE

5.8 Reliability of the LU-factorisation with partial pivoting

Conclusions • If G is not large, then well-conditioned matrix gives fairly good answer (i.e. O(nε)). • In case of partial pivoting, the elements of matrix L are ≤ 1. – Nevertherless, ∃ examples, where with well-conditioned matrix A the elements of U exponentially large in comparison with the elements in A. – But these examples are more like “academic” – in practice rarely one finds such (and the method can be used without any fair)

134 BLAS

6

6.1 Motivation

BLAS (Basic Linear Algebra Subroutines)

6.1

Motivation

How to optimise programs that use a lot of linear algebra operations? • Efficiency depends on

• but also on:

– processor speed – number of operations

arithmetic

• Hierarchical memory structure:

– the speed references

of

memory

135 BLAS

6.1 Motivation fast, small, expensive

Registers cache RAM HardDisk

slow, large, cheap

Tape, CS, DVD, Network storage

• Where Useful are arithmetic arithmetic operations operations onlyperformed on the topon ofthe thepicture? hierarchy • before operations: data whatneeds is datatomovement be moved up; direction after operations before/after thearithmetic data is moved op.? down the hierarchy • information movement faster on top or bottom? • What As a rule: is faster, speedspeed of arithmetic of arithmetic operations operations fasterorthan datadata movement? movement

136 BLAS

6.1 Motivation

Consider an arbitrary algorithm. Denote: • f – flops (# arithmetic operations: +, - ×, /) • m – # memory references Introduce

q = f /m.

Why this number is important? To prefer it to be small or large value? • t f – time spent on 1 flop

• tm time for memory access,

Then calculation time is: 1 tm f ∗ t f + m ∗ tm = f ∗ t f (1 + ( )) q tf In general, tm t f and therefore, total time is reflecting processor speed only if q is large or small?

137 BLAS

6.1 Motivation

Example. Gauss elimination method – for each i – key operations: A(i + 1 : n, i) = A(i + 1 : n, i)/A(i, i), A(i + 1 : n, i + 1 : n) = A(i + 1 : n, i + 1 : n) − A(i + 1 : n, i) ∗ A(i, i + 1 : n).

(14) (15)

• Operation (14) represents following general operation: y = ax + y, x, y ∈ Rn , a ∈ R

(16)

– Operation (16) called saxpy ((single-precision) a times x plus y) • (15) represents: A = A − vwT , A ∈ Rn×n , v, w ∈ Rn

(17)

– (17) – matrix A rank-1 update, (Matrix vwT has rank 1,, but because Why?each row of it is a multiple of vector w)

138 BLAS

6.1 Motivation

Operation (16) analysis: • m = 3n + 1 memory references: – 2n + 1 reads ∗ vectors x, y ∗ scalar a

Operation (17) analysis: • m = 2n2 + 2n memory references: – n2 + 2n reads – n2 writes • Computations – f = 2n2 flops

– n writes ∗ new y

• =⇒ q = 1 + O(1/n) ≈ 1 with large n

• Computations take f = 2n flops • =⇒ q = 2/3 + O(1/n) ≈ 2/3 for large n (16) – 1st order operation (O(n) flops); (17) – 2nd order operation (O(n2 ) flops). Note that coefficient q is O(1) in both cases

139 BLAS

6.1 Motivation

Faster results in case of 3rd order operations (O(n3 ) operations with O(n2 ) memory references). For example, matrix multiplication: C = AB +C, A, B, C ∈ Rn×n .

(18)

Here m = 4n2 and f = n2 (2n − 1) + n2 = 2n3 (check it!) =⇒ q = n/2 → ∞ if n → ∞. This operation can give processor work near peak performance, with good algorithm scheduling!

140 BLAS

6.2

6.2 BLAS implementations

BLAS implementations

• BLAS – standard library for simple 1st, 2nd and 3rd order operations – BLAS – freeware, available for example from netlib (http://www. netlib.org/blas/) – Processor vendors often supply their own implementation – BLAS ATLAS implementation ATLAS (http://math-atlas. sourceforge.net/) – self-optimising code Example of using BLAS (fortran90): • LU factorisation using BLAS3 operations (http://www.ut.ee/~eero/ SC/konspekt/Naited/lu1blas3.f90.html) • main program for testing different BLAS levels (http://www.ut.ee/ ~eero/SC/konspekt/Naited/testblas3.f90.html)

141 DE

7

7.1 Ordinary Differential Equations (ODE)

Numerical Solution of Differential Equations

Differential equation – mathematical equation for an unknown function of one or several variables that relates the values of the function itself and its derivatives of various orders Example: the velocity of a ball falling through the air, considering only gravity and air resistance. Order of Differential Equation – the highest derivative of the dependent variable with respect to the independent variable

7.1

Ordinary Differential Equations (ODE)

Ordinary Differential Equation (ODE) is a differential equation in which the unknown function (also known as the dependent variable) is a function of a single independent variable

142 DE

7.1 Ordinary Differential Equations (ODE)

Initial value problem y0 (t) = f (t, y(t)), y(t0 ) = y0 , where function f : [to , ∞) × Rd → Rd , y0 ∈ Rd - initial condition • (Boundary value problem - giving the solution at more than one point (on boundaries)) We consider here only first-order ODEs • Higher ODE can be converted into system of first order ODE – Example: y00 = −y can be rewritten as two first-order equations: y0 = z and z0 = −y. 7.1.1

Numerical methods for solving ODEs

143 DE

7.1 Ordinary Differential Equations (ODE)

Euler method (or forward Euler method) • finite difference approximation y0 (t) ≈

y(t + h) − y(t) h

⇒ y(t + h) ≈ y(t) + hy0 (t) ⇒ y(t + h) ≈ y(t) + h f (t, y(t)) Start with t0 , t1 = t0 + h, t2 = t0 + 2h, etc. yn+1 = yn + h f (tn , yn ).

• Explicit method – the new value (yn+1 ) depends on values already known (yn )

144 DE

7.1 Ordinary Differential Equations (ODE)

Backward Euler method • Different finite difference version:

y0 (t) ≈

y(t) − y(t − h) h

⇒ yn+1 = yn + h f (tn+1 , yn+1 ) • Implicit method – need to solve an equation to find yn+1 !

145 DE

7.1 Ordinary Differential Equations (ODE)

Comparison of the methods • Implicit methods computationally more complex • Explicit methods can be unstable – in case of stiff equations stiff equation – differential equation for which certain numerical methods for solving the equation are numerically unstable, unless the step size is taken to be extremely small

146 DE

7.1 Ordinary Differential Equations (ODE)

Examples 1 Initial value problem y0 (t) = −15y(t), t ≥ 0, y(0) = 1 Exact solution: y(t) = e−15t ⇒ y(t) → 0 as t → ∞ Explicit schemes with h = 1/4, h = 1/8 Adams-Moulton scheme (Trapezoidal method) yn+1 = yn + 21 h( f (tn , un ) + f (tn+1 , yn+1 ))

147 DE

7.1 Ordinary Differential Equations (ODE)

Example 2 Partial differential quation (see below): Wave equation (in 1D and) 2D Wave equation ∂ 2u ∂ 2u ∂ 2u − + + = f (x, y,t), ∂ x2 ∂ y2 ∂t 2

where • u(x, y,t)– height of a surface (e.g. water level) in point (x, y) at time t • f (x, y,t)– external force applied to the surface at time t (For simplicity here f (x, y,t) = 0) • solving on the domain (x, y) ∈ Ω = [0, 1] × [0, 1] at time t ∈ [0, T ] • Dirichlet boundary conditions u(x, y,t) = 0, (x, y) ∈ ∂ Ω and the values of derivatives ∂∂tu |t=0 for (x, y) ∈ Ω

148 DE

7.1 Ordinary Differential Equations (ODE)

Some examples on comparison of explicit vs implicit schemes (http://courses. cs.ut.ee/2009/sc/Main/FDMschemes) 1D wave equation failing with larger h value (http://www.ut.ee/~eero/ SC/1DWaveEqExpl-failure.avi)

149 PDE

7.2

7.2 Partial Differential Equations (PDE)

Partial Differential Equations (PDE)

PDE overview Examples of PDE-s: • Laplace’s equation – important in many fields of science, ∗ electromagnetism ∗ astronomy ∗ fluid dynamics – behaviour of electric, gravitational, and fluid potentials – The general theory of solutions to Laplace’s equation – potential theory – In the study of heat conduction, the Laplace equation – the steady-state heat equation

150 PDE

7.2 Partial Differential Equations (PDE)

• Maxwell’s equations – electrical and magnetical fields’ relationships – set of four partial differential equations – describe the properties of the electric and magnetic fields and relate them to their sources, charge density and current density • Navier-Stokes equations – fluid dynamics (dependencies between pressure, speed of fluid particles and fluid viscosity) • Equations of linear elasticity – vibrations in elastic materials with given properties and in case of compression and stretching out • Schrödinger equations – quantum mechanics – how the quantum state of a physical system changes in time. It is as central to quantum mechanics as Newton’s laws are to classical mechanics • Einstein field equations – set of ten equations in Einstein’s theory of general relativity – describe the fundamental interaction of gravitation as a result of spacetime being curved by matter and energy.

151 PDE

7.3

7.3 2nd order PDEs

2nd order PDEs

We consider now only a single equation case In many practical cases, 2nd order PDE-s occur, for example: • Heat equation: ut = uxx • Wave equation: utt = uxx • Laplace’s equation: uxx + uyy = 0. General second order PDE has the form: (canonical form) auxx + buxy + cuyy + dux + euy + f u + g = 0. Assuming not all of a, b and c zero, then depending on discriminant b2 − 4ac: b2 − 4ac > 0: hyperbolic equation, typical representative – wave equation; b2 − 4ac = 0: parabolic equation, typical representative – heat equation b2 − 4ac < 0: elliptical equation, typical representative – Poisson equation

152 PDE

7.3 2nd order PDEs

• In case of changing coefficient in time, equations can change their type • In case of equation systems, each equation can be of different type • Of course, problem can be non-linear or higher order as well In general, • Hyperbolic PDE-s describe time-dependent conservative physical processes like wave propagation • Parabolic PDE-s describe time-dependent dissipative (or scattering) physical processes like diffusion, which move towards some fixed-point • Elliptic PDE-s describe systems that have reached a fixed-point and are therefore independent of time

153 PDE

7.4 7.4.1

7.4 Time-independent PDE-s

Time-independent PDE-s Finite Difference Method (FDM)

• Discrete mesh in solving region • Derivatives replaced with approximation by finite differences Example. Conside Poisson equation in 2D: −uxx − uyy = f ,

0 ≤ x ≤ 1, 0 ≤ y ≤ 1,

• boundary values as on the figure on the left:

(19)

154 PDE y

7.4 Time-independent PDE-s y

1

0

0

0

0

1

0

x

x

• Define discrete nodes as on the figure on right • Inner nodes, where computations are carried out are defined with (xi , y j ) = (ih, jh),

i, j = 1, ..., n

– (in our case n = 2 and h = 1/(n + 1) = 1/3)

155 PDE

7.4 Time-independent PDE-s

Consider here that f = 0. Replacing 2nd order derivatives with standard 2nd order differences in mid-points, we get ui+1, j − 2ui, j + ui−1, j ui, j+1 − 2ui, j + ui, j−1 + = 0, i, j = 1, ..., n, h2 h2 where ui, j is approximation of the real solution u = u(xi , y j ) in point (xi , y j ), and includes one boundary value, if i or j is 0 or n + 1. As a result we get: 4u1,1 − u0,1 − u2,1 − u1,0 − u1,2 = 0 4u2,1 − u1,1 − u3,1 − u2,0 − u2,2 = 0 4u1,2 − u0,2 − u2,2 − u1,2 − u1,3 = 0 4u2,2 − u1,2 − u3,2 − u2,2 − u2,3 = 0. In matrix form:

156 PDE    Ax =  

 4 −1 −1 0 u1,1   −1 4 0 −1   u2,1  −1 0 4 −1   u1,2 0 −1 −1 4 u2,2





    =  

7.4 Time-independent PDE-s    u0,1 + u1,0 0   u3,1 + u2,0   0    =   = b.   1  u0,2 + u1,3 1 u3,2 + u2,3

This positively defined system can be solved directly with Cholesky factorisation (Gauss elimination for symmetric matrix, where factorisation A = LT L is found) or iteratively. Exact solution of the problem is:    x= 

u1,1 u2,1 u1,2 u2,2





    =  

0.125 0.125 0.375 0.375

   . 

157 PDE

7.4 Time-independent PDE-s

In general case n2 × n2 Laplace’s matrix has form: 

··· 0   −I B −I . . . ...   A =  0 −I B . . . 0   .. . . . . . . . . . −I  . 0 · · · 0 −I B where n × n matrix B is of form: 

B

−I

··· 0   −1 4 −1 . . . ...   B =  0 −1 4 . . . 0   .. . . . . . . . . . −1  . 0 · · · 0 −1 4 4

−1

0

0

     ,   

(20)

     .   

It means that most of the elements of matrix A are zero How – it isare a sparse such matrices matrix called?

158 PDE 7.4.2

7.4 Time-independent PDE-s

Finite element Method (FEM)

Consider as an example Poisson equation

−∆u(x) = f (x), ∀x ∈ Ω, u(x) = g(x), ∀x ∈ Γ, where Laplacian ∆ is defined by ∂ 2u ∂ 2u (∆u)(x) = ( 2 + 2 )(x), x = ∂x ∂y

x y

!

In Finite element Method the region is divided into finite elements.

159 PDE A region divided into Finite Elements:

7.4 Time-independent PDE-s

160 PDE

7.4 Time-independent PDE-s y

Consider unit square. The problem in Variational Formulation: Find uh ∈ Vh such that a(uh , v) = ( f , v), ∀v ∈ Vh where in case of Poisson equation Z

a(u, v) = Ω

∇u · ∇v dx

x

(21)

161 PDE and (u, v) =

7.4 Time-independent PDE-s R

Ω u(x)v(x) dx.

The gradient ∇ of a scalar function f (x, y) is defined by: ∇f =

∂f ∂f , ∂x ∂y

• In FEM the equation (21) needs to be satisfied on a set of testfunctions ϕi = ϕi (x), – which are defined such that ( ϕi =

1, x = xi 0 x = x j j 6= i

• and it is demanded that (21) is satisfied with each ϕi (i = 1, ..., N) . • As a result, a matrix of the linear equations is obtained

162 PDE

7.5 Sparse matrix storage schemes

• The matrix is identical with the matrix from (20) ! • Benefits of FEM over finite difference schemes – more flexible in choosing discretization – existence of thorough mathematical constructs for proof of convergence and error estimates

7.5

Sparse matrix storage schemes

• As we saw, different discretisation schemes give systems with similar matrix structures • (In addition to FDM and FEM often also some other discretisation schemes are used like Finite Volume Method (but we do not consider it here)) • In each case, the result is a system of linear equations with sparse matrix. How to store sparse matrices?

163 PDE 7.5.1

7.5 Sparse matrix storage schemes

Triple storage format

• n × m matrix A each nonzero with 3 values: integers i and j and (in most applications) real matrix element ai j . =⇒ three arrays:

indi(1:nz), indj(1:nz), vals(1:nz)

of length nz , – number of matrix A nonzeroes Advantages of the scheme: • Easy to refer to a particular element • Freedom to choose the order of the elements Disadvantages : • Nontrivial to find, for example, all nonzeroes of a particular row or column and their positions

164 PDE 7.5.2

7.5 Sparse matrix storage schemes

Column-major storage format

For each matrix A column k a vector row_ind(j) – giving row numbers i for which ai j 6= 0. • To store the whole matrix, each column nonzeros – added into a 1-dimensonal array row_ind(1:nz) – introduce cptr(1:M) referring to column starts of each column in row_ind.

row_ind(1:nz), cptr(1:M), vals(1:nz)

Advantages: • Easy to find matrix column nonzeroes together with their positions

165 PDE

7.5 Sparse matrix storage schemes

Disadvantages: • Algorithms become more difficult to read • Difficult to find nonzeroes in a particular row 7.5.3

Row-major storage format

For each matrix A row k a vector col_ind(i) giving column numbers j for which ai j 6= 0. • To store the whole matrix, each row nonzeros – added into a 1-dimensonal array col_ind(1:nz) – introduce rptr(1:N) referring to row starts of each row in col_ind.

col_ind(1:nz), rptr(1:N), vals(1:nz)

166 PDE

7.5 Sparse matrix storage schemes

Advantages: • Easy to find matrix row nonzeroes together with their positions Disadvantages: • Algorithms become more difficult to read. • Difficult to find nonzeroes in a particular column. 7.5.4

Combined schemes

Triple format enhanced with cols(1:nz), cptr(1:M), rows(1:nz), rptr(1:N). Here cols and rows refer to corresponding matrix A values in triple format. E.g., to access row-major type stuctures, one has to index through rows(1:nz) Advantages:

167 PDE • All operations easy to perform Disadvantages: • More memory needed. • Reference through indexing in all cases

7.5 Sparse matrix storage schemes

168 Iterative methods

8

8.1 Problem setup

Iterative methods

8.1

Problem setup

Itereative methods for solving systems of linear equations with sparse matrices Consider system of linear equations Ax = b,

(22)

where N × N matrix A • is sparse, – number of elements for which Ai j 6= 0 is O(N). • Typical example: Poisson equation discretisation on n × n mesh, (N = n × n) – in average 5, nonzeros per A row

169 Iterative methods

8.1 Problem setup

In case of direct methods, like LU-factorisation • memory consumption (together with fill-in): O(N 2 ) = O(n4 ). • flops: 2/3 · N 3 + O(N 2 ) = O(n6 ). Banded matrix LU-decomposition • memory consumption (together with fill-in): O(N · L) = O(n3 ), where L is bandwidth • flops: 2/3 · N · L2 + O(N · L) = O(n4 ).

170 Iterative methods

8.2

8.2 Jacobi Method

Jacobi Method

• Iterative method for solving (22) • With given initial approximation x(0) , approximate solution x(k) , k = 1, 2, 3, ... of (22) real solution x are calculated as follows: (k+1)

– i-th component of x(k+1) , xi is obtained by taking from (22) only the i-th row: Ai,1 x1 + · · · + Ai,i xi + · · · + Ai,N xN = bi

– solving this with respect to xi , an iterative scheme is obtained: (k+1)

xi

=

1 Ai,i

! (k)

bi − ∑ Ai, j x j j6=i

(23)

171 Iterative methods

8.2 Jacobi Method

The calculations are in essence parallel with respect to i – no dependence on (k+1) other componens x j , j 6= i. Iteration stop criteria can be taken, for example:

(k+1) (k) −x < ε

x

or

k + 1 ≥ kmax ,

– ε – given error tolerance – kmax – maximal number of iterations • memory consumption (no fill-in): – NA6=0 – number of nonzeroes of matrix A

(k)

(0)

• Number of iterations to reduce x − x < ε x − x : 2

#IT ≥

2 ln ε −1 (n + 1)2 = O(n2 ) π2

2

(24)

172 Iterative methods

8.2 Jacobi Method

• flops/iteration ≈ 10 · N = O(n2 ), =⇒ #IT ·

flops = Cn4 + O(n3 ) = O(n4 ). iteration

coefficent C in front of n4 is: C≈

2 ln ε −1 · 10 ≈ 2 · ln ε −1 2 π

• This Is thisisgood not very or bad? good at all... We need some better methods, because – For LU-decomposition (banded matrices) we had C = 2/3

173 Iterative methods

8.3

8.3 Conjugate Gradient Method (CG)

Conjugate Gradient Method (CG)

r(0)

= b − Ax(0)

x(0)

Calculate with given s t a r t i n g vector for i = 1 , 2 , . . . s o l v e Mz(i−1) = r(i−1) # we assume h e r e t h a t M = I now T ρi−1 = r(i−1) z(i−1) i f i ==1 p(1) = z(0) else βi−1 = ρi−1 /ρi−2 p(i) = z(i−1) + βi−1 p(i−1) endif T q(i) = Ap(i) ; αi = ρi−1 /p(i) q(i) x(i) = x(i−1) + αi p(i) ; r(i) = r(i−1) − αi q(i) check convergence ; continue i f needed end

174 Iterative methods

8.3 Conjugate Gradient Method (CG)

• memory consumption (no fill-in): NA6=0 + O(N) = O(n2 ), where NA6=0 – # nonzeroes ofA

• Number of iterations to achieve x(k) − x < ε x(0) − x : 2

#IT ≈

2

ln ε −1 p κ(A) = O(n) 2

• Flops/iteration ≈ 24 · N = O(n2 ) , =⇒ #IT ·

flops = Cn3 + O(n2 ) = O(n3 ), iteration

175 Iterative methods

8.3 Conjugate Gradient Method (CG)

p where C ≈ 12 ln ε −1 · κ2 (A). =⇒ C depends on condition number of A! This paves the way for preonditioning technique

176 Iterative methods

8.4

8.4 Preconditioning

Preconditioning

Idea: Replace Ax = b with system M −1 Ax = M −1 b. Apply CG to Bx = c, where B = M −1 A and c = M −1 b. But how to choose M? Preconditioner M = M T to be chosen such that (i) Problem Mz = r being easy to solve (ii) Matrix B being better conditioned than A, meaning that κ2 (B) < κ2 (A)

(25)

177 Iterative methods

8.4 Preconditioning

Then p p #IT(25) = O( κ2 (B)) < O( κ2 (A)) = #IT(22) but

flops flops flops (25) = (22) + (i) > (22) iteration iteration iteration

• =⇒ We need to make a compromise! • (In extreme cases M = I or M = A) • Preconditioned Conjugate Gradients (PCG) Method – obtained if to take in previous algorithm M 6= I

178 Iterative methods

8.5

8.5 Preconditioner examples

Preconditioner examples

Diagonal Scaling (or Jacobi method) M = diag(A) flops (i) Iteration = N (ii) κ2 (B) = κ2 (A) ⇒ no Is this large good? improvement to be expeted

179 Iterative methods

8.5 Preconditioner examples

Incomplete LU-factorisation ˜ M = L˜ U,

• L˜ and U˜ – approximations to actual factors L and U in LU-decompoition – nonzeroes in Li j and Ui j only where Ai j 6= 0 (i.e. fill-in is ignored in LUfactorisation algorithm) flops (i) Iiteration = O(N) (ii) κ2 (B) < κ2 (A) Some How good improvement is this preconditioner? at least expected! κ2 (B) = O(n2 )

180 Iterative methods

8.5 Preconditioner examples

Gauss-Seidel method

do k=1,2,... do i=1,...,n (k+1) xi

enddo enddo

1 = Ai,i

i−1

bi − ∑

n

(k+1) Ai j x j −

j=1

∑

! (k) Ai, j x j

(26)

j=i+1

Note that in real implementation, the method is done like:

do k=1,2,... do i=1,...,n xi = enddo enddo

1 Ai,i

! bi − ∑ Ai j x j j6=i

But Do you the preconditioner see a problem with is notthis symmetric, preconditioner which(with makesPCG CG method)? not to converge!

(27)

181 Iterative methods

8.5 Preconditioner examples

Symmetric Gauss-Seidel method To get the symmetric preconditioner, another step is added:

do k=1,2,... do i=1,...,n xi =

1 Ai,i

bi − ∑ j6=i Ai j x j

enddo enddo do k=1,2,... do i=n,...,1 xi = enddo enddo

1 Ai,i

bi − ∑ j6=i Ai j x j

182 Iterative methods

CONTENTS

Contents 1

2

Introduction 1.1 Syllabus . . . . . . . . . . . . . . . . . . . . . . 1.2 Literature . . . . . . . . . . . . . . . . . . . . . 1.3 Scripting vs programming . . . . . . . . . . . . 1.3.1 What is a script? . . . . . . . . . . . . . 1.3.2 Characteristics of a script . . . . . . . . . 1.3.3 Why not stick to Java, C/C++ or Fortran? 1.4 Scripts yield short code . . . . . . . . . . . . . . 1.5 Performance issues . . . . . . . . . . . . . . . . 1.5.1 Scripts can be slow . . . . . . . . . . . . 1.5.2 Scripts may be fast enough . . . . . . . . 1.5.3 When scripting is convenient . . . . . . . 1.5.4 When to use C, C++, Java, Fortran . . . . What is Scientific Computing?

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

3 3 5 7 7 8 9 10 11 11 13 14 15 17

183 Iterative methods 2.1 2.2 2.3 3

4

CONTENTS

Introduction to Scientific Computing . . . . . . . . . . . . . . . . . . Specifics of computational problems . . . . . . . . . . . . . . . . . . Mathematical model . . . . . . . . . . . . . . . . . . . . . . . . . .

Approximation in Scientific Computing 3.1 Sources of approximation error . . . . . . . . . . . . . . . . . 3.1.1 Error sources that are under our control . . . . . . . . 3.1.2 Errors created during the calculations . . . . . . . . . 3.1.3 Forward error (arvutuslik viga e. tulemuse viga) and ward error (algandmete viga) . . . . . . . . . . . . . . 3.2 Floating-Point Numbers . . . . . . . . . . . . . . . . . . . . . 3.3 Normalised floating-point numbers . . . . . . . . . . . . . . . 3.4 IEEE (Normalised) Arithmetics . . . . . . . . . . . . . . . .

17 23 25 30 30 30 32

. . . . . . . . . . . . back. . . . . . . . . . . . . . . .

35 43 45 47

Python in Scientfic Computing 4.1 Numerical Python (NumPy) . . . . . . . . . . . . . . . . . . . . . . 4.1.1 NumPy: making arrays . . . . . . . . . . . . . . . . . . . . .

51 51 54

184 Iterative methods 4.1.2 4.1.3 4.1.4 4.1.5 4.1.6 4.1.7 4.1.8 4.1.9 4.1.10 4.1.11 4.1.12 4.1.13 4.1.14 4.1.15 4.1.16 4.1.17

NumPy: making float, int, complex arrays . . . . . Array with a sequence of numbers . . . . . . . . . Warning: arange is dangerous . . . . . . . . . . . Array construction from a Python list . . . . . . . From “anything” to a NumPy array . . . . . . . . Changing array dimensions . . . . . . . . . . . . . Array initialization from a Python function . . . . Basic array indexing . . . . . . . . . . . . . . . . More advanced array indexing . . . . . . . . . . . Slices refer the array data . . . . . . . . . . . . . . Integer arrays as indices . . . . . . . . . . . . . . Loops over arrays . . . . . . . . . . . . . . . . . . Array computations . . . . . . . . . . . . . . . . . In-place array arithmetics . . . . . . . . . . . . . Standard math functions can take array arguments Other useful array operations . . . . . . . . . . . .

CONTENTS . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

55 56 57 57 59 61 62 63 64 65 67 68 71 72 75 76

185 Iterative methods

4.2

4.3 4.4 4.5 4.6

4.1.18 Temporary arrays . . . . . . . . . . . . . . . . . 4.1.19 More useful array methods and attributes . . . . 4.1.20 Complex number computing . . . . . . . . . . . 4.1.21 A root function . . . . . . . . . . . . . . . . . . 4.1.22 Array type and data type . . . . . . . . . . . . . 4.1.23 Matrix objects . . . . . . . . . . . . . . . . . . NumPy: Vectorisation . . . . . . . . . . . . . . . . . . . 4.2.1 Vectorisation of functions with if tests; problem . 4.2.2 Vectorisation of functions with if tests; solutions 4.2.3 General vectorization of if-else tests . . . . . . . 4.2.4 Vectorization via slicing . . . . . . . . . . . . . NumPy: Random numbers . . . . . . . . . . . . . . . . NumPy: Basic linear algebra . . . . . . . . . . . . . . . 4.4.1 A linear algebra session . . . . . . . . . . . . . Python: Plotting modules . . . . . . . . . . . . . . . . . I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. 77 . 78 . 80 . 82 . 83 . 85 . 87 . 89 . 91 . 93 . 94 . 95 . 97 . 98 . 99 . 103

186 Iterative methods

4.7

5

6

4.6.1 4.6.2 SciPy 4.7.1

File I/O with arrays; plain ASCII format File I/O with arrays; binary pickling . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . .

CONTENTS . . . .

. . . .

. . . .

. . . .

Solving Systems of Linear Equations 5.1 Systems of Linear Equations . . . . . . . . . . . . . . 5.2 Classification . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Problem Transformation . . . . . . . . . . . . 5.3 Triangular linear systems . . . . . . . . . . . . . . . . 5.4 Elementary Elimination Matrices . . . . . . . . . . . . 5.5 Gauss Elimination and LU Factorisation . . . . . . . . 5.6 Number of operations in GEM . . . . . . . . . . . . . 5.7 GEM with row permutations . . . . . . . . . . . . . . 5.8 Reliability of the LU-factorisation with partial pivoting BLAS (Basic Linear Algebra Subroutines)

. . . .

. . . . . . . . .

. . . .

. . . . . . . . .

. . . .

. . . . . . . . .

. . . .

. . . . . . . . .

. . . .

. . . . . . . . .

. . . .

. . . . . . . . .

. . . .

. . . . . . . . .

. . . .

103 105 107 107

. . . . . . . . .

108 108 110 111 113 115 117 120 121 129 134

187 Iterative methods 6.1 6.2 7

CONTENTS

Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 BLAS implementations . . . . . . . . . . . . . . . . . . . . . . . . . 140

Numerical Solution of Differential Equations 7.1 Ordinary Differential Equations (ODE) . . . 7.1.1 Numerical methods for solving ODEs 7.2 Partial Differential Equations (PDE) . . . . . 7.3 2nd order PDEs . . . . . . . . . . . . . . . . 7.4 Time-independent PDE-s . . . . . . . . . . . 7.4.1 Finite Difference Method (FDM) . . 7.4.2 Finite element Method (FEM) . . . . 7.5 Sparse matrix storage schemes . . . . . . . . 7.5.1 Triple storage format . . . . . . . . . 7.5.2 Column-major storage format . . . . 7.5.3 Row-major storage format . . . . . . 7.5.4 Combined schemes . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

141 141 142 149 151 153 153 158 162 163 164 165 166

188 Iterative methods 8

Iterative methods 8.1 Problem setup . . . . . . . . . . . 8.2 Jacobi Method . . . . . . . . . . . 8.3 Conjugate Gradient Method (CG) 8.4 Preconditioning . . . . . . . . . . 8.5 Preconditioner examples . . . . .

CONTENTS

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

168 168 170 173 176 178