Systems software design THREADS & SYNCHRONIZATION

Systems software design THREADS & SYNCHRONIZATION outline  Threads  Properties  Creating & joining  Usage patterns  OpenMP library  ...
Author: Derek Hodge
1 downloads 3 Views 646KB Size
Systems software design THREADS & SYNCHRONIZATION

outline 

Threads 

Properties



Creating & joining



Usage patterns



OpenMP library



Synchronization 

Objects



Monitor pattern

Thread properties 

“thread of execution” – running path of code instructions (smallest sequence of instructions that can be managed independently by a scheduler)



Each process has at least 1 thread (main thread, program loop)



Additional threads may be created on demand to perform some specific tasks (worker threads)



Libraries & system may create additional threads which are invisible to the user but make the application multithreaded

Threads vs. processes 

Processes are independent, threads exist as subsets of a process



Processes have more state configuration than threads, threads share process state, memory and resources



Processes have separate address space, threads share address space



Processes interact through systemprovided IPC mechanisms



Context switches between threads in same process is faster than between processes

Threading – what for? 

Responsiveness – perform long, blocking operations in the worker threads so that the application remains responsive to user input and doesn’t look frozen 

Non-blocking I/O may achieve the same without multiple threads but is more error-prone, harder & less natural in use



Performance – on multicore systems multiple threads allow to achieve the result faster by partitioning the work into parts executed in parallel on separate cores



Throughput – multithreaded application can better utilize the system by performing work in some threads when other are blocked waiting for I/O to complete

Multithreading - dangers 



Synchronization – multiple threads may modify the same data concurrently, leading to unexpected behavior 

Race condition – software depends on the sequence or timing of threads to operate correctly. Without proper synchronization of threads, the timing may be nondeterministic (esp. on SMP system)



Deadlock – improper use of synchronization objects may lead to a state when thread A acquired resource a and waits for thread B to release resource b, while thread B acquired resource b and waits for thread A to release resource a

Stability – faulty thread crashes the entire process

Race conditions Expected

Possible



These situations cause very difficult to diagnose bugs (Heisenbugs)



Solution is to use mutual exclusion (or other synchronization objects, eg. Monitor) or atomic operations

Deadlock



You need to maintain the same order of resources being acquired in all threads or use „safe” patterns – eg. monitor

Thread lifetime 

Apart from main thread, additional (worker) threads are created explicitly through system call



Thread may be created in suspended or running state, some systems allow also to suspend running threads (which should be avoided as may lead to deadlock)



To create a thread we need a thread function that will constitute thread’s code path



After the thread function exits, thread becomes joinable



Thread may be joined at any time, but the join will wait until thread is joinable



Thread join completes releases resources

thread

existence

and



On Windows use _beginthread()/_beginthreadex() APIs to create a thread (or CreateThread() function); WaitForSingleObject() or other wait functions to join it



On POSIX use phtread_create() & pthread_join() APIs (see man 3 pthread_create)



Even better: use C++11 std::thread class from header

Thread pools 

A general usage patter where a number of worker threads is created and tasks which need to be performed are inserted into a queue



Worker thread obtains a task from the queue, processes it and stores some result



After task is completed thread returns to the pool and awaits other tasks (or receives a new one right away)



Tasks (and their execution time) may be identical or different



The throughput of the system is increased (total time to complete all tasks)



The number of threads is related to number of CPU cores (why?)

Thread safety A piece of code is thread-safe if it manipulates shared data structures in a manner that guarantees safe execution by multiple threads at the time (no race conditions guarantee)  Solutions 





Avoid shared state 

Re-entrancy – writing code in such a way, that it doesn’t involve saving state to some global/shared variables. All non-local state is accessed through atomic operations



Thread-local storage – each thread has its own copy of variables

Synchronize access to shared state 

Mutual exclusion – access to data is serialized using synchronization objects ensuring that only one thread reads/writes data at any time



Atomic operations – use special instructions which cannot be interrupted by other threads



Immutable state – state cannot be changed after it is created

Thread local storage 

Some systems or programming languages allow to create variables, whose value may be different in each thread – each thread receives different copy of the variable



Most common example is errno variable (last error value) used by standard C library – if there was only one global errno, it would easily led to race conditions



How to use 

C++11 – thread_local keyword for use with global/static variables



MSVC - _declspec(thread) declarator for variables



GNU C - __thread declarator



POSIX – pthread_key_create()/pthread_setspecific()/pthread_key_delete()



Windows – TlsAlloc()/TlsSetValue()/TlsGetValue()/TlsFree()

Atomic operations 

Processors have instructions that are uninterruptible, i.e. their execution momentarily pauses hardware interrupts (which are used to implement thread switches) – guarantee that the instruction will complete with deterministic result on uniprocessor system



Some processors also have instructions which prevent other processors in the system to simultaneously alter the same memory location



Any primitive operation can be made atomic by enclosing in a critical section – adding synchronization locks preventing other threads to perform simultaneous operation



In C++11 use template class std::atomic to create atomic primitive types

Thread safe code



Atomic through synchronization (mutual exclusion)



Thread-safe but not reentrant – static variables protected by mutual exclusion



Atomic operation

Singleton pattern and concurrent initialization problem 

Singleton is a design pattern which restricts instantiation of a class to one object (AKA There can be only one) – which is often desired



This introduces global state into the program



Global state + multithreading = problems



What happens when the code initializing the singleton instance is executed simultaneously by multiple threads? (AKA concurrent initialization problem – which relates to any global variable)

Concurrent initialization problem Naïve

Solution

OpenMP library 

Open Multi-Processing – cross-platform, standard API for creating multiprocessing (multithreaded) applications



Targeted at creating high-performance, data-crunching programs



Pros





Portability - no need to know platform-specific multithreading APIs (like pthreads or Windows threads)



Simple API compared to native ones



The same code may be run serial or parallel depending on configuration of OpenMP runtime, no need to redesign the application

Cons 

Simple – not as sophisticated as native APIs



No explicit use of synchronization objects – hard to spot bugs

OpenMP – when to use? 

When decomposing problem which consists of performing the same task on different partitions of data (data parallelism) 

Example: apply filtering to multiple channels of audio buffer in parallel – use #pragma omp parallel for

OpenMP constucts 

Work-sharing constructs – crate parallel loops and distribute sections of serial code to threads



Data sharing clauses – specifying variables as shared or private between threads, also map/reduce model



Synchronization clauses – creating critical sections, atomic operations & thread barriers



Scheduling clauses – type of task scheduling for parallel loops



Conditional parallelization



Initialization of multiple private variables in parallel sections



Detection of number of processors, timing functions

More on OpenMP… 

OpenMP language extensions is a language by itself, which is not trivial to understand and use



There is nothing in OpenMP that you can’t do with native API, but sometimes OpenMP makes it 

Easier & faster to implement



Less readable & harder to understand



OpenMP allows to run the same code on your CPU and GPGPU platform – supercomputing in your home (see also OpenCL…)



It’s best to use OpenMP with embarrassingly parallel problems, better leave more complex issues to custom-designed models using portable threading tools like C++11 or boost.threads

Synchronization primitives 

Mutex



Semaphore



Condition variable



Monitor (actually not a primitive)



Barrier



Read/Write Lock



Event (Windows)

Mutex 

MUTual EXclusion object



Basic tool for creating critical sections



Only one thread at a time may acquire ownership of a mutex



Other threads trying to acquire ownership will wait (or try waiting) until first thread releases ownership



Operations: 

acquire (lock)



release (unlock)



try acquire (try lock) – returns boolean value whether ownership was actually acquired

Mutex

Mutex types 



With regard to process boundaries 

Process-shared – used for IPC, capable of blocking threads belonging to different processes (POSIX: pthread_mutex_create(); Windows: CreateMutex())



Intra-process – „cheap” mutex for using with threads within a single process (Windows: InitializeCriticalSection())

With regard to recurrence 

“plain” – will raise an error when attempted locking by the same thread multiple times



recursive – will allow locking multiple times (and require unlocking the same number of times) – this is the only type supported natively on Windows

“Scoped lock” idiom (C++) 

Using standard lock()/unlock() API is unsafe/uncomfortable in presence of exceptions



It’s best to use additional “scoped lock” class which will acquire the lock in constructor and release it in destructor, so even when exception is thrown, the lock will be released preventing the deadlock

Semaphore 

A synchronization object which maintains its lock count (and may optionally have maximum lock count)



Increase lock count with signal operation (raise the semaphore)



Decrease lock count with wait operation; waiting on semaphore with zero lock count will block until semaphore is raised



There’s no notion of semaphore owner – any thread can signal or wait on semaphore



Mutex is a special case of semaphore which maximum lock count of 1 and restriction that only the thread which acquired lock (succeeded in wait) may signal it



Semaphores are not as easy to understand & use correctly as mutexes, so try to avoid them until you know what you’re doing ;)

Condition variable 

Basic building block of a monitor



Threads perform wait operation on condition variable, until one of them is released by signal operation (or all of them are released with broadcast)



Condition variable should be used together with mutex – thread which succeeds in wait operation will automatically acquire lock on mutex

Condition variable

Condition variable usage pattern

Monitor pattern 

The usage of condition variable together with mutex from the previous example is the monitor pattern



It is a basic pattern involving waiting for some condition to occur within a critical section



The only available synchronization/wait construct in Java



Monitor pattern is a safe & tested way of avoiding race conditions & deadlocks while waiting for some condition to occur in a multithreaded code

Barrier 

Barrier (or rendezvous point) is a place in code where threads in a group are blocked and can not proceed until all of them reached the barrier



Barrier enforces synchronization of threads – useful in highperformance computing & number crunching

Read/Write lock 

Also called shared mutex



A special kind of mutex which allows many threads to simultaneously perform read operation but exclusive write access



May provide performance gain in a scenario where the write operation occurs rarely but read operations are common



Natively supported on POSIX, not on Windows



How it works 

Writing will cause to block all other writers and readers



Writer will wait until all readers have finished



Special operation: upgrading rwlock from read mode to write mode



Lots of readers may lead to starvation of writers!

Windows Events 

A special case of binary semaphore, which unlike mutex doesn’t support notion of ownership



Kinds of events: 

Manual reset – after thread succeeds in waiting on event, needs to be manually reset to non-signaled state (remains signaled until explicitly reset)



Automatic reset – after thread succeeds in waiting, event is automatically, atomically reset to non-signaled state

Suggest Documents