Systems software design THREADS & SYNCHRONIZATION
outline
Threads
Properties
Creating & joining
Usage patterns
OpenMP library
Synchronization
Objects
Monitor pattern
Thread properties
“thread of execution” – running path of code instructions (smallest sequence of instructions that can be managed independently by a scheduler)
Each process has at least 1 thread (main thread, program loop)
Additional threads may be created on demand to perform some specific tasks (worker threads)
Libraries & system may create additional threads which are invisible to the user but make the application multithreaded
Threads vs. processes
Processes are independent, threads exist as subsets of a process
Processes have more state configuration than threads, threads share process state, memory and resources
Processes have separate address space, threads share address space
Processes interact through systemprovided IPC mechanisms
Context switches between threads in same process is faster than between processes
Threading – what for?
Responsiveness – perform long, blocking operations in the worker threads so that the application remains responsive to user input and doesn’t look frozen
Non-blocking I/O may achieve the same without multiple threads but is more error-prone, harder & less natural in use
Performance – on multicore systems multiple threads allow to achieve the result faster by partitioning the work into parts executed in parallel on separate cores
Throughput – multithreaded application can better utilize the system by performing work in some threads when other are blocked waiting for I/O to complete
Multithreading - dangers
Synchronization – multiple threads may modify the same data concurrently, leading to unexpected behavior
Race condition – software depends on the sequence or timing of threads to operate correctly. Without proper synchronization of threads, the timing may be nondeterministic (esp. on SMP system)
Deadlock – improper use of synchronization objects may lead to a state when thread A acquired resource a and waits for thread B to release resource b, while thread B acquired resource b and waits for thread A to release resource a
Stability – faulty thread crashes the entire process
Race conditions Expected
Possible
These situations cause very difficult to diagnose bugs (Heisenbugs)
Solution is to use mutual exclusion (or other synchronization objects, eg. Monitor) or atomic operations
Deadlock
You need to maintain the same order of resources being acquired in all threads or use „safe” patterns – eg. monitor
Thread lifetime
Apart from main thread, additional (worker) threads are created explicitly through system call
Thread may be created in suspended or running state, some systems allow also to suspend running threads (which should be avoided as may lead to deadlock)
To create a thread we need a thread function that will constitute thread’s code path
After the thread function exits, thread becomes joinable
Thread may be joined at any time, but the join will wait until thread is joinable
Thread join completes releases resources
thread
existence
and
On Windows use _beginthread()/_beginthreadex() APIs to create a thread (or CreateThread() function); WaitForSingleObject() or other wait functions to join it
On POSIX use phtread_create() & pthread_join() APIs (see man 3 pthread_create)
Even better: use C++11 std::thread class from header
Thread pools
A general usage patter where a number of worker threads is created and tasks which need to be performed are inserted into a queue
Worker thread obtains a task from the queue, processes it and stores some result
After task is completed thread returns to the pool and awaits other tasks (or receives a new one right away)
Tasks (and their execution time) may be identical or different
The throughput of the system is increased (total time to complete all tasks)
The number of threads is related to number of CPU cores (why?)
Thread safety A piece of code is thread-safe if it manipulates shared data structures in a manner that guarantees safe execution by multiple threads at the time (no race conditions guarantee) Solutions
Avoid shared state
Re-entrancy – writing code in such a way, that it doesn’t involve saving state to some global/shared variables. All non-local state is accessed through atomic operations
Thread-local storage – each thread has its own copy of variables
Synchronize access to shared state
Mutual exclusion – access to data is serialized using synchronization objects ensuring that only one thread reads/writes data at any time
Atomic operations – use special instructions which cannot be interrupted by other threads
Immutable state – state cannot be changed after it is created
Thread local storage
Some systems or programming languages allow to create variables, whose value may be different in each thread – each thread receives different copy of the variable
Most common example is errno variable (last error value) used by standard C library – if there was only one global errno, it would easily led to race conditions
How to use
C++11 – thread_local keyword for use with global/static variables
MSVC - _declspec(thread) declarator for variables
GNU C - __thread declarator
POSIX – pthread_key_create()/pthread_setspecific()/pthread_key_delete()
Windows – TlsAlloc()/TlsSetValue()/TlsGetValue()/TlsFree()
Atomic operations
Processors have instructions that are uninterruptible, i.e. their execution momentarily pauses hardware interrupts (which are used to implement thread switches) – guarantee that the instruction will complete with deterministic result on uniprocessor system
Some processors also have instructions which prevent other processors in the system to simultaneously alter the same memory location
Any primitive operation can be made atomic by enclosing in a critical section – adding synchronization locks preventing other threads to perform simultaneous operation
In C++11 use template class std::atomic to create atomic primitive types
Thread safe code
Atomic through synchronization (mutual exclusion)
Thread-safe but not reentrant – static variables protected by mutual exclusion
Atomic operation
Singleton pattern and concurrent initialization problem
Singleton is a design pattern which restricts instantiation of a class to one object (AKA There can be only one) – which is often desired
This introduces global state into the program
Global state + multithreading = problems
What happens when the code initializing the singleton instance is executed simultaneously by multiple threads? (AKA concurrent initialization problem – which relates to any global variable)
Concurrent initialization problem Naïve
Solution
OpenMP library
Open Multi-Processing – cross-platform, standard API for creating multiprocessing (multithreaded) applications
Targeted at creating high-performance, data-crunching programs
Pros
Portability - no need to know platform-specific multithreading APIs (like pthreads or Windows threads)
Simple API compared to native ones
The same code may be run serial or parallel depending on configuration of OpenMP runtime, no need to redesign the application
Cons
Simple – not as sophisticated as native APIs
No explicit use of synchronization objects – hard to spot bugs
OpenMP – when to use?
When decomposing problem which consists of performing the same task on different partitions of data (data parallelism)
Example: apply filtering to multiple channels of audio buffer in parallel – use #pragma omp parallel for
OpenMP constucts
Work-sharing constructs – crate parallel loops and distribute sections of serial code to threads
Data sharing clauses – specifying variables as shared or private between threads, also map/reduce model
Synchronization clauses – creating critical sections, atomic operations & thread barriers
Scheduling clauses – type of task scheduling for parallel loops
Conditional parallelization
Initialization of multiple private variables in parallel sections
Detection of number of processors, timing functions
More on OpenMP…
OpenMP language extensions is a language by itself, which is not trivial to understand and use
There is nothing in OpenMP that you can’t do with native API, but sometimes OpenMP makes it
Easier & faster to implement
Less readable & harder to understand
OpenMP allows to run the same code on your CPU and GPGPU platform – supercomputing in your home (see also OpenCL…)
It’s best to use OpenMP with embarrassingly parallel problems, better leave more complex issues to custom-designed models using portable threading tools like C++11 or boost.threads
Synchronization primitives
Mutex
Semaphore
Condition variable
Monitor (actually not a primitive)
Barrier
Read/Write Lock
Event (Windows)
Mutex
MUTual EXclusion object
Basic tool for creating critical sections
Only one thread at a time may acquire ownership of a mutex
Other threads trying to acquire ownership will wait (or try waiting) until first thread releases ownership
Operations:
acquire (lock)
release (unlock)
try acquire (try lock) – returns boolean value whether ownership was actually acquired
Mutex
Mutex types
With regard to process boundaries
Process-shared – used for IPC, capable of blocking threads belonging to different processes (POSIX: pthread_mutex_create(); Windows: CreateMutex())
Intra-process – „cheap” mutex for using with threads within a single process (Windows: InitializeCriticalSection())
With regard to recurrence
“plain” – will raise an error when attempted locking by the same thread multiple times
recursive – will allow locking multiple times (and require unlocking the same number of times) – this is the only type supported natively on Windows
“Scoped lock” idiom (C++)
Using standard lock()/unlock() API is unsafe/uncomfortable in presence of exceptions
It’s best to use additional “scoped lock” class which will acquire the lock in constructor and release it in destructor, so even when exception is thrown, the lock will be released preventing the deadlock
Semaphore
A synchronization object which maintains its lock count (and may optionally have maximum lock count)
Increase lock count with signal operation (raise the semaphore)
Decrease lock count with wait operation; waiting on semaphore with zero lock count will block until semaphore is raised
There’s no notion of semaphore owner – any thread can signal or wait on semaphore
Mutex is a special case of semaphore which maximum lock count of 1 and restriction that only the thread which acquired lock (succeeded in wait) may signal it
Semaphores are not as easy to understand & use correctly as mutexes, so try to avoid them until you know what you’re doing ;)
Condition variable
Basic building block of a monitor
Threads perform wait operation on condition variable, until one of them is released by signal operation (or all of them are released with broadcast)
Condition variable should be used together with mutex – thread which succeeds in wait operation will automatically acquire lock on mutex
Condition variable
Condition variable usage pattern
Monitor pattern
The usage of condition variable together with mutex from the previous example is the monitor pattern
It is a basic pattern involving waiting for some condition to occur within a critical section
The only available synchronization/wait construct in Java
Monitor pattern is a safe & tested way of avoiding race conditions & deadlocks while waiting for some condition to occur in a multithreaded code
Barrier
Barrier (or rendezvous point) is a place in code where threads in a group are blocked and can not proceed until all of them reached the barrier
Barrier enforces synchronization of threads – useful in highperformance computing & number crunching
Read/Write lock
Also called shared mutex
A special kind of mutex which allows many threads to simultaneously perform read operation but exclusive write access
May provide performance gain in a scenario where the write operation occurs rarely but read operations are common
Natively supported on POSIX, not on Windows
How it works
Writing will cause to block all other writers and readers
Writer will wait until all readers have finished
Special operation: upgrading rwlock from read mode to write mode
Lots of readers may lead to starvation of writers!
Windows Events
A special case of binary semaphore, which unlike mutex doesn’t support notion of ownership
Kinds of events:
Manual reset – after thread succeeds in waiting on event, needs to be manually reset to non-signaled state (remains signaled until explicitly reset)
Automatic reset – after thread succeeds in waiting, event is automatically, atomically reset to non-signaled state