Systems software design THREADS & SYNCHRONIZATION

Systems software design THREADS & SYNCHRONIZATION outline  Threads  Properties  Creating & joining  Usage patterns  OpenMP library  ...

Author: Derek Hodge

1 downloads 3 Views 646KB Size

Report

Download PDF

Recommend Documents

Threads synchronization

OPERATING SYSTEMS. Threads

Operating Systems Threads

Operating Systems. Threads

Operating Systems: Processes and Threads

Threads. CSCI 315 Operating Systems Design Department of Computer Science

Synchronization. CSCI 315 Operating Systems Design Department of Computer Science

Threads. Why threads? Thread model & implementation. Next time: Synchronization. q q q q

Introduction Distributed Systems Processes & Threads

VariantSync - Automating the Synchronization of Software Variants

CSE 3320 Operating Systems Threads Jia Rao

Operating Systems. Lecture 4- Process Synchronization. Principles of Operating Systems - Process Synchronization 1

Software Design

Synchronization in coupled Ikeda delay systems

Optimistic Synchronization in parallell systems. Anders Gidenstam

CS 5523 Operating Systems: Concurrency and Synchronization

ACTOR-ORIENTED DESIGN OF EMBEDDED HARDWARE AND SOFTWARE SYSTEMS

Toward the Design Quality Evaluation of Object-Oriented Software Systems

MODEL-INTEGRATED DESIGN IN SOFTWARE, SYSTEMS AND CONTROL ENGINEERING

Software for Design of Wastewater Treatment Plants and Aeration Systems

Chapter 4: Threads! Overview" Multithreading Models" Threading Issues" Pthreads" Windows XP Threads" Linux Threads" Java Threads"

Chapter 4: Threads. Chapter 4: Threads

Section 4 Processes, kernel threads, user threads

Module 5: Threads. Solaris 2 Threads

Systems software design THREADS & SYNCHRONIZATION

outline 

Threads 

Properties



Creating & joining



Usage patterns



OpenMP library



Synchronization 

Objects



Monitor pattern

Thread properties 

“thread of execution” – running path of code instructions (smallest sequence of instructions that can be managed independently by a scheduler)



Each process has at least 1 thread (main thread, program loop)



Additional threads may be created on demand to perform some specific tasks (worker threads)



Libraries & system may create additional threads which are invisible to the user but make the application multithreaded

Threads vs. processes 

Processes are independent, threads exist as subsets of a process



Processes have more state configuration than threads, threads share process state, memory and resources



Processes have separate address space, threads share address space



Processes interact through systemprovided IPC mechanisms



Context switches between threads in same process is faster than between processes

Threading – what for? 

Responsiveness – perform long, blocking operations in the worker threads so that the application remains responsive to user input and doesn’t look frozen 

Non-blocking I/O may achieve the same without multiple threads but is more error-prone, harder & less natural in use



Performance – on multicore systems multiple threads allow to achieve the result faster by partitioning the work into parts executed in parallel on separate cores



Throughput – multithreaded application can better utilize the system by performing work in some threads when other are blocked waiting for I/O to complete

Multithreading - dangers 



Synchronization – multiple threads may modify the same data concurrently, leading to unexpected behavior 

Race condition – software depends on the sequence or timing of threads to operate correctly. Without proper synchronization of threads, the timing may be nondeterministic (esp. on SMP system)



Deadlock – improper use of synchronization objects may lead to a state when thread A acquired resource a and waits for thread B to release resource b, while thread B acquired resource b and waits for thread A to release resource a

Stability – faulty thread crashes the entire process

Race conditions Expected

Possible



These situations cause very difficult to diagnose bugs (Heisenbugs)



Solution is to use mutual exclusion (or other synchronization objects, eg. Monitor) or atomic operations

Deadlock



You need to maintain the same order of resources being acquired in all threads or use „safe” patterns – eg. monitor

Thread lifetime 

Apart from main thread, additional (worker) threads are created explicitly through system call



Thread may be created in suspended or running state, some systems allow also to suspend running threads (which should be avoided as may lead to deadlock)



To create a thread we need a thread function that will constitute thread’s code path



After the thread function exits, thread becomes joinable



Thread may be joined at any time, but the join will wait until thread is joinable



Thread join completes releases resources

thread

existence

and



On Windows use _beginthread()/_beginthreadex() APIs to create a thread (or CreateThread() function); WaitForSingleObject() or other wait functions to join it



On POSIX use phtread_create() & pthread_join() APIs (see man 3 pthread_create)



Even better: use C++11 std::thread class from header

Thread pools 

A general usage patter where a number of worker threads is created and tasks which need to be performed are inserted into a queue



Worker thread obtains a task from the queue, processes it and stores some result



After task is completed thread returns to the pool and awaits other tasks (or receives a new one right away)



Tasks (and their execution time) may be identical or different



The throughput of the system is increased (total time to complete all tasks)



The number of threads is related to number of CPU cores (why?)

Thread safety A piece of code is thread-safe if it manipulates shared data structures in a manner that guarantees safe execution by multiple threads at the time (no race conditions guarantee)  Solutions 





Avoid shared state 

Re-entrancy – writing code in such a way, that it doesn’t involve saving state to some global/shared variables. All non-local state is accessed through atomic operations



Thread-local storage – each thread has its own copy of variables

Synchronize access to shared state 

Mutual exclusion – access to data is serialized using synchronization objects ensuring that only one thread reads/writes data at any time



Atomic operations – use special instructions which cannot be interrupted by other threads



Immutable state – state cannot be changed after it is created

Thread local storage 

Some systems or programming languages allow to create variables, whose value may be different in each thread – each thread receives different copy of the variable



Most common example is errno variable (last error value) used by standard C library – if there was only one global errno, it would easily led to race conditions



How to use 

C++11 – thread_local keyword for use with global/static variables



MSVC - _declspec(thread) declarator for variables



GNU C - __thread declarator



POSIX – pthread_key_create()/pthread_setspecific()/pthread_key_delete()



Windows – TlsAlloc()/TlsSetValue()/TlsGetValue()/TlsFree()

Atomic operations 

Processors have instructions that are uninterruptible, i.e. their execution momentarily pauses hardware interrupts (which are used to implement thread switches) – guarantee that the instruction will complete with deterministic result on uniprocessor system



Some processors also have instructions which prevent other processors in the system to simultaneously alter the same memory location



Any primitive operation can be made atomic by enclosing in a critical section – adding synchronization locks preventing other threads to perform simultaneous operation



In C++11 use template class std::atomic to create atomic primitive types

Thread safe code



Atomic through synchronization (mutual exclusion)



Thread-safe but not reentrant – static variables protected by mutual exclusion



Atomic operation

Singleton pattern and concurrent initialization problem 

Singleton is a design pattern which restricts instantiation of a class to one object (AKA There can be only one) – which is often desired



This introduces global state into the program



Global state + multithreading = problems



What happens when the code initializing the singleton instance is executed simultaneously by multiple threads? (AKA concurrent initialization problem – which relates to any global variable)

Concurrent initialization problem Naïve

Solution

OpenMP library 

Open Multi-Processing – cross-platform, standard API for creating multiprocessing (multithreaded) applications



Targeted at creating high-performance, data-crunching programs



Pros





Portability - no need to know platform-specific multithreading APIs (like pthreads or Windows threads)



Simple API compared to native ones



The same code may be run serial or parallel depending on configuration of OpenMP runtime, no need to redesign the application

Cons 

Simple – not as sophisticated as native APIs



No explicit use of synchronization objects – hard to spot bugs

OpenMP – when to use? 

When decomposing problem which consists of performing the same task on different partitions of data (data parallelism) 

Example: apply filtering to multiple channels of audio buffer in parallel – use #pragma omp parallel for

OpenMP constucts 

Work-sharing constructs – crate parallel loops and distribute sections of serial code to threads



Data sharing clauses – specifying variables as shared or private between threads, also map/reduce model



Synchronization clauses – creating critical sections, atomic operations & thread barriers



Scheduling clauses – type of task scheduling for parallel loops



Conditional parallelization



Initialization of multiple private variables in parallel sections



Detection of number of processors, timing functions

More on OpenMP… 

OpenMP language extensions is a language by itself, which is not trivial to understand and use



There is nothing in OpenMP that you can’t do with native API, but sometimes OpenMP makes it 

Easier & faster to implement



Less readable & harder to understand



OpenMP allows to run the same code on your CPU and GPGPU platform – supercomputing in your home (see also OpenCL…)



It’s best to use OpenMP with embarrassingly parallel problems, better leave more complex issues to custom-designed models using portable threading tools like C++11 or boost.threads

Synchronization primitives 

Mutex



Semaphore



Condition variable



Monitor (actually not a primitive)



Barrier



Read/Write Lock



Event (Windows)

Mutex 

MUTual EXclusion object



Basic tool for creating critical sections



Only one thread at a time may acquire ownership of a mutex



Other threads trying to acquire ownership will wait (or try waiting) until first thread releases ownership



Operations: 

acquire (lock)



release (unlock)



try acquire (try lock) – returns boolean value whether ownership was actually acquired

Mutex

Mutex types 



With regard to process boundaries 

Process-shared – used for IPC, capable of blocking threads belonging to different processes (POSIX: pthread_mutex_create(); Windows: CreateMutex())



Intra-process – „cheap” mutex for using with threads within a single process (Windows: InitializeCriticalSection())

With regard to recurrence 

“plain” – will raise an error when attempted locking by the same thread multiple times



recursive – will allow locking multiple times (and require unlocking the same number of times) – this is the only type supported natively on Windows

“Scoped lock” idiom (C++) 

Using standard lock()/unlock() API is unsafe/uncomfortable in presence of exceptions



It’s best to use additional “scoped lock” class which will acquire the lock in constructor and release it in destructor, so even when exception is thrown, the lock will be released preventing the deadlock

Semaphore 

A synchronization object which maintains its lock count (and may optionally have maximum lock count)



Increase lock count with signal operation (raise the semaphore)



Decrease lock count with wait operation; waiting on semaphore with zero lock count will block until semaphore is raised



There’s no notion of semaphore owner – any thread can signal or wait on semaphore



Mutex is a special case of semaphore which maximum lock count of 1 and restriction that only the thread which acquired lock (succeeded in wait) may signal it



Semaphores are not as easy to understand & use correctly as mutexes, so try to avoid them until you know what you’re doing ;)

Condition variable 

Basic building block of a monitor



Threads perform wait operation on condition variable, until one of them is released by signal operation (or all of them are released with broadcast)



Condition variable should be used together with mutex – thread which succeeds in wait operation will automatically acquire lock on mutex

Condition variable

Condition variable usage pattern

Monitor pattern 

The usage of condition variable together with mutex from the previous example is the monitor pattern



It is a basic pattern involving waiting for some condition to occur within a critical section



The only available synchronization/wait construct in Java



Monitor pattern is a safe & tested way of avoiding race conditions & deadlocks while waiting for some condition to occur in a multithreaded code

Barrier 

Barrier (or rendezvous point) is a place in code where threads in a group are blocked and can not proceed until all of them reached the barrier



Barrier enforces synchronization of threads – useful in highperformance computing & number crunching

Read/Write lock 

Also called shared mutex



A special kind of mutex which allows many threads to simultaneously perform read operation but exclusive write access



May provide performance gain in a scenario where the write operation occurs rarely but read operations are common



Natively supported on POSIX, not on Windows



How it works 

Writing will cause to block all other writers and readers



Writer will wait until all readers have finished



Special operation: upgrading rwlock from read mode to write mode



Lots of readers may lead to starvation of writers!

Windows Events 

A special case of binary semaphore, which unlike mutex doesn’t support notion of ownership



Kinds of events: 

Manual reset – after thread succeeds in waiting on event, needs to be manually reset to non-signaled state (remains signaled until explicitly reset)



Automatic reset – after thread succeeds in waiting, event is automatically, atomically reset to non-signaled state