Scheduling Many-Task Workloads on Supercomputers: Dealing with Trailing Tasks

Scheduling Many-Task Workloads on Supercomputers: Dealing with Trailing Tasks Timothy Armstrong, Mike Wilde, Daniel Katz, Zhao Zhang, Ian Foster. Agni...

Author: Coleen Freeman

3 downloads 1 Views 278KB Size

Report

Download PDF

Recommend Documents

On dealing with adversaries fairly

Distributed Dynamic Scheduling of Composite Tasks on Grid. Computing System

Energy efficient scheduling of parallel tasks on multiprocessor computers

74 Dealing With Symptoms. Dealing With Symptoms

Scheduling of tasks for distributed processors

The Partitioned, Static-Priority Scheduling of Sporadic Real-Time Tasks with Constrained Deadlines on Multiprocessor Platforms

Spatial Statistics and Uncertainty Quantification on Supercomputers

DEALING WITH THE MEDIA

Dealing with Difficult Stakeholders

Dealing with Economic Failure

Dealing with Emotions

DEALING WITH UNCERTAINTIES

On Optimal MAC Scheduling With Physical Interference

Dealing with Strongholds

Dealing With Religious Intolerance

Dealing With a Grandstander

DEALING WITH DEPRESSION

DEALING WITH PATIENT REVIEWS

Dealing with Difficult Parents

Dealing with Diabetes

Dealing with data uncertainty

Seasons. Dealing with grief

11: Dealing with Debunkers

Dealing With Difficult Patients

Scheduling Many-Task Workloads on Supercomputers: Dealing with Trailing Tasks Timothy Armstrong, Mike Wilde, Daniel Katz, Zhao Zhang, Ian Foster. Agnieszka Podsiadło 23/02/12

Presentation

1

Abstract Many-task applications

Reducing time Efficient use of resources

23/02/12

Presentation

2

Introduction Introduction

Many-task application

Fixed Node Count

Dynamic Allocation Tail-chopping Simulation Experiment Results Discussion Conclusion

Comprices many independent tasks coupled with explicit I/O dependencies (tasks singlethreated or supporting parallelism within one node) ●

●

Focuses on high-performance

23/02/12

Presentation

3

Introduction Introduction

Trailing task problem

Fixed Node Count

Dynamic Allocation Tail-chopping Simulation

●

Increasing number of workers remain idle

Experiment Results Discussion Conclusion

Tail of some number of tasks continues to execute for some time ●

23/02/12

Presentation

4

Problem description Introduction

Workers

Fixed Node Count

Dynamic Allocation Tail-chopping Simulation Experiment

Tasks

Results Discussion Conclusion

23/02/12

Presentation

5

Problem description Introduction

Workers

Fixed Node Count

Dynamic Allocation Tail-chopping Simulation Experiment

Tasks

Results Discussion Conclusion

23/02/12

Presentation

6

Problem description Introduction

Constraints

Fixed Node Count

Dynamic Allocation

Workers allocated in a way fitting allocation policies ● Tasks scheduled on the available worker ● Tasks are not preemptable ●

Tail-chopping Simulation Experiment Results Discussion Conclusion

23/02/12

Presentation

7

Problem description Introduction

Constraints

Fixed Node Count

Dynamic Allocation

Workers allocated in a way fitting allocation policies ● Tasks scheduled on the available worker ● Tasks are not preemptable ●

Tail-chopping Simulation Experiment Results

Optimization

Discussion Conclusion

Minimizing: ● Time to solution ● Utilization u = (time spent on tasks) / (total allocated time)

●

23/02/12

Presentation

8

Algorithms for fixed worker counts Introduction Fixed Node Count

●

Dynamic Allocation Tail-chopping Simulation Experiment Results Discussion Conclusion

Fixed number of workers ●

Both goals are equivalent

●

NP-hard (bin-packing)

●

Simple approaches – queues ●

Random: (2 – 1/m) * OPT

●

Sorted: (4/3 – 1/m) * OPT

23/02/12

Presentation

9

Algorithms for fixed worker counts Introduction Fixed Node Count

Dynamic Allocation Tail-chopping Simulation Experiment Results

Factors causing long-tail ●

●

Variance in task duration Number of tasks not divisible by the number of workers

Discussion Conclusion

23/02/12

Presentation

10

Dynamic allocation Introduction Fixed Node Count

Dynamic Allocation Tail-chopping Simulation Experiment Results

Few less workers (works for sorted or short tasks) ●

Discussion Conclusion

Tail-chopping – after chopping smaller resources are allocated ●

23/02/12

Presentation

11

Tail-chopping assumptions Introduction Fixed Node Count

Dynamic Allocation Tail-chopping Simulation Experiment Results Discussion Conclusion

Only one partition of processors will be used for the target at a given time ●

No time limit – allocation requested for any duration ●

Constant time required to start and stop an allocation ●

Tasks cannot migrate – to move a task we need to cancel and restart the task ●

23/02/12

Presentation

12

Tail-chopping heuristics Introduction Fixed Node Count

Dynamic Allocation Tail-chopping

How many workers to allocate? ●

minimum task/worker ratio

Simulation Experiment Results Discussion

When to shrink the number of workers? ●

maximum fraction of idle workers

Conclusion

23/02/12

Presentation

13

Tail-chopping hypothesis Introduction Fixed Node Count

Dynamic Allocation

Tail-chopping will not completely solve the utilization problem ●

Tail-chopping Simulation Experiment Results Discussion Conclusion

Hard to achieve high utilization if the minimum allocation is high ●

Tail-chopping more beneficial for skewed distribution with much-longer-than-average-task-tail ●

Tail-chopping provides greater benefit for not sorted tasks – otherwise reallocating looses a lot of precious work done ●

●

No benefit combined with sorting if max_length / mean_length > task / worker

23/02/12

Presentation

14

Simulation Introduction Fixed Node Count

Dynamic Allocation Tail-chopping Simulation Experiment Results Discussion Conclusion

All tasks single threaded ● 12 different numbers of CPU cores ● First measured and then used: ● Time from request to manager reporting all partitions ready to go ● Time between requesting to terminate and the allocation finishing ● Control over 3 parameters: ● Scheduling order ● Task / worker ● Fraction of idle workers ●

23/02/12

Presentation

15

Simulation - results Introduction Fixed Node Count

Dynamic Allocation Tail-chopping Simulation Experiment Results Discussion Conclusion

Tail-chopping improved utilization for many sets of parameters ●

Increased time to solution (as expected) ●

23/02/12

Presentation

16

Simulation - results Introduction Fixed Node Count

Dynamic Allocation Tail-chopping Simulation Experiment Results Discussion Conclusion

23/02/12

Presentation

17

Simulation - results Introduction Fixed Node Count

Dynamic Allocation Tail-chopping Simulation Experiment Results Discussion Conclusion

23/02/12

Presentation

18

Simulation - results Introduction Fixed Node Count

Dynamic Allocation Tail-chopping Simulation Experiment Results Discussion Conclusion

Skewedness of the distribution is crucial for assessing the tail-chopping method ●

There is no better and worse fraction idle parameter ● 0.8 – aiming for quick solution ●

●

No further benefit on sorted tasks

23/02/12

Presentation

19

Experiment Introduction Fixed Node Count

Dynamic Allocation Tail-chopping

●

With and without tail-chopping

●

15,000 tasks

●

Task/worker = 5

●

Chopping when 50% of workers are idle

Simulation Experiment Results Discussion Conclusion

23/02/12

Presentation

20

Results Introduction Fixed Node Count

Dynamic Allocation Tail-chopping

Without tail-chopping

Simulation Experiment Results Discussion Conclusion

With tail-chopping when 50% workers are idle

23/02/12

Presentation

21

Discussion Introduction

Major problems:

Fixed Node Count

Dynamic Allocation Tail-chopping

●

Simulation Experiment Results

●

Discussion Conclusion ●

Time spent on waiting for new allocation Canceling the tasks when reallocated Current heuristics are not sophisticated

23/02/12

Presentation

22

Discussion Introduction Fixed Node Count

Dynamic Allocation Tail-chopping

Time spent on waiting for new allocation

Simulation Experiment Results Discussion Conclusion

●

●

Smaller processors alongside Ability of downsizing the current allocation

23/02/12

Presentation

23

Discussion Introduction Fixed Node Count

Dynamic Allocation Tail-chopping Simulation Experiment Results

Canceling the tasks when reallocated ●

Ability to migrate tasks

Discussion Conclusion

23/02/12

Presentation

24

Discussion Introduction Fixed Node Count

Dynamic Allocation Tail-chopping

Current heuristics are not sophisticated

Simulation Experiment Results Discussion

●

Heuristics using more information

Conclusion

23/02/12

Presentation

25

Conclusion Introduction Fixed Node Count

Dynamic Allocation Tail-chopping Simulation Experiment Results

●

●

Discussion Conclusion ●

Described trailing task problem Tail-chopping as a promising way to address the problem Several directions for further research

23/02/12

Presentation

26