estimating System Performance and Reliability

Dealing with Non-functional Requirements: Measuring/estimating System Performance and Reliability Box Leangsuksun Computer Science Center for Entrepre...

Author: Owen Dalton

0 downloads 0 Views 10MB Size

Report

Download PDF

Recommend Documents

Innovation, Performance and Reliability

RELIABILITY+ PERFORMANCE

Improving Courier Service Reservation System: Reliability and Performance

Reliability - Performance - Value

SECTION 2 RELIABILITY PERFORMANCE MEASURES Reliability Performance Indices 3 Major Events 4 Reliability Performance Benchmarks and Standards 5

High Performance & Reliability

DECK OVENS SIMPLICITY, RELIABILITY AND PROVEN PERFORMANCE!

Breakthrough in performance, flexibility and reliability

Reliability Management System (RMS)

3300 ICP System Reliability and Availability Summary

ATCO ELECTRIC LTD. (Transmission System) SERVICE QUALITY AND RELIABILITY PERFORMANCE, MEASURES AND INDICES Revision 1

Maintenance Planning and Scheduling for High Reliability and Maintenance Performance

Quantitative Methods for Estimating the Reliability of Qualitative Data

High Reliability Power System Design

Windows 8 File System Performance and Reliability Enhancements in NTFS Neal Christiansen Principal Development Lead Microsoft

Alcohol and Student Performance: Estimating the Effect of Legal Access

General Terms Management, Performance, Design, Reliability, Experimentation

performance S protection S reliability S efficiency

Carrier-Class Reliability for High-Performance Businesses

CoDNS: Improving DNS Performance and Reliability via Cooperative Lookups

Spectrofluorometer. FP-8000 Series. Performance Innovation Reliability

Performance and reliability of Radio Frequency Identification (RFID)

C5580. Digital Colour Multifunction Device High performance and reliability

Dealing with Non-functional Requirements: Measuring/estimating System Performance and Reliability Box Leangsuksun Computer Science Center for Entrepreneurship and Information Technology Louisiana Tech University cenit

Introduction l  l  l  l  l  l  l  l  l 

Non-functional requirements are equally if not more important Why? World is impatient More Cost-effective upfront than retrofit Efficiency Inconvenience Life-threatening Lost of money and/or opportunities Etc. cenit

Why? Goals l  l  l  l  l  l 

Compare Alternatives Determine impacts (per features) System Tuning quantify relative Rel/Avail/Perf Debugging Set Expectation

cenit

How to measure or estimate l  l  l 

Measurements Simulations Analytical Modeling

cenit

Measurements l  l  l  l  l 

Actual System Construction Create a workload per requirements Provides the best results Inherent difficult and inflexible Almost impossible for What-if

cenit

Measurements (continued) l 

Measure system or subsystem performance with tools Ø  Ø  Ø  Ø 

l 

Gprof Top/ ps etc.. Benchmark programs (e.g. Linpak, Specmark, Winmark Papi, perfctr, perfmon, perfsuite

What about reliability measurement? log, trace, outages. cenit

Simulation l  l  l  l  l  l 

A program to simulate important characteristics of targeted systems Flexible and ease to modify Good for the What-if analysis Difficult to model every small details Popular – cost-effective and flexible Suffer from details cenit

Analytical Modeling l  l 

Mathematical description of the system Provide a quick insight Ø 

l  l 

To help guiding in detail simulation or measurement-based

Results are much less believable or accurate Example Ø  Ø 

H = cache hit prob, Tm = memory access time, Tc= cache access time T avg = H Tc + (1 – H) Tm cenit

Comparison (Lilja’ book)

Factor Flexibility Cost Believability Accuracy

Analytical Modeling High Low Low Low

Simulation

Measurement

High Medium Medium Medium

Low High High High

cenit

Performance l 

Computation Ø  Ø  Ø 

l 

Communication Ø  Ø 

l 

CPU Memory I/O etc Latency Bandwidth

Transaction Ø 

Possible more involvement than DB cenit

Some Criteria l  l 

l  l 

Throughput – # of completed requests per time unit Response time – amount of time it takes from when a request was submitted until the first response is produced, not output CPU utilization – keep the CPU as busy as possible Turnaround time – amount of time to execute a particular request (finishing time – arrival time) cenit

Performance issue discovery phase Requirement

Architecture/design

Development/code

test

1/19/2004 - 3/19/2004 Re-design, code, re-test

2/1/2004

1/19/2004

3/1/2004

3/19/2004

Telcomm industry architecture review: 1/3 related issues to performance cenit

Performance Measures l  l  l 

Modeling Simulation Measurement

cenit

Analytical Modeling l  l 

l  l 

Identify important parameters Come up with a model that closely represents a physical system Exercise Design a levee that will present flood for Mississippi river l 

What would be important factor? cenit

Can you estimate a flow rate of Mississippi River?

cenit

Analytical Modeling l 

Example for memory Ø  Ø 

l 

H = cache hit prob, Tm = memory access time, Tc= cache access time T avg = H Tc + (1 – H) Tm

Example of operation/transaction modeling Ø  Ø  Ø  Ø 

Browsing order Tb + submitting order Ts 90 % vs 10% (volume) Weight 20% vs. 80% order Order = 50 instructions + 10 mem cenit

Performance Engineering l  l  l  l  l  l  l  l 

Understand requirements and growth Should begin at planning and architecture stage Resource needs and budget Use quantitative methods to gauge the goals (&eliminate root causes) Estimate Tracking and Management Measurement Tuning cenit

PE (continued) l  l 

Poor performance reflects a negativity Costly or high cost when retrofitting Ø  Ø 

l 

Re-architecting Add more hardware

Highly tuned code -> cost more in maintenance

cenit

Key PE activities Track Predict

-requirement -architecture/analysis -budget

Measure Correct cenit

Key approach* l  l  l 

l 

Bound performance to acceptable level (based on requirement) Targets are quantitative requirements that define the acceptance criteria Budgets are the performance goals allocated across all of the architecture components that must all be met in order to meet the overall targets Estimates are design component goals derived from experience or previous measurement of existing components • These definitions are excerpted from AT&T performance engineering course and only used for educational propose.

cenit

l 

Estimate -> How well can the system perform?

l 

Budget -> How well must the system perform?

cenit

Performance Engineering Life Cycle Architecture Design Budget

23

m 2

m 3

m 4

m 5

m 6

m 7

17

18

19

20

21

22

24

25

26

27

28

29

Development

Targets

Requirement Use-cases

M 1

Estimate measure

Spread Sheet

Initial Performance Measurement Model Test

cenit

How to start (Target) l  l 

Seek the boundary (requirement) Start with Back of the Envelope calculation Ø  Ø  Ø 

Ball park (e.g. no of transactions, normal and at peak) Don’t get hung up on precision early E.g. How much water flow out?

cenit

Budget l  l  l  l 

target or educated guess Iterative process Start from subsystem and then down to modules Budgeted resources items for each process/modules/subsystems Ø 

CPU, memory, Disk I/O, network bandwidth cenit

Budget types l  l  l 

l 

Concurrency: percentage of resource allocation A sequential: wall clock time Example of Budget response time for a transaction T trans = T cpu + (1 – Cmem) T disk + T network

cenit

Dependability Estimation/ Measurement l  l 

Similarly to aforementioned 3 techniques Two measures Ø  Ø 

l 

Availability (ratio of uptime/total) Reliability (MTTF)

Analytical modeling Ø  Ø 

Non-state space State-space cenit

Availability l  l 

l 

A measurement represents a ratio of uptime vs. total times High availability - ability of a system to perform its function continuously (without interruption) for a significantly longer period of time than the reliabilities of its individual components would suggest. High availability is most often achieved through fault tolerance. cenit

Availability Model Server up

Server down & repair

Availability model

S1

time HA-OSCAR dual head model

S1 S1&S2 S2 cenit

Availability (continued) l 

Availability = uptime/total time

l 

MTTF = Mean Time To Failure Ø 

l 

MTBF = Mean Time Between Failure Ø 

l  l 

Average time to failure, when it is not repairable Average time to failure, when it is repairable

MTTR = Mean Time To Repair Availability = MTTF/(MTTF+MTTR) cenit

Why Dependability measures? l  l  l 

comparisons with cost and performance. a proper focus for productimprovement efforts. Consideration of safety and risk issues.

cenit

Dependability Modeling l  l  l 

Include reliability modeling and availability modeling A designed system can be shown to meet performance and dependability requirement. provide a good mechanism for examining the behavior of a system, right from the design stage to implementation and final deployment.

cenit

Dependability l 

Two measures Reliability (MTTF) Ø  Availability (ratio of uptime/total) Ø 

cenit

Reliability l 

l 

Definition: The reliability R(t) of a system at time t is the probability that the system failure has not occurred in the interval [0,t). If X is a random variable that represents the time to occurrence of system failure, then R(t)=P(X>t). unreliability = 1-R(t)

cenit

Reliability l 

l 

Definition MTTF of a system is the expected time until the occurrence of the (first) system failure. If X is a random variable that represents the time to occurrence of system failure, then MTTF=E[X]. Given the system reliability R(t), the MTTF can be computed as, MTTF = ∫ R(t)dt cenit

Availability l  l 

l 

A measurement represents a ratio of uptime vs. total times High availability - ability of a system to perform its function continuously (without interruption) for a significantly longer period of time than the reliabilities of its individual components would suggest. High availability is most often achieved through fault tolerance. cenit

Degree of Availability Availability Class

System Type

Unavailability (minutes/year)

Availability (in percent)

Unmanaged

50,000

90

1

Managed

5,000

99

2

Well-managed

500

99.9

3

Fault-tolerant

50

99.99

4

High Availability

5

99.999

5

Very High Availability

0.5

99.9999

6

99.99999

7

Ultra Availability

0.05

cenit

Availability l 

l 

l  l 

Definition: Availability A(t) of a system at time t is the probability that the system is functioning correctly at time t. Like the reliability measure, in some applications it is better to compute the system unavailability U(t) = 1 -A(t). Availability = MTTF / (MTTF + MTTR) A steady = lim A(t) where t -> ∞ cenit

Modeling Techniques l 

Non State-space Ø  Ø 

l 

Fault-tree Reliability Block Diagram

State-Space Ø  Ø 

Continuous Markov Chain Stochastic Petri Net

cenit

Example of system

cenit

Fault Tree

cenit

Availability Model Server up

Server down & repair

Availability model

S1

time HA-OSCAR dual head model

S1 S1&S2 S2 cenit

HA-OSCAR SRN model

• Server sub-model • Switches • Compute nodes cenit

Server Sub Model • P Server up • P Server down • Failover • P server repair • Failback

• S is up and ready • S takes control • S Server down • S repair cenit

Switch sub model

Compute node sub model

cenit

Instantaneous Availability Steady (A) = 99.993 (36 min) vs. Beowulf (A) = 99.65 (30 hr)

cenit

Stochastic Petri Net Package l  l  l 

R & D from Duke U Very popular Petri net based dependability analysis

cenit

Exercises l 

See the handouts in the class

cenit