Dealing with Non-functional Requirements: Measuring/estimating System Performance and Reliability Box Leangsuksun Computer Science Center for Entrepreneurship and Information Technology Louisiana Tech University cenit
Introduction l l l l l l l l l
Non-functional requirements are equally if not more important Why? World is impatient More Cost-effective upfront than retrofit Efficiency Inconvenience Life-threatening Lost of money and/or opportunities Etc. cenit
Why? Goals l l l l l l
Compare Alternatives Determine impacts (per features) System Tuning quantify relative Rel/Avail/Perf Debugging Set Expectation
cenit
How to measure or estimate l l l
Measurements Simulations Analytical Modeling
cenit
Measurements l l l l l
Actual System Construction Create a workload per requirements Provides the best results Inherent difficult and inflexible Almost impossible for What-if
cenit
Measurements (continued) l
Measure system or subsystem performance with tools Ø Ø Ø Ø
l
Gprof Top/ ps etc.. Benchmark programs (e.g. Linpak, Specmark, Winmark Papi, perfctr, perfmon, perfsuite
What about reliability measurement? log, trace, outages. cenit
Simulation l l l l l l
A program to simulate important characteristics of targeted systems Flexible and ease to modify Good for the What-if analysis Difficult to model every small details Popular – cost-effective and flexible Suffer from details cenit
Analytical Modeling l l
Mathematical description of the system Provide a quick insight Ø
l l
To help guiding in detail simulation or measurement-based
Results are much less believable or accurate Example Ø Ø
H = cache hit prob, Tm = memory access time, Tc= cache access time T avg = H Tc + (1 – H) Tm cenit
Comparison (Lilja’ book)
Factor Flexibility Cost Believability Accuracy
Analytical Modeling High Low Low Low
Simulation
Measurement
High Medium Medium Medium
Low High High High
cenit
Performance l
Computation Ø Ø Ø
l
Communication Ø Ø
l
CPU Memory I/O etc Latency Bandwidth
Transaction Ø
Possible more involvement than DB cenit
Some Criteria l l
l l
Throughput – # of completed requests per time unit Response time – amount of time it takes from when a request was submitted until the first response is produced, not output CPU utilization – keep the CPU as busy as possible Turnaround time – amount of time to execute a particular request (finishing time – arrival time) cenit
Performance issue discovery phase Requirement
Architecture/design
Development/code
test
1/19/2004 - 3/19/2004 Re-design, code, re-test
2/1/2004
1/19/2004
3/1/2004
3/19/2004
Telcomm industry architecture review: 1/3 related issues to performance cenit
Performance Measures l l l
Modeling Simulation Measurement
cenit
Analytical Modeling l l
l l
Identify important parameters Come up with a model that closely represents a physical system Exercise Design a levee that will present flood for Mississippi river l
What would be important factor? cenit
Can you estimate a flow rate of Mississippi River?
cenit
Analytical Modeling l
Example for memory Ø Ø
l
H = cache hit prob, Tm = memory access time, Tc= cache access time T avg = H Tc + (1 – H) Tm
Example of operation/transaction modeling Ø Ø Ø Ø
Browsing order Tb + submitting order Ts 90 % vs 10% (volume) Weight 20% vs. 80% order Order = 50 instructions + 10 mem cenit
Performance Engineering l l l l l l l l
Understand requirements and growth Should begin at planning and architecture stage Resource needs and budget Use quantitative methods to gauge the goals (&eliminate root causes) Estimate Tracking and Management Measurement Tuning cenit
PE (continued) l l
Poor performance reflects a negativity Costly or high cost when retrofitting Ø Ø
l
Re-architecting Add more hardware
Highly tuned code -> cost more in maintenance
cenit
Key PE activities Track Predict
-requirement -architecture/analysis -budget
Measure Correct cenit
Key approach* l l l
l
Bound performance to acceptable level (based on requirement) Targets are quantitative requirements that define the acceptance criteria Budgets are the performance goals allocated across all of the architecture components that must all be met in order to meet the overall targets Estimates are design component goals derived from experience or previous measurement of existing components • These definitions are excerpted from AT&T performance engineering course and only used for educational propose.
cenit
l
Estimate -> How well can the system perform?
l
Budget -> How well must the system perform?
cenit
Performance Engineering Life Cycle Architecture Design Budget
23
m 2
m 3
m 4
m 5
m 6
m 7
17
18
19
20
21
22
24
25
26
27
28
29
Development
Targets
Requirement Use-cases
M 1
Estimate measure
Spread Sheet
Initial Performance Measurement Model Test
cenit
How to start (Target) l l
Seek the boundary (requirement) Start with Back of the Envelope calculation Ø Ø Ø
Ball park (e.g. no of transactions, normal and at peak) Don’t get hung up on precision early E.g. How much water flow out?
cenit
Budget l l l l
target or educated guess Iterative process Start from subsystem and then down to modules Budgeted resources items for each process/modules/subsystems Ø
CPU, memory, Disk I/O, network bandwidth cenit
Budget types l l l
l
Concurrency: percentage of resource allocation A sequential: wall clock time Example of Budget response time for a transaction T trans = T cpu + (1 – Cmem) T disk + T network
cenit
Dependability Estimation/ Measurement l l
Similarly to aforementioned 3 techniques Two measures Ø Ø
l
Availability (ratio of uptime/total) Reliability (MTTF)
Analytical modeling Ø Ø
Non-state space State-space cenit
Availability l l
l
A measurement represents a ratio of uptime vs. total times High availability - ability of a system to perform its function continuously (without interruption) for a significantly longer period of time than the reliabilities of its individual components would suggest. High availability is most often achieved through fault tolerance. cenit
Availability Model Server up
Server down & repair
Availability model
S1
time HA-OSCAR dual head model
S1 S1&S2 S2 cenit
Availability (continued) l
Availability = uptime/total time
l
MTTF = Mean Time To Failure Ø
l
MTBF = Mean Time Between Failure Ø
l l
Average time to failure, when it is not repairable Average time to failure, when it is repairable
MTTR = Mean Time To Repair Availability = MTTF/(MTTF+MTTR) cenit
Why Dependability measures? l l l
comparisons with cost and performance. a proper focus for productimprovement efforts. Consideration of safety and risk issues.
cenit
Dependability Modeling l l l
Include reliability modeling and availability modeling A designed system can be shown to meet performance and dependability requirement. provide a good mechanism for examining the behavior of a system, right from the design stage to implementation and final deployment.
cenit
Dependability l
Two measures Reliability (MTTF) Ø Availability (ratio of uptime/total) Ø
cenit
Reliability l
l
Definition: The reliability R(t) of a system at time t is the probability that the system failure has not occurred in the interval [0,t). If X is a random variable that represents the time to occurrence of system failure, then R(t)=P(X>t). unreliability = 1-R(t)
cenit
Reliability l
l
Definition MTTF of a system is the expected time until the occurrence of the (first) system failure. If X is a random variable that represents the time to occurrence of system failure, then MTTF=E[X]. Given the system reliability R(t), the MTTF can be computed as, MTTF = ∫ R(t)dt cenit
Availability l l
l
A measurement represents a ratio of uptime vs. total times High availability - ability of a system to perform its function continuously (without interruption) for a significantly longer period of time than the reliabilities of its individual components would suggest. High availability is most often achieved through fault tolerance. cenit
Degree of Availability Availability Class
System Type
Unavailability (minutes/year)
Availability (in percent)
Unmanaged
50,000
90
1
Managed
5,000
99
2
Well-managed
500
99.9
3
Fault-tolerant
50
99.99
4
High Availability
5
99.999
5
Very High Availability
0.5
99.9999
6
99.99999
7
Ultra Availability
0.05
cenit
Availability l
l
l l
Definition: Availability A(t) of a system at time t is the probability that the system is functioning correctly at time t. Like the reliability measure, in some applications it is better to compute the system unavailability U(t) = 1 -A(t). Availability = MTTF / (MTTF + MTTR) A steady = lim A(t) where t -> ∞ cenit
Modeling Techniques l
Non State-space Ø Ø
l
Fault-tree Reliability Block Diagram
State-Space Ø Ø
Continuous Markov Chain Stochastic Petri Net
cenit
Example of system
cenit
Fault Tree
cenit
Availability Model Server up
Server down & repair
Availability model
S1
time HA-OSCAR dual head model
S1 S1&S2 S2 cenit
HA-OSCAR SRN model
• Server sub-model • Switches • Compute nodes cenit
Server Sub Model • P Server up • P Server down • Failover • P server repair • Failback
• S is up and ready • S takes control • S Server down • S repair cenit
Switch sub model
Compute node sub model
cenit
Instantaneous Availability Steady (A) = 99.993 (36 min) vs. Beowulf (A) = 99.65 (30 hr)
cenit
Stochastic Petri Net Package l l l
R & D from Duke U Very popular Petri net based dependability analysis
cenit
Exercises l
See the handouts in the class
cenit