Software Risk Management: Principles and Practices

Software Risk Management: Principles and Practices BARRY W. BOEHM, Defense Advanced Research Projects Agency I) Identzhing and dealing with risks e...

Author: Dorthy Pope

10 downloads 0 Views 1MB Size

Report

Download PDF

Recommend Documents

RISK MANAGEMENT PRINCIPLES

Principles and Practices of Financial Management (PPFM)

Software Project Management. Risk

Environmental risk management Principles and process

Risk management in accountancy practices

Risk Management in Software Engineeringg

Derivatives: Practices and Principles

Principles, Practices Lean and Agile Software Management In The Large Experiences of a Playing Coach

Code: IM 101. Principles and Practices of Management

Principles and Practices of Irrigation Management for Vegetables1

Employment Practices Liability Risk Management Guide

Swimming Pool Safety Risk Management Best Practices

Reference Guide Employment Practices Risk Management

AN OVERVIEW OF ENTERPRISE RISK MANAGEMENT PRACTICES

1.5 Irrigation Principles and Practices

THE CLASSROOM: Principles and Practices

AIDS: PRINCIPLES, PRACTICES AND POLITICS

Organic Farming Principles and Practices

MANAGEMENT AND ACCOUNTING PRINCIPLES

Security Program and Policies: Principles and Practices

Optoelectronics and Photonics: Principles and Practices

RISK FACTORS AND RISK MANAGEMENT

Using Lean Principles to Improve Software Development Practices in a Large-Scale Software Intensive Company

Infant Mental Health: Principles and Practices

Software Risk Management: Principles and Practices BARRY W. BOEHM,

Defense Advanced Research Projects Agency

I) Identzhing and

dealing with risks early in development lessens long-tem costs and helps prevent so@are disasters. It is easy t o begin managing risks in your environment.

32

their early stages, the software field has had its share of project disasters: the software equivalents of the Beauvais Cathedral, the hWlS Titanic, and the “Galloping Gertie” Tacoma Narrows Bridge. The frequency of these software-project disasters is a serious concern: A recent survey of 600 firms indicated that 35 percent of them had at least one runaway software project.’ Most postmortems of these softwareproject disasters have indicated that their problems would have been avoided or strongly reduced if there had been an explicit early concern with identifylng and resolving their high-risk elements. Frequently, these projects were swept along by a tide of optimistic enthusiasm during their early phases that caused them to miss some clear signals of high-risk issues that proved to be their downfall later.

074B7459/91 /fl’lOfl/W32/$flI

00 D IEEE

Enthusiasm for new software capabilities is a good thing. But it must be tempered with a concern for early identification and resolution of a project’s high-risk elements so people can get these resolved early and then focus their enthusiasm and energy on the positive aspects of their product. Current approaches to the software process make it too easy for projects to make high-risk commitments that they will later regret: The sequential, document-driven waterfall process model tempts people to overpromise software capabilities in contractually binding requirements specifications before they understand their risk implications. The code-driven, evolutionary development process model tempts people to say, “Here are some neat ideas I’d like to s I’ll code them up, and put into t h ~ system.

+

+

JANUARY 1991

if they don’t fit other people’s ideas, we’ll just evolve thmgs until they work.” This sort of approach usually works fine in some well-supported minidomains like spreadsheet applications but, in more complex application domains, it most often creates or neglects unsalvageable high-risk elements and leads the project down the path to disaster. At TRW and elsewhere, I have had the good fortune to observe many project managers at work firsthand and to try to understand and apply the factors that distinguished the more successful project managers from the less successful ones. Some successfully used a waterfall approach, others successfully used an evolutionary development approach, and still others successfully orchestrated complex mixtures of these and other approaches involving prototyping, simulation, commercial software, executable specifications, tiger teams, design competitions, subcontracting, and various lands of costbenefit analyses. O n e pattern that emerged very strongly was that the successful project managers were good risk managers. Although they generally didn’t use such terms as “risk identification,” “risk assessment,” “risk-management planning,” or “risk monitoring,”they were using a general concept of risk exposure (potential loss times the probability of loss) to guide their priorities and actions. And their projects tended to avoid pitfalls and produce good products. The emerging discipline of software risk management is an attempt to formalize these risk-oriented correlates of success into a readily applicable set of principles and practices. Its objectives are to identi@, address, and eliminate risk items before they become either threats to successful software operation or major sources of sofixrare rework.

shortfalls are unsatisfactory. + For maintainers, poor-quality soft RE = POJO) * L(U0) ware is unsatisfactory. These components of an unsatisfac where RE is the risk exposure, P(U0) is the probability of an unsatisfactory out- tory outcome provide a top-level checkli: come and L(U0) is the loss to the parties for identifying and assessing risk items. A fundamental risk-analysis paradigr affected if the outcome is unsatisfactory. To relate this definition to software pro- is the decision tree. Figure 1 illustrates jects, we need a d e h t i o n of “unsatisfac- potentially risky situation involving th software controlling a satellite experi tory outcome.” Given that projects involve several ment. The software has been under devel classes of participants (customer, devel- opment by the experiment team, whic oper, user, and maintainer), each with dif- understands the experiment well but is in ferent but hghly important satisfaction experienced in and somewhatcasual aboi criteria, it is clear that “unsatisfactoryout- software development. As a result, the sal ellite-platform manager has obtained a come” is multidimensional: + For customers and developers, estimate that there is a probability P(UC budget overruns and schedule slips are of 0.4 that the experimenters’software wi have a critical error: one that will wipe 01 unsatisfactory. + For users, products with the wrong the entire experiment and cause an associ functionality, user-interface shortfalls, ated loss L(U0) of the total $20 millio performance shortfalls, or reliability investment in the experiment.

pact” or “risk factor.” Risk exposure is defined by the relationship

BASIC CONCEPTS Webster’s dictionary defines “risk” as “the possibility of loss or injury.” This definition can be translated into the fundamental concept of risk management: risk exposure, sometimes also called “risk im-

IEEE S O F T W A R E

FIGURE 1 DECISION TREE FOR WHFTHER TO PERFORM INOEPENDENTVALIOATIONANOVERIFICATIONTO ELIMINACRITICALERRORS IN A WTELLITEEXPERIMENT PROGRAM. UUO] IS THE LOSS ASSOCIATE0 W K H AN UNSATlSFAl TORY OUTCOME, P[UOl IS THE PROBABILITYOFTHE UNSATISFACTORYOUTCOME, AN0 CE IS A CRKICAL ERROR

33

Checklists

Decomposition Performance d

ek

FIGURE 2 SDFNARE RISK MANAGEMENT SlEPS

The satellite-platformmanager identifies two major options for reducing the risk of losing the experiment: + Convincing and helping the experiment team to apply better development methods. This incurs no additional cost and, from previous experience, the manager estimates that this will reduce the error probability P(U0) to 0.1. + Hiring a contractor to independently verify and validate the software. This costs an additional $500,000; based on the results of similar IV&V efforts, the manager estimates that this will reduce the error probability P(U0) to 0.04. The decision tree in Figure 1 then shows, for each of the two major decision options, the possible outcomes in terms of the critical error existing or being found and eliminated, their probabilities, the losses associated with each outcome, the risk exposure associated with each outcome, and the total risk exposure (or expected loss) associated with each decision option. In tlus case, the total risk exposure associated with the experiment-team option is only$2 d o n . For the IV&Voption, the total risk exposure is only $1.3 d i o n , so it representsthe more atmctive option. 34

Besides providing individual solution for risk-management situations, the deci sion tree also provides a framework fo analyzingthe sensitivity of preferred soh tions to the risk-exposure parameter: Thus, for example, the experiment-tear option would be preferred if the loss due ti a critical error were less than $1 3 millior if the experiment team could reduce it critical-error probability to less thai 0.065, if the IV&V team cost more thai $1.2 million, if the IV&V team could nu reduce the probability of critical error ti less than 0.075, or if there were variou partial combinations of these possibilities This sort of sensitivity analysis help deal with many situations in whch proba bilities and losses cannot be estimated we1 enough to perfonn a precise analysis. l h risk-exposure framework also support some even more approximate hut still ver useful approaches, like range estiniatioi and scale-of-10 estimation.

RISK MANAGMENT As Figure 2 shows, the practice of ri: management involves two primary stef each with three subsidiary steps.

The first primary step, risk assessment, involves risk identification, risk analysis, and risk prioritization: + Risk identification produces lists of the project-specific risk items likely to compromise a project’s success. Typical risk-identification techniques include checklists, examination of decision drivers, comparison with experience (assumption analysis), and decomposition. + Risk analysis assesses the loss probability and loss magnitude for each identified risk item, and it assesses compound risks in risk-item interactions. Typical techniques include performance models, cost models, network analysis, statistical decision analysis, and quality-factor (like reliability, availability,and security) analysis. + Risk prioritization produces a ranked ordering of the risk items identified and analyzed. Typical techmques include risk-exposure analysis, risk-reduction leverage analysis (particularly involving cost-benefit analysis), and Delphi or group-consensus techniques. The second primaiy step, risk control, involves risk-management planning, risk resolution, and risk monitoring: + ask-management planning helps prepare you to address each risk item (for example, via information buying, risk avoidance,risk transfer, or risk reduction), including the coordination of the individual risk-item plans with each other and with the overall project plan. Typical techniques include checklists of risk-resolution techmques, cost-benefit analysis,and standard risk-management plan outlines, fonns, and elements. + Risk resolution produces a situation in which the risk items are eliminated or otherwise resolved (for example, risk avoidance via relaxation of requirements). Typical techniques include prototypes, simulations, benchmarks, mission analyses, key-personnel agreements, design-tocost approaches, and incremental development. + Risk monitoring involves tracking the project’s progress toward resolving its risk items and taking corrective action where appropriate. Typical techniques include milestone tracking and a top-10 risk-item list that is highlighted at each JANUARY

1991

weekly, monthly, or milestone project review and followed up appropriately with reassessment of the risk item or corrective action. In addition, risk management provides an improved way to address and organize the life cycle. Risk-driven approaches, like the spiral model of the software process: avoid many of the difficulties encountered with previous process models like the waterfall model and the evolutionary development model. Such risk-driven approaches also show how and where to incorporate new software technologies like rapid prototyping, fourth-generation languages, and commercial softwareproducts into the life cycle.

SIX STEPS Figure 2 summarized the major steps and techniques involved in software risk management. This overviewarticle covers

four significant subsets of risk-management techniques: risk-identification checkhsts, risk prioritization, risk-management planning, and risk monitoring. Other techques have been covered elsewhere.’~~

Risk-identificationchecklists. Table 1 shows a top-level risk-identification checklist with the top 10 primary sources of risk on software projects, based on a survey ofsevera1 experienced project managers. Managers and system engineers can use the checklist on projects to help identify and resolve the most serious risk items on the project. It also provides a corresponding set of risk-management techniques that have been most successfulto date in avoiding or resolving the source of risk. Ifyou focus on item 2 of the top-10 list in Table 1 (unrealistic schedules and budgets), you can then move on to an example of a next-level checklist: the risk-probabil-

ity table in Table 2 for assessing the probability that a project will overrun its budget. Table 2 is one of several such checklists in a n excellent US Air Force handbook’ on software risk abatement. Using the checklist, you can rate a project’s status for the individualattributes associated with its requirements, personnel, reusable software, tools, and support environment (in Table 2 , the environment’s availabilityor the risk that the environment will not be available when needed). These ratings will support a probability-range estimation of whether the project has a relativelylow (0.0 to 0.3), medium (0.4 to 0.6), or high (0.7 to 1.0) probability of o v e e g its budget. Most of the cnncal nsk items in the checklist have to do with shortfalls in domain understanding and in properly scoping the job to be done - areas that are generally underemphasized in computerscience literature and education. Recent

Risk item

Risk-managementtechnique

Personnel shortfalls

Staffingwith top talent, job matching, team building, key personnel agreements,cross training

Unrealisticschedules and budgets

Detailed multisource cost and schedule estimation, design to cost, incremental development, softwarereuse, requirements scrubbing.

Developing the wrong functions and properties

Organization analysis, mission analysis, operations-conceptformulation,user surveys and user participation, prototyping, early users’manuals, off-nominal performance analysis, quality-factor analysis.

Developing the wrong user interface

Prototyping, scenarios, task analysis, user participation.

Gold-plating

Requirements scrubbing,prototyping, cost-benefit analysis, designing to cost.

Continuing stream of requirements changes

High change threshold, information hiding, incremental development (deferringchanges to later increments).

Shortfalls in externally furnished components

Benchmarking,inspections, reference checking, compatibilityanalysis.

Shortfallsin extemally performed tasks

Reference checking, preaward audits, award-fee contracts, competitive design or prototyping, team-building.

Real-nme performance shortfalls

Simulation, benchmarking, modeling, prototyping, instrumentation, tuning.

Straining computer-science capabilities

Technical analysis, cost-benefit analysis, prototyping, reference checking.

IEEE SOFTWARE

-

35

Cost drivers Requirements Size Resource constraints Application Technology Requirementsstability Personnel Availability

Mix Experience Management environment Reusable software Availability Modifications Language Rights Certification Tools and environment Facilities Availability

Probability Improbable(0.0-0.3)

Probable (0.4-0.6)

Frequent (0.7-1.O)

Small, noncomplex, or easily decomposed Little or no hardware-imposed constraints Nonreal-time, little system interdependency Mature, existent, in-house experience Little or no change to established baseline

Medium to moderate complexity, decomposable Some hardware-imposed constraints Embedded, some system interdependencies Existent, some in-house experience Some change in baseline expected

Large, highly complex, or not decomposable Significanthardware-imposed const" Real-time, embedded, strong interdependency New or new application, little experience Rapidly changing, or no baseline

In place, little turnover expected Good mix of software disciplines High experienceratio Strong personnel management approach

Available, some turnover expected Some disciplines inappropriatelyrepresented Average experienceratio Good personnel management approach

Not available, high turnover expected Some disciplines not represented Low experienceratio Weak personnel management approach

Compatiblewith need dates Little or no change Compatiblewith system and maintenance requirements Compatiblewith maintenance and competitionrequirements Verified performance, application compatible

Delivery dates in question Some change Partial compatibilitywith requirements Partial compatibilitywith maintenance, some competition Some application-compatible test data available

Incompatiblewith need dates Extensivechanges Incompatiblewith system or maintenance requirements Incompatiblewith maintenance concept, noncompetitive Unverified, little test data available

Little or no modification In place, meets need dates

Some modifications,existent Some compatibilitywithneed dates Partial compatibilitywith maintenance and development plans Some controls

Major modifications, nonexistent Nonexistent, does not meet need dates Incompatiblewith maintenance and development plans

Some shortage of financial resources, possible overrun

Significantfinancial shortages, budget overrun likely

Rights

Compatiblewith maintenance and development plans

Configurationmanagement

Fully controlled

Sufficientfinancial resources

initiatives, lke the Software Engineering Institute's masters curriculum in software engineering, are providing better coverage in these areas. The SEI is also initiating a major new program in software risk management.

Risk adysis and prioritizcltion After using 36

all the various risk-identification checklists, plus the other risk-identification techtuques in decision-driver analysis, assumption analysis, and decomposition, one very real risk is that the project will identify so many riskitems that the project could spend years just investigating them. This is where risk prioritization and its

No controls

associated risk-analysis activities become essential. The most effective techtuque for risk prioritization involves the risk-exposure quantity described earlier. It lets you rank the risk items identified and determine which are most important to address. One difficulty with the risk-exposure JANUARY

1991

Loss caused by unsatisfactoryoutcome

Risk exposure

Unsatisfactory outcome

Probability of unsatisfactoryoutcome

A. Software error kills experiment

3-5

10

30-50

B. Software error loses key data

3-5

8

24-40

C . Fault-tolerant features cause unacceptable performance

4-8

7

28-56

D. Monitoring software reports unsafe condition as safe

5

9

45

E. Monitoring software reports safe condition as unsafe

5

15

E Hardware delay causes schedule overrun

6

24

G. Data-reduction software errors cause extra work

8

8

H. Poor user interface causes inefficient operation

6

30

I. Processor memory insufficient

1

7

J. Database-management software loses derived data

2

4

(data-reduction errors), but the fact that these errors are recoverable and not mission-critical leads to a low loss factor and a resulting low RE of 7. Similarly, item I (insufficient memory) has a high potential loss, but its low probability leads to a low RE of 7. On the other hand, a relatively

low-profile item like item H (user-interface shortfalls) becomes a relatively highpriority risk item because its combination of moderately high probability and loss factorsyield a RE of 30. + The RE quantities also provide a basis for prioritizing verification and vali-

quantity, as with most other decision-analysis quantities, is the problem of m a h g accurate input estimates of the probability and loss associated with an unsatisfactory outcome. Checkhsts like that in Table 2 provide some help in assessing the probability of occurrence of a given risk item, but it is clear from Table 2 that its probability ranges do not support precise probability estimation. Full risk-analysis efforts involving prototyping, benchmarking, and simulation generally provide better probability and loss estimates, but they may be more expensive and time-consuming than the situation warrants. Other techniques, llke betting analogies and group-consensus techruques, can improve risk-probability estimation, but for risk prioritization you can often take a simpler course: assessing the risk probabilities and losses on a relative scale of 0 to 10. Table 3 and Figure 3 illustrate h s riskprioritization process by using some potential riskitems from the satellite-experiment project as examples. Table 3 summarizes several unsatisfactory outcomes with their corresponding ratings for P(UO), L(UO), and their resulting risk-exposure estimates. Figure 3 plots each unsatisfactory outcome with respect to a set ofconstant risk-exposure contours. Three key points emerge from Table 3 and Figure 3: + Projects often focus on factors having either a h g h P(U0) or a high L(UO), but these may not be the key factors with a high risk-exposure combination. One of the hghest P(U0)s comes from item G IEEE S O F T W A R E

FIGURE 3. RISKEXPOSUREFACTORS A N 0 CONTOURS FOR THE SATELLITE€XPERIMENTSOFTWARE. RE IS THE RISK EXPOSURE, P(UO] THE PROBABILITY OF AN UNSATISFACTORY OUTCOME, A N 0 L[UO] THE L E S ASSOCIATED WITH THAT UNSATISFACTORY OUTCOME. THE GRAPH P O N E M A P THE TEMS FROM TABLE 3 WHOSE RISK EXPOSURE ARE BEING ASSESSED

37

1. Obkctives (the "why)

+ +

Determine, reduce level of risk of the softwore huh-tolerance features causing unacceptableperformance, Create o descriptionof and o development plon for a set of lowrisk fault-tolerancefeatures. 2. Deliverobles and milestones (the "what" and "when"). +By Week 3. 1. Evoluationof fault-toleranceoptions 2. Assessment of reusable components 3. Droft wokload characterization 4. Evaluation plon for prototype exercise 5. Description of prototype +By Week 7. 6. Operational prototype with key foult-tolerancefeatures. 7 Workload simulabn 8. Instrumentation and data reduction capabilities. 9. Draft description, plan forfault+hrance features. +By Week 10 10. Evaulationand iteration of prototype 1 1. Revised description, plan for fault-tolerancefeatures 3. Responsiblities (the "who" and "where") +System engineer: GSmith Tasks 1,3,4,9,11. Support of tasks 5, 10 +lead programmer: [.lee Tasks 5,6,7,10. Support of tasks 1,3 + Progrommer:J.Wilson Tasks 2,8. Support of tosb 5,6,7,10 4. Approach (the "how") + Desigttwchedule prototyping effort +Driven by hypothesesabout fault-toleranceperformonte effects +Use real-time operating system, add prototype fault-tolerancefeatures + Evaluote performancewith respectto representaiiveworkload +Refine prototype based on resultsobserved 5. Resources (the "how much") S6OK -full-time system engineer, led programmer, progmmmer (IO week;)*(3 stfl*SZk/staffweek) SO- hree dedicatedwokstahons (from project pwl) SO -two target processors (from propct pool) SO -one test coprocessor (from project pwl) $1 OK -contingencies $70K -total

X I R E 4. RISK-MANAGEMENT PLAN FOR FAULT-TOLERANCEPROTOTYPING

ition and related test activities by giving ich error clas a significance weight. Freuently, all errors are treated with equal eight, putting too much testing effort it0 finding relatively trivial errors. 4 There is often a good deal of uncerinty in estimating the probability or loss sociated with an unsatisfactoryoutcome. The assessmentsare frequently subjective id are often the product of surveyingsev.al domain experts.) The amount of un:rtainty is itself a major source of risk, hich needs to be reduced as early as posble. The primary example in Table 3 and igure 3 is the uncertainty in item C about hether the fault-tolerance features are ling to cause an unacceptable degrada3n in real-time performance. If P(U0) is ted at 4, this item has only a moderate E of 28, but if P(U0) is 8, the RE has a ip-priority rating of 56. One of the best ways to reduce h s urce of risk is to buy information about Le actual situation. For the issue of fault

tolerance versus performance, a good waj to buy information is to invest in a prototype, to better understand the performance effects of the various fault-tolerance features. Risk-management planning. Once you d e tennine a project's major risk items anc their relative priorities, you need to establish a set of risk-control functions to bring the risk items under control. The first stei in thls process is to develop a set of riskmanagement plans that lay out the activities necessary to bring the risk items undei control. One aid in doing thls is the top-I( checklist in Figure 3 that identifies t h e most successful risk-management techniques for the most common risk items. A an example, item 9 (real-time performance shortfalls)in Table I covers the uncertainty in performance effect of thc fault-tolerance features. The corresponding risk-management techmques include

simulation, benchmarking, modeling, prototyping, instrumentation, and tuning. Assume, for example, that a prototype of representative safety features is the most cost-effective way to determine and reduce their effects on system performance. T h e next step in risk-management planning is to develop risk-management plans for each risk item. Figure 4 shows the plan for prototyping the fault-tolerance features and determining their effects on performance. The plan is organized around a standard format for software plans, oriented around answering the standard questions of why, what, when, who, where, how, and how much. Ths plan organization lets the plans be concise (fitting on one page), action-oriented, easy to understand, and easy to monitor. T h e final step in risk-management planning is to integrate the risk-management plans for each risk item with each other and with the overall project plan. Each of the other high-priority or uncertain risk items will have a risk-management plan; it may turn out, for example, that the fault-tolerance features prototyped for this risk item could also be useful as part of the strategy to reduce the uncertainty in items A and B (software errors killing the experiment and losing experiment-critical data). Also, for the overall project plan, the need for a 10-week prototype-development and -exercise period must be factored into the overall schedule, to keep the overall schedule realistic.

Risk resolution and momtoring. Once you have established a good set of risk-management plans, the risk-resolution process consists of implementing whatever prototypes, simulations, benchmarks, surveys, or other risk-reduction techniques are called for in the plans. Risk monitoring ensures that this is a closed-loop process by tracking risk-reduction progress and applying whatever corrective action is necessary to keep the risk-resolution process on track. Risk management provides managers with a very effective technique for keeping on top of projects under their control: Pmjkt top-1 0 rirk-item walking. This technique concentrates management atten~

38

JANUARY 1991

the top 10 risk items. (The number could be seven or 12 without loss of intent.) The summary should include each risk item’s current top-10 r&g, its rank at the previous review, how often it has been on the top-10 list, and a sumnary of progress in resolving the risk item since the previous review. + Focusing the project-review meeting on dealing with any problems in resolving the risk items. Table 4 shows how a top-10 list could have worked for the satellite-experiment project, as of month 3 of the project. The project’s top risk item in month 3 is a critical staffing problem. Highlighting it in the monthly review meeting would stimulate a discussion by the project team and the boss of the staffing options: Make the unavailable key person available, reshuffle project personnel, or look for new people witlun or outside the organization. This should result in an assignment of action items to follow through on the options

tion on the hgh-risk, high-leverage, critical success factors rather than swamping management reviews with lots of low-priority detail. As a manager, I have found that h s type of risk-item-oriented review saves a lot of time, reduces management surprises, and gets you focused on the high-leverage issues where you can make a difference as a manager. Top-10 risk-item tracking involves the followingsteps: + R a h g the project’s most significant risk items. + Establishing a regular schedule for higher management reviews of the project’s progress. The review should be chaired by the equivalent of the project manager’s boss. For large projects (more than 20 people), the reviews should be held monthly. In the project itself, the project manager would review them more kequently. + Beginning each project-review meeting with a summary of progress on

Risk item

Monthlv ranking This Last

chosen, including possible actions by the project manager’s boss. The number 2 risk item in Table 4, target hardware deliverydelays,is also one for which the project manager’s boss may be able to expedite a solution -by cutting through corporate-procurement red tape, for example, or by escalatingvendor-delay issues with the vendor’s higher management. As Table 4 shows, some risk items are moving down in priority or going off the list, while others are escalating or coming onto the list. The ones moving down the list-like the design-verification and -Validation staffing, fault-tolerance prototyping, and user-interface prototyping - still need to be monitored but &equently do not need special management action. The ones moving up or onto the list - like the data-bus design changes and the testbed-interface definitions are generally the ones needing higher management attention to help get them

Risk-resolution progress

No. of months ~

Replacingsensor-controlsoftware developer

1

4

2

Top replacement canddate unavailable

Target hFdware delivery delays

2

5

2

Procurement procedural delays

Sensor data formats undefined

3

3

3

Action items to software, sensor teams; due next month

Staffing of design V&V team

4

2

3

Key reviewers committed; need fault-tolerance reviewer

Software fault-tolerance may compromise performance

5

1

3

Fault-toleranceprototype successful

Accommodate changes in data bus design

6

-

1

Meeting scheduled with data-bus designers

Test-bed interface definitions

7

8

3

Some delays in action items; review meeting scheduled

User interfaceuncertainties

8

6

3

User interface prototype successful

TBDs in experiment operational concept

-

7

3

TBDs resolved

Uncertaintiesin reusable monitoring software

-

9

3

Required design changes small, successfully made

IEEE S O F T W A R E

I

Risk-management driven evaluation criteria, activities

I Implement Life-cytle plan Life-cycle risk-management plan I I

t Evaluation, source selection

Developmh p h ’ ’ “ Development risk-management plan Risk-item evaluation information Risk-management tailored document plan

I

Monitor development plan, development risk-management plan,

Update, implement: Development plan Development risk-management plan

r

4

I

t Acceptance, installation

1

Operations and maintenance

I

1

FIGURE 5 FRAMEWORK FOR LIFE-CYCLE RISK MANAGEMENT

resolved quickly. As tlus example shows, the top-1 0 riskitem list is a very effective way to focus higher management attention onto the project’s critical successfactors.It also uses management’s time very efficiently,unlike typical monthly reviews, which spend most of their time on things the hgher manager can’t do anythmg about. Also, if the hgher manager surfaces an additional concern, it is easy to add it to the top-10 risk item list to be hghlighted in future reviews.

IMPLEMENTINGRISK MANAGEMENT Implementing risk management involves inserting the risk-management principles and practices into your existing life-cycle management practices. Full implementation of risk management involves the use of risk-driven sofiware-process models &e the spiral model, where risk considerations determine the overall sequence of life-cycle activities, the use of prototypes and other risk-resolution techniques, and the degree of detail of plans and specifications. However, the best implementation strategy is an incremental one, which lets an organization’s culture adjust gradually to risk-oriented manage~~

40

ment practices and risk-driven process models. A good way to begin is to establish a top-IO risk-item tracking process. It is easy and inexpensive to implement, provides early improvements, and b e p s establishing a familiarity with the other risk-management principles and practices. Another good way to gain familiariq7 is via books like my recent tutorial on risk management,3 which contains the Air Force riskalsatenient pamphlet’ and other useful articles, and Robert Charette’s recent good book on risk management.’ An effective next step is to identifjr an appropriate initial project in which to i n plement a top-level life-cycle risk-mnanagement plan. Once the organization has accuniulated some risk-nlanagement experience on this initial project, successive steps can deepen the sophistication ofthe risk-management techniques and broaden their application to wider classes of projects. Figure 5 provides a scheme for iniplementing a top-level life-cycle risk-rnanagement plan. It is presented in the context of a contractualsoftware acquisition,but you can tailor it to the needs of an intemal development organizationas well. You can organize the life-cycle risk-

management plan as an elaboration of the “why, what, when, who, where, how, how much” framework of Figure 4. %le this plan is primarily the customer’s responsibility, it is very useful to involve the developer community in its preparation as well. Such a plan addresses not only the development risks that have been the prime topic of this article but also operations and maintenance risks. These include such items as staffing and training of maintenance personnel, discontinuities in the switch from the old to the new system, undefined responsibilities for operations and maintenance facilities and functions, and insufficient budget for planned lifecycle improvements or for corrective, adaptive, and perfective maintenance. Figure 5 also shows the importance of proposed developer risk-management plans in competitive source evaluation and selection. Emphasizing the realism and effectiveness of a bidder’s risk-management plan increases the probability that the customer will select a bidder that clearly understands the project’s critical success factors and that has established a development approach that satisfactorily addresses them. (If the developer is a noncompetitive internal organization, it is equally important for the internal customer to require and review a developer risk-management plan.)

T

e most important thing for a project to do is to get focused on its critical success factors. For various reasons, including the influence of previous document-driven management pdelines, projects get focused on activities that are not critical for their success. These frequently include writing boilerplate documents, exploring intriguing but peripheral techcal issues, playing politics, and tqmg to sell the “ultimate” system. In the process, critical success factors get neglected, the project fails, and nobody wins. The key contribution of software risk management is to create tlus focus on critical success factors - and to provide the techques that let the project deal with them. The risk-assessment and risk-control techques presented here provide the

___.

JANUARY

1991

foundation layer of capabilities needed to implement the risk-oriented approach. However, risk management is not a cookbook approach. To handle all the complex people-oriented and technologydriven success factors in projects, a great measure of hunian judgement is required. Good people, with good skills and good judgment, are what make projects work. Risk management can provide you with some of the skills, an emphasis on getting good people, and a good conceptual framework for sharpening your judgement. I hope you can find these useful on j7our next project.

+

REFERENCES 1. J. Rothfeder, “It’s Late, Costly, and hcompetent - But Tiy Firing a Computer System,” Bu.ri7zes.r Week,Nov. 7 , 1988, pp. 164-165. 2. B.W Boehm, “A Spiral Model of Software Development and Enhancement,” Cmputm, hfay 1988,pp.61-72. 3 , B.U! Boehm, Software Risk Management, CS Press, Los Alamitos, Calif., 1989. 4. R.N. Charette, Sojinare Engineering Risk Analp r and .\.fa??agemmt,McGraw-Hill, New York, 1989. 5 . “Software Risk Abatement,” AFSC/.4FIdC pamphlet 800-48,US Air Force Systems Command, Andrews .4FB, Md., 1988.

Barry W. Boehm is dircctor of the Defense Advanced Research Project Agency‘s Information Science and Technology Office, the us government’slargest computer/communications research organization. In his previous position as chief scicndst for TRLV”‘Defense Systems Group, he was involved in applying risk-management principles to large projects, including the National Aeronautics and Spacc r\dnunistration’s space station, the Federal .biation hdininistration+Advanced Automanon System, and the Dcfmsc Dept.i Sa;rtc$c Defcnse I~utiative. Boehm received a BA in mathematics from Harvard Univcrsity and an ALA and PhD in mathematiis from UCL.L

,Address questions ahout this arnclc to the author at D . W A ISTO, 1400 ililson Blvd.. Arlington,

22?09-?108.

The Premier Electronics Industry Conference of the Northwest October 1-3, 1991 in Portland, Oregon

Northcon is bringing this CompTehenSjve electronics mference andexhibitkm to & d d , Oregon, w h ” e t h a n S m electronicsengineekgprofessionalsdesign, test and mufacmng engineers, specliers,#wrchasirgspecialists, engimngmanagmt R & D d corporatepersonnel, quality speciaikts and corporate exeahves - wit gather to leam about the latest electronicsproducts and technology. Papets forpresentationin the technical sessions sye requested in five areas: 1. D e s l ~ e s ~ t i o ~ m t 2. ComputerHar&ar&We Advances 3. Research and Develqm” is a joint v e n m of: 4. Quality and Reliability 5. Regulations and EnM;rc;nment

For a pqmr to be consider&, a lcKxF w d s u m t y must be subm’ttedthat states thectyective of t h e w , the new conbihtim, and the ccmcluionthat MI be made. Reviouslypublishedmterials are not acceptable.

Pdandandseafie sectionsof the Institute of ELecaical and Electronics Er&een, IEEE

Abstracts must be mailed or faxed no later than Much 15, 1991 to be considwed for evaluation. Rease w h t abstracts to: Jon S. ports TechnicalRograms Chair do NC 8110AirportBoulevard LosArgeles, C4 900453194 (soo)87i-2668(x (213)215-3976, ext. 251 F M : (213)641-51 17

e0 cascade chapter d the Electronics Representatives Assodation. ERA