Volume 1, No. 3, May 2012 ISSN – 2278-1080 The International Journal of Computer Science & Applications (TIJCSA) RESEARCH PAPER Available Online at http://www.journalofcomputerscience.com/
A comparison of software cost estimation methods: A Survey Narendra Sharma1, Aman Bajpai2, Mr. Ratnesh Litoriya3 Department of computer science, Jaypee University of Engg. & Technology
[email protected] [email protected] [email protected] Abstract— Cost estimation is the important part of every type projects management. Accurate cost estimation helps us complete the project within time and budget. For this work we have knowledge of all available techniques and tools. This paper showing the comparison of various software cost estimation methods and some cost estimation models, that can mostly used for the software cost estimation projects. These methods given an introductory view of all available cost estimation techniques works with the data mining and techniques work used another terminology for estimation. The main aim of this paper to provide a comparison analysis of all available techniques and tools. Keywords- software engineering software cost estimation methods
method, top-down method, and bottom-up method. No one method is necessarily better or worse than the other, in fact, their strengths and weaknesses are often complimentary to each other. To understand their strengths and weaknesses is very important when you want to estimate your projects [11]. For improving the accuracy of cost estimation other fields also calibrate with the software engineering fields. 2CEE is one of the cost estimation model developed by JPL for NASA. This model combination of data mining and software engineering fields. This type of estimation it can calibrate machine learning algorithms with cost estimation models. The main aim of this paper to provide a survey of all these type of models and methods and see which model generates the accurate estimation. II.
I.
INTRODUCTION
This paper provide an introductory view of new Practitioners for providing detailed about software cost estimation methods and models. There are several methods and models, we use for the software cost estimation, but which method is suitable for cost estimation it’s very difficult to decide. To solve this type problem it is very necessary to know about the software cost estimation methods and models. With the help of this paper I am trying to provide a view of all cost estimation techniques that can be used in different- different environments [12]. Cost judgment or estimation is one of the most difficult tasks in project management. It is to accurately estimate needed resources and required schedules for software development projects. The software estimation process includes estimating the size of the software product to be produced, estimating the effort required, developing preliminary project schedules, and finally, estimating on the whole cost of the project [1]. In the last few years’ research, there are many software cost estimation methods available including algorithmic methods, estimating by analogy, expert judgment method, price to win
© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved
SOFTWARE COST ESTIMATION METHODS AND MODEL
A. 2CEE cost estimation model It is the first model that can be calibrate software engineering with the data mining techniques. 2CEE (21st Century Effort Estimation) is both a data mining system for developing new software cost models and an effort estimation tool. It uses a variety of data mining and machine learning techniques such as nearest neighbour, feature subset selection, bootstrapping local calibration – to propose the most accurate software cost model. With the help of data mining it analyze the past project data and on the basis of past project data, it can be generate the new pattern. It is designed to explore the ambiguity or uncertainty in the model and in the estimate, to allow estimates early in the lifecycle by representing new projects as ranges of values, and to provide numerous calibration options. 2CEE has been encoded in a Windows based tool that can be used to both generate an estimate and allow the model developer to calibrate and develop models using various machine learning, data mining, and statistical techniques. By automating many tasks for the user it provides gains in cost analyst efficiency. 2CEE uses leave-one out cross validation as a measure of model performance [2, 3,4].
121
Narendra Sharma, Aman Bajpai, Mr. Ratnesh Litoriya, The International Journal of Computer Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No.3 May 2012 B. Expert Judgment Method Expert judgment is the capability to make predictions and stay away or avoid from problems in a given domain. It is the most usable methods for the software cost estimation. Mostly companies used this method for generating the cost of the product. How can you test the conclusion of somebody on your team? You can monitor them over time, or you can accelerate the process by asking them about a topic they are passionate about. Expert judgment techniques involve consulting with software cost estimation expert or a group of the experts to use their experience and understanding of the proposed project to arrive at an estimate of its cost. Generally speaking, a group consensus technique, Delphi technique, is the best way to be used. The strengths and weaknesses are complementary to the strengths and weaknesses of algorithmic method. To provide a satisfactorily broad communication bandwidth for the experts to exchange the volume of information necessary to calibrate their estimates with those of the other experts, a wideband Delphi technique is introduced over standard Delphi technique [1, 9,12]. The estimating steps using this method: a.
Coordinator presents each expert specification and an estimation form.
with
a
b.
Experts fill out forms anonymously.
c.
Coordinator calls a group meeting in which the experts discuss estimation issues with the coordinator and each other.
d.
Coordinator prepares and distributes a summary of the estimation on an iteration form.
e.
Experts fill out forms, again anonymously, and steps 4 and 6 are iterated for as many rounds as appropriate.
The wideband Delphi Technique has subsequently been used in a number of studies and cost estimation activities. It has been highly successful in combining the free discuss advantages of the group meeting technique and advantage of anonymous estimation of the standard Delphi Technique [9]. The advantages of this method are: • The experts can factor in differences between past project experience and requirements of the proposed project. • The experts can factor in project impacts caused by new technologies, architectures, applications and languages involved in the future project and can also factor in exceptional personnel characteristics and interactions, etc
•
Expert may be some biased, optimistic, and pessimistic, even though they have been decreased by the group consensus.
•
The expert judgment method always compliments the other cost estimating methods such as algorithmic method [12].
C. Estimating by Analogy It is the most useful methods of the software cost estimation process. A number of cost estimation models have been developed based on analogy methods. Analogy Based Software Estimation is based on the principle that actual values achieved within the organization in an earlier and similar project are better indicators and predict the future project performance much better than an estimate developed a fresh from scratch. It also facilitates bringing the organizational experience to bear on the new projects Estimating by analogy means comparing the proposed project to previously completed similar project where the project development information id known. Actual data from the completed projects are extrapolated to estimate the proposed project. This method can be used either at systemlevel or at the component-level [11]. Estimating by analogy is relatively straightforward. Actually in some respects, it is a systematic form of expert judgment since experts often search for analogous situations so as to inform their opinion. The steps using estimating by analogy are: 1.
Characterizing the proposed project.
2.
Selecting the most similar completed projects whose characteristics have been stored in the historical data base.
3.
Deriving the estimate for the proposed project from the most similar completed projects by analogy.
The main advantages of this method are: 1.
The estimation is based on actual project characteristic data.
2.
The estimator's past experience and knowledge can be used which is not easy to be quantified.
3.
The differences between the completed and the proposed project can be identified and impacts estimated.
However there are also some problems with this method: a.
Using this method, we have to determine how best to describe projects. The choice of variables must be restricted to information that is available at the point that the prediction required. Possibilities include the type of application domain, the number of inputs, the number of distinct entities referenced, the number of screens and so forth.
b.
Even once we have characterized the project, we have to determine the similarity and how much
The disadvantages include •
This method cannot be quantified.
•
It is hard to document the factors used by the experts or experts-group.
© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved
122
Narendra Sharma, Aman Bajpai, Mr. Ratnesh Litoriya, The International Journal of Computer Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No.3 May 2012 confidence can we place in the analogies. Too few analogies might lead to maverick projects being used; too many might lead to the dilution of the effect of the closest analogies. Martin Sheppard etc. introduced the method of finding the analogies by measuring Euclidean distance in n-dimensional space where each dimension corresponds to a variable. Values are standardized so that each dimension contributes equal weight to the process of finding analogies. Generally speaking, two analogies are the most effective. c.
Finally, we have to derive an estimate for the new project by using known effort values from the analogous projects. Possibilities include means and weighted means which will give more influence to the closer analogies.
The most crucial aspect for the success of Analogy Based Estimation is the selection of right set of past projects. It has been estimated that estimating by analogy is superior technique to estimation via algorithmic model in at least some circumstances. It is a more intuitive method so it is easier to understand the reasoning behind a particular prediction [12]. D. Top-Down and Bottom-Up Methods Top-Down Estimating Method The top-down method of estimation is based on the whole uniqueness of the software project. The project is partitioned into lower-level components and life cycle phases beginning at the highest level. This method is more appropriate to early cost estimates when only global properties are known. Top-down estimating method is also called Macro Model. Using top-down estimating method, an overall cost estimation for the project is derived from the global properties of the software project, and then the project is partitioned into various low-level mechanism or components. The leading method using this approach is Putnam model. This method is more applicable to early cost estimation when only global properties are known. In the early phase of the software development, it is very useful because there is no detailed information available [1,9]. Compensation include consideration of system-level activities (integration, documentation, project control, configuration management, etc.), many of which may be ignored in other estimating methods. The top-down method is usually faster, easier to implement and requires minimal project detail. However, disadvantages are that it can be less accurate and tends to overlook lower-level components and possible technical problems. It also provides very little detail for justifying decisions or estimates. The advantages of this method are: •
It focuses on system-level activities such as integration, documentation, configuration management, etc., many of which may be ignored in other estimating methods and it will not miss the cost of system-level functions.
© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved
•
It requires minimal project detail, and it is usually faster, easier to implement.
The disadvantages are: •
It often does not identify difficult low-level problems that are likely to escalate costs and sometime tends to overlook low-level components.
•
It provides no detailed basis for justifying decisions or estimates.
Bottom-up Estimating Method It is also an important method of cost estimation process. Bottom-up estimation involves identifying and estimating each individual component separately, then combining the results to generate an estimate of the complete project. It is often difficult to execute a bottom-up estimate early in the life cycle process because the necessary information may not be available. This method also tends to be more time consuming and may not be practicable when either time or personnel are limited. Using bottom-up estimating method, the cost of each software components is estimated and then combines the results to arrive at an estimated cost of overall project. It aims at constructing the estimate of a system from the knowledge accumulated about the small software components and their interactions. The leading method using this approach is COCOMO's detailed model [9]. The advantages: •
It permits the software group to handle an estimate in an almost traditional fashion and to handle estimate components for which the group has a feel.
•
It is more stable because the estimation errors in the various components have a chance to balance out
The disadvantages: •
It may overlook many of the system-level costs (integration, configuration management, quality assurance, etc.) associated with software development.
•
It may be inaccurate because the necessary information may not available in the early phase.
•
It tends to be more time-consuming.
•
It may not be feasible when either time and personnel are limited
E. Algorithmic Method First cost estimation model generate by using the algorithmic method, this shows the importance of the algorithm based model in the files of software cost estimation. The algorithmic method involves the use of equations to perform software estimates. The equations are based on research and historical data and use such inputs as Source Lines of Code (SLOC), number of functions to perform, and other cost drivers such as language, design methodology, skill-levels, risk assessments, etc. The algorithmic method is designed to provide some mathematical equations to perform software estimation. These mathematical equations are based on research and historical 123
Narendra Sharma, Aman Bajpai, Mr. Ratnesh Litoriya, The International Journal of Computer Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No.3 May 2012 data and use inputs such as Source Lines of Code (SLOC), number of functions to perform, and other cost drivers such as language, design methodology, skill-levels, risk assessments, etc. The algorithmic methods have been largely studied and there are a lot of models have been developed, such as COCOMO models, Putnam model, and function points based models [1,6]. Advantages of this method include being able to generate repeatable results, easily modifying input data, easily refining and customizing formulas, and better understanding of the overall estimating methods since the formulas can be analyzed. Advantages: 1.
It is able to generate repeatable estimations.
2.
It is easy to modify input data, refine and customize formulas.
3.
It is efficient and able to support a family of estimations or a sensitivity analysis.
4.
It is objectively calibrated to previous experience.
Disadvantages: 1.
.It is unable to deal with exceptional conditions, such as exceptional personnel in any software cost estimating exercises, exceptional teamwork, and an exceptional match between skill-levels and tasks.
2.
Poor sizing inputs and inaccurate cost driver rating will result in inaccurate estimation.
3.
Some experience and factors cannot be easily quantified
III.
COCOMO Models
One very widely used algorithmic software cost model is the Constructive Cost Model (COCOMO). The basic COCOMO model has a very simple form [ 5,6]: MAN-MONTHS = K1* (Thousands of Delivered Source Instructions) K2 Where K1 and K2 are two parameters dependent on the application and development environment. Estimates from the basic COCOMO model can be made more precise by taking into account other factors concerning the required characteristics of the software to be developed, the qualification and experience of the development team, and the software development environment. Some of these factors are: Complexity of the software
6.
Experience of team with the programming language and computer
7.
Use of tools and software engineering practices
Many of these factors affect the person months required by an order of magnitude or more. COCOMO assumes that the system and software requirements have already been defined, and that these requirements are stable. This is often not the case [9] . COCOMO model is a regression model. It is based on the analysis of 63 selected projects. The primary input is KDSI. The problems are: 1.
In early phase of system life-cycle, the size is estimated with great uncertainty value. So, the accurate cost estimate can not be arrived at.
2.
The cost estimation equation is derived from the analysis of 63 selected projects. It usually have some problems outside of its particular environment. For this reason, the recalibration is necessary.
According to Kemmerer’s research, the average error for all versions of the model is 601%. The detailed model and Intermediate model seem not much better than basic model. The first version of COCOMO model was originally developed in 1981. Now, it has been experiencing increasing difficulties in estimating the cost of software developed to new life cycle processes and capabilities including rapiddevelopment process model, reuse-driven approaches, objectoriented approaches and software process maturity initiative. In this cocomo we use 15 cost drivers. For these reasons, The newest version, COCOMO 2.0, was developed. The major new modelling capabilities of COCOMO 2.0 are a tailor able family of software size models, involving object points, function points and source lines of code; nonlinear models for software reuse and reengineering; an exponent-driver approach for modelling relative software diseconomies of scale; and several additions, deletions, and updates to previous COCOMO effort-multiplier cost drivers. This new model is also serving as a framework for an extensive current data collection and analysis effort to further refine and calibrate the model's estimation capabilities. IV.
Putnam model
Another popular software cost model is the Putnam model. The form of this model is: Technical constant C= size * B1/3 * T4/3
1.
Required reliability
Total Person Months B=1/T4 *(size/C)3
2.
Size of data base
T= Required Development Time in years
3.
Required efficiency (memory and execution time)
Size is estimated in LOC
4.
Analyst and programmer capability
5.
Experience of team in the application area
© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved
Where: C is a parameter dependent on the development environment and It is determined on the basis of historical data of the past projects.
124
Narendra Sharma, Aman Bajpai, Mr. Ratnesh Litoriya, The International Journal of Computer Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No.3 May 2012 Rating: C=2,000 (excellent).
(poor),
C=8000
(good)
C=12,000
The Putnam model is very sensitive to the development time: decreasing the development time can greatly increase the person-months needed for development [9,12]. One significant problem with the PUTNAM model is that it is based on knowing, or being able to estimate accurately, the size (in lines of code) of the software to be developed. There is often great uncertainty in the software size. It may result in the inaccuracy of cost estimation. V.
AGILE COCOMO MODEL
There are lot of cost estimation model developed by using the base of cocomo model. Agile cocomo model in one of them. A COCOMO™ tool that is very easy to use and simple to learn. It incorporates the full COCOMO™ parametric model and used for analogy-based estimation to generate accurate results for a new project. Estimation by analogy is one of the most popular ways to estimate software cost and effort. While comparing similarities between the new and old projects provides a great way to estimate, results could still be inaccurate from overlooking differences between the two projects especially if the grounds of dissimilarity are fairly important. To build on the estimation by analogy approach while accounting for differences between projects, USC-CSE has created Agile COCOMO-II, a cost estimation tool that is based on COCOMO-II. It uses analogy based estimation to generate accurate results while being very simple to use and easy to learn. It can provide the facility to estimate the project in various ways, it is shown in the figure 5. We can estimate the project in tem of person- month, in term of dollars, in term of object points, in term of function points etc. In this paper, we discuss motivation for the program, the program's structure, the results of our research, and provide insight into the future direction of this tools [13]. VI.
Function Point Analysis Based Methods
From above two algorithmic models, we initiate they require the estimators to estimate the number of SLOC in order to get person-months, size and duration estimates. The Function Point Analysis is another method of quantifying the size and complexity of a software system in terms of the functions that the systems deliver to the user. A number of proprietary models for cost. A function point is a unit of measurement to express the amount of business functionality an information system provides to a user. The cost (in dollars or hours) of a single unit is calculated from past projects [10] The function point measurement method was developed by Allan Albrecht at IBM and published in 1979. He believes function points offer several significant advantages over SLOC counts of size measurement. There are two steps in counting function points: A. Counting the user functions. The raw function counts are arrived at by considering a linear combination of five basic software components: external inputs, external outputs, external inquiries, logic internal files, and external interfaces, each at one of three © 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved
complexity levels: simple, average or complex.. .The sum of these numbers, weighted according to the complexity level, is the number of function counts (FC). B. Adjusting for environmental processing complexity. The final function points is arrived at by multiplying FC by an adjustment factor that is determined by considering 14 aspects of processing complexity. This adjustment factor allows the FC to be modified by at most 35% or -35%. The collection of function point data has two primary motivations. One is the desire by managers to monitor levels of productivity. Another use of it is in the estimation of software development cost [9]. There are some cost estimation methods which are based on a function point type of measurement, such as ESTIMACS and SPQR/20. SPQR/20 is based on a modified function point method. Whereas traditional function point analysis is based on evaluating 14 factors, SPQR/20 separates complexity into three categories: complexity of algorithms, complexity of code, and complexity of data structures. ESTIMACS is a propriety system designed to give development cost estimate at the conception stage of a project and it contains a module which estimates function point as a primary input for estimating cost. The advantages of function point analysis based model are: 1.
function points can be estimated from requirements specifications or design specifications, thus making it possible to estimate development cost in the early phases of development.
2.
function points are independent of the language, tools, or methodologies used for implementation.
3.
non-technical users have a better understanding of what function points are measuring since function points are based on the system user's external view of the system The selection of Estimation methods
In this paper we studied the different- different estimation models. Researchers continue working for improving the accuracy of software cost estimation process .From the above comparison, we know no one method is necessarily better or worse than the other, in fact, their strengths and weaknesses are often complimentary to each other. According to the experience, it is recommended that a combination of models and analogy or expert judgment estimation methods is useful to get reliable, accurate cost estimation for software development. If our projects major part is similar to the past project that places the expert judgment or analogy method is very useful. This type of estimation the analogy method generates accurate result than compare the other methods. since it is fast and under these circumstances, reliable; for large, lesser known projects, it is better to use algorithmic model. In this case, many researchers recommend the estimation models that do not required SLOC as an input. On the basis study of all the model
125
Narendra Sharma, Aman Bajpai, Mr. Ratnesh Litoriya, The International Journal of Computer Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No.3 May 2012 agile cocomo model generate the accurate cost of the project than compare the available models because it predict the cost of the project in various ways such as cost of the project in dollars, in term of person- month, in term of function points, in term of objet points etc. VII. Use of Estimation Methods On the basis of this study we can say that selecting correct estimation is very difficult task. If we do any minor mistake for this work, result is high financial loss and completion of project time is increased. It is very common that we apply some cost estimation methods to estimate the cost of software development. But what we have to note is that it is very important to continually re-estimate cost and to compare targets against actual expenditure at each major milestone. This keeps the status of the project visible and helps to identify necessary corrections to budget and schedule as soon as they occur. At every estimation and re-estimation point, iteration is an important tool to improve estimation quality. The estimator can use several estimation techniques and check whether their estimates converge. The other advantages are as following: •
Different estimation methods may use different data. This results in better coverage of the knowledge base for the estimation process. It can help to identify cost components that cannot be dealt with or were overlooked in one of the methods
•
Different viewpoints and biases can be taken into account and reconciled. A competitive contract bid, a high business priority to keep costs down, or a small market window with the resulting tight deadlines tends to have optimistic estimates. A production schedule established by the developers is usually more on the pessimistic side to avoid committing to a schedule and budget one cannot meet.
In recent year research, researchers worked with another field along with the software engineering like data mining for improving the accuracy of software cost estimation process. There are many software cost estimation methods available including algorithmic methods, estimating by analogy, expert judgment method, top-down method, and bottom-up method. It is very difficult to decide which method is better than compare to all other methods because every method or model has a own significance or importance. Their strengths and weaknesses are often complimentary to each other. To understand their strengths and weaknesses is very important when you want to estimate your projects. For a specific project to be estimated, which estimation methods should be used depend on the nature of the project. According to the weaknesses and strengths of the methods, you can choose some methods to be used. In my research work I am using agile cocomo model because it provides the estimation of software project in various ways. We need only put the values of the different cost drivers on the basis of past project data. It is provide the all facility, provided other cocomo models. It is generate the cost of new project more accurately than the other cost estimation model. The future work is to study new software cost estimation methods and models that can be help us to easily understand the software cost estimation process.
REFERENCES [1] [2] [3]
[4]
Result & Conclusion In the software cost estimation filed, there are many work done in last few years. Researches continually developed new models and methods for improving the cost estimation process. The main aim of this paper to provide a detailed survey or introduction of available cost estimation models and techniques. In this paper I am include all the cost estimation model requires for any new research. One of the greatest challenges for a project leader is to successfully deliver on all aspects of a project both according to the client’s specifications and within the allotted budget. It is often the case that either one aspect or the other can be accomplished, but not necessarily both. When it comes to controlling costs, it is a critical first step to make appropriate estimations at the outset of a project. Being able to control project costs is largely a matter of adhering to established guidelines, oftentimes by learning from previous projects and reacting to current circumstances efficiently and effectively.
© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved
[5]
[6]
[7]
[8]
[9]
“COCOMO II Model definition manual”, version 1.4, University of Southern California. Karen T. Lum, Daniel R. Baker, and Jairus M. Hihn “The Effects of Data Mining Techniques on Software Cost Estimation” 2009 IEEE. Zhihao Chen, Tim Menzies? Dan PortTim Menzies? Dan Port “Feature Subset Selection Can Improve Software Cost Estimation Accuracy” Center for Software Engineering,Univ. of Southern California. Jairus Hihn,Karen Lum “2CEE, A TWENTY FIRST CENTURY EFFORT ESTIMATION METHODOLOGY” Lane Dept. CSEE West Virginia University ISPA / SCEA 2009 Joint International Conference. ] Z. Oscar Marbán, Antonio de Amescua, Juan J. Cuadrado, Luis García , “Cost Drivers of a Parametric Cost Estimation Model for Data Mining Projects” Notes, vol. 30, no. 4, pp. 1-6, 2005 Oscar Marbán, Antonio de Amescua, Juan J. Cuadrado, Luis García “A cost model to estimate the effort of data mining projects” Universidad Carlos III de Madrid (UC3M) Dr. Alassane Ndiaye and Dr. Dominik Heckmann “Weka: Practical machine learning tools and techniques with Java implementations” AI Tools Seminar University of Saarland, WS 06/07 S. Chandrasekaran1, R.Lavanya2 and V.Kanchana “MULTI-CRITERIA APPROACH FOR AGILE SOFTWARE COST ESTIMATION MODEL ” Caper Jones., “Estimating software cost” tata Mc- Graw -Hill Edition 2007.
[10] Roberto Meli, Luca Santillo “FUNCTION POINT ESTIMATION METHODS: A COMPARATIVE OVERVIEW” Data Processing Organization, http://web.tin.it/dpo. [11] Murali Chemuturi “Analogy based Software Estimation” Chemuturi Consultants
126
Narendra Sharma, Aman Bajpai, Mr. Ratnesh Litoriya, The International Journal of Computer Science & Applications (TIJCSA) ISSN – 2278-1080, Vol. 1 No.3 May 2012 [12] Liming Wu “The Comparison of the Software Cost Estimating Methods” University of Calgary
© 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved
[13] http://sunset.usc.edu/cse/pub/research/AgileCOCOMO/AgileCOCOMOI I/Main.htm
127