Toward COCOMO Cost Estimation Model

Proc. of Int. Conf. on Emerging Trends in Engineering and Technology Toward COCOMO Cost Estimation Model Su-Hua Wang1, Durgesh Samadhiya2 1 2 Dept ...
Author: Edwin Newman
5 downloads 0 Views 143KB Size
Proc. of Int. Conf. on Emerging Trends in Engineering and Technology

Toward COCOMO Cost Estimation Model Su-Hua Wang1, Durgesh Samadhiya2 1

2

Dept of Information Management Ph.D program of Technology Management 1,2 Chung Hua University, Taiwan 1 [email protected] 2 [email protected]

Abstract— Software development process includes many steps to complete the software project. Software estimation is crucial to software development and important for whole software development process or project. Cost Estimation is necessary part of any project. Software cost estimation must provide effort and schedule breakdowns among the primary software lifecycle activities, specification, design, implementation and testing etc. Now a day’s Constructive Cost Model II mostly used for the software cost estimation that used function points and lines of code to calculate software size. However it’s difficult to estimate the cost in early stage of software development. Cost estimations are essential for software developers and their companies, because it can provide right cost of the project, the exact time of delivery, among many other benefits for them. This paper will provide a sample of Constructive Cost Model II cost estimate for a real project, and conclude the analysis approach, briefly explain the techniques and methodologies used, and conclude the results. Index Terms— Software development process, cost estimation, code of lines, Constructive Cost Model, Effort estimation.

I. INTRODUCTION Cost estimation is the approximate calculation of the entire program or project that includes all the resources, approximate money for the full project, approximate time taken etc. Software development or software project development is the process and combination of software programming. Computer Software projects have to pass through many steps to complete the functionality according to customer requirement. In this computer era everywhere have good need of computer software like companies, industries and the society. The demand of computer software increasing every day. However, due to an increasing demand, it is necessary to produce software of high quality in time and within budget to assure competitiveness. Good planning and good project management have recently demanded the attention of project managers. The absence of good planning and project management before the beginning of the project frequently produce several problems, such as increase the software conclusion time, higher production cost, inadequate performance etc [1]. Due to failure of cost estimation of software projects only very few project delivered timely and maintained quality. Around more than 65% software projects are delivered with time delay, over budget and many are not even finished [1]. If the software development cost estimation (SDCE) is not good, the project cannot be developed and maintained within time. Therefore, an accurate methodology is necessary to optimally predict such cost. This cost estimation problem of software development creates a necessary need or important research area relevant task in software planning and software cost estimation and software effort estimation. [1, 2, 3, 4, 5, 6]. DOI: 03.AETS.2013.3.172 © Association of Computer Electronics and Electrical Engineers, 2013

For the estimating effort, cost, and schedule for software projects Barry W. Boehm introduce a model called Constructive Cost Model (COCOMO) that uses a basic regression formula with parameters that are derived from historical project data and current as well as future project characteristics [7]. Later computer software development process and technology moved from mainframe and batch processing to desktop development, code reusability and the use of software components and this model was not helpful for these projects so he introduced a new and updated model of the same called COCOMO II [8] that better suited for estimating modern software development projects. It provides more support for old and modern software development processes and provide updated project database. This paper is organized as follows. In Section 2 presents the COCOMO model and describes how it different from cocomo or COCOMO 81. Section 3 shows the Model Application, section 4, 5 shows the effecting factor and process for accurate cost estimation, performance of the proposed system. In Section 6,7,8 some methods takes place and finally the conclusions and future of this work. II. COCOMO AND COCOMO II MODEL COCOMO is the abbreviation of Constructive Cost Model that is an algorithm software cost estimation model that is developed by Barry W. Boehm that estimates and computers software development effort and cost as a function of program size. Comparison can be a technique to calculate the estimation but it cannot be helpful for the exact estimation or we can say that another approach is to try to estimate efforts based on comparisons with some previously done work. For example, if you and your team agree that the new project is about three times bigger effort then some project you accomplished before (please note relative measure) then you can use Source Line of Code (SLOC), Function Point (FP), COCOMO and other methods to analyze your existing, previously done project, and project three times these measures. COCOMO programs allow you for some other parameters such as what amount of code is expected to be reused, how skillful your team is etc. COCOMO can be calibrated to reflect your software development environment, and to produce more accurate estimates. The most fundamental calculation in the COCOMO model is the use of the Effort Equation to estimate the number of Person-Months required to develop a project. Most of the other COCOMO results, including the estimates for Requirements and Maintenance, are derived from this quantity. However, please do not forget that this is estimation, not exact math. Because COCOMO is well defined, and doesn't rely upon proprietary estimation algorithms, it has following advantages.  COCOMO estimation are more objective and repeatable than estimates made by methods relying on proprietary models  COCOMO can be calibrated to reflect the environment of software development, and can produce more accurate estimates. Because COCOMO was not enough to apply to newer software development practices, Boehm introduce another updated model COCOMO II Boehm proposed three levels of the model:  Basic COCOMO  Intermediate COCOMO  Detailed COCOMO Basic COCOMO is good for rough order of magnitude software costs estimation, but because of its lack of factors to account for differences in hardware constraints its accuracy is necessarily limited, personnel quality and experience, use of modern tools and techniques, and other project attributes known to have a significant influence on costs. COCOMO II is the current version of COCOMO model. It is enhanced based on COCOMO or COCOMO 81 model to different approaches of software development, for example, incremental development. COCOMO II assimilates a range of sub-models that produce increasingly detailed software estimates [9]:  Application composition model. Application composition model used in the earliest phases or spiral model , which will generally involve prototype using application composition capability  Early design model. Used in next phases or spiral cycles, which involves exploring architectural alternatives or incremental development strategies  Post - architecture model. Once the project is ready to develop and sustain a fielded system, it should have a life-cycle architecture, which provides more accurate information on cost driver inputs and enables more accurate cost estimates. This post-architecture model is suitable for such phases. 322

III. MODEL APPLICATION The following figure gives an illustration about how to use COCOMO II model. With the help of COCOMO II model, lots of analysis can be made:  Making investment decisions and business-case analysis  Setting project budgets and schedules  Performing tradeoff analysis  Cost risk management  Development vs. reuse decisions  Legacy software phase-out decisions  Software reuse and product line decisions  Process improvement decisions

Fig1. COCOMO Model

Fig2. Model Applicability

IV. SOFTWARE COST ESTIMATION AFFECTING FACTORS The entire life of software project includes two phases: production and maintenance. Software maintenance cost is increasingly growing everyday and it showed that about 90% of software project life cost is related to its maintenance phase. Extraction and considering the factors affecting the software maintenance cost. Reduce the effect of effecting factor is helpful to estimate the cost and reduce it by controlling the factors. Previous attempts to identify possible methods to accurately estimate development effort were not as successful as desired, mainly because of software cost calculations were based on certain project attributes of publicly available datasets. Nevertheless, the proportion of evaluation methods employing historical data is around 55% from the total 304 research papers investigated by [10] in 2004.

323

Fig3. Cost Estimation Model

Fig 3 shows the cost estimation model and the factors that can affect the cost estimation of software project. Although there are many factors that affect the cost estimation of the software project but basically there are two main factors which affect the project costs: The efficiency of the implementation and cost of the studies done prior to software development The above two factors are very dynamic and vary from project to project. There are many factors which are to be considered while software cost estimation. V. PROCESS FOR EXACT COST ESTIMATION Accurate Software Cost estimation is very typical task and this process also can be helpful to minimize the misestimating. Delivery of software at right time is typical without right estimation. To avoid misestimating, we can classify all these techniques into the following two major categories: Parametric models and Nonparametric models. Parametric cost estimating is a method for estimating future proceedings based on analysis of past events and trends. “Parameters” (conditions) that appear to have driven what happened in the past are identified, and connected to past experience through mathematical relationships. Parametric models are derived from the statistical or numerical analysis of historical projects data [7, 11]. In order for parametric models to have any validity, they must be based on or proven using actual project data. Non-Parametric models include those techniques which are based on a set of artificial intelligence techniques such as analogy based reasoning, artificial neural networks, genetic algorithms, regression trees, and rulebased induction [7, 12]. Most of the algorithmic model based techniques comes under parametric models and techniques based on expert methods, machine learning methods comes under non-parametric models. Software Cost Estimation is done with the help of Boehm’s model by using one of the three developments Modes of Organic mode, Semidetached Mode and Embedded mode Organic model describes forms, method or pattern. In organic model the team organization have the knowledge of software development environment. The project contains a minimum of innovative data processing architectures or algorithms. The product requires little innovation and is relatively small, rarely greater than 50K delivered source instruction (KDIs) [14]. Semidetached represents an intermediate stage between the organic and embedded modes. The size range of a semidetached mode product generally extends up to 300 KDSI [14]. Embedded mode, the product must operate within (is embedded in) strongly coupled complex of hardware, software, regulation and operational procedures. An embedded mode project will require a great deal of innovation like a real time system with timing constraints and customized hardware. Algorithmic models have the advantage that we work with an economically feasible approach for estimating software costs, but if we have a glance at the darker side of using Algorithmic models then we conclude the following shortcomings of these models: First one is that, the cost and effort estimates derived from different algorithmic models generally vary significantly from each other [15].Variations of such kind causes problems to managers because it becomes difficult for them to decide the nearly exact amount of resources to be needed. Second point is that, the models are based on historical data and hence are not able to reflect current progresses in the areas of programming languages, hardware, and software engineering.

324

VI. DECISION BASED METHODS Prediction is not an end in itself, but only a means of optimizing current action against the prospect of an uncertain future. Decision based method is based on management or estimator recollection of past projects that may or may not have been documented. Heemstra and Vigder conclude that 62% of estimators in organizations use this technique. Hughes has discussed and identified in his paper the strengths and weaknesses of expert judgment and acknowledged the usefulness of the method in industry [13]. Among the benefits of using expert judgment is that the estimation is customized to the specific organizational culture, hence this estimation technique is more accurate in comparison to general Algorithmic approaches. Expert judgment is a non structured process even though in many cases it has been proven to give a better accuracy than using other techniques. The final estimation of the experts is subjective and based on feelings and logic. The logical aspect of human decision-making process is very complex but it is based on the process of copying and mimicking. VII. NEURAL NETWORK MODEL There are many algorithms to for neural networks, among which the back propagation (back-prop) paradigm has been considered to be the most popular learning mechanism for all problems based on prediction and classification. Every back prop neural network is always organized in layers. Each layer is composed of neuron which is also called processing elements and their connections. Basically there are three layers: Input Layer: It is the first layer which contains neurons that represent the set of input variables Output Layer: The output layer contains neurons that represent the output variables. Hidden Layer: Nonlinear relationship between the input layer and output layer variables are dealt with the help of hidden layer. The hidden layer helps in the extraction of higher level features and for facilitating generalization. Numerical weights are associated with the connections between neurons. Examples from the training set are repeatedly fed in the training process for adjustment of these numerical weights which are associated with the connections between neurons. An activation level is also associated with every neuron, which is specified by continuous or discrete values. Internal activation value which comes out into a neuron in the hidden or output layers is the sum of each incoming activation level multiplied by its respective connection weight. In the input layer, the activation levels of the neurons are determined by the response to the input signals received from the environment. Now, the internal activation value is updated by a transfer function and this now becomes an output, which in turn may become an input to one or more neurons. A sigmoid transfer function is typically used to transform the input signals into output signals specifically for classification problem. The sigmoid transfer function is represented by F(I) = 1/( l+e^-1) Where I represents the internal activation. For any output error, all the connection weights are assumed to be responsible in a back propagation network. The difference between a network's estimated output or predicted value and the corresponding observed output value is known as error. The error values are calculated at the output layer and propagated to previous layers and used for adjusting the connection weights. The training process consists of repeatedly feeding input and output data from empirical observations, propagating the error values, and adjusting the connection weights until the error values fall below a user-specified tolerance level. Equations are there which describe how the error values are computed and the connection weights are updated. VIII. FUZZY LOGIC AND EVOLUTIONARY ALGORITHMS, GENETIC PROGRAMMING These are believed to provide nearly accurate approximations. When we are talking about accuracy, Genetic Programming was found more accurate than other techniques, but its drawback is that it does not converge to a good solution as consistently as Neural Network. Thus we can say that more emphasis is to be provided on defining that what are the measures, and combination of measures, which will be more suitable for the specific problem. In Genetic Programming, generally different datasets are used and hence it yields diverse results, which are classified as acceptable, moderately good, moderate but bad results. The datasets examined vary extremely in terms of size, complexity, homogeneity; therefore it is not easy to obtain granularity consistent results.

325

IX. CONCLUSIONS AND FUTURE WORK Software cost estimation affected by many cost estimation factors so it’s a complex activity to calculate the exact cost of the software. Both metrics, qualitative as well as the quantitative measures are included in the Software development metrics for a project. Quantitative methods are necessary for comparison of similar products, control and management. Quantitative data easily deliver statistical results and can be processed automatically. Another advantage of their use is ease of integration with systems covering other indicators. It is important to define goals of measurement and prepare an appropriate measurement plan. Qualitative methods can be helpful in optimizing a structure of measurements. On the other hand, metrics can provide assessments or problem indication to focus on with qualitative methods. After identification of the reasons of problems and their removal quantitative methods can be used again to track the project. However all these measures vary greatly for every project and generally these values are ambiguous, dissimilar and vague for every project. There are no formal guidelines for determining the actual effort which is required for completing a project based on specific characteristics and attributes. Overall we conclude that the most important thing is that datasets of the current and future projects which are required during the evaluation of estimation methods should be as representative as possible. As the database will be briefly explained and clear the cost estimation can be comparatively exact and good. That can be helpful to minimize the time delay, budget problem and can be helpful for the accurate cost estimation. All methods proposed have their own advantages and disadvantages. None of the factors which affect the project cost and development should be ignored while estimating the Software development Cost. During investigation of COCOMO II models, good points are raised and may be used for our future or further research, like Factor Selection, Value Rating & Estimation, Model Predictability, Model Customization, and Model Application. REFERENCES [1] Oliveira, A. L. I. (2006). Estimation of software project effort with support vector regression. Neurocomputing, 69(13–15), 1749–1753. [2] Bailey, J., & Basili, V. (1981). A meta model for software development resource expenditure. In Proceedings of the fifth international on software engineering, San Diego, California, USA. [3] Braga, P. L., Oliveira, A. L. I., Ribeiro, G. H. T., & Meira, S. R. L. (2007). Software effort estimation using machine learning techniques with robust confidence intervals. In IEEE international conference on tools with artificial intelligence (ICTAI). [4] Flake, G. W., & Lawrence, S. (2002). Efficient svm regression training with smo. Machine Learning, 46(1–3), 271– 290. [5] Kemerer, C. F. (1987). An empirical validation of software cost estimation models. Communication of the ACM, 30(5), 416–429. [6] Shin, M. & Goel, A. L. (2000). Empirical data modeling in software engineering using radial basis functions. IEEE Transactions on Software Engineering, 26(6), 567–576. [7] Barry Boehm. Software Engineering Economics. Englewood Cliffs, NJ:Prentice-Hall, 1981. ISBN 0-13-822122-7. [8] Barry Boehm, Chris Abts, A. Winsor Brown, Sunita Chulani, Bradford K. Clark, Ellis Horowitz, Ray Madachy, Donald J. Reifer, and Bert Steece. Software Cost Estimation with COCOMO II (with CD-ROM). Englewood Cliffs, NJ:Prentice-Hall, 2000. ISBN 0-13-026692-2. [9] Barry W. Boehm, etc., Software Cost Estimation with COCOMO II, Prentice Hall, 2000. [10] M. Jorgensen, M. Shepperd, “A Systematic Review of Software Development Cost Estimation Studies”, IEEE Transactions on Software Engineering, Vol. 33, No. 1, January 2007. [11] B.W. Boehm et al. "Cost Models for Future Software Life Cycle Processes: COCOMO 2.0." Annals of Software Engineering on Software Process and Product Measurement, Amsterdam, 1995. [12] C. J. Burgess and M. Lefley. "Can Genetic Programming Improve Software Effort Estimation?" Information and Software Technology, vol. 43, 2001, pp. 863-873. [13] R. T. Hughes, "Expert judgment as an estimating method", Information & Soft. Technology, pp. 67-75, 1996. [14] The Free On-line Dictionary of Computing, © Denis Howe 2010 http://foldoc.org. [15] H. Saiedian, M. Zand, and D. Barney. The strengths and limitations of the algorithmic approaches in estimating and managing software costs. IBS Computing Quarterly, 4(1):21-27, 1992.

326

Suggest Documents