October 14, 2005 Attention: Delma Moore Contracting Officer, Code 210.3 NASA Software IV&V Facility 100 University Drive Fairmont, WV 26554-8818 Wesley Sweetser NASA Software IV&V Facility 100 University Drive Fairmont, WV 26554-8818 Subject: Return on Investment of IV&V Phase III Study Final Report Reference: Contract Number: GS-35F-4815G Delivery Order: S-43619-Y, GSFC Code Y and NASA Research Center IV&V CM Number: GSFCY & NRC IVV-05-137 Dear Ms. Moore and Mr. Sweetser: Titan Corporation is pleased to provide the Return on Investment of IV&V Phase III Study Final Report, DID 06, approved for delivery under Task Order Number 31 Modification 3, Return on Investment for IV&V, for Contract Number GS-35F-4815G, BPA Order Number S-43619-Y, and Delivery Order 01. Enclosed is an electronic version of the Final Report for your review. This email and its attachment constitute the electronic delivery of Product GSFCY & NRC IVV-05-137. Should you have any questions, please contact the undersigned. Approved, James Dabney Principal Investigator IV&V of GSFC Code Y and NASA Research Center Software Titan Corporation Phone: (281) 480-4101 Fax: (281) 480-6328 Enclosures: Return on Investment of IV&V Phase III Study Final Report Distribution: D. Moore/ NASA-Fairmont J. Dicks/Titan W. Sweetser/NASA-Fairmont T. Mascaro/Titan K. McGill/ NASA Fairmont K. Williams/Titan Contract Number: GSA-35-F-4815G S-43619 CM Number: GSFC & NRC IVV-05-137 1 Prepared for: NASA IV&V Facility Fairmont, WV 26554 DID Number: 06 INDEPENDENT VERIFICATION AND VALIDATION (IV&V) OF NASA PROGRAM SOFTWARE Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 100 University Drive Fairmont, WV 26554-8818 Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 i CM Number: GSFC & NRC IVV-05-137 Abstract This report documents results of the tasks associated with Phase III of the Independent Verification and Validation Return on Investment research. These tasks were to 1) develop initiating material for research to develop a full lifecycle prototype predictive ROI model, 2) produce prototype Bayesian belief network (BBN) sub-nets to model defect introduction and defect removal efficiency for IV&V and developers for the entire software lifecycle, 3) elicit PDFs for each node in the system of BBNs and 4) Calibrate the predictive model using existing case study data. The first task resulted in the development of a refined requirements phase BBN including updated pdf data. Tasks (2) and (3) were performed concurrently, resulting in complete lifecycle BBN diagrams and a software model. In Task (4) the prototype model was calibrated and produced predicted ROI results consistent with the case studies. The report concludes that the emphasis for the next phase of the ROI work should be collection of additional case studies and improved model calibration. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 ii CM Number: GSFC & NRC IVV-05-137 Table of Contents 1 Introduction................................................................................................................ 1 2 Predictive Model Overview .......................................................................................1 2.1 Bayesian Belief Network Overview ....................................................................... 2 2.2 Function Point Ratios.............................................................................................. 3 2.3 Defect Leakage Model............................................................................................ 4 3 Complete Prototype BBN .......................................................................................... 4 3.1 BBN Subnets...........................................................................................................5 3.2 Monte Carlo Implementation.................................................................................. 6 4 Node Probability Density Functions.......................................................................... 7 4.1 PDF Representation ................................................................................................7 4.2 PDF Elicitation........................................................................................................9 5 Model Calibration ......................................................................................................9 5.1 BBN Input Data Collection................................................................................... 10 5.2 FPR Calibration ....................................................................................................10 5.2.1 Developer-Discovered FPR Calibration ...................................................11 5.2.2 IV&V-Discovered FPR Calibration.......................................................... 12 5.2.3 FPR Calibration Results............................................................................ 13 5.2.4 Leakage Model Calibration....................................................................... 15 5.3 ROI Computation..................................................................................................17 6 Summary ..................................................................................................................18 7 Conclusions and Recommendations ........................................................................ 19 8 References................................................................................................................20 Appendix A – BBN Diagrams .......................................................................................... 21 Appendix B – BBN Input Definitions .............................................................................. 37 B.1 Requirements Issue Subnet ........................................................................................ 37 B.2 Design Issue Subnet ................................................................................................... 43 B.3 Code Issue Subnet ...................................................................................................... 49 B.4 Test Issue Subnet........................................................................................................ 54 B.5 Integration Issue Subnet ............................................................................................. 60 Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 1 CM Number: GSFC & NRC IVV-05-137 1 Introduction The independent verification and validation (IV&V) return on investment (ROI) study is developing the means to compute ROI for past projects and to predict ROI for new projects using available project data. The ROI study consists of a sequence of phases. Phase I entailed a set of preliminary direct ROI case studies. In Phase IIA, the team investigated the feasibility of computing indirect ROI and identified suitable project characteristics for indirect ROI and a candidate set of components of a predictive ROI model. Phase IIB produced a full lifecycle cost escalation model, proposed the Bayesian belief network (BBN) as a means to estimate defect density, and determined the sensitivity of the ROI model to variations in escalation rates and defect location rates. This report documents the results of Phase III which had four principal tasks: 1. Develop initiating material for research to develop a full lifecycle prototype predictive ROI model Sensitivity study 2. Produce prototype Bayesian belief network (BBN) sub-nets to model defect introduction and defect removal efficiency for IV&V and developers for the entire software lifecycle. 3. Elicit PDFs for each node in the system of BBNs 4. Calibrate the predictive model using existing case study data 2 Predictive Model Overview The direct IV&V ROI model [DBO04] provides the means to compute ROI for completed IV&V projects. The model requires as inputs developer and IV&V costs (typically in equivalent person months (EPM)), software product size (measured in source lines of code (SLOC) or function points (FP)[FPUG00]), and measures of defect detection by the developer and IV&V for each type of issue (requirements, design, code, test, integration) and each development phase (typically in cost-to-fix EPM or issue size in FP). The direct ROI model is depicted graphically in Figure 1. The complete set of inputs for the direct ROI model does not exist until the project is complete. Thus, the direct ROI model provides one measure of value added for complete projects, but does not (except reasoning by analogy) help in determining the potential value added of candidate IV&V projects. A predictive ROI model will permit assessment of ROI of candidate projects and therefore assist managers in resource allocation. A predictive model can also serve as the basis for a model-based effectiveness metric that will permit progressive monitoring of ongoing IV&V projects. A predictive ROI model based on the direct ROI methodology must provide the means to determine early in the project all inputs to the direct ROI model. Estimates of developer and IV&V cost should be available early in the lifecycle as these estimates are normal project management requirements. Other inputs, specifically developer and IV&V defect discovery data, can’t be known early in the project and must therefore be estimated using project characteristics that are known early in the lifecycle. The predictive ROI model uses the BBN technique [FKN01], [FKN01A] to estimate the inputs to the direct ROI model [DBO04] using information available early in the project lifecycle. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 2 CM Number: GSFC & NRC IVV-05-137 Figure 1: Direct ROI computation 2.1 Bayesian Belief Network Overview A BBN consists of a hierarchy of nodes representing stochastic causal relationships. Figure 2 depicts a single node with two inputs. The node output is a random value with a probability density function (pdf) which depends on the values of input parameters A and B. The mapping of parameter values to pdfs could be done using historical data, if sufficient data existed. In the absence of a sufficient quantity of data, the mapping can be estimated using expert opinion. For the ROI BBN, the expert opinion approach was selected. For the ROI BBN, all nodes produce random variables in the range of 1 to 5, where 1 corresponds to the worst possible case and 5 corresponds to the best possible case. Causal Dependency Relationship 1 Parameter A Parameter B 1 5 Probability density function Random Variable 1 Figure 2: BBN node Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 3 CM Number: GSFC & NRC IVV-05-137 A complete BBN consists of a hierarchical set of nodes. Figure 3 shows a larger BBN fragment containing three nodes. Note that the inputs to node 3 are random variables and therefore node 3 must compute an expected probability density function. In practice, this is easily accomplished using the Monte Carlo method. Figure 3: Hierarchical BBN nodes 2.2 Function Point Ratios For each development phase (requirements, design, code, test, integration), three BBN subnets were developed. The first subnet estimates product quality (with respect to defect density) on a scale of 1 to 5. The second subnet estimates developer defect detection efficiency on a scale of 1 to 5. The third subnet estimates IV&V defect detection efficiency on a scale of 1 to 5. In order to produce the inputs required by the direct ROI model, the BBN output must be converted into measures of defect size. Function points are a convenient measure early in the lifecycle because they can be estimated from system requirements and are independent of programming language. Therefore the subnet values (product quality, developer defect discovery efficiency , IV&V defect discovery efficiency) are used to estimate function point ratios (FPRs), which, scaled by the estimated product function points, provide suitable direct ROI model inputs. A high level flow diagram for the process (showing requirements phase defect function points only) is shown in Figure 4. The overall process of developing the BBN model for one phase (requirements, design, code, test, integration) of the baseline IV&V process consists of the following steps: 1. Develop pictorially (based on elicitation from experts) the BBNs for defect introduction, defect detection by the developer, and defect detection by IV&V. 2. Elicit from the experts probability density functions for each causal dependency. 3. Implement the BBN in software using the Monte Carlo technique. 4. Calibrate the BBN output to case study data to predict discovered defect function points in-phase for the developer and IV&V Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 4 CM Number: GSFC & NRC IVV-05-137 Figure 4: Predictive model defect function point computation 2.3 Defect Leakage Model The direct ROI model requires as inputs discovered-defect function points for the developer and IV&V for each issue type and phase found. The BBNs were developed to compute in-phase discovered defects only. Although it would be possible to develop a BBN for each issue type for each development phase, that would require an additional fifteen subnets many of which would lack calibration data. Therefore, a leakage model was devised to estimate discovered defect function points out of phase (for example, requirements defect function points discovered in design, code, test, integration phases). Based on an extensive literature search, the Rayleigh leakage model [GJWE01], [KSH91] was selected. Calibration of the leakage model will be discussed later. 3 Complete Prototype BBN The complete BBN framework consists of subnets for defect introduction and defect detection by the developer and IV&V for each development phase. As noted above, BBNs are used only for in-phase defect detection prediction. Defect discovery in subsequent phases is estimated using a calibrated Rayleigh leakage model. Table 1 lists the source of direct model inputs for each phase and issue type. Here, BBN indicates the source is the BBN and LM indicates the source is the calibrated Rayleigh leakage model applied to the BBN FPRs. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 5 CM Number: GSFC & NRC IVV-05-137 Table 1: Function point ratio source for ROI computation Phase issue found Issue type Requirements Design Code Test Int Ops Requirements BBN LM LM LM LM LM Design BBN LM LM LM LM Code BBN LM LM LM Test BBN LM LM Integration BBN LM 3.1 BBN Subnets Appendix A shows the complete set of BBNs. An example defect introduction subnet (requirements phase) is shown in Figure 5. The output of this subnet is requirements quality in the range of 1 to 5 where 1 is the worst possible quality and 5 is the best possible quality. An example defect detection BBN (IV&V, requirements phase) is shown in Figure 6. The output of the defect detection subnet is the ratio (FPR) of function points of defects discovered to total function points. The product of FPR and total function points is a suitable set of inputs for the direct ROI model. Figure 5: Requirements quality BBN sub-net Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 6 CM Number: GSFC & NRC IVV-05-137 Figure 6: IV&V requirements phase defect detection sub-net 3.2 Monte Carlo Implementation The complete set of BBN sub-nets was implemented in MATLAB using a Monte Carlo technique. For the prototype software, each phase is implemented as a stand-alone MATLAB program. Inputs for subsequent phases from BBNs earlier in the lifecycle are manually entered in the prototype model. The number of Monte Carlo iterations is an input parameter; using 100,000 iterations was found to produce stable results. The structure of each BBN subnet model is as follows: • Load BBN input data (elicited project characteristics, see section 5.1) • Monte Carlo iteration loop o For each node in succession .. Interpolate in the node pdf tables to generate a pdf corresponding to the node inputs. Details of the pdf representation are presented in the next section. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 7 CM Number: GSFC & NRC IVV-05-137 .. Using the MATLAB random number function and the pdf from the previous step, generate a node output o Store the iteration results • Compute expected value and standard deviation for each node output • Generate plots and print results 4 Node Probability Density Functions 4.1 PDF Representation Each node in the ROI BBN produces a random variable between 1 and 5 with a pdf that depends on the node inputs. In order to avoid excessive complexity, each node has either two or three inputs. More complex relationships are represented by cascading the nodes. The node pdf functions are based on a set of elicited pdfs for each node. The pdfs are elicited from experts using a pdf editor graphical user interface (GUI). The pdf editor window for a typical node with two inputs is shown in Figure 7. The pdfs are elicited for boundary cases and internal cases. Each pdf is approximated using a trapezoidal distribution by dragging the circles that correspond to the four points in the pdf. Thus, a pdf corresponding to a particular node input vector is approximated by the location of the four points that define the trapezoid. In order to implement the BBN in software, it is necessary to map the node inputs to the locations of the four points that characterize the node pdf. The location of each pdf point for a node with two inputs is a surface as shown in Figure 8, and the location of each pdf point for a three-input node is a hypersurface. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 8 CM Number: GSFC & NRC IVV-05-137 Figure 7: pdf editor GUI A number of techniques for extending the elicited pdfs to node functions were evaluated. Most of the techniques used curve fitting to approximate the functions for each node point. None of the curve fitting techniques produced consistently satisfactory results. Therefore, the node pdfs are represented as lookup tables that are produced by interpolating and extrapolating in the elicited data to generate a grid of points that are compatible with the built-in MATLAB two- and three- dimensional interpolating Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 9 CM Number: GSFC & NRC IVV-05-137 functions. The surface shown n Figure 8 was produced from the interpolating table for the second point of the node pdf function defined in Figure 7. Figure 8: Interpolating surface 4.2 PDF Elicitation Once the complete set of BBN subnets was defined, the pdf editor (Figure 7) was used to capture elicited pdfs for each node. The pdfs were validated using the plotting feature illustrated in Figure 8. 5 Model Calibration After eliciting pdf functions, it was necessary to calibrate the predictive model to the case study data. The calibration was performed in four steps. First, BBN input data was collected from project managers for each of the four direct ROI case studies. FPRs were calculated for in-phase and leakage issues using the data collected for the direct ROI case studies. Next, FPR functions were developed for in-phase issues for each defect type (requirements, design, code, test, integration) for developer and IV&V issues. Last, the leakage model was calibrated to predict developer and IV&V defect detection in subsequent life cycle phases. Using the calibrated model, ROI was predicted for each of the four case studies and compared to case study results. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 10 CM Number: GSFC & NRC IVV-05-137 5.1 BBN Input Data Collection The prototype predictive model was calibrated using the four direct ROI case studies previously reported [DBO03]. Inputs for each BBN sub-net for each of the four case studies were collected from IV&V project managers familiar with the case study projects. The inputs were collected using spreadsheets with embedded instructions. The spreadsheets included internal nodes to aid in validating the BBN topology and to ensure consistent data. Inconsistencies between internal nodes were discussed with the project managers and adjustments made to achieve consistent input. The spreadsheets were configured to automatically generate the MATLAB code. The score for each node is an estimated value in the range 1 - 10, a lower tolerance, and an upper tolerance. The range 1 – 10 was chosen because preliminary experiments indicated that eliciting inputs in the range 1 – 5 provided insufficient discrimination among projects. The spreadsheet generates automatically the MATLAB code that interprets the inputs as triangular probability density functions as illustrated in Figure 9. Figure 9: Node input pdf The MATLAB code then rescales the pdfs to the 1 – 5 range used in the BBN. Appendix B contains the input descriptions contained in the spreadsheet files for each issue type. 5.2 FPR Calibration The FPR calibration mapped the three BBN outputs (quality Qf, developer defect discovery efficiency Dffd and IV&V defect removal efficiency Dffi, where f represents the development phase and issue type). The mapping functions consist of lookup tables for each phase for developer and IV&V that map quality and efficiency to FPR. The main steps in the calibration for each issue type were as follows: • Run the BBN model for each case study using the elicited BBN inputs • Convert the case study defect data (actual data) to FPR • Plot the actual FPR as a function of quality and defect discovery efficiency to identify candidate approximating functions • Fit approximating functions to the BBN output and case study FPR • Generate FPR lookup tables suitable for MATLAB cubic spline interpolation and implementation in the MATLAB BBN models Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 11 CM Number: GSFC & NRC IVV-05-137 • Re-run the BBN using the calibrated FPR functions and compare computed FPR with actual FPR. 5.2.1 Developer-Discovered FPR Calibration For developer-discovered issues, plotting actual FPR vs requirements quality and developer defect removal efficiency, it was observed that FPR varies directly with the distance from the origin in the (QR, Drrd) plane. Therefore, an approximating function of the form 2 2 RRd RRd R RRd FPR c Q D = + was used. The calibrated curve is shown in Figure 10. Here the circles are the case study data points and the solid trace is the approximating function. The corresponding interpolating surface is shown in Figure 11. Figure 10: Requirements phase FPRDDd calibration Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 12 CM Number: GSFC & NRC IVV-05-137 Figure 11: FPRRRd interpolating surface 5.2.2 IV&V-Discovered FPR Calibration For IV&V-discovered issues, a more complex relationship was postulated and observed to fit well to the four case studies. The defect detection opportunity was postulated to be a function of Qf such that there is an optimal point that depends on issue type. For Qf values lower than the optimal point, the defect discovery opportunity is reduced because defect discovery is inherently more difficult. For Qf values greater than the optimal point, defect discovery opportunity is reduced because there should be fewer defects to discover. It was also observed that the effectiveness of IV&V defect discovery exhibits an inverse exponential relationship with IV&V efficiency score. The best fit for IV&V effectiveness was a function of the form & cos( ) i bD IV V i Eff a e c D ff ff - = where a, b, c are coefficients determined using a nonlinear least squares technique. Due to the difficulty in fitting functions to the opportunity data, interpolating tables were produced graphically and used to generate interpolating surfaces for FPR. An example IV&V FPR interpolating surface is shown in Figure 12. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 13 CM Number: GSFC & NRC IVV-05-137 Figure 12: FPRRRi interpolating surface 5.2.3 FPR Calibration Results The results of the in-phase issue detection FPR calibration are shown in Tables 2 - 6. For requirements defects, the case study data was consistent with the Qf and Drrx results for all projects. Consequently, the predicted FPRs agree within one standard deviation with the actual FPRs. For subsequent life cycle phases, there are outliers that resulted from project anomalies. For example, project B produced no design documentation although many characteristics of the developer’s process were judged by the IV&V project manager to be relatively good. Project C was terminated before any code, test, or integration issues were reported, suppressing IV&V defect FPR even though a relatively good IV&V process was underway when the development project was terminated. For all of the case study projects, it is evident that in-phase issue reporting for the later lifecycle phases tended to be lower than earlier lifecycle phases. Of course, due to the relatively small cost escalation factors for the later lifecycle phases, this factor will have a relatively small impact on ROI. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 14 CM Number: GSFC & NRC IVV-05-137 Table 2: Requirements phase FPR calibration Case Developer FPR IV&V FPR Actual Predicted Std Dev Actual Predicted Std Dev A 0.551 0.709 0.166 0.096 0.135 0.098 B 0.483 0.629 0.176 0.079 0.093 0.070 C 0.239 0.371 0.135 0.812 0.603 0.286 D 0.623 0.628 0.161 0.137 0.289 0.169 Table 3: Design phase FPR calibration Case Developer FPR IV&V FPR Actual Predicted Std Dev Actual Predicted Std Dev A 0.429 0.541 0.114 0.429 0.457 0.593 B 0.000 0.504 0.135 0 0.210 0.290 C 0.037 0.321 0.104 2.207 2.105 1.343 D 0.451 0.433 0.114 0.712 1.510 1.305 Table 4: Code phase FPR calibration Case Developer FPR IV&V FPR Actual Predicted Std Dev Actual Predicted Std Dev A 3.700 3.328 2.027 1.821 0.821 0.388 B 1.512 2.090 1.502 0.485 1.026 0.490 C 0.037 1.252 0.869 0 1.062 0.717 D 0.107 2.814 1.535 0.040 1.163 0.555 Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 15 CM Number: GSFC & NRC IVV-05-137 Table 5: Test phase FPR calibration Case Developer FPR IV&V FPR Actual Predicted Std Dev Actual Predicted Std Dev A 3.323 1.069 0.390 1.640 0.679 0.236 B 1.292 1.547 0.415 0 0.347 0.180 C 0.075 0.669 0.305 0 0.589 0.245 D 0.002 0.622 0.290 0.086 0.597 0.230 Table 6: Integration phase FPR calibration Case Developer FPR IV&V FPR Actual Predicted Std Dev Actual Predicted Std Dev A 0 0.019 0.005 0 0.0004 0.0002 B 0 0.018 0.005 0 0.0002 0.0001 C 0.037 0.007 0.004 0 0.0003 0.0002 D 0.002 0.011 0.005 0.003 0.0005 0.0002 5.2.4 Leakage Model Calibration The leakage model was calibrated using only two of the four case studies. The other two case studies reported no defect leakage to subsequent life cycle phases. The lack of leakage data for the two projects resulted from project anomalies rather than the lack of defect leakage. One project did not track post-phase issues and for the other project, the case study was based on a database snapshot that didn’t include the leakage data for IV&V and for which case study developer leakage data was estimated by IV&V project managers due to the lack of actual developer data. Using averages of the available leakage data, the Rayleigh model was calibrated for IV&V and developer-discovered defects by fitting the cumulative issues to the Rayleigh function. The results of developer calibration are shown in Figure 13 and the results of IV&V calibration are shown in Figure 14. In both cases, the cumulative FPR is normalized to the in-phase data so that the computed issues in each subsequent phase is the product of the leakage factor and in-phase FPR. In the figures, the small circles represent the case study data and the solid lines represent the leakage model. The Rayleigh leakage model has been shown to be effective for large numbers of projects. However, it was observed that leakage exhibits rather large variances. With only Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 16 CM Number: GSFC & NRC IVV-05-137 two case studies upon which to calibrate leakage, it was not possible to compute the standard deviations of the estimates. Therefore, for the prototype model, it was assumed that the leakage standard deviation is proportional to the standard deviation for in-phase issues of each issue type. That is, the standard deviation of predicted FPR for a particular issue type for subsequent phases is assumed to be the product of the in-phase standard deviation and the ratio of predicted leakage FPR and in-phase FPR. Figure 13: Leakage model calibration for developer-discovered defects Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 17 CM Number: GSFC & NRC IVV-05-137 Figure 14: Leakage model calibration for IV&V-discovered defects 5.3 ROI Computation Using the calibrated prototype predictive model, ROI was computed for each of the four case studies. ROI computations were made using the subnet results illustrated in Figure 4. The direct ROI model [DBO04], [DBO03], implemented in MATLAB, used the FPR results of Tables 2 – 6 and the leakage model of Figure 13 – 14 to estimate developer defect discovery probabilities for each Monte Carlo iteration and then to compute expected without-IV&V BRAK. Next, the COCOMO [BAB00] coefficient was calibrated to the with-IV&V results and then used to compute without-IV&V results. Using the with- and BBN-predicted without-IV&V development costs and the actual IV&V cost, ROI was computed. The results of the ROI computation are listed in Table 7. For two cases (B & C) , predicted ROI is somewhat larger than actual ROI. In both cases, this was due to the fact that BBN predicted in-phase issues when the case study data contained no in-phase issues. To more accurately portray the actual case study situation in those cases, the elicited IV&V data for phases in which there were no IV&V issues due to lack of data availability, all IV&V inputs for those phases were set to 1.0 with zero tolerance and the BBN was then re-run. Using the corrected IV&V input data, predicted ROI for both cases is closer to actual ROI. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 18 CM Number: GSFC & NRC IVV-05-137 Table 7: Predicted ROI results Project Actual Direct ROI Predicted Direct ROI ROI Std Deviation A 1.590 1.261 0.170 B 1.207 4.040 * 3.310 0.625 * 0.550 C 5.534 10.78 * 7.970 1.590 * 1.330 D 10.085 10.68 1.77 Among the four case studies, Case D had the most complete data for both IV&V and the developer. There were reported issues for both IV&V and the developer for all lifecycle phases and the project was completed successfully. Therefore, it is not a coincidence that the best agreement between actual and predicted direct ROI was exhibited by Case D. The actual ROI computation for Case B was the least certain. Developer issue distribution among phases was not available when the case study was performed, so a conservative leakage model (developer issues distributed equally among the phases) was assumed. Due to the lack of design documentation, the developer found no design defects and IV&V found design defects only in the code phase via source code analysis. Furthermore, IV&V was not complete when the case study was performed, so no IV&V issues found in the test or integration phase were included in the original case study. It is reasonable to expect that full-lifecycle IV&V and elimination of the conservative (from the IV&V ROI perspective) assumptions would increase Case B ROI significantly. 6 Summary The prototype predictive ROI model was developed as planned and node probability density functions (pdfs) were developed using the pdf editor. The predictive model for each BBN subnet was implemented in MATLAB. The model was calibrated using case study data to compute function point ratios (FPRs) for developer and IV&V-discovered in-phase issues. A Rayleigh defect leakage model was calibrated to the developer and IV&V case study data to predict out-of-phase FPRs. ROI was computed using a * IV&V inputs for phases for which there was no IV&V activity set to 1.0 (minimum value) Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 19 CM Number: GSFC & NRC IVV-05-137 MATLAB Monte Carlo implementation of the direct ROI algorithm and BBN-predicted ROI was computed for each of the four case studies. The in-phase FPR computations for requirements and design phases for both developer and IV&V issues are in excellent agreement with actual data except for two easily understood anomalies in the case study data. Agreement between predicted and actual values decreases progressively as the lifecycle proceeds to code, test, and integration phases. This decline in precision of the predictive model for later lifecycle phases is attributed to decreasing availability of case study data for the later lifecycle phases. The decline in precision is ameliorated by the decreasing importance, in the direct ROI sense, of the later lifecycle phases. The fidelity of post-phase (leakage) defect FPR is lower than in-phase fidelity due to the apparently higher variability of leakage behavior and limited amount of calibration data. As was shown in the Phase IIB sensitivity study [DBO04A], ROI is highly dependent on leakage rates because estimation of post-phase developer defect detection drives the probability distribution across lifecycle phases and therefore the expected value of cost-to-fix escalation. For example, the sensitivity study showed that for a hypothetical project with an IV&V ROI of 8.5, significantly reducing developer defect detection efficiency (or increasing leakage rate) can increase IV&V ROI to 25.3. Therefore, the variation in ROI exhibited by the predictive model is well within the envelope to be expected from the sensitivity study results. 7 Conclusions and Recommendations The predictive IV&V ROI model produces credible ROI estimates for the four case studies. The initial calibration predicts potential full-lifecycle ROI more accurately than truncated-lifecycle ROI. Therefore, it appears that the predictive model is particularly well-suited to prediction of achievable ROI for a specified set of project circumstances. Thus, the prototype predictive model appears to be particularly well-suited to use in a model-based effectiveness measurement framework. The prototype predictive model also suggests that although per-issue ROI is higher for early lifecycle activities, overall ROI is better for full-lifecycle IV&V. The prototype model provides the means to further explore this phenomenon via additional cases using hypothetical IV&V projects. Proposed future work includes additional case studies and development of a production ROI model. The results of the prototype model calibration suggest that initial emphasis should be placed on expanding the calibration database via additional case studies. The additional case studies will provide more insight into IV&V ROI, will improve the model calibration database, and will serve as the basis for automating ROI data collection for inprogress projects. Additional experience working with the prototype model in the process of doing the additional case studies will facilitate improved production model requirements. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 20 CM Number: GSFC & NRC IVV-05-137 8 References [DBO04] J. B. Dabney, G. Barber, and D. Ohi, “Estimating direct return on investment of independent verification and validation,” 8th IASTED International Conference on Software Engineering and Applications, Cambridge, MA, 2004. [FKN01] N. Fenton, P. Krause, and M. Neil, “Software measurement: Uncertainty and causal modeling,” 2001. [FKN01A] N. Fenton, P. Krause, M. Neil, “A probabilistic model for software defect prediction,” preprint, University of London, England, 2001. [DBO04A] J. B. Dabney, G. Barber, and D. Ohi, Return on Investment of Independent Verification and Validation Study Phase IIB Final Report, Titan Inc., Fairmont, WV, 2004. [FPUG00] Function Point Counting Practices Manual, Release 4.1.1, The International Function Point User’s Group, 2000. [GJWE01] J. W. E. Greene, “Purchasing Software Intensive Systems Using Quality Targets”, Quality Software Management Ltd., 2001 [KSH91] S. H Kan, “Modeling And Software Development Quality”, IBM Systems Journal, Vol 30, No. 3, 1991 [DBO03] J. B. Dabney, G. Barber, and D. Ohi, Computing Direct Return on Investment of Software Independent Verification and Validation, Titan Inc., Fairmont, WV, 2003. [BAB00] B. Boehm, C. Abts, A. W. Brown, S. Chulani, B. Clark, E. Horowitz, R. Madachy, D. Reifer, B. Steece, Software Cost Estimation with COCOMO II, Prentice Hall, Upper Saddle River, NJ, 2000. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 21 CM Number: GSFC & NRC IVV-05-137 Appendix A – BBN Diagrams This appendix contains the complete set of BBN diagrams for each phase. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 22 CM Number: GSFC & NRC IVV-05-137 Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 23 CM Number: GSFC & NRC IVV-05-137 Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 24 CM Number: GSFC & NRC IVV-05-137 Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 25 CM Number: GSFC & NRC IVV-05-137 Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 26 CM Number: GSFC & NRC IVV-05-137 Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 27 CM Number: GSFC & NRC IVV-05-137 Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 28 CM Number: GSFC & NRC IVV-05-137 Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 29 CM Number: GSFC & NRC IVV-05-137 Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 30 CM Number: GSFC & NRC IVV-05-137 Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 31 CM Number: GSFC & NRC IVV-05-137 Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 32 CM Number: GSFC & NRC IVV-05-137 Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 33 CM Number: GSFC & NRC IVV-05-137 Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 34 CM Number: GSFC & NRC IVV-05-137 Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 35 CM Number: GSFC & NRC IVV-05-137 Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 36 CM Number: GSFC & NRC IVV-05-137 Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 37 CM Number: GSFC & NRC IVV-05-137 Appendix B – BBN Input Definitions B.1 Requirements Issue Subnet Name Characteristic Description Requirements Defect Introduction User system expertise Experience level of system users with similar (or the same) systems or solution approach. Maturity level of users (or representatives such as system engineering with equivalent knowledge) in understanding technical aspects of system to be implemented and the scenarios in which it will be used. User involvement Degree to which the system users are involved in the requirements definition process and the timeliness of that involvement. Note that a high score here requires involvement or representation (by system engineering, for example, with equivalent knowledge) of all key system users, not just system operators. Heritage Relative novelty (to the developer or user) of the application/mission or the solution approach. For example, entry GN&C for a new vehicle where all algorithms are adapted from Shuttle would be high heritage (score of 10) (provided the new mission is very similar to the Shuttle mission), completely new algorithms or new application would be low heritage (score of 1). Quality of User Input Effectiveness and timeliness of user involvement in assuring that requirements meet end user needs. System documentation quality Qualitative estimate of the quality (completeness, correctness, and consistency) of system documentation from which software requirements may be derived. Problem complexity Qualitative estimate of overall system/problem complexity. Related to technical difficulty to define, required interfaces (developer and system), and unity of users. Not correlated with code complexity metrics. Simple (10) to complex (1) Requirements stability How stable are requirements? The more stable, the fewer changes and the lower the risk of requirements errors. Requirements Problem Space Overall susceptibility of the problem space to introduction of requirements defects. Represents the difficulty, based on the complexity of the problem being solved and the quality and stability of documentation describing the problem, in deriving a correct set of requirements. Dev staff experience level Average experience level of development staff, not specific to the problem domain, but overall experience in software development for the domain type (e.g., real-time embedded flight, financial, ground, manned , etc). Dev domain experience Average experience of development staff in the specific application domain (e.g., laser guidance system, space telescope, crew rescue, etc). Consider all individual domains within the system (e.g., space telescope will require GN&C, optics, propulsion, system management, telemetry, etc). Dev schedule pressure How much margin is in the development schedule? Assessment of flexibility in end date. A higher number indicates developers have plenty of time to complete their work, a lower number indicates developers are consistently Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 38 CM Number: GSFC & NRC IVV-05-137 rushed to deliver products. Dev budget margin How tight is the development budget? Assessment of flexibility in cost growth. External Constraint Pressure Overall influence level of external constraints related to schedule and budget. How strong is the pressure to proceed without fix in spite of schedule or budget problems? Process effectiveness actions How much emphasis does development management place on quality (as opposed to productivity)? For example, does development management ensure that review action items are tracked, and encourage extra analysis of suspected problems? Assessment of effectiveness of process problem reports (e.g., a formal mechanism to document, correct and publish discrepancies in following the process), process improvement actions, activity of a board to assess effectiveness of process, etc. An assessment of the 'aliveness' of the developer process and attention to making it work to produce better products. Dev quality organization How effective is the embedded quality organization? This measure includes consideration of size, breadth and depth of capability applied to this project, and level of authority granted to the quality organization applied to this project. Process Adherence How well is the development staff likely to adhere to the documented requirements development process? Should be based on knowledge of schedule and budget constraints, level of activity related to process effectiveness, and the quality of organization enforcing the process. Turnover Experienced or historical rate of change of staff involved with requirements development. A higher number indicates little turnover, a lower number indicates a lot of turnover. Staff level Assessment of whether the quantity and distribution across domains of staff is sufficient for the problem space. An adequate staff level should receive a very high score. The worst staff level with respect to work required ever seen by the evaluator would receive a score of 1. Resource Availability Measure of the degree to which the size of the development staff is sufficient and stable in the terms of longevity on the project. Staff Ability Overall ability level of the requirements development staff, in terms of size, development experience, domain experience and turnover with respect to the problem at hand. Process definition, product standards, quality criteria This is an overall assessment of the effectiveness of the development process related to development of requirements. The assessment should include methods for requirements elicitation, coordination, documentation, and validation. It should correlate fairly well with CMM level, but includes an assessment of what is really happening in addition to what is documented. Dev tools Degree to which the developer uses tools in developing and analyzing requirements. Tools here include requirements management tools (DOORs, for example), traceability tools, simulations, process support, etc. Process Rigor How rigorous can the process be expected to be? Is the process well founded (related to CMM level), is it supported by a good set of tools, does the development organization pay Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 39 CM Number: GSFC & NRC IVV-05-137 attention to and follow the process? Requirements quality Overall relative measure of requirements quality, and hence, probable relative defect density. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 40 CM Number: GSFC & NRC IVV-05-137 Developer Requirements Defect Removal Efficiency Cost/schedule pressure Similar to the cost and schedule pressure on requirements development. In some cases, the developer requirements inspection/checking process is more tightly constrained than requirements development. Process effectiveness actions Same as process effectiveness actions input for requirements development unless there is something unique about the group that reviews the products to find defects. Stakeholder involvement Degree to which users are involved in the requirements validation process. This is similar to but can be different from the user involvement for requirements development. Defect Removal Effort/Focus How much effort or focus does the developer apply to defect identification and removal during the requirements phase? Are users involved to make sure requirements are right, is there a rush to complete deliverables, is the defect removal process effective and adhered to? Simulation tool use Degree to which the development organization uses simulation to understand and validate requirements. This does not refer to simulations to verify requirements but rather to simulations of the conceptual operation of the system or particular subsystems to assure the right requirements are being specified. This could be a model of the expected environment and the system/subsystem reaction to it at an abstract rather than implementation level. Completeness of those simulations. Coverage of novel or complex areas. Reasoning tool use Degree of use of reasoning tools. By reasoning tools, we mean tools such as automation of formal methods, model checking, etc. Representation tool use Degree to which the development organization uses representational tools to automate and support requirements validation. The best example of representation tools is UML. Tools effectiveness Overall level and effectiveness of tool use in the requirements analysis process. Review quality Quality of the embedded review process. For example, do the developers do formal, detailed requirements reviews? Do they use entry and exit criteria, track and board issues, etc? Do they hold walkthroughs with broad scope support? Techniques Employed effectiveness Overall level and effectiveness of defect removal techniques used by the developer. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 41 CM Number: GSFC & NRC IVV-05-137 IV&V Requirements Defect Removal Efficiency Data timing Relative measure of timeliness of data delivery by the developer to IV&V. Late data delivery places time restrictions on IV&V and can impede IV&V. Data completeness Measure of completeness of data submitted to IV&V. For example, if important information is withheld due to proprietary concerns, that will impede IV&V. Another example is that documents are delivered with sections incomplete or many TBDs or document does not have content expected for the point in lifecycle at which it was delivered. Data availability Likelihood that sufficient and timely input data will be available to IV&V when it is needed. IV&V environment Overall operating environment given to IV&V. Are artifacts on time and do they contain what is needed for efficient evaluation, are issues considered fairly and in a timely manner? Direct dev access How much access does IV&V have to the developers? If IV&V has to work through several bureaucratic levels to get information from the developers or discuss issues or risks, that will decrease IV&V effectiveness. Project acceptance How well does the project accept the IV&V participation? A high score indicates the project participants exhibit a belief that use of IV&V will lead to higher mission success probability. Project management has issued directives exhibiting the right intent in dealing with the IV&V participation. Dev cooperation How cooperative are developers, in general, in responding to IV&V requests and suggestions? Do all IV&V issues get immediate attention, or does the development organization tend to ignore or avoid dealing with IV&V concerns? Developers exhibit attention to IV&V concerns and timely/meaningful response. While project acceptance is having the right intent, cooperation is doing the right thing. Project/IV&V interface How efficient is the interface between the developer and IV&V considering access, cooperation and acceptance. IV&V experience level Average experience level of IV&V staff. This is a measure of how much experience the IV&V staff has in the area of IV&V and related activities. This does not consider domain experience level which is considered in another input. IV&V domain experience/expertise level Degree of experience of IV&V staff with the application domain. Consider all individual domains within the system (e.g., GN&C, power, C&DH, ECLSS, terrain mobility, thermal, etc). This item should evaluate the extent of applicable domain knowledge within the IV&V staff. IV&V Staffing level How appropriate is the size of the IV&V staff to the analysis tasks that need to be performed based on the CARA results? Too few (or too many) would lower the rating. An adequate staff level should receive a very high score. Descoping the tasks from the CARA results to match a low staff level would get a low score. Schedule pressure How much schedule pressure does IV&V face. This factor could be correlated with data timing, but not necessarily. A low score indicates there was heavy schedule pressure. A high score indicates there was little schedule pressure. Resource availability Availability of all human resources needed to perform IV&V. A Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 42 CM Number: GSFC & NRC IVV-05-137 sufficient set of personnel is available to perform the IV&V activities including consideration of the schedule pressure from the project (e.g., very short turnaround of document review expected), size of staff and personnel turnover. IV&V Staff ability Overall ability of IV&V staff to perform the IV&V tasks defined by the CARA analysis. Simulation tool use Degree to which the IV&V organization uses simulation to understand and validate requirements. This does not refer to simulations to verify requirements but rather to simulations of the conceptual operation of the system or particular subsystems to assure the right requirements are being specified. This could be a model of the expected environment and the system/subsystem reaction to it at an abstract rather than implementation level. Completeness of those simulations. Coverage of novel or complex areas. Reasoning tool use Degree of use of reasoning tools by the IV&V organization. By reasoning tools, we mean tools such as automation of formal methods, model checking, traceability, etc. Representation tool use Degree to which the IV&V organization uses representational tools to automate and support requirements validation. The best example of representation tools is UML. IV&V tool effectiveness An assessment of effectiveness of all tools employed to support requirements validation including simulations, formal method support, UML, requirements trace etc. Analyses employed An assessment of the effectiveness of the types of analyses planned such as scenario analyses, requirements reading, comparative analysis, formal method Developer review participation Assessment of effectiveness of plans for participating in developer milestone reviews, inspections, walkthroughs etc. Assessment would include timing in which IV&V enters the project (SRR, SDR SSR, etc) and its impact on effectiveness. IV&V techniques employed effectiveness Overall level and effectiveness of defect removal techniques used by IV&V Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 43 CM Number: GSFC & NRC IVV-05-137 B.2 Design Issue Subnet Name Characteristic Description Design Defect Introduction User system expertise Experience level of system users with similar (or the same) systems or solution approach. Maturity level of users (or representatives such as system engineering with equivalent knowledge) in understanding technical aspects of system to be implemented and the scenarios in which it will be used. User involvement Degree to which the system users are involved in the requirements definition process and the timeliness of that involvement. Note that a high score here requires involvement or representation (by system engineering, for example, provided they have equivalent knowledge) of all key system users, not just system operators. Heritage Relative novelty (to the developer or user) of the application/mission or the solution approach. For example, entry GN&C for a new vehicle where all algorithms are adapted from Shuttle would be high heritage (score of 10) (provided the new mission is very similar to the Shuttle mission), completely new algorithms or new application would be low heritage (score of 1). Quality of User Input Effectiveness and timeliness of user involvement in assuring that design meets end user needs. Problem complexity Qualitative estimate of overall system/problem complexity. Related to technical difficulty to define system, required interfaces (among the developer and system), and unity of users. Not correlated with code complexity metrics. Simple (10) to complex (1 Design stability How stable is the design? This is driven by the heritage of the approach, support from the users, requirements stability, and the proneness of the design group to make errors. The more stable, the fewer changes and the lower the risk of introduction of new errors. Design Problem Space Overall susceptibility of the problem space to introduction of design defects. Represents the difficulty, based on the complexity of the problem being solved and the quality and stability of documentation describing the problem, in deriving a correct design. Dev staff experience level Average experience level of development staff, not specific to the problem domain, but overall experience in software development for the domain type (e.g., real-time embedded flight, financial, ground, manned , etc). Includes staff experience with the language chosen for implementation and the operating system in use. Also includes experience with the processor to be used, and the hardware to be interfaced with Dev domain experience Average experience of development staff in the specific application domain (e.g., laser guidance system, space telescope, crew rescue, etc). Consider all individual domains within the system (e.g., space telescope will require GN&C, optics, propulsion, system management, telemetry, etc). Dev schedule pressure How much margin is in the development schedule? Assessment of flexibility in end date. A higher number indicates developers have plenty of time to complete their work, a lower number Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 44 CM Number: GSFC & NRC IVV-05-137 indicates developers are consistently rushed to deliver products. Dev budget margin How tight is the development budget? Assessment of flexibility in cost growth. External Constraint Pressure Overall influence level of external constraints related to schedule and budget. How strong is the pressure to proceed without fix in spite of schedule or budget problems? Process effectiveness actions How much emphasis does development management place on quality (as opposed to productivity)? For example, does development management ensure that review action items are tracked, and encourage extra analysis of suspected problems? Assessment of effectiveness of process problem reports (e.g., a formal mechanism to document, correct and publish discrepancies in following the process), process improvement actions, activity of a board to assess effectiveness of process, etc. An assessment of the 'aliveness' of the developer process and attention to making it work to produce better products. Dev quality organization How effective is the embedded quality organization? This measure includes consideration of size, breadth and depth of capability applied to this project, and level of authority granted to the quality organization applied to this project Process Adherence How well is the development staff likely to adhere to the documented design development process? Should be based on knowledge of schedule and budget constraints, level of activity related to process effectiveness, and the quality of organization enforcing the process. Turnover Experienced or historical rate of change of staff involved with design development. A higher number indicates little turnover, a lower number indicates a lot of turnover Staff level Assessment of whether the quantity and distribution across domains of staff is sufficient for the problem space. An adequate staff level should receive a very high score (not average). The worst staff level with respect to work required ever seen by the evaluator would receive a score of 1. Resource Availability Measure of the degree to which the size of the development staff is sufficient and stable in terms of longevity on the project Staff Ability Overall ability level of the design development staff, in terms of size, development experience, domain experience and turnover with respect to the problem at hand Process definition, product standards, quality criteria This is an overall assessment of the effectiveness of the development process related to development of design. The assessment should include methods for design derivation, coordination, documentation, and validation. It should correlate fairly well with CMM level, but includes an assessment of what is really happening in addition to what is documented. Dev tools Degree to which the developer uses tools in developing and analyzing design. Tools here include formal method support tools (Stateflow, for example), traceability tools, simulations, process support, etc. The assessment should include evaluation of tool support to assure conformance of design to requirements (i.e., do tools support a seamless transition to design from requirements or are the design support tools completely independent of requirements support tools) Process Rigor How rigorous can the process be expected to be? Is the process well founded (related to CMM level), is it supported by a Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 45 CM Number: GSFC & NRC IVV-05-137 good set of tools, does the development organization pay attention to and follow the process? Design quality Overall relative measure of design quality, and hence, probable relative defect density. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 46 CM Number: GSFC & NRC IVV-05-137 Developer Design Defect Removal Efficiency External Constraint Pressure Similar to the cost and schedule pressure on design development. In some cases, the developer design inspection/checking process is more tightly constrained than design development. Process effectiveness actions Same as process effectiveness actions input for design development unless there is something unique about the group that reviews the products to find defects. Stakeholder involvement Degree to which users are involved in the design validation process. This is similar to but can be different from the user involvement for design development Defect Removal Effort/Focus How much effort or focus does the developer apply to defect identification and removal during the design phase? Are users involved to make sure design is right, is there a rush to complete deliverables, is the defect removal process effective and adhered to? Simulation tool use Degree to which the development organization uses simulation to understand and validate design. This does not refer to simulations to verify requirements but rather to simulations of the conceptual operation of the system or particular subsystems to assure the right design is being specified. This could be a model of the expected environment and the system/subsystem reaction to it at an abstract rather than implementation level. Completeness of those simulations. Coverage of novel or complex areas. Reasoning tool use Degree of use of reasoning tools. By reasoning tools, we mean tools such as automation of formal methods, model checking, etc. Extent of executability of design. Ability of tools to point to defects in design characteristics such as sequencing, timing, homogeneity, etc. Representation tool use Degree to which the development organization uses representational tools to automate and support design validation. The best example of representation tools is UML Tools effectiveness Overall level and effectiveness of tool use in the design analysis process for removal of defects. Review quality Quality of the embedded review process. For example, do the developers do formal, detailed design reviews? Do they use entry and exit criteria, track and board issues, etc? Do they hold walkthroughs with broad scope support? Techniques Employed effectiveness Overall level and effectiveness of defect removal techniques used by the developer Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 47 CM Number: GSFC & NRC IVV-05-137 IV&V Design Defect Removal Efficiency Data timing Relative measure of timeliness of data delivery by the developer to IV&V. Late data delivery places time restrictions on IV&V and can impede IV&V. Data completeness Measure of completeness of data submitted to IV&V. For example, if important information is withheld due to proprietary concerns, that will impede IV&V. Also, documents delivered with sections incomplete or many TBDs or lacking content expected for the point in lifecycle at which it was delivered. Data availability Likelihood that sufficient and timely input data will be available to IV&V when it is needed. IV&V environment Overall operating environment given to IV&V. Are artifacts on time and do they contain what is needed for efficient evaluation, are issues considered fairly and in a timely manner? Direct dev access How much access does IV&V have to the developers? If IV&V has to work through several bureaucratic levels to get information from the developers or discuss issues or risks, that will decrease IV&V effectiveness. Project acceptance How well does the project accept the IV&V participation? A high score indicates the project participants exhibit a belief that use of IV&V will lead to higher mission success probability. Project management has issued directives exhibiting the right intent in dealing with the IV&V participation. Dev cooperation How cooperative are developers, in general, in responding to IV&V requests and suggestions? Do all IV&V issues get immediate attention, or does the development organization tend to ignore or avoid dealing with IV&V concerns? Developers exhibit attention to IV&V concerns and timely/meaningful response. While project acceptance is having the right intent, cooperation is doing the right thing. Project/IV&V interface How efficient is the interface between the developer and IV&V considering access, cooperation and acceptance. IV&V experience level Average experience level of IV&V staff. This is a measure of how much experience the IV&V staff has in the area of IV&V and related activities. This does not consider domain experience level which is considered in another input. This considers experience in implementation language, operating system in use, processing platform and hardware interfaces IV&V domain experience/expertise level Degree of IV&V staff experience with the application domain. Consider all individual domains within the system (GN&C, power, C&DH, ECLSS, terrain mobility, thermal, etc). This item should evaluate the extent of applicable domain knowledge within the IV&V staff. IV&V Staffing level How appropriate is IV&V staff size to the needed analysis tasks, based on the CARA results? Too few (or too many) would lower the rating. An adequate staff level should receive a very high score (not average). Descoping the tasks from the CARA results to match a low staff level would get a low score Schedule pressure How much schedule pressure does IV&V face. This factor could be correlated with data timing, but not necessarily. A low score indicates there was heavy schedule pressure. A high score indicates there was little schedule pressure Resource availability Availability of all human resources needed to perform IV&V. A Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 48 CM Number: GSFC & NRC IVV-05-137 sufficient set of personnel is available to perform the IV&V activities including consideration of the schedule pressure from the project (e.g., very short turnaround of document review expected), size of staff and personnel turnover. IV&V Staff ability Overall ability of IV&V staff to perform the IV&V tasks defined by the CARA analysis. Simulation tool use Degree to which the IV&V organization uses simulation to understand and validate design. This does not refer to simulations to verify requirements but rather to simulations of the conceptual operation of the system or particular subsystems to assure the right design is being specified. This could be a model of the expected environment and the system/subsystem reaction to it at an abstract rather than implementation level. Completeness of those simulations. Coverage of novel or complex areas. Reasoning tool use Degree of use of reasoning tools by the IV&V organization. By reasoning tools, we mean tools such as automation of formal methods, model checking, traceability, etc. Representation tool use Degree to which the IV&V organization uses representational tools to automate and support design validation. The best example of representation tools is UML or a combination of design language and graphics such as AADL. This represents the degree to which IV&V develops their own design representation for analysis and the degree to which IV&V has the tools to understand the developer’s representation of the design. For the case in which design documentation is very poor and code navigation tools are used to support understanding the design, those tools may be scored here. IV&V tool effectiveness An assessment of effectiveness of all tools employed to support design validation including simulations, formal method support, UML, requirements trace etc. Analyses employed An assessment of the effectiveness of the types of analyses planned such as scenario analyses, design reading, comparative analysis, and formal method. Developer review participation Assessment of effectiveness of plans for participating in developer milestone reviews, inspections, walkthroughs etc. Assessment would include timing in which IV&V enters the project (SRR, SDR SSR, etc) and its impact on effectiveness. IV&V techniques employed effectiveness Overall level and effectiveness of defect removal techniques used by IV&V Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 49 CM Number: GSFC & NRC IVV-05-137 B.3 Code Issue Subnet Name Characteristic Description Code Defect Introduction Design Quality Quantitative estimate of design quality coming from the design defect introduction subnet Heritage Relative novelty (to the developer or user) of the application/mission or the solution approach. For example, entry GN&C for a new vehicle where all algorithms are adapted from Shuttle would be high heritage (score of 10) (provided the new mission is very similar to the Shuttle mission), completely new algorithms or new application would be low heritage (score of 1). Problem complexity Qualitative estimate of overall system/problem complexity. Related to technical difficulty to define system, required interfaces (among the developer and system), and unity of users. Not correlated with code complexity metrics. Simple (10) to complex (1). Code stability How stable is the code? This is driven by the heritage of the approach, design quality, and the proneness of the code group to make errors. The more stable, the fewer changes and the lower the risk of introduction of new errors. Code Problem Space Overall susceptibility of the problem space to introduction of code defects. Represents the difficulty, based on the complexity of the problem being solved, the algorithms chosen, and the quality and stability of documentation describing the problem, in deriving a correct implementation. Dev staff experience level Average experience level of development staff, not specific to the problem domain, but overall experience in software development for the domain type (e.g., real-time embedded flight, financial, ground, manned, etc). Includes staff experience with the language chosen for implementation and the operating system in use. Also includes experience with the processor to be used, and the hardware to be interfaced with Dev domain experience Average experience of development staff in the specific application domain (e.g., laser guidance system, space telescope, crew rescue, etc). Consider all individual domains within the system (e.g., space telescope will require GN&C, optics, propulsion, system management, telemetry, etc) Dev schedule pressure How much margin is in the development schedule? Assessment of flexibility in end date. A higher number indicates developers have plenty of time to complete their work, a lower number indicates developers are consistently rushed to deliver products. Dev budget margin How tight is the development budget? Assessment of flexibility in cost growth. External Constraint Pressure Overall influence level of external constraints related to schedule and budget. How strong is the pressure to proceed without fix in spite of schedule or budget problems? Process effectiveness actions How much emphasis does development management place on quality (as opposed to productivity)? For example, does development management ensure that review action items are tracked, and encourage extra analysis of suspected problems? Assessment of effectiveness of process problem reports (e.g., a formal mechanism to document, correct and publish Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 50 CM Number: GSFC & NRC IVV-05-137 discrepancies in following the process), process improvement actions, activity of a board to assess effectiveness of process, etc. An assessment of the 'aliveness' of the developer process and attention to making it work to produce better products. Dev quality organization How effective is the embedded quality organization? This measure includes consideration of size, breadth and depth of capability applied to this project, and level of authority granted to the quality organization applied to this project. Process Adherence How well is the development staff likely to adhere to the documented code development process? Should be based on knowledge of schedule and budget constraints, level of activity related to process effectiveness, and the quality of organization enforcing the process. Turnover Experienced or historical rate of change of staff involved with code development. A higher number indicates little turnover, a lower number indicates a lot of turnover. Staff level Assessment of whether the quantity and distribution across domains of staff is sufficient for the problem space. An adequate staff level should receive a very high score (not average). The worst staff level with respect to work required ever seen by the evaluator would receive a score of 1. Resource Availability Measure of the degree to which the size of the development staff is sufficient and stable in terms of longevity on the project. Staff Ability Overall ability level of the code development staff, in terms of size, development experience, domain experience and turnover with respect to the problem at hand Process definition, product standards, quality criteria This is an overall assessment of the effectiveness of the development process related to development of code. The assessment should include methods for code development and debug, unit test, integration test, commenting, and coding standards. It should correlate fairly well with CMM level, but includes an assessment of what is really happening in addition to what is documented. Dev tools Degree to which the developer uses tools in developing and analyzing code. Tools here include formal method support tools, traceability tools, code navigation, process support, static and dynamic code analysis tools, etc. The assessment should include evaluation of tool support to assure conformance of code to design and requirements (i.e., do tools support a seamless transition to code from design or are the code support tools completely independent of design and requirements support tools). Process Rigor How rigorous can the process be expected to be? Is the process well founded (related to CMM level), is it supported by a good set of tools, does the development organization pay attention to and follow the process? Code quality Overall relative measure of code quality, and hence, probable relative defect density. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 51 CM Number: GSFC & NRC IVV-05-137 Developer Code Defect Removal Efficiency External Constraint Pressure Similar to the cost and schedule pressure on code development. In some cases, the developer code inspection/checking process is more tightly constrained than code development. Process effectiveness actions Same as process effectiveness actions input for code development unless there is something unique about the group that reviews the products to find defects Defect Removal Effort/Focus How much effort or focus does the developer apply to defect identification and removal during the code phase? Are users involved to make sure code is right, is there a rush to complete deliverables, is the defect removal process effective and adhered to? Static checkers Degree and effectiveness of use of automated defect locator tools such as Lint, Coverity, or Polyspace. These are tools that do a static analysis of source code and identify potentially erroneous constructs. Reasoning tool use Degree of use of reasoning tools. By reasoning tools, we mean tools such as automation of formal methods, model checking, etc. An example would be SPIN or Rational Rose that use finite state machine representation in Promela or UML to provide an executable model. Ability of tools to point to defects in code characteristics such as sequencing, timing, homogeneity, etc. Analytical tool use Degree to which the development organization uses analytical tools to understand and write/modify code. These include tools such as target system debuggers, code navigators (e.g., Understand), or flow charters Tools effectiveness Overall level and effectiveness of tool use in the code analysis process for removal of defects. Review quality Quality of the embedded review process. For example, do the developers do formal, detailed code reviews? Do they use entry and exit criteria, track and board issues, etc? Do they hold walkthroughs with broad scope support? Techniques Employed effectiveness Overall level and effectiveness of defect removal techniques used by the developer. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 52 CM Number: GSFC & NRC IVV-05-137 IV&V Code Defect Removal Efficiency Data timing Relative measure of timeliness of data delivery by the developer to IV&V. Late data delivery places time restrictions on IV&V and can impede IV&V. Data completeness Measure of completeness of data submitted to IV&V. For example, if important information is withheld due to proprietary concerns, that will impede IV&V. Another example is that source is delivered with sections incomplete or many TBDs or source does not have content expected for the point in lifecycle at which it was delivered. Data availability Likelihood that sufficient and timely input data will be available to IV&V when it is needed. IV&V environment Overall operating environment given to IV&V. Are artifacts delivered on time and do they contain what is needed for efficient evaluation, are issues considered fairly and in a timely manner? Direct dev access How much access does IV&V have to the developers? If IV&V has to work through several bureaucratic levels to get information from the developers or discuss issues or risks, that will decrease IV&V effectiveness. Project acceptance How well does the project accept the IV&V participation? A high score indicates the project participants exhibit a belief that use of IV&V will lead to higher mission success probability. Project management has issued directives exhibiting the right intent in dealing with the IV&V participation. Dev cooperation How cooperative are developers, in general, in responding to IV&V requests and suggestions? Do all IV&V issues get immediate attention, or does the development organization tend to ignore or avoid dealing with IV&V concerns? Developers exhibit attention to IV&V concerns and timely/meaningful response. While project acceptance is having the right intent, cooperation is doing the right thing. Project/IV&V interface Average experience level of IV&V staff. This is a measure of how much experience the IV&V staff has in the area of IV&V and related activities. This does not consider domain experience level which is considered in another input. This considers experience in implementation language, operating system in use, processing platform and hardware interfaces. IV&V experience level Average experience level of IV&V staff. This is a measure of how much experience the IV&V staff has in the area of IV&V and related activities. This does not consider domain experience level which is considered in another input. This considers experience in implementation language, operating system in use, processing platform and hardware interfaces. IV&V domain experience/expertise level Degree of experience of IV&V staff with the application domain. Consider all individual domains within the system (e.g., GN&C, power, C&DH, ECLSS, terrain mobility, thermal, etc). This item should evaluate the extent of applicable domain knowledge within the IV&V staff. IV&V Staffing level How appropriate is the size of the IV&V staff to the analysis tasks that need to be performed based on the CARA results? Too few (or too many) would lower the rating. An adequate staff level should receive a very high score (not average). Descoping Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 53 CM Number: GSFC & NRC IVV-05-137 the tasks from the CARA results to match a low staff level would get a low score. Schedule pressure How much schedule pressure does IV&V face. This factor could be correlated with data timing, but not necessarily. A low score indicates there was heavy schedule pressure. A high score indicates there was little schedule pressure. Resource availability Availability of all human resources needed to perform IV&V. A sufficient set of personnel is available to perform the IV&V activities including consideration of the schedule pressure from the project (e.g., very short turnaround of document review expected), size of staff and personnel turnover. IV&V Staff ability Overall ability of IV&V staff to perform the IV&V tasks defined by the CARA analysis. Static checkers Degree and effectiveness of use of automated defect locator tools such as Lint, Coverity, or Polyspace. These are tools that do a static analysis of source code and identify potentially erroneous constructs. Reasoning tool use Degree of use of reasoning tools. By reasoning tools, we mean tools such as automation of formal methods, model checking, etc. An example would be SPIN or Rational Rose that use finite state machine representation in Promela or UML to provide an executable model. Ability of tools to point to defects in code characteristics such as sequencing, timing, homogeneity, etc. Analytical tool use Degree to which the IV&V organization uses analytical tools to understand characteristics, structure, and behavior of code. These include tools such as target system debuggers, code navigators (e.g., Understand), or flowcharters. IV&V tool effectiveness An assessment of effectiveness of all tools employed to support code validation including analytical tools, formal method support, automated static analyzers, requirements trace etc. Analyses employed An assessment of the effectiveness of the types of analyses planned such as Lint analyses, code reading, formal method, etc. Developer review participation Assessment of effectiveness of plans for participating in developer milestone reviews, inspections, walkthroughs etc. Assessment would include timing in which IV&V enters the project (SRR, SDR SSR, etc) and its impact on effectiveness. IV&V techniques employed effectiveness Overall level and effectiveness of defect removal techniques used by IV&V Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 54 CM Number: GSFC & NRC IVV-05-137 B.4 Test Issue Subnet Name Characteristic Description Test Defect Introduction Test Plan Quality Quantitative estimate of test plan quality. Consider whether the test plan contains clear definition of test cases, their objectives, complete description of test equipment and its capabilities, coverage of requirements, and a schedule for test. Requirements Stability Estimate of rate of change of the requirements. If the requirements are changing rapidly, it becomes difficult to define test cases to verify the requirements. The requirements stability comes from the requirements phase BBN. Test Description Quality Estimate of the quality of the test description. Does this document contain a good design for the test cases? Does it contain details as to how each test case will be implemented, what equipment it will use and how, what constraints are imposed, success criteria, inputs for each test case, models to be used and their required fidelity, data to be collected and how that will be accomplished (method for access of data), required control of target computer, and analyses to be completed and how that will be accomplished. Test Procedures Stability How stable are the procedures? This is driven by the test plan quality, test description quality, requirements stability, and the proneness of the test group to make errors. The more stable, the fewer changes and the lower the risk of introduction of new errors. Problem Complexity Qualitative estimate of overall system/problem complexity. Related to technical difficulty to define, required interfaces (developer and system), and unity of users. Not correlated with code complexity metrics. Simple (10) to complex (1) Test Problem Space Overall susceptibility of the problem space to introduction of test defects. Represents the difficulty, based on the complexity of the problem being solved, and the quality and stability of documentation describing the problem, in deriving a correct implementation (set of test procedures). Dev staff experience level Average experience level of development staff, not specific to the problem domain, but overall experience in software testing for the domain type (e.g., real-time embedded flight, financial, ground, manned, etc). Includes staff experience with the language chosen for implementation and the operating system in use. Also includes experience with the processor to be used, and the hardware to be interfaced with. Also includes experience with the test equipment to be used and development of test cases for the type of software being tested. Dev domain experience Average experience of development staff in testing the specific application domain (e.g., laser guidance system, space telescope, crew rescue, etc). Consider all individual domains within the system (e.g., space telescope will require GN&C, optics, propulsion, system management, telemetry, etc). Dev schedule pressure How much margin is in the testing schedule? Assessment of flexibility in end date. A higher number indicates developers have plenty of time to complete their work, a lower number Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 55 CM Number: GSFC & NRC IVV-05-137 indicates developers are consistently rushed to deliver products. Dev budget margin How tight is the development budget? Assessment of flexibility in cost growth. External Constraint Pressure Overall influence level of external constraints related to schedule and budget. How strong is the pressure to proceed without fix in spite of schedule or budget problems? Process effectiveness actions How much emphasis does development management place on quality (as opposed to productivity)? For example, does development management ensure that review action items are tracked and encourage extra analysis of suspected problems? Assessment of effectiveness of process problem reports (e.g., a formal mechanism to document, correct and publish discrepancies in following the process), process improvement actions, activity of a board to assess effectiveness of process, etc. An assessment of the 'aliveness' of the developer process and attention to making it work to produce better products. Dev quality organization How effective is the embedded quality organization? This measure includes consideration of size, breadth and depth of capability applied to this project, and level of authority granted to the quality organization applied to this project. Process Adherence How well is the development staff likely to adhere to the documented test development process? Should be based on knowledge of schedule and budget constraints, level of activity related to process effectiveness, and the quality of organization enforcing the process. Turnover Experienced or historical rate of change of staff involved with test development. A higher number indicates little turnover, a lower number indicates a lot of turnover. Staff level Assessment of whether the quantity and distribution across domains of staff is sufficient for the problem space. An adequate staff level should receive a very high score (not average). The worst staff level with respect to work required ever seen by the evaluator would receive a score of 1. Resource Availability Measure of the degree to which the size of the development staff is sufficient and stable in terms of longevity on the project. Staff Ability Overall ability level of the test development staff, in terms of size, development experience, domain experience and turnover with respect to the problem at hand. Process definition, product standards, quality criteria This is an overall assessment of the effectiveness of the development process related to development of test. The assessment should include methods for test procedure development and debug, effective use of the test equipment, effective methods for data acquisition, test execution, documentation, and methods for development of analyses for verification. It should correlate fairly well with CMM level, but includes an assessment of what is really happening in addition to what is documented. Dev tools Degree to which the developer uses tools in developing and analyzing testing. Tools here include requirements coverage support, traceability tools, code coverage support, test case generation, execution control, etc. Process Rigor How rigorous can the process be expected to be? Is the process well founded (related to CMM level), is it supported by a good set of tools, does the development organization pay Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 56 CM Number: GSFC & NRC IVV-05-137 attention to and follow the process? Test quality Overall relative measure of test quality, and hence, probable relative defect density. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 57 CM Number: GSFC & NRC IVV-05-137 Developer Test Defect Removal Efficiency External Constraint Pressure Similar to the cost and schedule pressure on test procedure development. In some cases, the developer test procedure/checking process is more tightly constrained than test procedure development because of late development. Process effectiveness actions Same as process effectiveness actions input for test procedure development unless there is something unique about the group that reviews the products to find defects. Defect Removal Effort/Focus How much effort or focus does the developer apply to defect identification and removal during the test phase? Are external organizations involved to make sure test procedures are right, is there a rush to complete deliverables, is the defect removal process effective and adhered to? Simulation Validity Degree of verification of simulations used to test flight software requirements. The flight software is not verified unless the models used to verify it are shown to be correct. Test Case Automation Degree and effectiveness of use of automated test case generation and requirements coverage analysis tools. Analytical tool use Degree to which the development organization uses analytical tools to understand and validate design and implementation. These include tools such as target system control for breakpoints and data access or code navigators. This also includes tools to support understanding of interfaces (in particular sequence and timing of data exchange, interrupt timing and frequency, potential range and quantity of data, operational dynamics of interfacing hardware, and environmental requirements of interfacing hardware) to support the definition of effective test cases. Coverage of novel or complex areas. Tools effectiveness Overall level and effectiveness of tool use in the test procedure analysis process for removal of defects. Review quality Quality of the embedded review process. For example, do the developers do formal, detailed test procedure reviews? Do they use entry and exit criteria, track and board issues, etc? Do they hold walkthroughs with broad scope support? Techniques Employed effectiveness Overall level and effectiveness of defect removal techniques used by the developer. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 58 CM Number: GSFC & NRC IVV-05-137 IV&V Test Defect Removal Efficiency Data timing Relative measure of timeliness of data delivery by the developer to IV&V. Late data delivery places time restrictions on IV&V and can impede IV&V. Data completeness Measure of completeness of data submitted to IV&V. For example, if important information is withheld due to proprietary concerns, that will impede IV&V. Another example is that test procedures are delivered with sections incomplete or many TBDs or test description does not have content expected for the point in lifecycle at which it was delivered. Data availability Likelihood that sufficient and timely input data will be available to IV&V when it is needed. IV&V environment Overall operating environment given to IV&V. Are artifacts delivered on time and do they contain what is needed for efficient evaluation, are issues considered fairly and in a timely manner? Direct dev access How much access does IV&V have to the developers? If IV&V has to work through several bureaucratic levels to get information from the developers or discuss issues or risks, that will decrease IV&V effectiveness. Project acceptance How well does the project accept the IV&V participation? A high score indicates the project participants exhibit a belief that use of IV&V will lead to higher mission success probability. Project management has issued directives exhibiting the right intent in dealing with the IV&V participation. Dev cooperation How cooperative are developers, in general, in responding to IV&V requests and suggestions? Do all IV&V issues get immediate attention, or does the development organization tend to ignore or avoid dealing with IV&V concerns? Developers exhibit attention to IV&V concerns and timely/meaningful response. While project acceptance is having the right intent, cooperation is doing the right thing. Project/IV&V interface How efficient is the interface between the developer and IV&V considering access, cooperation and acceptance. IV&V experience level Average experience level of IV&V staff. This is a measure of how much experience the IV&V staff has in the area of IV&V and testing activities. This does not consider domain experience level which is considered in another input. This considers experience in implementation language, operating system in use, processing platform and hardware interfaces. IV&V domain experience/expertise level Degree of experience of IV&V staff with the application domain. Consider all individual domains within the system (e.g., GN&C, power, C&DH, ECLSS, terrain mobility, thermal, etc). This item should evaluate the extent of applicable domain knowledge within the IV&V staff. IV&V Staffing level How appropriate is the size of the IV&V staff to the analysis tasks that need to be performed based on the CARA results? Too few (or too many) would lower the rating. An adequate staff level should receive a very high score (not average). Descoping the tasks from the CARA results to match a low staff level would get a low score. Schedule pressure How much schedule pressure does IV&V face. This factor could be correlated with data timing, but not necessarily. A low score Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 59 CM Number: GSFC & NRC IVV-05-137 indicates there was heavy schedule pressure. A high score indicates there was little schedule pressure. Resource availability Availability of all human resources needed to perform IV&V. A sufficient set of personnel is available to perform the IV&V activities including consideration of the schedule pressure from the project (e.g., very short turnaround of document review expected), size of staff and personnel turnover. IV&V Staff ability Overall ability of IV&V staff to perform the IV&V tasks defined by the CARA analysis. Test Automation Degree and effectiveness of use of coverage analysis tools. Understanding of developer tool use Degree to which the IV&V organization understands the tools to be used by the developer and is able to assess effectiveness of usage. Tool Use An assessment of effectiveness of all tools employed to support test procedure analysis including automation tools and understanding of developer tools. Analyses employed An assessment of the effectiveness of the types of analyses planned such as scenario analyses, simulation, or similarity to determine completeness and correctness of test procedures. Developer review participation Assessment of effectiveness of plans for participating in developer milestone reviews, inspections, walkthroughs etc. Assessment would include timing in which IV&V enters the project (SRR, SDR SSR, etc) and its impact on effectiveness. IV&V techniques employed effectiveness Overall level and effectiveness of defect removal techniques used by IV&V Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 60 CM Number: GSFC & NRC IVV-05-137 B.5 Integration Issue Subnet Parameter Characteristic Name and Description Integration Defect Introduction System Requirements Stability Quantitative estimate of the quality and stability of the system requirements being verified in the integration testing. Stakeholder Involvement Degree to which the system users are involved in the integration test process and the timeliness of that involvement. Note that a high score here requires involvement or representation (by users or system engineering, for example, with equivalent knowledge) of all key system users, not just system operators. Integration Stability Probability of change to the driving requirements and user desires affecting the development of the integration test procedures. Integration Test Plan Quality Quantitative estimate of integration test plan quality. This should include completeness of test case definition in terms of needed inputs, success criteria, needed equipment, needed data access, and requirements coverage. Integration Test Procedures Stability How stable are the procedures? This is driven by the integration test plan quality, integration test description quality, system and interface requirements stability, and the proneness of the test group to make errors. The more stable, the fewer changes and the lower the risk of introduction of new errors. Problem Complexity Qualitative estimate of overall system/problem complexity. Related to technical difficulty to define, required interfaces (developer and system), and unity of users. Not correlated with code complexity metrics. Simple (10) to complex (1) Integration Test Equipment Attributes Rating of capability and complexity of equipment and simulations to be used in performing system level verification. Capability is rated in terms of ability to perform all functions needed as a part of requirements verification. Complexity is rated in terms of ease and reliability of use of the capabilities. Integration Test Problem Space Overall susceptibility of the problem space to introduction of test defects. Represents the difficulty, based on the complexity of the problem being solved, and the quality and stability of documentation describing the problem, in deriving a correct implementation (set of test procedures). Dev staff experience level Average experience level of development staff, not specific to the problem domain, but overall experience in software testing for the domain type (e.g., real-time embedded flight, financial, ground, manned, etc). Includes staff experience with the language chosen for implementation and the operating system in use. Also includes experience with the processor to be used, and the hardware to be interfaced with. Also includes experience with the test equipment to be used and development of test cases for the type of software being tested. Dev domain experience Average experience of development staff in testing the specific application domain (e.g., laser guidance system, space telescope, crew rescue, etc). Consider all individual domains within the system (e.g., space telescope will require GN&C, optics, propulsion, system management, telemetry, etc). Dev schedule pressure How much margin is in the integration testing schedule? Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 61 CM Number: GSFC & NRC IVV-05-137 Assessment of flexibility in end date. A higher number indicates developers have plenty of time to complete their work, a lower number indicates developers are consistently rushed to deliver products. Dev budget margin How tight is the development budget? Assessment of flexibility in cost growth. External Constraint Pressure Overall influence level of external constraints related to schedule and budget. How strong is the pressure to proceed without fix in spite of schedule or budget problems? Process effectiveness actions How much emphasis does development management place on quality (as opposed to productivity)? For example, does development management ensure that review action items are tracked and encourage extra analysis of suspected problems? Assessment of effectiveness of process problem reports (e.g., a formal mechanism to document, correct and publish discrepancies in following the process), process improvement actions, activity of a board to assess effectiveness of process, etc. An assessment of the 'aliveness' of the developer process and attention to making it work to produce better products. Dev quality organization How effective is the embedded quality organization? This measure includes consideration of size, breadth and depth of capability applied to this project, and level of authority granted to the quality organization applied to this project. Process Adherence How well is the development staff likely to adhere to the documented integration test development process? Should be based on knowledge of schedule and budget constraints, level of activity related to process effectiveness, and the quality of the organization enforcing the process. Turnover Experienced or historical rate of change of staff involved with integration test development. A higher number indicates little turnover, a lower number indicates a lot of turnover. Staff level Assessment of whether the quantity and distribution across domains of staff is sufficient for the problem space. An adequate staff level should receive a very high score (not average). The worst staff level with respect to work required ever seen by the evaluator would receive a score of 1. Resource Availability Measure of the degree to which the size of the integration test development staff is sufficient and stable in terms of longevity on the project. Staff Ability Overall ability level of the integration test development staff, in terms of size, development experience, domain experience and turnover with respect to the problem at hand. Process definition, product standards, quality criteria This is an overall assessment of the effectiveness of the development process related to development of integration test. The assessment should include methods for integration test procedure development and debug, effective use of the integration test equipment, effective methods for data acquisition, test execution, documentation, and methods for development of analyses for verification. It should correlate fairly well with CMM level, but includes an assessment of what is really happening in addition to what is documented. Dev tools Degree to which the developer uses tools in developing and analyzing integration testing. Tools here include requirements coverage support, traceability tools, interface coverage support, Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 62 CM Number: GSFC & NRC IVV-05-137 test case generation, execution control, etc. Process Rigor How rigorous can the process be expected to be? Is the process well founded (related to CMM level), is it supported by a good set of tools, does the development organization pay attention to and follow the process? Integration Test quality Overall relative measure of integration test quality, and hence, probable relative defect density. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 63 CM Number: GSFC & NRC IVV-05-137 Developer Integration Defect Removal Efficiency External Constraint Pressure Similar to the cost and schedule pressure on integration test procedure development. In some cases, the developer test procedure/checking process is more tightly constrained than test procedure development because of late development. Process effectiveness actions Same as process effectiveness actions input for integration test procedure development unless there is something unique about the group that reviews the products to find defects. Defect Removal Effort/Focus How much effort or focus does the developer apply to defect identification and removal during the integration test phase? Are external organizations involved to make sure integration test procedures are right, is there a rush to complete deliverables, is the defect removal process effective and adhered to? Simulation Validity Degree of verification of simulations used to test system requirements. The models used to verify requirements must, themselves, be verified. Test Case Automation Degree and effectiveness of use of automated test case generation and requirements coverage analysis tools. Analytical tool use Degree to which the development organization uses analytical tools to understand and verify design and implementation. These include tools such as target system control for breakpoints and data access or code navigators. This also includes tools to support understanding of interfaces (in particular sequence and timing of data exchange, interrupt timing and frequency, potential range and quantity of data, operational dynamics of interfacing hardware, and environmental requirements of interfacing hardware) to support the definition of effective test cases. Coverage of novel or complex areas. Tools effectiveness Overall level and effectiveness of tool use in the integration test procedure analysis process for removal of defects. Review quality Quality of the embedded review process. For example, do the developers do formal, detailed code reviews? Do they use entry and exit criteria, track and board issues, etc? Do they hold walkthroughs with broad scope support? Techniques Employed effectiveness Overall level and effectiveness of defect removal techniques used by the developer. Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 64 CM Number: GSFC & NRC IVV-05-137 IV&V Integration Defect Removal Efficiency Data timing Relative measure of timeliness of data delivery by the developer to IV&V. Late data delivery places time restrictions on IV&V and can impede IV&V. Data completeness Measure of completeness of data submitted to IV&V. For example, if important information is withheld due to proprietary concerns, that will impede IV&V. Another example is that integration test procedures are delivered with sections incomplete or many TBDs or test description does not have content expected for the point in lifecycle at which it was delivered. Data availability Likelihood that sufficient and timely input data will be available to IV&V when it is needed. IV&V environment Overall operating environment given to IV&V. Are artifacts delivered on time and do they contain what is needed for efficient evaluation, are issues considered fairly and in a timely manner? Direct dev access How much access does IV&V have to the developers? If IV&V has to work through several bureaucratic levels to get information from the developers or discuss issues or risks, that will decrease IV&V effectiveness. Project acceptance How well does the project accept the IV&V participation? A high score indicates the project participants exhibit a belief that use of IV&V will lead to higher mission success probability. Project management has issued directives exhibiting the right intent in dealing with the IV&V participation. Dev cooperation How cooperative are developers, in general, in responding to IV&V requests and suggestions? Do all IV&V issues get immediate attention, or does the development organization tend to ignore or avoid dealing with IV&V concerns? Developers exhibit attention to IV&V concerns and timely/meaningful response. While project acceptance is having the right intent, cooperation is doing the right thing. Project/IV&V interface How efficient is the interface between the developer and IV&V considering access, cooperation and acceptance. IV&V experience level Average experience level of IV&V staff. This is a measure of how much experience the IV&V staff has in the area of IV&V and integration testing activities. This does not consider domain experience level which is considered in another input. This considers experience in implementation language, operating system in use, processing platform and hardware interfaces. IV&V domain experience/expertise level Degree of experience of IV&V staff with the application domain. Consider all individual domains within the system (e.g., GN&C, power, C&DH, ECLSS, terrain mobility, thermal, etc). This item should evaluate the extent of applicable domain knowledge within the IV&V staff. IV&V Staffing level How appropriate is the size of the IV&V staff to the analysis tasks that need to be performed based on the CARA results? Too few (or too many) would lower the rating. An adequate staff level should receive a very high score (not average). Descoping the tasks from the CARA results to match a low staff level would get a low score. Schedule pressure How much schedule pressure does IV&V face. This factor could Return on Investment of Independent Verification and Validation Study Phase III Final Report October 14, 2005 DID Number: 06 65 CM Number: GSFC & NRC IVV-05-137 be correlated with data timing, but not necessarily. A low score indicates there was heavy schedule pressure. A high score indicates there was little schedule pressure. Resource availability Availability of all human resources needed to perform IV&V. A sufficient set of personnel is available to perform the IV&V activities including consideration of the schedule pressure from the project (e.g., very short turnaround of document review expected), size of staff and personnel turnover. IV&V Staff ability Overall ability of IV&V staff to perform the IV&V tasks defined by the CARA analysis. Test Automation Degree and effectiveness of use of coverage analysis tools. Understanding of developer tool use Degree to which the IV&V organization understands the tools to be used by the developer and is able to assess effectiveness of usage. Tool Use An assessment of effectiveness of all tools employed to support integration test procedure analysis including automation tools and understanding of developer tools. Analyses employed An assessment of the effectiveness of the types of analyses planned such as scenario analyses, simulation, or similarity to determine completeness and correctness of test procedures. Developer review participation Assessment of effectiveness of plans for participating in developer milestone reviews, inspections, walkthroughs etc. Assessment would include timing in which IV&V enters the project (SRR, SDR SSR, etc) and its impact on effectiveness. IV&V techniques employed effectiveness Overall level and effectiveness of defect removal techniques used by IV&V