Title Improving IV\&V Techniques Through the Analysis of Project Anomalies: Bayes networks- preliminary report Abstract The goal of the project is the creation of a process by which conclusions learned from one IV\&V project could be applied to another. In particular, we seek methods whereby an agent can say ``that's odd''; i.e. detect anomalies and propose repairs in active NASA projects. Given the current state of business knowledge and IV\&V project data recorded at the IV\&V facility, the methods proposed in the original plan (semantic web frame-based generalization and specialization over ontologies describing IV\&V business practices) are not supportable, Hence, this report describes an alternate direction. Instead of working ``top-down'' from descriptions of business knowledge which may never exist, or ``bottom-up'' from data thay may never be available, this project now focuses on ``middle-out'' and will try to combine the available data/models into a semantic whole. The currently available data on NASA business practices (that this author can access) are \bi \item The SILAP model/databases from the IV\&V planning and scoping team; \item The Bayes networks generated by James Dabney describing the IV\&V business practices of one of the IV\&V contractors (L3); \item The PITS issue tracking data; \item The LINKER database project that intends to join PITS to other data sources; \item and balanced score card strategy maps from NASA Langley. \item and the COCOMO data sets from JPL (note- these data sets are being explored by another project. This project will not use its funding to directly explore COCOMO. However, we intend to lever conclusions from that project into this project. \ei At SAS/06, a preliminary report described what had been learned from the SILAP data. This report presents a preliminary report on our use of the Dabney belief networks. It also offers some background notes on the entire problems. Subsequent reports will expand those preilminary reports into final conclusions. Those subsequent reports will focus on each other above data sources, one by one, as well as exploring how to combine the above data sources into one anomaly detector. Introduction: From Data to Information Worms and humans both see data. Worms spend their lives digging in mud while humans ride rockets to outer space. Why? Humans soar out of the mud use {\em model-based reasoning} to transform data into information. Model-based reasoning lets us generalize from the past, and form predictions about the future. Models let us look before we leap. For every bridge that is built, a hundred more are designed and rejected using a variety of models such as mathematical models showing a bridge's statics and dynamics, finite element simulations, or numerous legaslative models that try to maximize for safety/ ethestics/ etc. Models not only let us make decisions- they let us audit them as well. Returning to the bridge example, engineers use their models and the calculations they make from those models to justify their decisions. This project is about getting NASA's software development ``out of the mud''. Specifically, we seek model-based tools that allow data collected from active NASA development projects to be converted into the information that some project requires urgent management action. Experienced analysts at the NASA IV\&V facility already perform this task. Sadly, those decisions are rarely modeled-based. Based on an eight year association with that facility, I assert that IV\&V lacks the models that a crticial external auditor could use to certity or criticise the decisions made at this facility. In our ideal scenario, the IV\&V team has access to models connecting what is {\em observable} within IV\&V projects to high-level business {\em objectives}. The model can be used to raise alerts when newly arrived data shows that the business objectives for a project are under threat. Components The original vision of this project was ``anomaly detection''. However, on reflection, that vision must be extended. This project is incompelte without seven components. If this project levers prior and current NASA research projects, then the the following seven components are achievable in the time frame of this project. Data hooks Without {\em data hooks}, the anomaly detectors have no raw data to work from. This project will hook into the LINKER database, currently under development. The details of those hooks will be discussed in a report due Decemeber 31, 2006. Model Raw data, without some intepretation model, can't be intepreted (by definition). Leake advises that interpretation (which he calls {\em explanation}) is situation-specific construct that must be tuned to (1)~the audience; and (2)~the goals of the audience~\cite{leake91}. Hence, prior to the construcion of a model that can interpret data, some commitments must be made regarding the who will read the model, and why. One way to model the audience and their business goals is the {\em balanced score card} (BST) method of Kapland and Norton~\cote{kaplan96}. In BST, a model contains \bi \item High level business {\em objectives}; \item {\em Measures} that connect raw data to objectives; \item {\em Targets} values for the variables, as set by defined {\em initiatives}. \item \ei Without a {\em model}, the anoaly detectos have no way to interpret the consquence of an anomaly. Nor will the data hooks have anything to hook too. \item It is unlikely that any model we create will appeal to all users. Hence, not only do we need a model, but we also need to give our users {\em access to modeling tools} to let them create alternate models or modify existing ones. If we do not do so, then we run the risk of reporting anomalies that are irrelvant to certain users. \item Before users accept the anomaly reports from our models, the models must be {\em calibration}, lest the users dismiss our anomaly report with ``that model is not relevant to our domain}; \item Once the above is in place, we can operationalize a {\em anomaly detector}; \item The first thing the users will ask is ``what is the cause of the anomaly?''; i.e. anomaly detectors need {\em fault localization} methods. \item The second thing the users will ask is ``how do we fix the anomaly?''; i.e. anomaly detectors need {\em repair} modules; \item Finally, to be comprehensible, all the above must be {\em explainable} to the users. \ee The rest of this section discussed the practicality of our tool set. Data Hooks {\em Without, the anomaly detectors have no raw data to work from}. : The whole approach is blind. Recent work by Must demonstrate utility on multiple projects. Uncertainty Secondary goals are the This project seeks to apply model-based methods to IV&V that convertsa project monitor that can recognize anomalies in NASA software developments, and do so early enough for repair actions to be proposed to the projects. anomaly detection requires a data source a model (bayes) nodes N = Obs = base = data from the projects derived = inferences that can be made from base Obj = subset(Derived) business goals- what you can report to the CEO In Kaplan and Norton's BSC, XXX no all kown. uncertain reasoning is all there is . edges the spider-web inference procedures log prior build expectations delete anomalies (between old and current) offer repair actions explain anomalies and repairs background distributions poll all avilable sources calibrartion prior solutions bayes nets data collection (no intermediaries) runtime anomaly detection (that's odd..) diagnosis (what, exactly, is odd) repair (what to do) treatment learning same tool to do all three