Title
	 Improving IV\&V Techniques Through the Analysis of Project Anomalies: 
	 Bayes networks-  preliminary report 
Abstract
	 The goal of the project is the
	 creation of a process by which conclusions learned from one IV\&V project
	 could be applied to another.  In particular, we seek methods whereby an
	 agent can say ``that's odd''; i.e. detect anomalies and propose repairs
	 in active NASA projects.
	
	 Given the current state
	 of business knowledge and IV\&V project data recorded at the
	 IV\&V facility,
	 the
	 methods proposed in the  original plan (semantic web 
	 frame-based generalization and specialization over ontologies describing
	 IV\&V business practices) are  not supportable, 
	 Hence, this report describes an alternate direction. Instead of working
	 ``top-down'' from descriptions of business knowledge which may never
	 exist, or ``bottom-up'' from data thay may never be available, this
	 project now focuses on ``middle-out'' and will try to combine the
	 available data/models into a semantic whole.
	 The currently available data on NASA
	 business practices (that this author can access) are
	 \bi
	 \item
	 The SILAP
	 model/databases from the IV\&V planning and scoping team; 
	 \item
	 The Bayes networks generated by James Dabney
	 describing the IV\&V business practices of one of the IV\&V contractors
	 (L3); 
	 \item
	 The PITS issue tracking data; 
	 \item
	 The LINKER database
	 project that intends to
	 join PITS to other data sources; 
	 \item
	 and 
	 balanced score card strategy maps from NASA Langley. 
	 \item
	 and the COCOMO data sets from JPL (note- these data sets are being
	 explored by another project. This project will not use its funding to
	 directly explore COCOMO. However, we intend to lever conclusions from
	 that project into this project.
	 \ei
	
	 At SAS/06, a preliminary report described what had been learned from the
	 SILAP data. This report presents a preliminary report on our use of the
	 Dabney belief networks. It also offers some background notes on the
	 entire problems.
	 Subsequent reports will expand those preilminary reports into final
	 conclusions. Those subsequent reports will focus on each other above
	 data sources, one by one, as well as exploring how to
	 combine the above data sources into one anomaly detector.

Introduction: From Data to Information
	 Worms and humans both see data.  Worms spend their lives digging in mud
	 while humans ride rockets to outer space. Why?
	
	 Humans soar out of the mud use {\em
	 model-based reasoning}  to transform data into information. 
	 Model-based reasoning lets us generalize from the past, and form
	 predictions about the future. Models let us look
	 before we leap. For every bridge that is built, a hundred more are
	 designed and rejected using a variety of models such as  mathematical
	 models
	 showing a bridge's statics and dynamics, finite element simulations, 
	 or 
	 numerous legaslative models that
	 try to maximize for safety/ ethestics/ etc. 
	 Models not only let us make decisions- they let us audit them as well.
	 Returning to the bridge example, engineers use their models and the
	 calculations they make from those models to justify their decisions.
	
	 This project is about getting NASA's software development
	 ``out of the mud''.  Specifically, we seek model-based tools that allow
	 data collected from active NASA
	 development projects to be converted into the information that some project requires urgent
	 management action. 
	 Experienced analysts at the NASA IV\&V facility already
	 perform  this task. Sadly, those decisions are rarely modeled-based.
	 Based on an eight year association with
	 that facility, I assert that IV\&V lacks the models that a crticial external
	 auditor could use to certity or criticise  the decisions made at this
	 facility.  
	
	 In our ideal scenario, the IV\&V team has access to models connecting
	 what is {\em observable} within IV\&V projects
	 to high-level business
	 {\em objectives}.  The model can be used to raise alerts when newly 
	 arrived data shows that the business objectives
	 for a project are under threat.
	

Components
	 The
	 original vision of this project was ``anomaly detection''. However, on
	 reflection, that vision must be extended.
	 This project is  incompelte without 
	 seven components.
	 If this project levers prior and current NASA
	 research projects, then the the following seven components are  achievable in
	 the time frame of this project.
	Data hooks
		 Without {\em data hooks}, the anomaly detectors have no raw data to work
		 from. This project will hook into the LINKER database, currently
		 under development. The details of those hooks will be discussed in a
		 report due Decemeber 31, 2006.
	Model
		 Raw data, without some intepretation model, can't be intepreted (by
		 definition). Leake advises that interpretation (which he calls {\em
		 explanation}) is situation-specific construct that must be tuned to
		 (1)~the audience; and (2)~the goals of the audience~\cite{leake91}.
		 Hence, prior to the construcion of a model that can interpret data,
		 some commitments must be made regarding the who will read the model,
		 and why.
		
		
		 One way to model the audience and their business goals is the
		 {\em balanced score card} (BST) method of Kapland and
		 Norton~\cote{kaplan96}.   In BST, a model contains 
		 \bi
		 \item High level business {\em objectives};
		 \item {\em Measures} that connect raw data to objectives;
		 \item {\em Targets} values for the variables, as set by defined {\em
		 initiatives}.
		 \item
		 \ei
		 Without a {\em model}, the anoaly detectos have no way to interpret the
		 consquence of an anomaly. Nor will the data hooks have anything to hook too. 

		 \item
		 It
		 is
		 unlikely that any model we create will appeal to all users. Hence, not
		 only do we need a model, but we also need to give our users {\em access
		 to modeling tools} to let them create alternate models or modify
		 existing ones.
		 If we do not do so, then we run the risk of reporting anomalies that are
		 irrelvant to certain users.
		 \item
		 Before users accept the anomaly reports from our models, the models must
		 be {\em calibration}, lest the users dismiss our anomaly report with
		 ``that model is not relevant to our domain};
		 \item
		 Once the above is in place, we can operationalize a {\em anomaly
		 detector};
		 \item
		 The first thing the users will ask is ``what is the cause of the
		 anomaly?''; i.e. anomaly detectors need {\em fault localization} methods.
		 \item
		 The second thing the users will ask is ``how do we fix the anomaly?'';
		 i.e. anomaly detectors need {\em repair} modules;
		 \item
		 Finally, to be comprehensible, all the above must be {\em explainable}
		 to the users.
		\ee
		
	 The rest of this section discussed the practicality of our tool set. 
	 	Data Hooks
	 {\em Without, the anomaly detectors have no raw data to work
	 from}.
 	:
	 The 
	 whole approach is blind. Recent work by 
	 Must demonstrate utility on multiple projects.
	
	 Uncertainty
	
	 Secondary goals are the 
	 This project seeks to apply model-based methods to IV&V that convertsa
	 project monitor that can recognize anomalies in NASA software
	 developments, and do so early enough for repair actions to be proposed
	 to the projects.  
anomaly detection
	requires
		a data source
		a model (bayes)
			nodes
				N = <Obs>
				Obs = <Base,Derived>
					base = data from the projects
					derived = inferences that can be made from base
				Obj = subset(Derived)
					business goals- what you can report to the CEO
				In Kaplan and Norton's BSC, 	
			
			XXX no all kown. uncertain reasoning is all there is .
			edges
				the spider-web
			inference procedures
				log prior
				build expectations
				delete anomalies (between old and current)
				offer repair actions
				explain anomalies and repairs

background distributions
	poll all avilable sources
	calibrartion

prior solutions
	bayes nets
	
	data collection (no intermediaries)

runtime
	anomaly detection (that's odd..)
	diagnosis (what, exactly, is odd)
	repair (what to do)
	treatment learning
		same tool to do all three