Finding the Right Data for
Software Cost Modeling

Zhihao Chen,  Tim Menzies, Dan Port, 
Barry Boehm
Center for Software Engineering, University. of Southern California
 Computer Science, Portland State University
  Computer Science, University of Hawaii

zhihaoch@cse.usc.edu;  tim@timmenzies.net
dport@hawaii.edu; boehm@cse.usc.edu


ABSTRACT
Strange to say, when building a software cost model, sometimes it is
useful to ignore much of the available cost data.  

Introduction

Good software 
cost models can significantly help
the managers of software projects.
With  good models,
project stakeholders can make informed decisions
about 
how to manage resources; how to control and plan the
project; or how to deliver the project on time, on schedule and on
budget. 

Off-the-shelf ``untuned'' models have been up
to 600 inaccurate in their estimates .
Hence, the wise  manager uses a cost model built from  local data.  
But what data should we use to build a good cost model?
Real-world data sets, such as what comes from  software
engineering projects, often contain noisy or irrelevant or redundant
variables. 

Prior to getting the data, it is hard to know what parts of
the data are most important.  However, once a database is available,
automatic tools can be used to prune the data back to the most
important values. Therefore this paper proposes a change to current practice.
Often, cost models are built using all available data. Here, we propose
that after  data collection and before model building, cost
modelers should perform some data pruning experiments. As shown below, such pruning
experiments are simple and fast to perform.
 
Our process starts with a table of historical data divided into
columns and rows. Each column is a
different variable describing some aspect of a software project.
Each row shows data from different software sub-systems, so one
project can contribute many rows. For example, eg shows data
from 17 variables and rows from three  sub-systems from two projects.


The data in
such a table can be pruned by removing columns or removing rows:


In row pruning (also known as  stratification), rows from related projects are
collected together and different cost models are learned from these
different subsets.  

In column pruning (also known as feature subset selection), the columns are
sorted left-to-right according to their ``usefulness''; i.e. how well that
column's variable predicts for the target variable (in our case,
software development effort). 


Column pruning then  proceeds left to right across the sorted columns, each time
removing some less-useful left-hand-side columns.   
At each step of the pruning, a cost model is learned from the remaining columns. 

The benefits of row pruning have been reported
previously. For example, Shepperd and Schofeld 
report experiments with row pruning where estimator performance improved by up
to 28 (measured in terms of the ``PRED'' measure described later in this article),
However, and this is the point of this paper, we find that much larger improvements
result from
pruning both rows and columns. For example, the experiments
presented here include one data set called 
where estimator performance improved from
15 to 97.  

Further,
these large improvements were seen when most of the
columns are pruned away. For example, in our experiments, column pruning removed
65 of all columns (on average).
Surprisingly,
when building a software cost model, it is
usually  
useful to ignore over half of the available cost data.  

More
 importantly, row and column pruning leads to the largest
 improvements in estimator performance in the smallest training
 sets (less than 30 examples).  This is a result of tremendous
 practical significance.  Modern software practices  change so
 rapidly that most organizations can't access large databases of
 relevant project data.  Our results suggest that this is not
 necessarily a problem, provided models are learned via row and
 column pruning.


This rest of this paper describes experiments with our tool for
column pruning. This tool is a set of UNIX scripts
that use the WRAPPER variable subtraction algorithm from the
public-domain WEKA data mining toolkit .  
The tool is fully  automated, runs on a standard LINUX installation, and
is available from the authors.

Background

This paper applies column pruning methods to cost estimation.  These
pruning methods were evolved by the data mining community. While these
methods have been explored extensively , there
is very little prior work on column pruning and software cost
modeling. 

To the best of our knowledge, the only other work on column pruning
for cost estimation was a limited experiment by Kirsopp and
Shepperd .  Like this study, they found that column
pruning significantly improves effort estimation.  However, their
experimental base was much more restrictive than this study (they ran
only two data sets while we ran 15). Also, unlike our work, their
experiment is not reproducible. The Kirsopp  Shepperd data sets are
not public domain while all our COCOMO-I data sets can be downloaded
from the PROMISE repository .

An earlier draft of this paper appeared previously in a workshop
publication .  This paper extends that earlier draft in
two ways.  Firstly, this paper explores more data than before (double
the number of COCOMO-I project analyzed, two new COCOMO-II data sets).
Secondly, it includes an expanded discussion on the business
implications of column pruning.

Pruning: Why?

The case for pruning rows is quite simple. Software projects are not all the same. 
For example, real-time
safety critical systems are very different to batch financial processors. Given a database
of different kinds of software, it is just good sense to divide up the rows into different
project types and learn different cost models for each type. Then, in the future,
managers can use different cost models depending on what type of software they are developing.

The case for pruning columns is slightly more complex.  If a learned
cost model uses all the variables in the database then the only way to
use that cost model on future sub-systems is to collect information on
all those variables.  In many business situations, the cost of
reaching some goal is a function of how much data you have to collect
or monitor along the way.  But if the learned model uses only some
of the variables, then using that model in the future means collecting
less data.  This would be useful in several scenarios.  For example,
when monitoring an out-sourced project at a remote site, it is useful
to minimize the reporting requirements to just the variables that
matter the most.  Such a reporting structure reduces the overhead in
managing a contract.

From a technical perspective, there are also good reasons
to subtract variables:

[Under-sampling:] The number of possible influences on a project
is quite large and, usually, historical data sets on projects for a 
particular company are quite small. Hence, a variable that is theoretically useful
may be practically useless.
For example, dels shows how in na60 data set, nearly all those
those NASA projects were rated as having
 a high complexity (see dels).  


Therefore, this data set would not support
conclusions about the interaction of, say, extra high complex
projects with other variables. A learner would be wise to
subtract this variable (and a cost modeling analyst would be wise
to suggest to their NASA clients that they 
refine the local definition of ``complexity'').

[Reducing Variance:]
Miller offers an extensive survey of column pruning for linear models
and regression . That survey includes a very strong
argument for column pruning:
the variance of a linear model learned by
minimizing least squares error decreases as the number of  columns
in the model. That is, the fewer the columns, the more restrained are the
model predictions.

[Irrelevancy:] Sometimes, modelers are incorrect in their beliefs
about what variables effect some outcome. In this case, they might add
irrelevant variables to a database. Without column pruning, a cost
model learned from that database might contain these irrelevant
variables.  Anyone trying to use that model in the future would then
be forced into excessive data collection.

[Noise:] 
Learning a cost estimation model is easier when
the learner does not have to struggle with fitting the model
to confusing noisy data  (i.e. when the data  
contains spurious signals not associated with
variations to projects).  Noise can come from many sources such clerical errors
or missing data.
For example, organizations that only build word processors may have little
data on software projects with high reliability requirements.

[Correlated variables:]  
If multiple variables are tightly correlated,
then using all of them will diminish the likelihood that either
variable attains significance.  
A repeated result in data mining is
that pruning away some of the correlated variables increases the
effectiveness of the learned model (the reasons for this are subtle
and vary according to which particular learner is being
used ).


Pruning: How?

The column pruning method used in this study was is called the ``WRAPPER'' .
The WRAPPER selects
combinations of columns and asks some learner to build a cost model
using just those columns.  The WRAPPER then grows the selected columns
and checks if a better model comes from learning over the larger set
of columns.  

The WRAPPER stops when there are no more columns to
select, or there has been no significant improvement in the learned
model for the last five additions (in which case, those last five
additions are deleted). Technically speaking, this is a forward select
search with a ``stale'' parameter set to 5.

WRAPPER is thorough but, theoretically, it is quite slow since (in the
worst case), it has to explore all subsets of the available
columns. However, all the data sets in this study are quite small.
Our experiments only required around 20 minutes per data set.
                                                                   
We use the WRAPPER since other experiments by other researchers
strongly suggest that it is superior to many other column pruning
methods.  For example, Hall and Holmes  compare the
WRAPPER to several other column pruning methods including principal
component analysis (PCA- a widely used technique).  Column pruning
methods can be grouped according to: 

Whether or not they make special use of the
target attribute in the data set such as ``development cost'';

Whether
or not they use the target learner as part of their pruning.

PCA
is unique since it does not make
special use of the target attribute. WRAPPER is also unique, but for
different reasons: unlike other pruning methods, it does use the
target learner as part of its analysis. Hall and Holmes found that
PCA was one of the worst performing methods (perhaps because it
ignored the target attribute) while WRAPPER was the best (since it can
exploit its special knowledge of the target learner).


Cost Modeling with COCOMO

Before pruning data, we first need to understand how cost models might use that data.
This study uses COCOMO for our cost modeling.
COCOMO stands for
Constructive Cost Model .
COCOMO helps software developers
reason about the cost and schedule implications of their software
decisions such as software investment decisions; setting project budgets
and schedules; negotiating cost, schedule, and performance
tradeoffs; making software risk management decisions, and making
software improvement decisions. 
One advantage of COCOMO (and this is why we use it)
is that unlike other many other costing models such as SLIM or SEER,
COCOMO is an 
open model with numerous published data .


There are two versions of COCOMO: COCOMO-I and COCOMO-II.  In going
from the 1981 COCOMO-I model  to the 2000 COCOMO-II
model , one parameter, ``Turnaround Time'', was dropped
to reflect the almost-universal use of interactive software
development.  Another parameter, ``Modern Programming Practices'', was
dropped in favor of a more general ``Process Maturity'' parameter. But
several more parameters were added to reflect the
subsequently-experienced influences of such factors as ``Development
for Reuse'', ``Multisite Development'', ``Architecture and Risk
Resolution'', and ``Team Cohesion''. The COCOMO-II
book  also provides capabilities and guidelines for an
organization to add new parameters, reflecting their particular
situations.


COCOMO
measures effort in calendar months where one month is 152
hours (and includes development and management hours). The core
intuition behind COCOMO-based estimation is that as systems grow
in size, the effort required to create them grows exponentially. More specifically,
one shows the COCOMO I model :


Here,
 is one of 15 
effort multipliers such as cplx (complexity)
or pcap (programmer capability).
The COCOMO-I effort multipliers are shown defined in em and their
numeric values are shown in effortmults.
In COCOMO-II, the number of effort multipliers changed from 15 to 17.


In the COCOMO-I
model,  and  are domain-specific variables and KSLOC (thousands of lines
of non-commented source code)
is estimated
directly or computed from a function point analysis.  In COCOMO-II,
 was expanded to include scale factors:


where  is one of five scale factors that exponentially
influence effort. Examples of scale factors include pmat
(process maturity) or resl (attempts to resolve project risks).


A standard method for assessing COCOMO performance is
PRED(30). 
PRED(30) is calculated from 
the relative error, or RE, which is the relative
size of the difference between the actual and estimated value:


The mean magnitude of the relative error, or MMRE, is the average percentage
of the absolute values of the relative errors over an entire data sets.
PRED(N) reports the average percentage of estimates that were within
N of the actual values. If a data set has  rows, then:


For example, a PRED(30)=50 means that half the estimates are within
30 of the actual.  Note that, we report results in terms of PRED(N),
not MMRE. This is a pragmatic decision- we have found PRED(N) easier
to explain to business users than MMRE. Also, there are more PRED(N)
reports in the literature than MMRE. This is perhaps due to the
influence of the COCOMO researchers who reported their 1999 landmark
study using PRED(N) . Further, we report here PRED(30)
results since the major experiments of that 1999 study also used
PRED(30). The results for PRED(25) are similar with those for PRED(30)
except that the accuracies for PRED(30) is a little better than those
for PRED(25).

In order to use
the linear least squares regression, which is the most widely used
and the simplest modeling method, it is common to transform
COCOMO model into
linear model by taking the logs of one.


Note that, if twol is used, then before computing PRED(N), the
estimated effort has to be converted back from a logarithm.


Case Study Data
This study uses datasets in both COCOMO-I and COCOMO-II format:
 

The cii0 data set was used to build the  COCOMO-II model.

The cii4 data 
set is also in the COCOMO-II format and includes the 72 projects from cii0
developed after 1990, plus 47 new projects.

The COCOMO-II data is not published since it was
collected on condition of confidentiality with the companies supplying
the data. Further research must be conducted in terms of the same
conditions.
In contrast, several COCOMO-I data sets are available
in the PROMISE repository : 


Coci  comes from the  COCOMO-I text 
and includes  data from a variety of domains
including engineering, science, financial, etc.


na60 comes from 20 years of NASA projects and is recorded using
the COCOMO-I variables.

The na60  data  contains the following
subsets:

c01,c02,c03 store data from three different NASA geographical locations;
p02,p03,p04 stores data from three different NASA projects;
t02,t03 stores data from two tasks such as
ground data receiving and flight guidance.

For reasons of
confidentiality, the exact details of those centers, tasks, and
projects cannot be disclosed.  The other centers, projects, and tasks
from na60 were not included in cX, pX, or tX for a variety of pragmatic reasons
(e.g. suspicious repeated entries suggesting data entry errors, too
few examples for generalization, etc).  

Of these data sets, coci describes projects
from before 1982; 
cii4 contain  data from the most recent projects; and the NASA data
sets (na60, pX,cX,tX) described projects newer than coci
and before cii4.
Also, the COCOMO-I data sets ()
have the 15 columns of em 
 COCOMO-II data sets () have 24 columns.


For experimental purposes, we group the above as follows:

call combines all the center data; i.e. ;
tall combines all the task data; i.e. ;
pall combines all the project data; i.e. ;


Experimental Method

Having described COCOMO, the WRAPPER, and our data sets,
we can now describe how an analyst can use them all together
to find better cost models.

Column pruning using the WRAPPER
was discussed above.
Recall that the WRAPPER sorts
columns are
sorted left-to-right according how well
that column's variable predicts for the target variable (in our case,
software development effort).  Column pruning then proceeds left to
right across the sorted columns, each time removing some less-useful
left-hand-side columns.  At each step of the pruning, a cost model is
learned from the remaining columns. Column pruning stops when removing
more columns does not improve the best result seen so far.


To find the relative value of each column, we ran the WRAPPER ten
             times (each time using a randomly selected 90 of the
             rows).  The value of a column was then set to the number
             of times the WRAPPER selected that column.

Once the WRAPPER ordered the columns, we randomized the order of the
rows and starting column pruning.  To ensure statistical validity,
randomization (followed by column pruning) was repeated 30 times.  

For each repeat, at each stage of the pruning, the lowest value column
was removed.  The rows in the remaining columns were then divided into
training and test sets (each time using a randomly selected 67 of
the rows for the training set). A cost model was learned (using linear
regression) from the training set and then assessed, using PRED, on
the test set.

Once the 30 repeats were completed, a best model was selected by
looking at the mean and standard deviation of model performance at
each pruning step.  The best model was the one that t-tests confirmed
out-performed all the other states of the column pruning. The mean
value of that model over the 30 repeats was then reported.

The above process is fully automated using our own Unix scripts which
control a JAVA data mining library called the
WEKA . The WEKA comes complete with a linear regression
learner and an implementation of the WRAPPER. The whole system is
available from the authors.


Results


The results of our column pruning are shown in prunep.  The red
and green lines of show the number of columns in our data sets before
and after pruning. The data sets are on the x-axis.  The 
values are 22 for the COCOMO-II data sets ( and ) while
the  values for the COCOMO-I data sets are all 15.  The blue
line shows the percentage of the  columns removed by pruning.
For example, very little was pruned from  while most of the columns were
pruned from .  On average, over 65 of the columns were
pruned. Sometimes, the pruning was quite heavy with over 80 of the
columns pruned away.


A concern with such large scale pruning is that the resulting models would be
somehow sub-standard. This proved not to the case.
The PRED(30) results associated with the pruned data sets are shown
 in rowp. This figure shows  mean values in 30 experiments where
the learned model was tested on rows not seen during training. Note that pruning
always improved PRED.


The red lines on rowp show the mean PRED(30) seen in 30 trials using all the columns.
These are the baseline results
for learning  cost models before applying any pruning method.
The green lines show the best mean PRED(30) seen after automatic column pruning.
The difference between the red and the green lines is the improvement produced by pruning.

The data sets are sorted by pruning method into three plots.  Within
each plot, the data sets are sorted left-to-right in increasing value
of .  The x-axis shows the names of the
data sets used in these studies.  The blue lines show the number of
rows in each data set.  

The plots of rowp have three labels: ``pruning just columns'',
``pruning columns and some rows'', and ``pruning columns and many
rows''.  The data sets are sorted into three labels according to their
stratification.  The left-hand-side
results, labeled ``pruning just columns'', come
from the largest data sets that combing project information form many
sources (i.e. ). These data sets are not divided
up into data from similar sources. Hence, there is no row pruning used
on these data sets.  

The right-hand-side results, labeled ``pruning columns and many rows'',
come from eight data sets that have been heavily stratified into just
specific NASA centers (i.e. ); or specific NASA projects
(i.e. ); or specific NASA software tasks
(i.e. ).

The middle results, labeled ``pruning columns and some rows'', show data
sets that have been somewhat stratified. The data sets in this group
combine together data from either:


All the NASA centers (i.e. ); 

Or all the
NASA projects (i.e. ); 

Or all the NASA tasks (i.e. ).

This
middle group samples a point half-way between the unstratified data
sets on the left and the heavily stratified data sets on the right.

Three are three features of rowp:


Pruning always improved estimation effectiveness.  
That is,
in all our case studies, it was always useful to ignore a portion 
of the available data.

Column pruning by itself can offer some improvements to PRED. However,
column pruning combined with row pruning can result in dramatic
improvements of effort estimation.

With one exception (), the general trend across the three graphs
is clear: as data set size shrinks, the improvement 
increases. That is, pruning is most important when dealing with small data sets.


When Not to Prune

While column pruning is clearly useful, sometimes it cannot
or should not be applied.
Firstly, our variable subtraction methods require an historical
database of projects.  If there is no such database, then our
column pruning techniques won't work.

Secondly,
even if a historical database exists and our techniques suggest
pruning variable , then it may still be important to ignore that
advice.  If a cost model ignores certain effects that business
users believe are important, then those users may not trust that
model.  In that case, even if a variable has no noticeable impact on
predictions, it should be left in the model.  By leaving such
variables in a model we are acknowledging that, in many domains,
expert business users hold in their head more knowledge that what may
be available in historical databases.  Suppose that there is some
rarely occurring combination of factors which leads to a major
productivity improvement.  Even if there is little data on some
situation, it still should be included in the model.  For example,
even though some studies have shown that reduced-parameter function
point counting rules are equally good in most
situations , COCOMO II supports the full International
Function Point Users' Group (IFPUG) set of parameters due to their
wide usage and acceptance in the IFPUG community.

Thirdly, another reason not to prune variables is that you still
might need them.  For example, the experiments shown above often
subtract over half of the attributes in a COCOMO-I model while
(usually) improve effort estimation.
However, suppose that a business decision has to be made using some of
the pruned variables. The reduced model has no information on
those subtracted variables so a business user would have to resort to
other information for making their decisions.


Hence, we propose using column pruning with some care.  If there is no
historical data for learning specialized data sets, then managers
should use the general background knowledge within COCOMO.  The 1981
regression co-efficients of COCOMO-I or the updated co-efficients of
COCOMO-II  are the best general-purpose indicators we
can currently offer for cost estimation.  Management decisions can use
that public knowledge to make software process decisions.  For
example, according to the coefficients on the COCOMO-II pmat
variable, the increase in cost between a CMM3 and CMM4 project contain
 lines of code is . With this estimate in hand,
a business user could then make their own assessment about the cost of
increased software process maturity vs the benefits of that increase.

If historical data from the local site is available, then managers
could tune the general COCOMO background knowledge by 
adjusting the coefficients  within the COCOMO equations.
COCOMO-I and COCOMO-II contain with several local calibration
variables that can quickly tune a model to local project data. Our
experience has been that 10 to 20 projects are adequate to achieve
such tunings .

Local calibration is a simple tuning method that is supported by many
tools e.g. http://sunset.usc.edu/available_tools/index.html.
Currently, our toolkit methods requires more effort
(i.e. some UNIX scripting) than local calibration.  rowp
suggests that the extra effort may well be worthwhile, particularly
when building models from a handful of projects.  Also, we
have found that it is easier to extrapolated costs from old projects
to new projects with reduced variable sets .
Nevertheless, column pruning
is 
not appropriate when there are business reasons to use all available
variables (e.g. the three reasons  described above).

Conclusion

The specific goal of this paper was to encourage more column pruning
in cost modeling, particularly when dealing with very small data sets.
The improvements seen in rowp seem quite impressive. Row and
column and pruning, in combination, are very useful for cost modeling-
particularly when dealing with small data sets.  Hence, we propose a
change to current practice.  Cost models should not be  built using all
available data. Rather, after data collection and before
model building, cost modelers perform some data pruning experiments.

The more general goal of our work is to encourage repeatable,
refutable, and improvable experiments in software engineering. To that
end, as much as possible, we use public domain tools and public domain
data sets.  Hence, this paper uses a 
open source cost model (COCOMO) and, as much as possible,
publicly available data. 
All the COCOMO-I data sets used in this study
can be downloaded from the PROMISE
repository . We urge other
researchers to produce more results  based on open source models and data sets.

*Acknowledgments

The advice of the anonymous reviewers helped to clarify an earlier draft
of this paper. 
Helen Burgess offered invaluable editorial assistance. 


*Biographies

l1in
[width=1in]scott.pdf

Zhihao Chen  is a Research Assistant at the Center for Software
Engineering and a PhD candidate of Computer Science Department at the
University of Southern California. His research interests lie in
system  software engineering, models development and integration in
general. Particularly, he focus on software cost estimation, product
line investment models, process modeling and risk management for
software application development. He also investigate empirically
based Software Engineering - empirical methods and model
integration. His research supports the generation of an empirically
based software development process covering high level lifecycle
models to low level techniques, provide validated guidelines/knowledge
for selecting techniques and models and serves, and help people better
understand such issues as what variables affect cost, reliability, and
schedule, and integrating existing data and models from the
participants and all collaborators. Previously, he got his bachelor
and master of computer science from South China University of
Technology. He previously worked for HP, CA and EMC.
He can reach on zhihaoch@usc.edu.

l1in
[width=1in]timbeach.pdf

Dr. Tim Menzies is a  associate research professor
at Portland State University in the United States, and has been working with
NASA on software quality issues since 1998. He has a CS degree and a PhD
from the University of New South Wales. His recent research concerns modeling
and learning with a particular focus on light weight modeling methods.
His doctoral research aimed at improving the validation of, possibly inconsistent,
knowledge-based systems in the QMOD specification language. He
also has worked as a object-oriented consultant in industry and has authored
over 150 publications and served on numerous conference and workshop programs
and well as guest editor of journal special issues. He can be reached at
tim@timmenzies.net.

l1in
[width=1in]danport.pdf

Dr. Daniel Port, is an Assistant Professor of IT Management at
 University of Hawaii at Manoa. Prior to this, he was a Research
 Assistant Professor working with Barry Boehm at USC's Center for
 Software Engineering, where he now holds the title of Visiting
 Scholar. Dr. Port has been involved in software development process
 research, and in the development and assessment of innovative
 pedagogic techniques for software engineering education. His primary
 research activities lie in strategic and economic based software
 engineering.
His primary research activities lie in
 strategic and economic based software engineering. He has applied the
 strategic method to COTS assessment and COTS process selection, IVV,
 architecture flexibility, software dependability, and to IT security
 risk management with collaborators from NASA, JPL, and the Japanese
 Space Exploration Administration (JAXA). Dr. Port is the co-founder,
 with Dr. Rick Kazman, of the proposed new Center for Strategic
 Software Engineering at the University of Hawaii.  He can be reached at
dport@hawaii.edu.


l1in
[width=1in]boehm.pdf

Professor Barry Boehm received his B.A. degree from Harvard in 1957, and his
M.S. and Ph.D. degrees from UCLA in 1961 and 1964, all in
Mathematics. Between 1989 and 1992, he served within the
U.S. Department of Defense (DoD) as Director of the DARPA Information
Science and Technology Office, and as Director of the DDRE Software
and Computer Technology Office. He worked at TRW from 1973 to 1989,
culminating as Chief Scientist of the Defense Systems Group, and at
the Rand Corporation from 1959 to 1973, culminating as Head of the
Information Sciences Department. He was a Programmer-Analyst at
General Dynamics between 1955 and 1959. His current research interests
include software process modeling, software requirements engineering,
software architectures, software metrics and cost models, software
engineering environments, and knowledge-based software
engineering.  His honors and awards include
Guest Lecturer of the USSR Academy of Sciences (1970), the AIAA
Information Systems Award (1979), the J.D. Warnier Prize for
Excellence in Information Sciences (1984), the ISPA Freiman Award for
Parametric Analysis (1988), the NSIA Grace Murray Hopper Award (1989),
the Office of the Secretary of Defense Award for Excellence (1992),
the ASQC Lifetime Achievement Award (1994), and the ACM Distinguished
Research Award in Software Engineering (1997). He is an AIAA Fellow,
an ACM Fellow, an IEEE Fellow, and a member of the National Academy of
Engineering. He can be reached at boehm@sunset.usc.edu.