%\documentclass[times, 11pt]{article}\renewcommand{\baselinestretch}{1.5}\textheight 23.25cm\textwidth 18cm\oddsidemargin -0.68cm
%\documentclass{sig-alternate}

\documentclass[namedreferences]{../../tex/kluwer/kluwer}    % Specifies the document style.

%\newdisplay{guess}{Conjecture}


%\documentclass[times, twocolumn,10pt]{article}
\usepackage{times,alltt,url,graphicx,amssymb}
\usepackage{../../tex/bar}%,../../tex/cite}
%\usepackage{../../tex/lineno}\runninglinenumbers\modulolinenumbers[5]
%\usepackage{bar}


\newcommand{\bb}{\blacksquare\!\!}
\newcommand{\fig}[1]{Figure~\ref{fig:#1}}
\newcommand{\ARROW}{{\sffamily $\Rightarrow$}}
\newcommand{\ATTRIBUTE}{{\sffamily $attribute$}}
\newcommand{\BANDS}{{\sffamily $bands$}}
\newcommand{\BASELINE}{{\sffamily $baseline$}}
\newcommand{\BEST}{{\sffamily $best$}}
\newcommand{\CANDIDATES}{{\sffamily $candidates$}}
\newcommand{\CC}{{\sffamily $C$}}
\newcommand{\DEEONE}{{\sffamily $D_1$}}
\newcommand{\DEETWO}{{\sffamily $treated$}}
\newcommand{\DEE}{{\sffamily $D$}}
\newcommand{\DISTRIBUTION}{{\sffamily $result$}}
\newcommand{\EN}{{\sffamily $N$}}
\newcommand{\INPUT}{{\underline {\sffamily input}}}
\newcommand{\ITEMS}{{\sffamily $items$}}
\newcommand{\LHS}{{\sffamily $lhs$}}
\newcommand{\NOE}{$\not= \emptyset$}
\newcommand{\PROMISING}{{\sffamily $promising$}}
\newcommand{\RANGES}{{\sffamily $ranges$}}
\newcommand{\RHS}{{\sffamily $rhs$}}
\newcommand{\RR}{{\sffamily $R$}}
\newcommand{\SKEW}{{\sffamily $skew$}}
\newcommand{\TEMP}{{\sffamily $temp$}}
\newcommand{\TREATED}{{\sffamily $treated$}}

\newcommand{\AND}{{\underline {\sffamily and}}}
\newcommand{\DONE}{{{\sffamily \}}}}
\newcommand{\DO}{{{\sffamily \{}}}
\newcommand{\ELSE}{{\underline {\sffamily else}}}
\newcommand{\FI}{{\underline {\sffamily fi}}}
\newcommand{\FOR}{{\underline {\sffamily for}}}
\newcommand{\GE}{{$\ge$}}
\newcommand{\IF}{{\underline {\sffamily if}}}
\newcommand{\IN}{{\underline {\sffamily in}}}
\newcommand{\IS}{{$\leftarrow$}}
\newcommand{\NIL}{ {\sffamily $\emptyset$}}
\newcommand{\OUTPUT}{{\underline {\sffamily output}}}
\newcommand{\PLUS}{{$\wedge$}}
\newcommand{\REPEAT}{{\underline {\sffamily repeat}}}
\newcommand{\RETURN}{{\underline {\sffamily return}}}
\newcommand{\SOME}{{$\subseteq$}}
\newcommand{\THEN}{{\underline {\sffamily then}}}
\newcommand{\TO}{{\underline {\sffamily to}}}
\newcommand{\UNTIL}{{\underline {\sffamily until}}}
\newcommand{\WHEN}{{\underline {\sffamily when}}}
\newcommand{\WHERE}{{\underline {\sffamily where}}}

\newenvironment{smallitem}{
   \setlength{\topsep}{0pt}
   \setlength{\partopsep}{0pt}
   \setlength{\parskip}{0pt}
   \begin{itemize}
   \setlength{\leftmargin}{.2in}
   \setlength{\parsep}{0pt}
   \setlength{\parskip}{0pt}
   \setlength{\itemsep}{0pt}}{\end{itemize}}

   %! some shorthands
   \newcommand{\bi}{\begin{itemize}}
   \newcommand{\ei}{\end{itemize}}

\begin{document}
\begin{article}
\begin{opening}

%\conferenceinfo{KDD}{'02 Edmonton, Alberta, Canada}

%! general comment- you wrote the paper you want to write, not
%! the paper that folks attracted to this particular workshop would
%! want to read. most of my changes are just "throw lots of references
%! to requirements engineering" to the front of the paper

%!your title was too long and not linked into this particular venue
\title{Just Enough Learning (of Association Rules):\newline
The TAR2 ``Treatment'' Learner}
%\titlenote{KDD
%paper ID: 440}\titlenote{Submitted to the KDD-2002, The Eighth ACM SIGKDD
%International Conference on Knowledge Discovery and Data
%Mining, July 23 - 26, 2002, Edmonton, Alberta, Canada
%\url{http://www.acm.org/sigkdd/kdd2002/}. Wp ref
%01/tar2/tar2} }
%! you did the experimental work and wrote draft one and wrote the
%! fast tar2 that makes this all possible. you be first author!
%
%  thanks! but I just can't write this beautiful paper, it is yours.
%\numberofauthors{2}

\newcommand{\affaddr}[1]{#1}
%\newcommand{\email}[1]{{\small {\url #1}}}

\author{
 Tim \surname{Menzies}\thanks{
       Lane Department of Computer Science,
       University of West Virginia,
       PO Box 6109 Morgantown,
        WV 26506-6109, USA,
      \email{tim@menzies.com}},
 Ying \surname{Hu}\thanks{
        Dept. Electical \& Computer Engineering,
       University of British Columbia,
       2356 Main Mall; Vancouver,
        B.C. Canada V6T1Z4,
       \email{yingh@ece.ubc.ca}}}

  \runningauthor{Tim Menzies, Ying Hu} \runningtitle{Just Enough
Learning}
\date{Nov 22 2002}
%\date{Aug 12,2001}


% Use \section* instead of \section to suppress numbering for
% the abstract, acknowledgements, and references.

%!there is an abstract envrionment

\begin{motto}[prose]
"Don't tell me where I am, tell me where to go."
\rightline{- a (very busy) user}
\end{motto}
\begin{abstract}
An over-zealous machine learner can automatically generate large,
intricate, theories which can be hard to understand. However, such
intricate learning  is not necessary  in domains that lack complex
relationships. A much simpler learner can suffice in domains with
{\em narrow funnels}; i.e. where most domain variables are controlled
by a very small subset.

Such a learner is TAR2: a weighted-class minimal
contrast-set association rule learner that utilizes
confidence-based pruning, but not  support-based pruning.
TAR2 learns {\em treatments};  i.e. constraints that can
change an agent's environment. Treatments take two forms.
{\em Controller treatments} hold the {\em smallest number}
of conjunctions that {\em most improve} the current state
of the system. {\em Monitor treatments} hold the {\em
smallest number} of conjunctions that  best detect  future
faulty system behavior.  Such treatments tell an agent what
to do (apply the controller) and what to watch for (the
monitor conditions) within the current environment.

Because TAR2 generates very small theories, our experience
has been that users prefer its tiny treatments. The success
of such a  simple learner suggests that many domains lack
complex relationships.
\end{abstract}
\keywords{Association rules, treatment learning, contrast sets,
%ML-lite.
}


\end{opening}

%-------------------------------
% Figure of KC-1 NASA project
%-------------------------------


\section{Introduction}

%! in journalism, there is a problem of "burying the lead"- i.e. the key idea
%! does not appear till late in the text. much of "editing" is hunting for the
%! "lead" and just moving it to the front.
Machine learners generate theories. People read theories. What kind
of learners generate the kind of theories that people want to read?

If the reader is a busy person, then  they might  not need, or be
able to use,  complex
 theories. Rather, such a busy person might instead just want
 to know the {\em least}
 they need to do to achieve the {\em most} benefits.
It therefore follows that machine
 learning for busy people should not strive for (e.g.)
 elaborate theories or
 (e.g.)
 increasing the expressive power of the language of the theory.
 Rather, a better goal might be to find the
 smallest theory with the most impact.

\begin{figure*}
{\em \fig{tree}.A: A learnt decision tree. Classes (right-hand-side),
top-to-bottom, are
 ``high'',
``medhigh'', ``medlow'', and ``low'' This indicates median value of
owner-occupied homes in \$1000's.}
\begin{center}
\includegraphics[height=5in,width=5in]{tree.eps}
\end{center}

{\em \fig{tree}.B: Treatments learnt in the same domain:}
{\scriptsize
\begin{center}
\begin{tabular}{cccc}
\multicolumn{1}{c}{}&  & {\tt controllerH} & {\tt
monitorH}\\
\multicolumn{1}{c}{}& baseline & (i.e. best action) & (i.e. diaster
if..)\\\hline
Treatment & nothing & $6.7         \leq RM       < 9.8$   & $0.6          \leq NOX   <  1.9$ \\
                &   & $\wedge 12.6 \leq PTRATION < 15.9 $ & $\wedge 17.16 \leq LSTAT < 39.0$ \\\hline
&&&\\
 Results& ~~~~\begin{barenv} \setstyle{\sffamily} \setwidth{8}
 \setstretch{0.4}\sethspace{0.5} \setyaxis{0}{100}{25}
  \bar{21}{1}
 \bar{21}{2}
 \bar{29}{4}
 \bar{29}{8}
\end{barenv}
& ~\begin{barenv} \setstyle{\sffamily} \setwidth{8}
 \setstretch{0.4}\sethspace{0.5} \setyaxis{0}{100}{25}
   \bar{0}{1}
 \bar{0}{2}
 \bar{3}{4}
 \bar{97}{8}
\end{barenv}
& ~\begin{barenv} \setstyle{\sffamily} \setwidth{8}
 \setstretch{0.4}\sethspace{0.5} \setyaxis{0}{100}{25}
   \bar{98}{1}
 \bar{1}{2}
 \bar{1}{4}
 \bar{0}{8}
\end{barenv}\\\hline
Sample size &506& 38 & 81
\end{tabular}
\end{center}}
\caption{Comparing decision trees learnt by C4.5 and treatments
learnt by the TAR2 treatment learner. Theories learnt from the 506
cases in HOUSING example set from the UC Irvine repository. This
dataset has the class distribution shown in the bottom table,
left-hand-side. KEY: LSTAT= lower status of the population; NOX=
nitric oxides concentration (parts per 10 million); PTRATIO=
pupil-teacher ratio by town; RM= average number of rooms per
dwelling; KEY:~\protect\legend1{low}; \protect\legend2{medlow};
\protect\legend4{medhigh}; \protect\legend8{high}. }\label{fig:tree}
\end{figure*}

For example, \fig{tree} contrasts two theories learnt from the same
data set using the TAR2 system discussed here and the C4.5 decision
tree learner~\cite{quinlan92}. C4.5 has learnt a elaborate
description of different kinds of houses in Boston.  In contrast,
TAR2 has learnt a minimal description of the {\em differences}
between house types. These differences are described in terms of the
{\em treatments}; i.e. a constraint on controllable variable that
changes the class distribution. The {\em controller/monitor}
treatments are
 TAR2's comments on what might most
  improve/degrade the current situation.
  TAR2's treatments for Boston housing are shown
  in the bottom table of \fig{tree}. In summary,
  if we focus on houses with seven to nine rooms
  in suburbs with parent/teacher ratios of 12.6
  to 15.9, then the percentage of high quality
  houses we will see will change from 29\% to
  97\%.


Note that TAR2's theory is far smaller than C4.5's theory. C4.5's
theory contains more details but TAR2's theory gives succinct advice
on how to change the current situation. Our experience with business
users is that they prefer find TAR2's simpler theories. As one user
put it ``decision trees just tell you where you are, treatments tell
you what to do''.


TAR2's succinct theories will miss the complex interrelations  that
C4.5 might find. We have reasons to believe that many domains lack
complex relationship. Many domains exhibit a curious {\em narrow
funnel effect}; i.e. a small number of critical variables control the
remaining variables within a system (the metaphor being that  all
processing runs down the same narrow funnel~\cite{me99a}). The
concept of  narrow funnels has been reported in many domains under a
variety of names including: \bi \item {\em Master-variables} in
scheduling~\cite{crawford94}; \item {\em Prime-implicants} in
model-based diagnosis~\cite{rymon94an} or machine
learning~\cite{rymon93setree}, or fault-tree analysis~\cite{lutz99}.
\item {\em Backbones} in
satisfiability~\cite{parkes99lifted,singer00backbone}; \item {\em the
dominance filtering} used in  Pareto optimization of
designs~\cite{joseph98}; \item {\em Minimal environments} in the
ATMS~\cite{deKleer86}; \item The {\em base controversial assumptions}
of HT4~\cite{me97a}. \item The small {\em feature subset selection}
effect~\cite{kohavi97} and the related {\em 1R}
effect~\cite{holte93}.\ei Whatever the name, the core intuition in
all these terms is the same: what happens in the total space of a
system can be controlled by a small critical region.

We have argued previously that narrow funnels are very
common~\cite{me99q,me00b,me00d,me00t}. For systems with narrow
funnels, the space of options within a large space reduces to just
the range of a few variables within the narrow funnel.  In such a
reduced space, variables assignments outside the funnel are highly
correlated to assignments within the funnel. Machine learning in such
domains is very simple: an adequate theory need only comment on
assignments to the variables that are highly correlated to funnel
assignments.

This paper uses the TAR2 system to check the merits of assuming
narrow funnels for the purposes of machine learning. TAR2's
distinguishing feature is that it performs very well, yet it is seems
overly simplistic. The algorithm outputs only two rules: the best
smallest controller and the best smallest monitor. If domains contain
complex relationships, then these two small associations will be
useless. The algorithm's runtimes are exponential on the size of the
treatments. Hence, the algorithm makes the following ``small
treatment assumption''; i.e. {\em adequate treatments can be built
from small treatments}. If the small treatment assumption fails, then
TAR2's exponential runtimes will make it  impractically slow. Also,
the algorithm relies on a {\em confidence1} measure which prunes the
space of possible associations. The confidence1 measure we describe
below has no special merit: it was merely the first one we could
think of. Further, our initial implementation worked without
algorithmic or memory management optimizations. Our only explanation
for the surprising success of this simplistic implementation is that
the small treatment assumption holds for the domains we studied.

The rest of this article discusses TAR2. After an introductory
example and a discussion of related work, the TAR2 algorithm is
presented. This is followed by examples and evaluations and an
analysis of the general applicability of our approach.

\section{Related Work}
If domains lack complex relationships, then it is possible
that adequate theories can be
learnt from small subsets of the available attributes.
Various researchers have reported that this is indeed the case.
For example Holte   wrote a machine learner that was
deliberately restricted to
learning theories using a single attribute. Surprisingly, Holte found
that learners that use many attributes such as C4.5 perform
only moderately
better this 1R algorithm~\cite{holte93}.  Nevertheless,
TAR2 does not
use the 1R technique since  our results  show that best
treatments may require more than one attribute.

In other work, Kohavi
and  John {\em wrapped} their learners in a pre-processor that used a
heuristic search to grow subsets from size 1. At each step in the
growth, a learner was called to find the accuracy of the theory
learned from the current subset. Subset grow was stopped when the
addition of new attributes did not improve the accuracy.  As
shown in \fig{reduce}, spectacular
reductions in the number of the attributes can be achieved,
with only minimal lose of accuracy~\cite{kohavi97}.
\begin{figure}
\begin{center}
{\scriptsize \begin{tabular}{r@{~}ccccccccccc}\\
&\multicolumn{11}{c}{number of attributes}\\\cline{2-12}
before:& 10&13&15&180&22&8&25&36&6&6&6\\
 after:& 2&2&2&11&2&1&3&12&1&1&2\\
reduction:& 80\%& 84\%& 87\%&94\%&90\%& 87\%&
88\%&67\%&
83\%&83\%&67\%\\\hline
$\Delta$accuracy: & 0\%&6\%&5\%&4\%&2\%&1\%&0.5\%&0\%&-25\%  &6\%
&7\%
\end{tabular}}
\end{center}
\caption{Feature subset selection using wrappers, hill-climbing,
and ID3 (i.e. C4.5 with pruning disabled). The $\Delta$Accuracy
figure is the difference in the accuracies of the theories
found by ID3 the $before$ and $after$ attributes.
From \protect\cite{kohavi97}.}\label{fig:reduce}
\end{figure}
Nevertheless, TAR2 does not use this technique since relevant
feature selection with wrappers can be prohibitively slow since
  {\em each step} of the heuristic search requires
a call to the learning algorithm.

If TAR2  isn't  1R or wrappers, what is it?
This rest of this section expands on the following definition.
TAR2 is a {\em
weighted-class minimal contrast-set} association rule learner that
uses {\em confidence measures} but not {\em support-based pruning}.

TAR2 learns treatments and the general form of a treatment  is:

 {\footnotesize
\begin{eqnarray}\nonumber R_1&if&
Attr_1=range_1 \wedge Attr_2=range_2  \wedge ...\\\nonumber
&then& good=more \wedge bad=less \\\nonumber R_2&if&
Attr_1=range_1 \wedge Attr_2=range_2  \wedge ...\\\nonumber
&then& good=less \wedge bad=more
\end{eqnarray}}
\noindent where $R_1$ is the controller rule; $R_2$ is the
monitor rule; {\em good} and {\em bad} are sets of classes
that the agent likes and dislikes respectively; and {\em
more} and {\em less} are the frequency of these classes,
compared against the current situation, which we call the
{\em baseline}. The nature of these output rules
distinguishes TAR2 from many other learning strategies.


{\bf Association rule learning:} Classifiers like C4.5 and
CART learn rules with a single attribute pair on the
right-hand side; e.g. {\em class= goodHouse}. Association
rule learners like APRIORI~\cite{agrawal94} and TAR2
generate rules containing multiple attribute pairs on both
the left-hand-side and the right-hand-side of the rules.
That is, classifiers have a small number of pre-defined
targets (the classes) while, for association rule learners,
the target is less constrained.

General association rule learners like APRIORI input a set
of $D$ transactions of items $I$ and  return associations
between items of the form $LHS \Rightarrow RHS$ where $LHS
\subset I$ and $RHS \subset I$ and $LHS \cap RHS =
\emptyset$. A common restriction with   classifiers is that
they assume the entire example set can fit into RAM.
Learners like APRIORI   are designed for data sets that
need not reside in main memory. For example, Agrawal and
Srikant report experiments with association rule learning
using very large data sets with 10,000,000 examples and
size 843MB~\cite{agrawal94}. However, just like
Webb~\cite{webb00}, TAR2 makes the ``memory-is-cheap
assumption''; i.e. TAR2 loads all it's examples into RAM.

Specialized association rule learners like CBA~\cite{liu98}
and  TAR2 impose restrictions on the   right-hand-side.
 For example, TAR2's
right-hand-sides show a prediction of the  {\em change} in
the class distribution if the constraint in the
left-hand-side were applied. The CBA learner finds {\em
class association rules}; i.e. association rules where the
conclusion is restricted to one classification class
attribute. That is, CBA acts like a classifier, but can
process larger datasets that (e.g.) C4.5. TAR2 restricts
the right-hand-side attributes to just those containing
criteria assessment.

{\bf Weighted-learning:}   Standard classifier algorithms
such as C4.5~\cite{quinlan92} or CART~\cite{breiman84} have
no concept of class {\em weighting}. That is, these systems
have  no notion of a {\em good} or {\em bad} class. Such
learners therefore can't filter their learnt theories to
emphasize the location of the {\em good} classes or {\em
bad} classes. Association rule learners such as
MINWAL~\cite{cai98}, TARZAN~\cite{me00e} and TAR2  explore
  {\em weighted   learning} in which
     some items are given a higher priority weighting that
     others.
     Such
      weights can focus the learning  onto issues that are of
     particular interest
     to some audience.
For example   TARZAN~\cite{me00e} swung through the
decision trees generated by  C4.5~\cite{quinlan92} and
10-way cross-validation. TARZAN returned the smallest
treatments that occurred  in most of the ensemble that {\em
increased} the percentage of branches leading to some
preferred highly weighted  classes and {\em decreased} the
percentage of branches leading to lower weighted  class.
TAR2 was as experiment with applying TARZAN's tree pruning
strategies directly to the C4.5 example sets. The resulting
system is simpler, fast to execute, and does not require
calling a learner such as C4.5 as a sub-routine.

{\bf Contrast sets:}   Instead of finding rules that
describe the current situation, association rule learners
like  STUCCO~\cite{bay99} finds rules that differ
meaningfully in their distribution across groups. For
example, in STUCCO, an analyst could ask "what are the
differences between people with Ph.D. and bachelor
degrees?". TAR2's variant on the STUCCO strategy is to
combine contrast sets with weighted classes with
minimality. That is, TAR2 treatments can be viewed as the
smallest possible contrast sets that distinguish situations
with numerous  highly-weighted classes from situations that
contain more lowly-weighted classes.

{\bf Support-based pruning:} In the terminology of APRIORI,
an association rule has {\em support} $s$ if $s\%$ of the
$D$ contains $X \wedge Y$; i.e. $s=\frac{|X \wedge Y|
}{|D|}$ (where $|X \wedge Y|$ denotes the number of
examples containing both $X$ and $Y$). The {\em confidence}
$c$ of an association rule is the percent of transactions
containing $X$ which also contain $Y$; i.e. $c=\frac{|X
\wedge Y|}{|X|}$.

Many association rule learners use {\em support-based
pruning} i.e. when searching for rules with high
confidence, sets of items $I_i,... I_k$ are only be
examined only if all its subsets are above some minimum
support value.   Support-based pruning is impossible in
weighted
  association rule learning since  with
  weighted items, it is not always true that subsets of
  {\em interesting} items (i.e. where
  the weights are high) are also
  interesting~\cite{cai98}.
Another reason to reject support-based pruning is that it
can force the learner to only miss
  features that apply to a small, but interesting subset
  of the examples~\cite{wang00}.

{\bf Confidence-based pruning:} Without support-based
pruning, association rule learners rely on confidence-based
pruning to reject all rules that fall below a minimal
threshold of adequate confidence. TAR2 uses {\em
confidence1} pruning.

\section{Confidence1 Pruning}

TAR2 targets the attribute ranges that ``nudge'' a system
away from undesired behavior and towards desired behavior.
TAR2's score for each range is the   {\em confidence1}
measure. This value
 is high if a range occurs  frequently in desired  situations
and infrequently in undesired situations. That is, if we
were to impose this range as a constraint, then it would
tend to "nudge" the system into better behavior.

To find confidence1, we assume that we can access
$\$class$; i.e. some numeric value assigned to $class$. The
class with the highest value is the $best$ class.  The
$lesser$ classes are the set of all classes, less the
$best$ class. Let $O[C,A.R]$   be the number of occurrences
of some attribute range   in some
 class   $C$; i.e.
\[O[C]_{A.R} = |A.R \wedge class=C \wedge D|\]
To generate confidence1, we compare the relative
frequencies of an attribute range in different classes.
This comparison is weighted by the difference in the scores
of the classes, and normalized by the total frequency count
of the attribute range; i.e.

 \[ \frac{\sum_{C\in
lesser}\left((\$best-\$C)*(O[best]_{A.R}-O[C]_{A.R})\right)}{|A.R
\wedge D|}
\]
\begin{figure}[!t]
{\small {\em
\begin{center}
\begin{tabular}{rccccc}
\multicolumn{4}{c}{Items} & & Criteria
\\\cline{1-4}\cline{6-6}
outlook & temp($^o$F) &  humidity & windy? & & class
\\\cline{1-4}\cline{6-6}
sunny&  85&     86&     false& & none\\
sunny&     80&     90&     true&&   none\\
sunny&     72&     95&     false&&  none\\
rain&  65&     70&     true&      &   none\\
rain&        71&     96&     true& &
none\\\cline{1-4}\cline{6-6}
      rain&         70&     96&     false&&      some\\
      rain&      68&     80&     false&   &   some\\
      rain&      75&     80&     false& & some\\\cline{1-4}\cline{6-6}
      sunny&    69&     70&     false&  &lots \\
      sunny&    75&     70&     true&   &lots\\
      overcast&  83&     88&     false& & lots\\
      overcast&  64&     65&     true&  & lots\\
      overcast&  72&     90&     true&  & lots\\
     overcast&  81&     75&     false&  &lots\\\cline{1-4}\cline{6-6}
\end{tabular}\end{center}}}
\caption{A log of some golf-playing behavior.}
\label{fig:golfdata}
\end{figure}
For example, from the  golf playing example of
\fig{golfdata}, let us assume that the classes have been
scored as follows: "lots"=8, "some"=4, "none"=2; i.e.
"lots" is the $best$ class. The range {\em
outlook=overcast} appears four, zero, and zero times when
playing "lots", "some", and "none" golf (respectively). The
confidence1 of {\em outlook=overcast} is therefore:

\[\frac{((8-2)*(4-0))+((8-4)*(4-0))}{4+0+0}=10
\]
\fig{dels} shows the range of confidence1 seen in
\fig{golfdata}. The confidence1 ranges shown in black are
outstandingly high; i.e. these are the values may generate
the   best control treatments. TAR2 forms its treatments by
exploring subsets of the ranges with outstandingly high
confidence1 values.
\begin{figure}
\begin{center}
{\footnotesize
\begin{tabular}{c}
\begin{barenv}
\setstyle{\sffamily} \setwidth{8} \setnumberpos{none}
 \setstretch{8}\sethspace{0.5} \setyaxis{0}{3}{1}
\setxaxis{-5}{10}{1}
 \bar{0}{8} % -5
 \bar{1}{1} %-4
 \bar{0}{8} % -3
 \bar{0}{8} % -2
 \bar{1}{1}  % -1
  \bar{3}{1} % 0
  \bar{3}{1} % 1
\bar{2}{1} % 2
 \bar{0}{8} %3
\bar{0}{8} %4
\bar{1}{8} %5
\bar{1}{8} %6
\bar{0}{8} %7
\bar{0}{8} %8
\bar{0}{8} %9
\bar{1}{8} %10
\end{barenv}
\end{tabular}
}
\end{center}
\caption{Frequency of confidence1 generated from
\fig{golfdata}. Assumes that numeric ranges have been
divided into 3 \BANDS. Outstandingly high confidence1
values shown are in black. Y-axis is the number of  ranges
that have a particular confidence1 value.} \label{fig:dels}
\end{figure}

\begin{figure}
\begin{center}\begin{minipage}{1.0\linewidth}{\scriptsize
\begin{tabular}{l@{~}l@{~}p{2.5in}}
\INPUT:&   \DEE      &The examples.\\
     &   \ITEMS     & Attributes seen in the examples.\\
      &  \BEST      &The best combination of criteria.\\
      &  \EN      & Desired size of LHS.\\
      &  \PROMISING & Threshold for a
                   useful attribute range.\\
       & \SKEW      & Threshold for acceptable number
                  of \BEST\ entries in \TREATED.\\
       & \BANDS     &Number of divisions within
                  continuous ranges.\\~\\
\OUTPUT: &\LHS& A conjunction of attribute ranges\\
&\RHS& a change in the class distributions\\
\end{tabular}
\begin{alltt}{\scriptsize
01. \DEEONE \IS discretize(\DEE, \BANDS)
02. \TEMP \IS \BASELINE \IS frequency(\DEEONE)
03. \FOR \ATTRIBUTE in \ITEMS \DO
04.     \FOR \RR \IN \ATTRIBUTE.\RANGES \DO
05.         \IF   confidence1(\ATTRIBUTE.\RR) \GE \PROMISING
06.         \THEN \CANDIDATES \IS \CANDIDATES + \ATTRIBUTE.\RR\DONE\DONE
07. \FOR \CC \SOME \CANDIDATES \WHERE |\CC| = \EN \DO
08.     \DEETWO \IS \CC \PLUS \DEEONE
09.     \DISTRIBUTION \IS frequency(\DEETWO)
10.     \IF   \DISTRIBUTION>\TEMP \AND |\BEST\PLUS\DEEONE|/|\BEST\PLUS\DEETWO|>\SKEW
11.     \THEN   \DO\LHS \IS \CC
12.            \RHS \IS compare(\BASELINE,\DISTRIBUTION)
13.            \TEMP \IS \DISTRIBUTION\DONE\DONE
14. \IF (\LHS \NOE \AND \RHS \NOE) \THEN \RETURN (\LHS, \RHS)
15. \ELSE \RETURN "no treatment"}
\end{alltt}}\end{minipage}\end{center}
\caption{The TAR2 algorithm.}\label{fig:tar2}
\end{figure}


\section{Inside TAR2}
TAR2 generates controller and monitor treatments. Monitors
are generated using in same manner as generating
controllers. However, before the monitor is generated, the
scoring function for the criteria is reversed so TAR2 now
seeks attribute ranges that {\em nudge} a system into worse
behavior. The rest of this section discusses how to
generate controllers.

The TAR2 algorithm is shown in \fig{tar2}. The {\tt frequency}
function counts the frequency of examples falling into different
criteria.  Using this function, a \BASELINE\ class distribution is
collected from \DEE\ (this is used later to contrast different
treatments) and copied to a \TEMP\ variable (this is used to store
the best distribution seen so far). The {\tt compare} function
compares two frequencies to generate reports like (e.g.) 43\% less
"lots" and 5\% less "some" and 167\% more "none". The {\tt
discretize} function divides the numeric ranges seen in the examples
into \BANDS\ number of groups. TAR2 was originally designed using a
very simple  discretization policy; i.e. TAR2 sorts the known values
and divides into \BANDS with (roughly) the same cardinality. It was
anticipated that this policy would be too simplistic and would have
to be improved.  However, our empirical results (see below) were so
encouraging that we were never motivated to do so.

%XXX define ">" in the following


%XXX missing values
% XX best classs
% skew prune


Once a treatment is found, it is applied to the example set to create
a \TREATED\ example set; i.e. all the examples that don't contradict the
proposed treatment (see line 8). A "good" treatment includes
most of the   examples that
have the \BEST\ criteria (e.g. in the golf example of \fig{golfdata}, \BEST= playing "lots" of golf).
The \SKEW parameter is used at line 10 to reject "bad"
treatments; i.e. those that don't contain enough of the \BEST\ criteria. For example,
at \SKEW=5, at least 20\% of the \BEST\ criteria must appear in the treatment.


TAR2 explores  subsets of the  \RANGES\ found in a set of
examples \DEE\ (see line 7). Subset exploration is
constrained to just the \RANGES\ with an outstandingly
large confidence1 score (see line 5). Even with this
restriction, there are still an exponential number of such
subsets. Hence, to be practical, TAR2 must seek the {\em
minimal} possible number of control actions and monitors.
Accordingly, the user of TAR2 constrains its learning to
rule conditions of size \EN, where  $N$ is small (see line
7). Often, effective treatments can be found using $N\le 4$
which suggests that narrow funnels existed in the datasets
used for our case studies.
%XXX key point. the nudge distributions say that most
% attributes don't matter
%XXX refer to the monitor1 and monitor2 rules. note that
% bad <> not good
\begin{figure}
\begin{center}
\begin{minipage}{0.8\linewidth}
{\footnotesize
\begin{verbatim}
controllerG
if   outlook=overcast
then (230% more "lots" and no "some"
     and no "none").

monitorG
if    90 <= humidity < 97
then (43% less "lots" and 5% less "some"
     and 167% more "none").
   \end{verbatim}}
\end{minipage}
\end{center}
\caption{Control and monitor rules found from
\protect\fig{golfdata}. To control {\em outlook},
unscrupulous owners of golf courses could (e.g.) bribe
radio announces to lie about the weather
report.}\label{fig:cr}
\end{figure}
\begin{figure}
{\small\begin{center}
\begin{tabular}{cp{0.6in}cp{0.7in}cp{1.1in}}
&baseline: &&controllerG: && monitorG:\\\hline &(from \fig{golfdata})
&& $outlook$= $overcast$ && $humid =[90..97)$\\ &~~~~~\begin{barenv}
\setstyle{\sffamily} \setwidth{8}
     \setstretch{0.4}
    \setyaxis{0}{100}{25}
\sethspace{0.4} \setnumberpos{down} \bar{36}{1}
 \bar{21}{4}
 \bar{43}{8}
\end{barenv}&&
~~\begin{barenv} \setstyle{\sffamily} \setwidth{8}
 \setstretch{0.4}
    \setyaxis{0}{100}{25}
\sethspace{0.5} \setnumberpos{down} \bar{0}{1} \bar{0}{4}
\bar{100}{8}
\end{barenv}
&& ~~\begin{barenv} \setstyle{\sffamily} \setwidth{8}
 \setstretch{0.4}
  \setyaxis{0}{100}{25}
\sethspace{0.5} \setnumberpos{down}
 \bar{60}{1}
  \bar{20}{4}
   \bar{20}{8}
\end{barenv}
\end{tabular}
\end{center}
} \caption{Percentage of classes seen in different
situations. The left-hand-side histogram is a report of the class
frequencies seen in \protect\fig{golfdata}. The middle and right-hand-side histograms
were generated by applying the treatments of \protect\fig{cr}. KEY:~\protect\legend1{none};
\protect\legend4{some}; \protect\legend8{lots}.}
\label{fig:golft}
\end{figure}

~\\

\section{Examples and Experiments}
\subsection{Examples}
The output of TAR2 describes constraints which, if
applied to the dataset, may reject certain examples. For
example, the {\tt controllerG} treatment of \fig{cr} contains
the  constraint $outlook=overcast$. If we
reject all items in the golf dataset that contradicts this
constraint, then our golfers now play "lots", "some", and
"none" golf in 100\%, 0\%, and 0\% (respectively) of the
constrained dataset (as shown in the middle histogram of
\fig{golft}).

The monitor rule
{\tt monitorG} of \fig{cr}
was generated in a similar manner; but with the scoring system
reversed; i.e.  "lots"=2, "some"=4, "none"=8. In this case,
"none" is the ``$best$'' class and TAR2 will find a treatment that selects
for less golf behavior; i.e. $90 \le humidity < 97$.
After applying this constraint , the
class distribution changes to the right-hand-side histogram
of \fig{golft}.


%\begin{center}
%{\footnotesize
%\begin{tabular}{cccccccccc}
%    &\multicolumn{4}{c}{$I$ = Items}&&\multicolumn{4}{c}{$C$ = Criteria}\\
%$D$ & $I_1$ & $I_2$ & ... & $I_i$     & & $C_1$ & $C_2$ & ... &
%$C_c$\\\cline{2-5}\cline {7-10}
%1 & $V_x$ & ? & ... & $V_w$ & & good & med & .. & ? \\
%2 & $V_y$ & $V_z$ & ... & ? & & bad & ?  & ... & good \\
%\end{tabular}
%}
%\end{center}


\subsection{Experiments}\label{sec:ex}


This section discusses
experiments with  TAR2 where the leaner was assessed via two methods:
\begin{description}
\item[Xvals:]
Standard N-way cross-validation  studies.
\item[Simulations:]
Simulations  showing how well TAR2's treatments can control or monitor
some model.
\end{description}


\subsubsection{Xval Studies}

Figure~\ref{fig:iris} to Figure~\ref{fig:carr} shows TAR2
executing over some samples from the UC Irvine
repository ({\footnotesize \url{http://www.ics.uci.edu/~mlearn/}}).
These figures display the  effects of the treatment closest
to the average improvement
seen in a 10-way cross-validation study.
Each
  figures show the class distributions as percentages
and the   domain classes are shown in a legend. In the legend,
  the heuristic worth
assigned to each class is, top-to-bottom, worst-to-best.


%XXX ying: check algorithm
%XXX ying: that sentence
\begin{figure}
{\small\begin{center}
\begin{tabular}{ cp{0.8in} cp{1.0in} p{1.5in} }
 &baseline           &&controller                 &Legend\\\hline
 &                   &&$petal length=$& \\
 &                   &&$[3.7..4.8)$& \\
&~~~~~\begin{barenv} \setstyle{\sffamily} \setwidth{8}
     \setstretch{0.4}
    \setyaxis{0}{100}{25}
\sethspace{0.4} \setnumberpos{down} \bar{33}{1}
 \bar{33}{4}
 \bar{33}{8}
\end{barenv}&&
~~\begin{barenv} \setstyle{\sffamily} \setwidth{8}
 \setstretch{0.4}
    \setyaxis{0}{100}{25}
\sethspace{0.5} \setnumberpos{down} \bar{0}{1} \bar{0}{4}
\bar{100}{8}
\end{barenv}
& \protect\legend1{setosa}\newline
\protect\legend4{virginica}\newline
\protect\legend8{v.color}\newline
\end{tabular}
\end{center}
} \caption{Iris} \label{fig:iris}
\end{figure}
\begin{figure}
{\small\begin{center}
\begin{tabular}{cp{0.8in}cp{1.0in}p{1.5in}}
 &baseline           &&controller                 &Legend\\\hline
 &                   &&$displacement$= $[68..101)$ & \\
 &                   &&$horsepower$= $[46..78)$   & \\
&~~~~~\begin{barenv} \setstyle{\sffamily} \setwidth{8}
     \setstretch{0.4}
    \setyaxis{0}{100}{25}
\sethspace{0.4} \setnumberpos{down} \bar{27}{1}
 \bar{20}{2}
 \bar{27}{4}
 \bar{26}{8}
\end{barenv}&&
~~\begin{barenv} \setstyle{\sffamily} \setwidth{8}
 \setstretch{0.4}
    \setyaxis{0}{100}{25}
\sethspace{0.5} \setnumberpos{down} \bar{0}{1} \bar{0}{2}
\bar{30}{4} \bar{70}{8}
\end{barenv}
& \protect\legend1{low}\newline
\protect\legend2{medlow}\newline
\protect\legend4{medhigh}\newline
\protect\legend8{high}\newline
\end{tabular}
\end{center}
} \caption{Autompg} \label{fig:autompg}
\end{figure}
\begin{figure}
{\small\begin{center}
\begin{tabular}{cp{0.8in}cp{1.0in}p{1.5in}}
 & baseline           &&controller&Legend\\\hline
&   &&$hieght$= $[34..86)$& \\
            & &&$mean\_tr$= $[3.9..9.5)$& \\
&~~~~~\begin{barenv} \setstyle{\sffamily} \setwidth{8}
     \setstretch{0.4}
    \setyaxis{0}{100}{25}
\sethspace{0.4} \setnumberpos{down} \bar{90}{1}
 \bar{6}{2}
 \bar{1}{4}
 \bar{2}{7}
 \bar{2}{8}
\end{barenv}&&
~~\begin{barenv} \setstyle{\sffamily} \setwidth{8}
 \setstretch{0.4}
    \setyaxis{0}{100}{25}
\sethspace{0.5} \setnumberpos{down} \bar{0}{1} \bar{0}{2}
\bar{0}{4} \bar{0}{7}
 \bar{100}{8}
\end{barenv}
& \protect\legend1{text}\newline
\protect\legend2{horz.line}\newline
\protect\legend4{graphic}\newline
\protect\legend7{vert.line}\newline
\protect\legend8{picture}\newline
\end{tabular}
\end{center}
} \caption{Page-block} \label{fig:pageblock}
\end{figure}
\begin{figure}
{\small\begin{center}
\begin{tabular}{cp{0.8in}cp{1.0in}p{1.5in}}
 &baseline           &&controller                 &Legend\\\hline
 &                   &&$buying=low$ & \\
 &                   &&$safety=high$   & \\
&~~~~~\begin{barenv} \setstyle{\sffamily} \setwidth{8}
     \setstretch{0.4}
    \setyaxis{0}{100}{25}
\sethspace{0.4} \setnumberpos{down} \bar{70}{1}
 \bar{22}{2}
 \bar{4}{4}
 \bar{4}{8}
\end{barenv}&&
~~\begin{barenv} \setstyle{\sffamily} \setwidth{8}
 \setstretch{0.4}
    \setyaxis{0}{100}{25}
\sethspace{0.5} \setnumberpos{down} \bar{31}{1} \bar{19}{2}
\bar{13}{4} \bar{38}{8}
\end{barenv}
& \protect\legend1{unacc}\newline
\protect\legend2{acc}\newline
\protect\legend4{good}\newline
\protect\legend8{vgood}\newline
\end{tabular}
\end{center}
} \caption{Car} \label{fig:car}
\end{figure}
\begin{figure}
{\small\begin{center}
\begin{tabular}{cp{0.8in}cp{1.0in}p{1.5in}}
 &baseline           &&monitor                 &Legend\\\hline
 &                   &&$person=2$ & \\
 &                   &&$safety=low$   & \\
&~~~~~\begin{barenv} \setstyle{\sffamily} \setwidth{8}
     \setstretch{0.4}
    \setyaxis{0}{100}{25}
\sethspace{0.4} \setnumberpos{down} \bar{70}{1}
 \bar{22}{2}
 \bar{4}{4}
 \bar{4}{8}
\end{barenv}&&
~~\begin{barenv} \setstyle{\sffamily} \setwidth{8}
 \setstretch{0.4}
    \setyaxis{0}{100}{25}
\sethspace{0.5} \setnumberpos{down} \bar{100}{1} \bar{0}{2}
\bar{0}{4} \bar{0}{8}
\end{barenv}
& \protect\legend1{unacc}\newline
\protect\legend2{acc}\newline
\protect\legend4{good}\newline
\protect\legend8{vgood}\newline
\end{tabular}
\end{center}
} \caption{Car: reversing the class scoring.} \label{fig:carr}
\end{figure}
In  \fig{iris}, TAR2 was told that the worth of each type
of flower was (in increasing order) {\em setosa}, {\em
viginica}, then {\em v.color}. TAR2 then learnt that
$3.7\le petal length < 4.8$ would select for the flower
with highest worth (i.e. {\em v.color}).

Similarly, in  \fig{autompg}, TAR2 learnt a selector that favored  high
quality cars. By restricting engine size to $68\le displacement<101$
and $46\le horsePower<78$, the ratio of high quality cars increased from
26\% to 70\%. Further, the low and medium low cars have disappeared.

In \fig{pageblock}, TAR2 learnt a specialized feature
extractor for finding pictures mixed in with text,
horizontal lines, vertical lines, and graphics. According
to TAR2, a height between 34 to 86, and a  mean number of
white-black transitions between 3.9 and 9.5 will locate
text blocks, and nothing else.

In the car domain
of \fig{car}, most
of the classes are non-best. The average best controller
seen in the 10-way cross-validation for the car domain was
{\em buying=low} and {\em safety=high}. While this controller increases
the frequency of very good cars  from 4\% to 38\%, this  controller
still leaves us with 31\% unacceptable cars.
While this controller is weak, the monitor obtained by reversing
the class scoring is very strong. \fig{carr} shows that monitor: if
we select two person cars with low safety, then 100\% of
the cars are unacceptable. That is, when the best class occurs
rarely in the dataset,
TAR2 may be better at finding methods to {\em degrade} a system, rather than
{\em improve} it.

\subsubsection{Simulation Studies}

Another way to assess TAR2 is to test how well it can control
some model. To perform such an assessment,
 we (i)~generated
data sets from some model; (ii)~apply TAR2 to find
treatments from those data set; (iii) imposed those
treatments as constraints on the model; (iv) ran the model
a second time; (v) compared the outputs of the second run
to the predictions made by TAR2.

In our first two simulations studies, a {\em baseline}
class distribution was used by TAR2 to generate a best
controller and a prediction of how this best controller
would change the class distribution. We call the predicted
distribution the  {\em treated} distribution. The {\em
actual} distribution was the class distribution seen after
the best controller was imposed on the model and the model
executed again. In \fig{run3} and \fig{run2}, the treated
distribution matches the result distribution almost
exactly; i.e. TAR2 accurately predicted the effects of the
controller treatment.

\fig{run3} was generated from a  model of software project
risk. This risk model was implemented as part of the COCOMO
project. The goal of the COCOMO project is to build an
open-source software cost estimation model~\cite{cocomoII}.
Internally, the model contains a matrix of  parameters that
should be tuned to a particular software organization.
Using COCOMO-II, the Madachy risk model can assess the risk
of a software cost over-run~\cite{madachy97}.  For machine
learning purposes, the goal of using the Madachy model is
to find a change to a description of a software project
that reduces the likelihood of a poor risk software
project~\cite{me00e,me01f}. In the experiment shown in
\fig{run3}, the model was executed 30,000 times using
randomly selected inputs. When the treatments learnt from
TAR2 treatments were imposed on model inputs, and the model
was executed again, all the high risk projects were
removed, the percentage of medium risk projects was
significantly reduced, and the percentage of low risk
projects was tripled.


\begin{figure}
{\footnotesize\begin{center}
\begin{tabular}{cp{0.8in}cp{0.8in}cp{0.8in}}
&baseline &&treated
&& actual\\\hline

&~~~~~\begin{barenv} \setstyle{\sffamily} \setwidth{8}
     \setstretch{0.4}
    \setyaxis{0}{100}{25}
\sethspace{0.4} \setnumberpos{down} \bar{0}{1}
 \bar{20}{2}
 \bar{60}{4}
 \bar{20}{8}
\end{barenv}&&
~~\begin{barenv} \setstyle{\sffamily} \setwidth{8}
 \setstretch{0.4}
    \setyaxis{0}{100}{25}
\sethspace{0.5} \setnumberpos{down} \bar{0}{1} \bar{0}{2}
\bar{40}{4}
\bar{60}{8}
\end{barenv}&&
~~\begin{barenv} \setstyle{\sffamily} \setwidth{8}
 \setstretch{0.4}
    \setyaxis{0}{100}{25}
\sethspace{0.5} \setnumberpos{down} \bar{0}{1} \bar{0}{2}
\bar{40}{4}
\bar{60}{8}
\end{barenv}
\end{tabular}
\end{center}
} \caption{COCOMO
 key:~\protect\legend1{very high risk};
\protect\legend2{high risk};
\protect\legend4{medium risk};
\protect\legend8{low risk}.}
\label{fig:run3}
\end{figure}


\begin{figure}[!b]
\begin{center}
{\small
\begin{tabular}{rl}
baseline: &
\begin{barenv}
\setstyle{\sffamily} \setwidth{8} \setnumberpos{none}
 \setstretch{0.5}\sethspace{0.5} \setyaxis{0}{45}{20}
\setxaxis{0}{9}{1}
 \bar{0}{8} % 0
 \bar{15}{8} % 1
 \bar{45}{8} % 2
 \bar{20}{8} % 3
 \bar{5}{8}  % 4
  \bar{5}{8} % 5
  \bar{3}{8} % 6
\bar{2}{8} % 7
 \bar{0}{8} %8
\bar{0}{8} %9
\bar{0}{8}
\end{barenv}\\~\\~\\ %10
treated: &
\begin{barenv}
\setstyle{\sffamily} \setwidth{8} \setnumberpos{none}
 \setstretch{0.5}\sethspace{0.5} \setyaxis{0}{45}{20}
\setxaxis{0}{9}{1}
 \bar{0}{8} % 0
 \bar{0}{8} % 1
 \bar{0}{8} % 2
 \bar{0}{8} % 3
 \bar{5}{8}  % 4
  \bar{32}{8} % 5
  \bar{45}{8} % 6
\bar{15}{8} % 7
 \bar{0}{8} %8
\bar{0}{8} %9
\bar{0}{8}
\end{barenv}\\~\\~\\
actual: &
\begin{barenv}
\setstyle{\sffamily} \setwidth{8} \setnumberpos{none}
 \setstretch{0.5}\sethspace{0.5} \setyaxis{0}{45}{20}
\setxaxis{0}{9}{1}
 \bar{0}{8} % 0
 \bar{0}{8} % 1
 \bar{0}{8} % 2
 \bar{0}{8} % 3
 \bar{5}{8}  % 4
  \bar{32}{8} % 5
  \bar{45}{8} % 6
\bar{15}{8} % 7
 \bar{0}{8} %8
\bar{0}{8} %9
\bar{0}{8}
\end{barenv}
\end{tabular}}
\end{center}
\caption[Round1]{Circuit. X-axis denotes number of bulbs
glowing in the circuit. }\label{fig:run2}
\end{figure}

\fig{run2} shows TAR2 controlling a qualitative description
of an electrical circuit. A qualitative description of a
circuit of 47 wires connecting 9 light bulbs and 16 other
components was coded in Prolog. The model was expressed as
a set of constraints; e.g. the {\tt sum} of the voltages of
components in series is the {\tt sum} of the voltage drop
across each component. The goal of the circuit was to
illuminate a space using the 9 light bulbs. The circuit is
qualitative and qualitative mathematics is
nondeterministic; e.g. sum of a negative and a positive
value is unknown. The problem with the circuit was
out-of-control nondeterminism. On backtracking, this
circuit generated 35,228 different solutions to the
constraints. In many of these solutions, the circuit was
unacceptably dark:  only two bulbs glowing, on average (see
the top histogram of \fig{run2}) . The goal of the machine
learning was hence to find a minimal set of changes to the
circuit to increase the illumination~\cite{me01g}.
\fig{run2} shows the distribution of the frequency with
which bulbs glowed in a qualitative circuit description.
The behavior of qualitative circuits is notoriously hard to
predict~\cite{clancy97} but TAR2  found two actions on the
circuit that trebled the average number of bulbs that
glowed (see the {\em treated}  and {\em actual} plot of
\fig{run2}).


\begin{figure}
\begin{center}
\includegraphics[angle=270,width=3in]{jpl.eps}
\end{center}
\caption{Results from the satellite domain. The dots below
the line show the initial output of the model: note the
very large spread in the costs and benefits. The dots above
the line show the final outputs of the model after 5
iterations of TAR2 learning.}\label{fig:jpl}
\end{figure}

\fig{jpl} shows a third simulation study with TAR2.
Analysts at the NASA Jet Propulsion Laboratory debate
satellite design by  building a semantic network connecting
design decisions to satellite requirements~\cite{fea00}.
Each edge is annotated with the numeric cost and benefits
of taking some action. Some of these nodes represent base
decisions within the project (e.g. selection of  a
particular type of power supply). Each set of decisions has
an associated cost. The net can be executed by selecting
actions and seeing what benefits results.  One such network
included 90 possible actions; i.e. $2^{99}\approx 10^{30}$
combinations of actions. Note the black line, top-left, of
\fig{jpl}. All the dots below this line were generated via
10,000 random selections of the decisions, and the
collection of their associated costs and benefits. All the
dots above this line represent high benefit, low cost
projects found by TAR2~\cite{fea02a}. In this application,
TAR2 was used as a knowledge acquisition tool. After each
run of TAR2, the proposed best controller was debated with
the analysts. Each run, and its associated debate, resulted
in a new set of constraints for the semantic net. The new
constraints were then imposed on the model before the next
run. After five runs, TAR2 found 30 decisions (out of 99)
that crucially effected the cost/benefit of the satellite.
Note that this means TAR2 also found 99-30=67 decisions
that could be safely ignored.

For comparison purposes, a genetic algorithm  (GA) was also
applied  to the \fig{jpl} domain~\cite{fea02a}. The GA also
found decisions that generated high benefit, low cost
projects. However, each such GA solution commented on every
possible decisions and there was no apparent way to
ascertain which of these are the most critical decisions.
The TAR2 solution was deemed superior to the GA solution by
the domain experts, since the TAR2 solution required just
30 actions rather than the 99 demanded by the GA.

Note that the \fig{jpl} case study is not a counter example
to our thesis that most domains have narrow funnels.  That
study adopted the incremental approach for reasons of
convenience. JPL's semantic net simulator was too slow to
generate enough examples at one run. Hence, an incremental
generate-and-constrain approach was taken.

%XXX scale up unknown . ok if the larger data sets have a small left tail and that is future work
% all the examples discussed in the


%XXX add in the COCOMO and the QR results

\section{Generality}

\begin{figure}[!t]
\begin{center}{\scriptsize
\begin{tabular}{r@{:}lrccccc}
     \multicolumn{1}{c}{} &  & \multicolumn{4}{c}{Attributes}
     & treatment&runtime\\\cline{3-6}
source&domain&\#
examples&\#continuous&\#discrete&\#classes& size&
(secs)\\\hline   UCI&iris & 150 & 4 & 0   & 3       &1      &
$<$1\\\hline UCI&wine      & 178 & 13 & 0 & 3       & 2     &
$<$1\\\hline UCI&car       & 1,728 &  0 & 6 & 4 & 2 & $<$1\\\hline
UCI&autompg   & 398     & 6 & 1   & 4 & 2 & 1\\\hline UCI&housing   &
506     & 13 &  0  & 4       & 2     & 1\\\hline UCI&page blocks    &
5,473 & 10         &   0 & 5 & 2 & 2\\\hline
 here&circuit &35,228
& 0 & 18  & 10      & 4     & 4\\\hline here&cocomo &30,000   & 0 &
23  & 4       & 1     & 2\\\hline here&satellite &30,000   & 0 & 99 &
9       & 5 &86\\\hline other &reachness &25,000   & 4 & 9 & 4 & 2
&3\\\cline{3-8}
        &  &250,000  & 4          & 9   & 4       & 1     &23\\\hline
              \end{tabular}}\end{center}
              \caption{Runtimes for TAR2 on different domains
              (on a 333MHz Windows machine with 200MB of ram).
              ``UCI'' denotes data sets from the machine learning
              repository at UC Irvine. ``Here'' denotes data sets
              taken from this article. `
              The text discusses experiments with 10,000 examples from
             the satellite domain.
             This table shows a larger case study
             of 30,000 examples.
             `Other'' denotes a data
              set taken from~\protect\cite{me01h}.
             }\label{fig:runtimes}
              \end{figure}

This section is an algorithmic assessment of TAR2.
Such an algorithm
assessment comments on TAR2's
ability to scale to larger domains.

\fig{runtimes} reports TAR2 runtimes on data sets of
different sizes. \fig{ying} shows three studies where the
size of the treatments ($N$, from line 7 in \fig{tar2}) was
held constant, and the size of the dataset was increased.
\fig{deltachanges} shows one study were the size of the
dataset was held constant and the size of the treatments
was increased. Note that:
\begin{figure}[!t]
\begin{center}
\includegraphics{ying.eps}
\end{center}
\caption[Round1]{Increasing size of dataset and size of
treatments. Datasets generated from the COCOMO
model.}\label{fig:ying}
\end{figure}
\begin{figure}
\begin{center}
\includegraphics{deltachanges.eps}
\end{center}
\caption[Round1]{For different treatment sizes $N$,
Increasing size of treatments, keeping data set size
constant (3MB).  Dataset generated from the COCOMO
model.}\label{fig:deltachanges}
\end{figure}
\begin{enumerate}
\item
TAR2 can handle small to medium sized datasets. For
example, the algorithm learnt effective treatments in 23
seconds from a  dataset containing size 250,000 examples:
see the {\em reachness} domain in \fig{runtimes}.
\item
TAR2 has the potential to scale to large datasets. Assuming
constant treatment size, TAR2's runtimes are linear on
dataset size: see~\fig{ying}.
\item
However, the algorithm is exponential on treatment size: see the marked
increase in the runtimes between N=2 and N=3 in \fig{ying} and the log-linear
plot of \fig{deltachanges}.
\end{enumerate}

The exponential impact of increasing treatment size is not
necessarily a reason to reject TAR2. Firstly, if very large
treatments are required, then a incremental treatment
learning approach, such as used in  the satellite case
study of \fig{jpl}, may suffice.

Secondly,  if most domains don't need large treatments,
then this exponential impact will not be seen in practice.
Elsewhere, we have made an average case mathematical
analysis of the ratio of the odds of a domain narrow
funnels to the odds of larger funnels. Under a wide variety
of assumptions, the same effect holds: the odds of narrower
funnels are millions of times more likely that wider
funnels~\cite{me00t}. Such a statistical analysis
represents an average case result and may not apply to a
particular domain.  What would be useful would be some kind
of   assessment tool that checks if this average case
statistical result applies to a particular domain.
\begin{figure}
\begin{center}
{\small
\begin{tabular}{c}
\begin{barenv}
\setstyle{\sffamily} \setwidth{8} \setnumberpos{none}
 \setstretch{0.84}\sethspace{0.5} \setyaxis{0}{64}{15}
\setxaxis{1}{8}{1}
 \bar{5}{1} %-4
 \bar{5}{1} % -3
 \bar{5}{1} % -2
 \bar{5}{1}  % -1
  \bar{8}{1} % 0
  \bar{16}{1} % 1
\bar{32}{1} \bar{64}{1}
\end{barenv}
\end{tabular}
}
\end{center}
\caption{A hypothetical confidence1 frequency distribution
with a large left tail that is inconsistent with narrow
funnels. Note: yet to be observed in any example set.}
\label{fig:badnews}
\end{figure}


\begin{figure*}
\begin{center}
{\tiny
\begin{tabular}{p{1.4in}p{1.1in}p{1.1in}p{1.1in}}
 {\em \fig{nudges}.i: housing}\newline {\tiny
\begin{barenv}
\setstretch{5.4} \setstyle{\sffamily} \setwidth{8}
\setnumberpos{none} \setyaxis{0}{10}{2} \sethspace{0.2}
 \setxaxis{-10}{22}{4}
 \bar{8}{1}

  \bar{8}{1}

 \bar{8}{1}

\bar{10}{1}

   \bar{9}{1}

   \bar{3}{1}

\bar{1}{1}

 \bar{2}{1}

 \bar{2}{1}
\end{barenv}} &
 {\em \fig{nudges}.ii: page-blocks}\newline {\tiny
\begin{barenv}
\setstretch{2.7} \setstyle{\sffamily} \setwidth{8}
\setnumberpos{none} \sethspace{0.2}\setyaxis{0}{20}{4}
\setxaxis{-29}{32}{8} \bar{20}{1}

\bar{7}{1}

 \bar{0}{1}

 \bar{1}{1}

 \bar{0}{1}

\bar{0}{1}

 \bar{1}{1}

 \bar{1}{1}
\end{barenv}}&
  {\em \fig{nudges}.iii: wine}\newline {\tiny
\begin{barenv}
\setstretch{2.7} \setstyle{\sffamily} \setwidth{8}
\setnumberpos{none} \setyaxis{0}{20}{5}
\sethspace{0.2}\setxaxis{-4}{11}{2}

   \bar{13}{1}

    \bar{17}{1}

\bar{15}{1}

\bar{7}{1}

 \bar{5}{1}

  \bar{3}{1}

\bar{2}{1}

 \bar{3}{1}
\end{barenv}}

&

{\em \fig{nudges}.iv: car}\newline {\tiny
\begin{barenv}
\setstretch{5.4} \setstyle{\sffamily} \setwidth{8}
\setnumberpos{none} \setyaxis{0}{10}{3} \sethspace{0.2}
\setxaxis{-13}{-7}{1} \bar{5}{1} \bar{2}{1} \bar{3}{1}
\bar{3}{1} \bar{6}{1} \bar{1}{1} \bar{1}{1}
\end{barenv}}

\\~\\
 {\em \fig{nudges}.v: satellite}\newline {\tiny
\begin{barenv}
\setstretch{0.45}\sethspace{0.5} \setstyle{\sffamily}
\setwidth{8} \setnumberpos{none} \setyaxis{0}{120}{30}
\setxaxis{-350}{0}{50} \bar{1}{1}
 \bar{11}{1}
 \bar{46}{1}
 \bar{118}{1}
 \bar{16}{1}
 \bar{0}{1}
 \bar{4}{1}
  \bar{8}{1}
\end{barenv}} &

   {\em \fig{nudges}.vi: QR circuit}\newline{\tiny
\begin{barenv}
\setstyle{\sffamily}\sethspace{0.2} \setwidth{8}
 \setnumberpos{none}
 \setstretch{0.154}
\setyaxis{0}{350}{100}
 \setxaxis{4}{9}{1}
 \bar{144}{1}
 \bar{226}{1}
 \bar{280}{1}
  \bar{104}{1}
  \bar{17}{1}
  \bar{10}{1}
\end{barenv}} &
  {\em \fig{nudges}.vii: cmm2}\newline {\tiny
\begin{barenv}
\setstretch{0.25}\sethspace{0.2} \setstyle{\sffamily}
\setwidth{8} \setnumberpos{none} \setyaxis{0}{220}{50}
\setxaxis{1}{14}{2}

\bar{20}{1}
 \bar{147}{1}

\bar{213}{1} \bar{212}{1}

 \bar{10}{1}

\bar{20}{1}
 \bar{10}{1}
\end{barenv}}
&
 {\em \fig{nudges}.viii: cocomo2}\newline  {\tiny
\begin{barenv}
\setstyle{\sffamily} \setwidth{8} \setstretch{0.72}
\setnumberpos{none} \sethspace{0.2}
 \setyaxis{0}{75}{25}
 \setxaxis{-13}{9}{4}
 \bar{1}{1}
 \bar{3}{1}
 \bar{4}{1}
 \bar{68}{1}
 \bar{49}{1}
 \bar{9}{1}
\end{barenv}}
\end{tabular}}

\end{center}
\caption{Confidence1 distributions seen in eight domains.
 Y-axis is
the number of times
  a particular confidence1 was seen.
 Top row comes from datasets taken from
  the UC Irvine repository.
Bottom row were generated from other domains discussed in
this article.}\label{fig:nudges}
\end{figure*}

The confidence1 distribution can be used to test for narrow
funnels. Domains that contain such funnels would exhibit
the following property: a small number of variables within
the funnel exert a disproportionately large influence on
the overall behavior of the system. A test for such
variables is to check for {\em small right tails} in the
confidence1 distributions. \fig{golft} has such a {\em
small right tail}; i.e. the bulk of the distribution lies
away from the maximum value. Distributions with a {\em
large right tail} such as \fig{badnews} are not consistent
with narrow funnels. \fig{nudges} shows the confidence1
distributions seen in eight example sets: four from the UC
Irvine repository and some of the other domains described
above. Note that in all cases, the distribution has a small
right tail; i.e. a small number of variables exert a
disproportionately large influence on the overall behavior
of the system.  In all, we have applied TAR2 to 20 domains:
the ones discussed in this paper and others  not shown for
space reasons. In none of those domains have we observed a
large right tail.

\section{Conclusion}

The minimal theories of TAR2 will be inadequate
if
domains contain complex relationships. Domains with narrow
funnels are not complex: the key controllers for the whole
space are merely the few variables in the funnel.

The  TAR2 association rule learner  is both a test
and an application of   funnel theory. TAR2 offers two
tests for narrow funnels. Firstly, a confidence1
distribution with a small right tail is consistent with a
domain containing narrow funnels. Secondly, if a domain
contains narrow funnels, then TAR2 should be able to
generate adequate controllers and monitors for that domain.
All the domains we have seen to date have these two
features.

The open issue is how many other domains lack complex
relationships. Based on around 20 case studies with TAR2
(some of which were reported above),  Holte's prior work
with 1R, and the wrapper studies of Kohavi and John,
we have some empirical reasons to believe that
many domains are not complex. Also, we have theoretical
reasons for believing that narrow funnels are
common~\cite{me99q,me00b,me00d,me00t}; i.e.  domains with
complex relationships are rare and
TAR2 will often suffice.

The success of such a simple algorithm such as TAR2 suggests
that it can be fruitful to {\em first} try lightweight
methods {\em before} exploring heavyweight methods. We hence
advocate using TAR2 as a preprocessor to other, more
elaborate schemes.

TAR2 is available for download from
\url{http://www.ece.ubc.ca/twiki/bin/view/Softeng/TreatmentLearner}.


%XXX note: runtimes worse than exponential as |lhs| increases
% and less than exponential as sample size increases
% and assuming small rules, then we're in!

%not done: effect of unknowns, effects of increasing #attribtus, #classes,
%#examples. and some control for solution quality. kidna like the
% APRIORI paper


\acknowledgements This research was conducted at
the University of British Columbia and the
West Virginia University, partially under NASA
contract NCC2-0979. The work was partially sponsored by the NASA Office of
Safety and Mission Assurance under the Software Assurance Research
Program led by the NASA IV\&V Facility.  Reference herein to any
specific commercial product, process, or service by trade name,
trademark, manufacturer, or otherwise does not constitute nor
imply its endorsement by the United States Government.
% The endnotes section will be placed here.

\theendnotes


\bibliographystyle{../../tex/kluwer/klunamed}
%! remind me to send you my refs.bib file
%! you'll have  to change this line to wherever your store your refs
%! to get the references in, you have run latex once, BibTeX once, then latex twice
%! (to get the reference numbers right). so stupid...
\bibliography{../../refs}


\end{article}
\end{document}