%\documentclass[times, 11pt]{article}\renewcommand{\baselinestretch}{1.5}\textheight 23.25cm\textwidth 18cm\oddsidemargin -0.68cm %\documentclass{sig-alternate} \documentclass[namedreferences]{../../tex/kluwer/kluwer} % Specifies the document style. %\newdisplay{guess}{Conjecture} %\documentclass[times, twocolumn,10pt]{article} \usepackage{times,alltt,url,graphicx,amssymb} \usepackage{../../tex/bar}%,../../tex/cite} %\usepackage{../../tex/lineno}\runninglinenumbers\modulolinenumbers[5] %\usepackage{bar} \newcommand{\bb}{\blacksquare\!\!} \newcommand{\fig}[1]{Figure~\ref{fig:#1}} \newcommand{\ARROW}{{\sffamily $\Rightarrow$}} \newcommand{\ATTRIBUTE}{{\sffamily $attribute$}} \newcommand{\BANDS}{{\sffamily $bands$}} \newcommand{\BASELINE}{{\sffamily $baseline$}} \newcommand{\BEST}{{\sffamily $best$}} \newcommand{\CANDIDATES}{{\sffamily $candidates$}} \newcommand{\CC}{{\sffamily $C$}} \newcommand{\DEEONE}{{\sffamily $D_1$}} \newcommand{\DEETWO}{{\sffamily $treated$}} \newcommand{\DEE}{{\sffamily $D$}} \newcommand{\DISTRIBUTION}{{\sffamily $result$}} \newcommand{\EN}{{\sffamily $N$}} \newcommand{\INPUT}{{\underline {\sffamily input}}} \newcommand{\ITEMS}{{\sffamily $items$}} \newcommand{\LHS}{{\sffamily $lhs$}} \newcommand{\NOE}{$\not= \emptyset$} \newcommand{\PROMISING}{{\sffamily $promising$}} \newcommand{\RANGES}{{\sffamily $ranges$}} \newcommand{\RHS}{{\sffamily $rhs$}} \newcommand{\RR}{{\sffamily $R$}} \newcommand{\SKEW}{{\sffamily $skew$}} \newcommand{\TEMP}{{\sffamily $temp$}} \newcommand{\TREATED}{{\sffamily $treated$}} \newcommand{\AND}{{\underline {\sffamily and}}} \newcommand{\DONE}{{{\sffamily \}}}} \newcommand{\DO}{{{\sffamily \{}}} \newcommand{\ELSE}{{\underline {\sffamily else}}} \newcommand{\FI}{{\underline {\sffamily fi}}} \newcommand{\FOR}{{\underline {\sffamily for}}} \newcommand{\GE}{{$\ge$}} \newcommand{\IF}{{\underline {\sffamily if}}} \newcommand{\IN}{{\underline {\sffamily in}}} \newcommand{\IS}{{$\leftarrow$}} \newcommand{\NIL}{ {\sffamily $\emptyset$}} \newcommand{\OUTPUT}{{\underline {\sffamily output}}} \newcommand{\PLUS}{{$\wedge$}} \newcommand{\REPEAT}{{\underline {\sffamily repeat}}} \newcommand{\RETURN}{{\underline {\sffamily return}}} \newcommand{\SOME}{{$\subseteq$}} \newcommand{\THEN}{{\underline {\sffamily then}}} \newcommand{\TO}{{\underline {\sffamily to}}} \newcommand{\UNTIL}{{\underline {\sffamily until}}} \newcommand{\WHEN}{{\underline {\sffamily when}}} \newcommand{\WHERE}{{\underline {\sffamily where}}} \newenvironment{smallitem}{ \setlength{\topsep}{0pt} \setlength{\partopsep}{0pt} \setlength{\parskip}{0pt} \begin{itemize} \setlength{\leftmargin}{.2in} \setlength{\parsep}{0pt} \setlength{\parskip}{0pt} \setlength{\itemsep}{0pt}}{\end{itemize}} %! some shorthands \newcommand{\bi}{\begin{itemize}} \newcommand{\ei}{\end{itemize}} \begin{document} \begin{article} \begin{opening} %\conferenceinfo{KDD}{'02 Edmonton, Alberta, Canada} %! general comment- you wrote the paper you want to write, not %! the paper that folks attracted to this particular workshop would %! want to read. most of my changes are just "throw lots of references %! to requirements engineering" to the front of the paper %!your title was too long and not linked into this particular venue \title{Just Enough Learning (of Association Rules):\newline The TAR2 ``Treatment'' Learner} %\titlenote{KDD %paper ID: 440}\titlenote{Submitted to the KDD-2002, The Eighth ACM SIGKDD %International Conference on Knowledge Discovery and Data %Mining, July 23 - 26, 2002, Edmonton, Alberta, Canada %\url{http://www.acm.org/sigkdd/kdd2002/}. Wp ref %01/tar2/tar2} } %! you did the experimental work and wrote draft one and wrote the %! fast tar2 that makes this all possible. you be first author! % % thanks! but I just can't write this beautiful paper, it is yours. %\numberofauthors{2} \newcommand{\affaddr}[1]{#1} %\newcommand{\email}[1]{{\small {\url #1}}} \author{ Tim \surname{Menzies}\thanks{ Lane Department of Computer Science, University of West Virginia, PO Box 6109 Morgantown, WV 26506-6109, USA, \email{tim@menzies.com}}, Ying \surname{Hu}\thanks{ Dept. Electical \& Computer Engineering, University of British Columbia, 2356 Main Mall; Vancouver, B.C. Canada V6T1Z4, \email{yingh@ece.ubc.ca}}} \runningauthor{Tim Menzies, Ying Hu} \runningtitle{Just Enough Learning} \date{Nov 22 2002} %\date{Aug 12,2001} % Use \section* instead of \section to suppress numbering for % the abstract, acknowledgements, and references. %!there is an abstract envrionment \begin{motto}[prose] "Don't tell me where I am, tell me where to go." \rightline{- a (very busy) user} \end{motto} \begin{abstract} An over-zealous machine learner can automatically generate large, intricate, theories which can be hard to understand. However, such intricate learning is not necessary in domains that lack complex relationships. A much simpler learner can suffice in domains with {\em narrow funnels}; i.e. where most domain variables are controlled by a very small subset. Such a learner is TAR2: a weighted-class minimal contrast-set association rule learner that utilizes confidence-based pruning, but not support-based pruning. TAR2 learns {\em treatments}; i.e. constraints that can change an agent's environment. Treatments take two forms. {\em Controller treatments} hold the {\em smallest number} of conjunctions that {\em most improve} the current state of the system. {\em Monitor treatments} hold the {\em smallest number} of conjunctions that best detect future faulty system behavior. Such treatments tell an agent what to do (apply the controller) and what to watch for (the monitor conditions) within the current environment. Because TAR2 generates very small theories, our experience has been that users prefer its tiny treatments. The success of such a simple learner suggests that many domains lack complex relationships. \end{abstract} \keywords{Association rules, treatment learning, contrast sets, %ML-lite. } \end{opening} %------------------------------- % Figure of KC-1 NASA project %------------------------------- \section{Introduction} %! in journalism, there is a problem of "burying the lead"- i.e. the key idea %! does not appear till late in the text. much of "editing" is hunting for the %! "lead" and just moving it to the front. Machine learners generate theories. People read theories. What kind of learners generate the kind of theories that people want to read? If the reader is a busy person, then they might not need, or be able to use, complex theories. Rather, such a busy person might instead just want to know the {\em least} they need to do to achieve the {\em most} benefits. It therefore follows that machine learning for busy people should not strive for (e.g.) elaborate theories or (e.g.) increasing the expressive power of the language of the theory. Rather, a better goal might be to find the smallest theory with the most impact. \begin{figure*} {\em \fig{tree}.A: A learnt decision tree. Classes (right-hand-side), top-to-bottom, are ``high'', ``medhigh'', ``medlow'', and ``low'' This indicates median value of owner-occupied homes in \$1000's.} \begin{center} \includegraphics[height=5in,width=5in]{tree.eps} \end{center} {\em \fig{tree}.B: Treatments learnt in the same domain:} {\scriptsize \begin{center} \begin{tabular}{cccc} \multicolumn{1}{c}{}& & {\tt controllerH} & {\tt monitorH}\\ \multicolumn{1}{c}{}& baseline & (i.e. best action) & (i.e. diaster if..)\\\hline Treatment & nothing & $6.7 \leq RM < 9.8$ & $0.6 \leq NOX < 1.9$ \\ & & $\wedge 12.6 \leq PTRATION < 15.9 $ & $\wedge 17.16 \leq LSTAT < 39.0$ \\\hline &&&\\ Results& ~~~~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4}\sethspace{0.5} \setyaxis{0}{100}{25} \bar{21}{1} \bar{21}{2} \bar{29}{4} \bar{29}{8} \end{barenv} & ~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4}\sethspace{0.5} \setyaxis{0}{100}{25} \bar{0}{1} \bar{0}{2} \bar{3}{4} \bar{97}{8} \end{barenv} & ~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4}\sethspace{0.5} \setyaxis{0}{100}{25} \bar{98}{1} \bar{1}{2} \bar{1}{4} \bar{0}{8} \end{barenv}\\\hline Sample size &506& 38 & 81 \end{tabular} \end{center}} \caption{Comparing decision trees learnt by C4.5 and treatments learnt by the TAR2 treatment learner. Theories learnt from the 506 cases in HOUSING example set from the UC Irvine repository. This dataset has the class distribution shown in the bottom table, left-hand-side. KEY: LSTAT= lower status of the population; NOX= nitric oxides concentration (parts per 10 million); PTRATIO= pupil-teacher ratio by town; RM= average number of rooms per dwelling; KEY:~\protect\legend1{low}; \protect\legend2{medlow}; \protect\legend4{medhigh}; \protect\legend8{high}. }\label{fig:tree} \end{figure*} For example, \fig{tree} contrasts two theories learnt from the same data set using the TAR2 system discussed here and the C4.5 decision tree learner~\cite{quinlan92}. C4.5 has learnt a elaborate description of different kinds of houses in Boston. In contrast, TAR2 has learnt a minimal description of the {\em differences} between house types. These differences are described in terms of the {\em treatments}; i.e. a constraint on controllable variable that changes the class distribution. The {\em controller/monitor} treatments are TAR2's comments on what might most improve/degrade the current situation. TAR2's treatments for Boston housing are shown in the bottom table of \fig{tree}. In summary, if we focus on houses with seven to nine rooms in suburbs with parent/teacher ratios of 12.6 to 15.9, then the percentage of high quality houses we will see will change from 29\% to 97\%. Note that TAR2's theory is far smaller than C4.5's theory. C4.5's theory contains more details but TAR2's theory gives succinct advice on how to change the current situation. Our experience with business users is that they prefer find TAR2's simpler theories. As one user put it ``decision trees just tell you where you are, treatments tell you what to do''. TAR2's succinct theories will miss the complex interrelations that C4.5 might find. We have reasons to believe that many domains lack complex relationship. Many domains exhibit a curious {\em narrow funnel effect}; i.e. a small number of critical variables control the remaining variables within a system (the metaphor being that all processing runs down the same narrow funnel~\cite{me99a}). The concept of narrow funnels has been reported in many domains under a variety of names including: \bi \item {\em Master-variables} in scheduling~\cite{crawford94}; \item {\em Prime-implicants} in model-based diagnosis~\cite{rymon94an} or machine learning~\cite{rymon93setree}, or fault-tree analysis~\cite{lutz99}. \item {\em Backbones} in satisfiability~\cite{parkes99lifted,singer00backbone}; \item {\em the dominance filtering} used in Pareto optimization of designs~\cite{joseph98}; \item {\em Minimal environments} in the ATMS~\cite{deKleer86}; \item The {\em base controversial assumptions} of HT4~\cite{me97a}. \item The small {\em feature subset selection} effect~\cite{kohavi97} and the related {\em 1R} effect~\cite{holte93}.\ei Whatever the name, the core intuition in all these terms is the same: what happens in the total space of a system can be controlled by a small critical region. We have argued previously that narrow funnels are very common~\cite{me99q,me00b,me00d,me00t}. For systems with narrow funnels, the space of options within a large space reduces to just the range of a few variables within the narrow funnel. In such a reduced space, variables assignments outside the funnel are highly correlated to assignments within the funnel. Machine learning in such domains is very simple: an adequate theory need only comment on assignments to the variables that are highly correlated to funnel assignments. This paper uses the TAR2 system to check the merits of assuming narrow funnels for the purposes of machine learning. TAR2's distinguishing feature is that it performs very well, yet it is seems overly simplistic. The algorithm outputs only two rules: the best smallest controller and the best smallest monitor. If domains contain complex relationships, then these two small associations will be useless. The algorithm's runtimes are exponential on the size of the treatments. Hence, the algorithm makes the following ``small treatment assumption''; i.e. {\em adequate treatments can be built from small treatments}. If the small treatment assumption fails, then TAR2's exponential runtimes will make it impractically slow. Also, the algorithm relies on a {\em confidence1} measure which prunes the space of possible associations. The confidence1 measure we describe below has no special merit: it was merely the first one we could think of. Further, our initial implementation worked without algorithmic or memory management optimizations. Our only explanation for the surprising success of this simplistic implementation is that the small treatment assumption holds for the domains we studied. The rest of this article discusses TAR2. After an introductory example and a discussion of related work, the TAR2 algorithm is presented. This is followed by examples and evaluations and an analysis of the general applicability of our approach. \section{Related Work} If domains lack complex relationships, then it is possible that adequate theories can be learnt from small subsets of the available attributes. Various researchers have reported that this is indeed the case. For example Holte wrote a machine learner that was deliberately restricted to learning theories using a single attribute. Surprisingly, Holte found that learners that use many attributes such as C4.5 perform only moderately better this 1R algorithm~\cite{holte93}. Nevertheless, TAR2 does not use the 1R technique since our results show that best treatments may require more than one attribute. In other work, Kohavi and John {\em wrapped} their learners in a pre-processor that used a heuristic search to grow subsets from size 1. At each step in the growth, a learner was called to find the accuracy of the theory learned from the current subset. Subset grow was stopped when the addition of new attributes did not improve the accuracy. As shown in \fig{reduce}, spectacular reductions in the number of the attributes can be achieved, with only minimal lose of accuracy~\cite{kohavi97}. \begin{figure} \begin{center} {\scriptsize \begin{tabular}{r@{~}ccccccccccc}\\ &\multicolumn{11}{c}{number of attributes}\\\cline{2-12} before:& 10&13&15&180&22&8&25&36&6&6&6\\ after:& 2&2&2&11&2&1&3&12&1&1&2\\ reduction:& 80\%& 84\%& 87\%&94\%&90\%& 87\%& 88\%&67\%& 83\%&83\%&67\%\\\hline $\Delta$accuracy: & 0\%&6\%&5\%&4\%&2\%&1\%&0.5\%&0\%&-25\% &6\% &7\% \end{tabular}} \end{center} \caption{Feature subset selection using wrappers, hill-climbing, and ID3 (i.e. C4.5 with pruning disabled). The $\Delta$Accuracy figure is the difference in the accuracies of the theories found by ID3 the $before$ and $after$ attributes. From \protect\cite{kohavi97}.}\label{fig:reduce} \end{figure} Nevertheless, TAR2 does not use this technique since relevant feature selection with wrappers can be prohibitively slow since {\em each step} of the heuristic search requires a call to the learning algorithm. If TAR2 isn't 1R or wrappers, what is it? This rest of this section expands on the following definition. TAR2 is a {\em weighted-class minimal contrast-set} association rule learner that uses {\em confidence measures} but not {\em support-based pruning}. TAR2 learns treatments and the general form of a treatment is: {\footnotesize \begin{eqnarray}\nonumber R_1&if& Attr_1=range_1 \wedge Attr_2=range_2 \wedge ...\\\nonumber &then& good=more \wedge bad=less \\\nonumber R_2&if& Attr_1=range_1 \wedge Attr_2=range_2 \wedge ...\\\nonumber &then& good=less \wedge bad=more \end{eqnarray}} \noindent where $R_1$ is the controller rule; $R_2$ is the monitor rule; {\em good} and {\em bad} are sets of classes that the agent likes and dislikes respectively; and {\em more} and {\em less} are the frequency of these classes, compared against the current situation, which we call the {\em baseline}. The nature of these output rules distinguishes TAR2 from many other learning strategies. {\bf Association rule learning:} Classifiers like C4.5 and CART learn rules with a single attribute pair on the right-hand side; e.g. {\em class= goodHouse}. Association rule learners like APRIORI~\cite{agrawal94} and TAR2 generate rules containing multiple attribute pairs on both the left-hand-side and the right-hand-side of the rules. That is, classifiers have a small number of pre-defined targets (the classes) while, for association rule learners, the target is less constrained. General association rule learners like APRIORI input a set of $D$ transactions of items $I$ and return associations between items of the form $LHS \Rightarrow RHS$ where $LHS \subset I$ and $RHS \subset I$ and $LHS \cap RHS = \emptyset$. A common restriction with classifiers is that they assume the entire example set can fit into RAM. Learners like APRIORI are designed for data sets that need not reside in main memory. For example, Agrawal and Srikant report experiments with association rule learning using very large data sets with 10,000,000 examples and size 843MB~\cite{agrawal94}. However, just like Webb~\cite{webb00}, TAR2 makes the ``memory-is-cheap assumption''; i.e. TAR2 loads all it's examples into RAM. Specialized association rule learners like CBA~\cite{liu98} and TAR2 impose restrictions on the right-hand-side. For example, TAR2's right-hand-sides show a prediction of the {\em change} in the class distribution if the constraint in the left-hand-side were applied. The CBA learner finds {\em class association rules}; i.e. association rules where the conclusion is restricted to one classification class attribute. That is, CBA acts like a classifier, but can process larger datasets that (e.g.) C4.5. TAR2 restricts the right-hand-side attributes to just those containing criteria assessment. {\bf Weighted-learning:} Standard classifier algorithms such as C4.5~\cite{quinlan92} or CART~\cite{breiman84} have no concept of class {\em weighting}. That is, these systems have no notion of a {\em good} or {\em bad} class. Such learners therefore can't filter their learnt theories to emphasize the location of the {\em good} classes or {\em bad} classes. Association rule learners such as MINWAL~\cite{cai98}, TARZAN~\cite{me00e} and TAR2 explore {\em weighted learning} in which some items are given a higher priority weighting that others. Such weights can focus the learning onto issues that are of particular interest to some audience. For example TARZAN~\cite{me00e} swung through the decision trees generated by C4.5~\cite{quinlan92} and 10-way cross-validation. TARZAN returned the smallest treatments that occurred in most of the ensemble that {\em increased} the percentage of branches leading to some preferred highly weighted classes and {\em decreased} the percentage of branches leading to lower weighted class. TAR2 was as experiment with applying TARZAN's tree pruning strategies directly to the C4.5 example sets. The resulting system is simpler, fast to execute, and does not require calling a learner such as C4.5 as a sub-routine. {\bf Contrast sets:} Instead of finding rules that describe the current situation, association rule learners like STUCCO~\cite{bay99} finds rules that differ meaningfully in their distribution across groups. For example, in STUCCO, an analyst could ask "what are the differences between people with Ph.D. and bachelor degrees?". TAR2's variant on the STUCCO strategy is to combine contrast sets with weighted classes with minimality. That is, TAR2 treatments can be viewed as the smallest possible contrast sets that distinguish situations with numerous highly-weighted classes from situations that contain more lowly-weighted classes. {\bf Support-based pruning:} In the terminology of APRIORI, an association rule has {\em support} $s$ if $s\%$ of the $D$ contains $X \wedge Y$; i.e. $s=\frac{|X \wedge Y| }{|D|}$ (where $|X \wedge Y|$ denotes the number of examples containing both $X$ and $Y$). The {\em confidence} $c$ of an association rule is the percent of transactions containing $X$ which also contain $Y$; i.e. $c=\frac{|X \wedge Y|}{|X|}$. Many association rule learners use {\em support-based pruning} i.e. when searching for rules with high confidence, sets of items $I_i,... I_k$ are only be examined only if all its subsets are above some minimum support value. Support-based pruning is impossible in weighted association rule learning since with weighted items, it is not always true that subsets of {\em interesting} items (i.e. where the weights are high) are also interesting~\cite{cai98}. Another reason to reject support-based pruning is that it can force the learner to only miss features that apply to a small, but interesting subset of the examples~\cite{wang00}. {\bf Confidence-based pruning:} Without support-based pruning, association rule learners rely on confidence-based pruning to reject all rules that fall below a minimal threshold of adequate confidence. TAR2 uses {\em confidence1} pruning. \section{Confidence1 Pruning} TAR2 targets the attribute ranges that ``nudge'' a system away from undesired behavior and towards desired behavior. TAR2's score for each range is the {\em confidence1} measure. This value is high if a range occurs frequently in desired situations and infrequently in undesired situations. That is, if we were to impose this range as a constraint, then it would tend to "nudge" the system into better behavior. To find confidence1, we assume that we can access $\$class$; i.e. some numeric value assigned to $class$. The class with the highest value is the $best$ class. The $lesser$ classes are the set of all classes, less the $best$ class. Let $O[C,A.R]$ be the number of occurrences of some attribute range in some class $C$; i.e. \[O[C]_{A.R} = |A.R \wedge class=C \wedge D|\] To generate confidence1, we compare the relative frequencies of an attribute range in different classes. This comparison is weighted by the difference in the scores of the classes, and normalized by the total frequency count of the attribute range; i.e. \[ \frac{\sum_{C\in lesser}\left((\$best-\$C)*(O[best]_{A.R}-O[C]_{A.R})\right)}{|A.R \wedge D|} \] \begin{figure}[!t] {\small {\em \begin{center} \begin{tabular}{rccccc} \multicolumn{4}{c}{Items} & & Criteria \\\cline{1-4}\cline{6-6} outlook & temp($^o$F) & humidity & windy? & & class \\\cline{1-4}\cline{6-6} sunny& 85& 86& false& & none\\ sunny& 80& 90& true&& none\\ sunny& 72& 95& false&& none\\ rain& 65& 70& true& & none\\ rain& 71& 96& true& & none\\\cline{1-4}\cline{6-6} rain& 70& 96& false&& some\\ rain& 68& 80& false& & some\\ rain& 75& 80& false& & some\\\cline{1-4}\cline{6-6} sunny& 69& 70& false& &lots \\ sunny& 75& 70& true& &lots\\ overcast& 83& 88& false& & lots\\ overcast& 64& 65& true& & lots\\ overcast& 72& 90& true& & lots\\ overcast& 81& 75& false& &lots\\\cline{1-4}\cline{6-6} \end{tabular}\end{center}}} \caption{A log of some golf-playing behavior.} \label{fig:golfdata} \end{figure} For example, from the golf playing example of \fig{golfdata}, let us assume that the classes have been scored as follows: "lots"=8, "some"=4, "none"=2; i.e. "lots" is the $best$ class. The range {\em outlook=overcast} appears four, zero, and zero times when playing "lots", "some", and "none" golf (respectively). The confidence1 of {\em outlook=overcast} is therefore: \[\frac{((8-2)*(4-0))+((8-4)*(4-0))}{4+0+0}=10 \] \fig{dels} shows the range of confidence1 seen in \fig{golfdata}. The confidence1 ranges shown in black are outstandingly high; i.e. these are the values may generate the best control treatments. TAR2 forms its treatments by exploring subsets of the ranges with outstandingly high confidence1 values. \begin{figure} \begin{center} {\footnotesize \begin{tabular}{c} \begin{barenv} \setstyle{\sffamily} \setwidth{8} \setnumberpos{none} \setstretch{8}\sethspace{0.5} \setyaxis{0}{3}{1} \setxaxis{-5}{10}{1} \bar{0}{8} % -5 \bar{1}{1} %-4 \bar{0}{8} % -3 \bar{0}{8} % -2 \bar{1}{1} % -1 \bar{3}{1} % 0 \bar{3}{1} % 1 \bar{2}{1} % 2 \bar{0}{8} %3 \bar{0}{8} %4 \bar{1}{8} %5 \bar{1}{8} %6 \bar{0}{8} %7 \bar{0}{8} %8 \bar{0}{8} %9 \bar{1}{8} %10 \end{barenv} \end{tabular} } \end{center} \caption{Frequency of confidence1 generated from \fig{golfdata}. Assumes that numeric ranges have been divided into 3 \BANDS. Outstandingly high confidence1 values shown are in black. Y-axis is the number of ranges that have a particular confidence1 value.} \label{fig:dels} \end{figure} \begin{figure} \begin{center}\begin{minipage}{1.0\linewidth}{\scriptsize \begin{tabular}{l@{~}l@{~}p{2.5in}} \INPUT:& \DEE &The examples.\\ & \ITEMS & Attributes seen in the examples.\\ & \BEST &The best combination of criteria.\\ & \EN & Desired size of LHS.\\ & \PROMISING & Threshold for a useful attribute range.\\ & \SKEW & Threshold for acceptable number of \BEST\ entries in \TREATED.\\ & \BANDS &Number of divisions within continuous ranges.\\~\\ \OUTPUT: &\LHS& A conjunction of attribute ranges\\ &\RHS& a change in the class distributions\\ \end{tabular} \begin{alltt}{\scriptsize 01. \DEEONE \IS discretize(\DEE, \BANDS) 02. \TEMP \IS \BASELINE \IS frequency(\DEEONE) 03. \FOR \ATTRIBUTE in \ITEMS \DO 04. \FOR \RR \IN \ATTRIBUTE.\RANGES \DO 05. \IF confidence1(\ATTRIBUTE.\RR) \GE \PROMISING 06. \THEN \CANDIDATES \IS \CANDIDATES + \ATTRIBUTE.\RR\DONE\DONE 07. \FOR \CC \SOME \CANDIDATES \WHERE |\CC| = \EN \DO 08. \DEETWO \IS \CC \PLUS \DEEONE 09. \DISTRIBUTION \IS frequency(\DEETWO) 10. \IF \DISTRIBUTION>\TEMP \AND |\BEST\PLUS\DEEONE|/|\BEST\PLUS\DEETWO|>\SKEW 11. \THEN \DO\LHS \IS \CC 12. \RHS \IS compare(\BASELINE,\DISTRIBUTION) 13. \TEMP \IS \DISTRIBUTION\DONE\DONE 14. \IF (\LHS \NOE \AND \RHS \NOE) \THEN \RETURN (\LHS, \RHS) 15. \ELSE \RETURN "no treatment"} \end{alltt}}\end{minipage}\end{center} \caption{The TAR2 algorithm.}\label{fig:tar2} \end{figure} \section{Inside TAR2} TAR2 generates controller and monitor treatments. Monitors are generated using in same manner as generating controllers. However, before the monitor is generated, the scoring function for the criteria is reversed so TAR2 now seeks attribute ranges that {\em nudge} a system into worse behavior. The rest of this section discusses how to generate controllers. The TAR2 algorithm is shown in \fig{tar2}. The {\tt frequency} function counts the frequency of examples falling into different criteria. Using this function, a \BASELINE\ class distribution is collected from \DEE\ (this is used later to contrast different treatments) and copied to a \TEMP\ variable (this is used to store the best distribution seen so far). The {\tt compare} function compares two frequencies to generate reports like (e.g.) 43\% less "lots" and 5\% less "some" and 167\% more "none". The {\tt discretize} function divides the numeric ranges seen in the examples into \BANDS\ number of groups. TAR2 was originally designed using a very simple discretization policy; i.e. TAR2 sorts the known values and divides into \BANDS with (roughly) the same cardinality. It was anticipated that this policy would be too simplistic and would have to be improved. However, our empirical results (see below) were so encouraging that we were never motivated to do so. %XXX define ">" in the following %XXX missing values % XX best classs % skew prune Once a treatment is found, it is applied to the example set to create a \TREATED\ example set; i.e. all the examples that don't contradict the proposed treatment (see line 8). A "good" treatment includes most of the examples that have the \BEST\ criteria (e.g. in the golf example of \fig{golfdata}, \BEST= playing "lots" of golf). The \SKEW parameter is used at line 10 to reject "bad" treatments; i.e. those that don't contain enough of the \BEST\ criteria. For example, at \SKEW=5, at least 20\% of the \BEST\ criteria must appear in the treatment. TAR2 explores subsets of the \RANGES\ found in a set of examples \DEE\ (see line 7). Subset exploration is constrained to just the \RANGES\ with an outstandingly large confidence1 score (see line 5). Even with this restriction, there are still an exponential number of such subsets. Hence, to be practical, TAR2 must seek the {\em minimal} possible number of control actions and monitors. Accordingly, the user of TAR2 constrains its learning to rule conditions of size \EN, where $N$ is small (see line 7). Often, effective treatments can be found using $N\le 4$ which suggests that narrow funnels existed in the datasets used for our case studies. %XXX key point. the nudge distributions say that most % attributes don't matter %XXX refer to the monitor1 and monitor2 rules. note that % bad <> not good \begin{figure} \begin{center} \begin{minipage}{0.8\linewidth} {\footnotesize \begin{verbatim} controllerG if outlook=overcast then (230% more "lots" and no "some" and no "none"). monitorG if 90 <= humidity < 97 then (43% less "lots" and 5% less "some" and 167% more "none"). \end{verbatim}} \end{minipage} \end{center} \caption{Control and monitor rules found from \protect\fig{golfdata}. To control {\em outlook}, unscrupulous owners of golf courses could (e.g.) bribe radio announces to lie about the weather report.}\label{fig:cr} \end{figure} \begin{figure} {\small\begin{center} \begin{tabular}{cp{0.6in}cp{0.7in}cp{1.1in}} &baseline: &&controllerG: && monitorG:\\\hline &(from \fig{golfdata}) && $outlook$= $overcast$ && $humid =[90..97)$\\ &~~~~~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4} \setyaxis{0}{100}{25} \sethspace{0.4} \setnumberpos{down} \bar{36}{1} \bar{21}{4} \bar{43}{8} \end{barenv}&& ~~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4} \setyaxis{0}{100}{25} \sethspace{0.5} \setnumberpos{down} \bar{0}{1} \bar{0}{4} \bar{100}{8} \end{barenv} && ~~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4} \setyaxis{0}{100}{25} \sethspace{0.5} \setnumberpos{down} \bar{60}{1} \bar{20}{4} \bar{20}{8} \end{barenv} \end{tabular} \end{center} } \caption{Percentage of classes seen in different situations. The left-hand-side histogram is a report of the class frequencies seen in \protect\fig{golfdata}. The middle and right-hand-side histograms were generated by applying the treatments of \protect\fig{cr}. KEY:~\protect\legend1{none}; \protect\legend4{some}; \protect\legend8{lots}.} \label{fig:golft} \end{figure} ~\\ \section{Examples and Experiments} \subsection{Examples} The output of TAR2 describes constraints which, if applied to the dataset, may reject certain examples. For example, the {\tt controllerG} treatment of \fig{cr} contains the constraint $outlook=overcast$. If we reject all items in the golf dataset that contradicts this constraint, then our golfers now play "lots", "some", and "none" golf in 100\%, 0\%, and 0\% (respectively) of the constrained dataset (as shown in the middle histogram of \fig{golft}). The monitor rule {\tt monitorG} of \fig{cr} was generated in a similar manner; but with the scoring system reversed; i.e. "lots"=2, "some"=4, "none"=8. In this case, "none" is the ``$best$'' class and TAR2 will find a treatment that selects for less golf behavior; i.e. $90 \le humidity < 97$. After applying this constraint , the class distribution changes to the right-hand-side histogram of \fig{golft}. %\begin{center} %{\footnotesize %\begin{tabular}{cccccccccc} % &\multicolumn{4}{c}{$I$ = Items}&&\multicolumn{4}{c}{$C$ = Criteria}\\ %$D$ & $I_1$ & $I_2$ & ... & $I_i$ & & $C_1$ & $C_2$ & ... & %$C_c$\\\cline{2-5}\cline {7-10} %1 & $V_x$ & ? & ... & $V_w$ & & good & med & .. & ? \\ %2 & $V_y$ & $V_z$ & ... & ? & & bad & ? & ... & good \\ %\end{tabular} %} %\end{center} \subsection{Experiments}\label{sec:ex} This section discusses experiments with TAR2 where the leaner was assessed via two methods: \begin{description} \item[Xvals:] Standard N-way cross-validation studies. \item[Simulations:] Simulations showing how well TAR2's treatments can control or monitor some model. \end{description} \subsubsection{Xval Studies} Figure~\ref{fig:iris} to Figure~\ref{fig:carr} shows TAR2 executing over some samples from the UC Irvine repository ({\footnotesize \url{http://www.ics.uci.edu/~mlearn/}}). These figures display the effects of the treatment closest to the average improvement seen in a 10-way cross-validation study. Each figures show the class distributions as percentages and the domain classes are shown in a legend. In the legend, the heuristic worth assigned to each class is, top-to-bottom, worst-to-best. %XXX ying: check algorithm %XXX ying: that sentence \begin{figure} {\small\begin{center} \begin{tabular}{ cp{0.8in} cp{1.0in} p{1.5in} } &baseline &&controller &Legend\\\hline & &&$petal length=$& \\ & &&$[3.7..4.8)$& \\ &~~~~~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4} \setyaxis{0}{100}{25} \sethspace{0.4} \setnumberpos{down} \bar{33}{1} \bar{33}{4} \bar{33}{8} \end{barenv}&& ~~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4} \setyaxis{0}{100}{25} \sethspace{0.5} \setnumberpos{down} \bar{0}{1} \bar{0}{4} \bar{100}{8} \end{barenv} & \protect\legend1{setosa}\newline \protect\legend4{virginica}\newline \protect\legend8{v.color}\newline \end{tabular} \end{center} } \caption{Iris} \label{fig:iris} \end{figure} \begin{figure} {\small\begin{center} \begin{tabular}{cp{0.8in}cp{1.0in}p{1.5in}} &baseline &&controller &Legend\\\hline & &&$displacement$= $[68..101)$ & \\ & &&$horsepower$= $[46..78)$ & \\ &~~~~~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4} \setyaxis{0}{100}{25} \sethspace{0.4} \setnumberpos{down} \bar{27}{1} \bar{20}{2} \bar{27}{4} \bar{26}{8} \end{barenv}&& ~~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4} \setyaxis{0}{100}{25} \sethspace{0.5} \setnumberpos{down} \bar{0}{1} \bar{0}{2} \bar{30}{4} \bar{70}{8} \end{barenv} & \protect\legend1{low}\newline \protect\legend2{medlow}\newline \protect\legend4{medhigh}\newline \protect\legend8{high}\newline \end{tabular} \end{center} } \caption{Autompg} \label{fig:autompg} \end{figure} \begin{figure} {\small\begin{center} \begin{tabular}{cp{0.8in}cp{1.0in}p{1.5in}} & baseline &&controller&Legend\\\hline & &&$hieght$= $[34..86)$& \\ & &&$mean\_tr$= $[3.9..9.5)$& \\ &~~~~~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4} \setyaxis{0}{100}{25} \sethspace{0.4} \setnumberpos{down} \bar{90}{1} \bar{6}{2} \bar{1}{4} \bar{2}{7} \bar{2}{8} \end{barenv}&& ~~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4} \setyaxis{0}{100}{25} \sethspace{0.5} \setnumberpos{down} \bar{0}{1} \bar{0}{2} \bar{0}{4} \bar{0}{7} \bar{100}{8} \end{barenv} & \protect\legend1{text}\newline \protect\legend2{horz.line}\newline \protect\legend4{graphic}\newline \protect\legend7{vert.line}\newline \protect\legend8{picture}\newline \end{tabular} \end{center} } \caption{Page-block} \label{fig:pageblock} \end{figure} \begin{figure} {\small\begin{center} \begin{tabular}{cp{0.8in}cp{1.0in}p{1.5in}} &baseline &&controller &Legend\\\hline & &&$buying=low$ & \\ & &&$safety=high$ & \\ &~~~~~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4} \setyaxis{0}{100}{25} \sethspace{0.4} \setnumberpos{down} \bar{70}{1} \bar{22}{2} \bar{4}{4} \bar{4}{8} \end{barenv}&& ~~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4} \setyaxis{0}{100}{25} \sethspace{0.5} \setnumberpos{down} \bar{31}{1} \bar{19}{2} \bar{13}{4} \bar{38}{8} \end{barenv} & \protect\legend1{unacc}\newline \protect\legend2{acc}\newline \protect\legend4{good}\newline \protect\legend8{vgood}\newline \end{tabular} \end{center} } \caption{Car} \label{fig:car} \end{figure} \begin{figure} {\small\begin{center} \begin{tabular}{cp{0.8in}cp{1.0in}p{1.5in}} &baseline &&monitor &Legend\\\hline & &&$person=2$ & \\ & &&$safety=low$ & \\ &~~~~~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4} \setyaxis{0}{100}{25} \sethspace{0.4} \setnumberpos{down} \bar{70}{1} \bar{22}{2} \bar{4}{4} \bar{4}{8} \end{barenv}&& ~~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4} \setyaxis{0}{100}{25} \sethspace{0.5} \setnumberpos{down} \bar{100}{1} \bar{0}{2} \bar{0}{4} \bar{0}{8} \end{barenv} & \protect\legend1{unacc}\newline \protect\legend2{acc}\newline \protect\legend4{good}\newline \protect\legend8{vgood}\newline \end{tabular} \end{center} } \caption{Car: reversing the class scoring.} \label{fig:carr} \end{figure} In \fig{iris}, TAR2 was told that the worth of each type of flower was (in increasing order) {\em setosa}, {\em viginica}, then {\em v.color}. TAR2 then learnt that $3.7\le petal length < 4.8$ would select for the flower with highest worth (i.e. {\em v.color}). Similarly, in \fig{autompg}, TAR2 learnt a selector that favored high quality cars. By restricting engine size to $68\le displacement<101$ and $46\le horsePower<78$, the ratio of high quality cars increased from 26\% to 70\%. Further, the low and medium low cars have disappeared. In \fig{pageblock}, TAR2 learnt a specialized feature extractor for finding pictures mixed in with text, horizontal lines, vertical lines, and graphics. According to TAR2, a height between 34 to 86, and a mean number of white-black transitions between 3.9 and 9.5 will locate text blocks, and nothing else. In the car domain of \fig{car}, most of the classes are non-best. The average best controller seen in the 10-way cross-validation for the car domain was {\em buying=low} and {\em safety=high}. While this controller increases the frequency of very good cars from 4\% to 38\%, this controller still leaves us with 31\% unacceptable cars. While this controller is weak, the monitor obtained by reversing the class scoring is very strong. \fig{carr} shows that monitor: if we select two person cars with low safety, then 100\% of the cars are unacceptable. That is, when the best class occurs rarely in the dataset, TAR2 may be better at finding methods to {\em degrade} a system, rather than {\em improve} it. \subsubsection{Simulation Studies} Another way to assess TAR2 is to test how well it can control some model. To perform such an assessment, we (i)~generated data sets from some model; (ii)~apply TAR2 to find treatments from those data set; (iii) imposed those treatments as constraints on the model; (iv) ran the model a second time; (v) compared the outputs of the second run to the predictions made by TAR2. In our first two simulations studies, a {\em baseline} class distribution was used by TAR2 to generate a best controller and a prediction of how this best controller would change the class distribution. We call the predicted distribution the {\em treated} distribution. The {\em actual} distribution was the class distribution seen after the best controller was imposed on the model and the model executed again. In \fig{run3} and \fig{run2}, the treated distribution matches the result distribution almost exactly; i.e. TAR2 accurately predicted the effects of the controller treatment. \fig{run3} was generated from a model of software project risk. This risk model was implemented as part of the COCOMO project. The goal of the COCOMO project is to build an open-source software cost estimation model~\cite{cocomoII}. Internally, the model contains a matrix of parameters that should be tuned to a particular software organization. Using COCOMO-II, the Madachy risk model can assess the risk of a software cost over-run~\cite{madachy97}. For machine learning purposes, the goal of using the Madachy model is to find a change to a description of a software project that reduces the likelihood of a poor risk software project~\cite{me00e,me01f}. In the experiment shown in \fig{run3}, the model was executed 30,000 times using randomly selected inputs. When the treatments learnt from TAR2 treatments were imposed on model inputs, and the model was executed again, all the high risk projects were removed, the percentage of medium risk projects was significantly reduced, and the percentage of low risk projects was tripled. \begin{figure} {\footnotesize\begin{center} \begin{tabular}{cp{0.8in}cp{0.8in}cp{0.8in}} &baseline &&treated && actual\\\hline &~~~~~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4} \setyaxis{0}{100}{25} \sethspace{0.4} \setnumberpos{down} \bar{0}{1} \bar{20}{2} \bar{60}{4} \bar{20}{8} \end{barenv}&& ~~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4} \setyaxis{0}{100}{25} \sethspace{0.5} \setnumberpos{down} \bar{0}{1} \bar{0}{2} \bar{40}{4} \bar{60}{8} \end{barenv}&& ~~\begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.4} \setyaxis{0}{100}{25} \sethspace{0.5} \setnumberpos{down} \bar{0}{1} \bar{0}{2} \bar{40}{4} \bar{60}{8} \end{barenv} \end{tabular} \end{center} } \caption{COCOMO key:~\protect\legend1{very high risk}; \protect\legend2{high risk}; \protect\legend4{medium risk}; \protect\legend8{low risk}.} \label{fig:run3} \end{figure} \begin{figure}[!b] \begin{center} {\small \begin{tabular}{rl} baseline: & \begin{barenv} \setstyle{\sffamily} \setwidth{8} \setnumberpos{none} \setstretch{0.5}\sethspace{0.5} \setyaxis{0}{45}{20} \setxaxis{0}{9}{1} \bar{0}{8} % 0 \bar{15}{8} % 1 \bar{45}{8} % 2 \bar{20}{8} % 3 \bar{5}{8} % 4 \bar{5}{8} % 5 \bar{3}{8} % 6 \bar{2}{8} % 7 \bar{0}{8} %8 \bar{0}{8} %9 \bar{0}{8} \end{barenv}\\~\\~\\ %10 treated: & \begin{barenv} \setstyle{\sffamily} \setwidth{8} \setnumberpos{none} \setstretch{0.5}\sethspace{0.5} \setyaxis{0}{45}{20} \setxaxis{0}{9}{1} \bar{0}{8} % 0 \bar{0}{8} % 1 \bar{0}{8} % 2 \bar{0}{8} % 3 \bar{5}{8} % 4 \bar{32}{8} % 5 \bar{45}{8} % 6 \bar{15}{8} % 7 \bar{0}{8} %8 \bar{0}{8} %9 \bar{0}{8} \end{barenv}\\~\\~\\ actual: & \begin{barenv} \setstyle{\sffamily} \setwidth{8} \setnumberpos{none} \setstretch{0.5}\sethspace{0.5} \setyaxis{0}{45}{20} \setxaxis{0}{9}{1} \bar{0}{8} % 0 \bar{0}{8} % 1 \bar{0}{8} % 2 \bar{0}{8} % 3 \bar{5}{8} % 4 \bar{32}{8} % 5 \bar{45}{8} % 6 \bar{15}{8} % 7 \bar{0}{8} %8 \bar{0}{8} %9 \bar{0}{8} \end{barenv} \end{tabular}} \end{center} \caption[Round1]{Circuit. X-axis denotes number of bulbs glowing in the circuit. }\label{fig:run2} \end{figure} \fig{run2} shows TAR2 controlling a qualitative description of an electrical circuit. A qualitative description of a circuit of 47 wires connecting 9 light bulbs and 16 other components was coded in Prolog. The model was expressed as a set of constraints; e.g. the {\tt sum} of the voltages of components in series is the {\tt sum} of the voltage drop across each component. The goal of the circuit was to illuminate a space using the 9 light bulbs. The circuit is qualitative and qualitative mathematics is nondeterministic; e.g. sum of a negative and a positive value is unknown. The problem with the circuit was out-of-control nondeterminism. On backtracking, this circuit generated 35,228 different solutions to the constraints. In many of these solutions, the circuit was unacceptably dark: only two bulbs glowing, on average (see the top histogram of \fig{run2}) . The goal of the machine learning was hence to find a minimal set of changes to the circuit to increase the illumination~\cite{me01g}. \fig{run2} shows the distribution of the frequency with which bulbs glowed in a qualitative circuit description. The behavior of qualitative circuits is notoriously hard to predict~\cite{clancy97} but TAR2 found two actions on the circuit that trebled the average number of bulbs that glowed (see the {\em treated} and {\em actual} plot of \fig{run2}). \begin{figure} \begin{center} \includegraphics[angle=270,width=3in]{jpl.eps} \end{center} \caption{Results from the satellite domain. The dots below the line show the initial output of the model: note the very large spread in the costs and benefits. The dots above the line show the final outputs of the model after 5 iterations of TAR2 learning.}\label{fig:jpl} \end{figure} \fig{jpl} shows a third simulation study with TAR2. Analysts at the NASA Jet Propulsion Laboratory debate satellite design by building a semantic network connecting design decisions to satellite requirements~\cite{fea00}. Each edge is annotated with the numeric cost and benefits of taking some action. Some of these nodes represent base decisions within the project (e.g. selection of a particular type of power supply). Each set of decisions has an associated cost. The net can be executed by selecting actions and seeing what benefits results. One such network included 90 possible actions; i.e. $2^{99}\approx 10^{30}$ combinations of actions. Note the black line, top-left, of \fig{jpl}. All the dots below this line were generated via 10,000 random selections of the decisions, and the collection of their associated costs and benefits. All the dots above this line represent high benefit, low cost projects found by TAR2~\cite{fea02a}. In this application, TAR2 was used as a knowledge acquisition tool. After each run of TAR2, the proposed best controller was debated with the analysts. Each run, and its associated debate, resulted in a new set of constraints for the semantic net. The new constraints were then imposed on the model before the next run. After five runs, TAR2 found 30 decisions (out of 99) that crucially effected the cost/benefit of the satellite. Note that this means TAR2 also found 99-30=67 decisions that could be safely ignored. For comparison purposes, a genetic algorithm (GA) was also applied to the \fig{jpl} domain~\cite{fea02a}. The GA also found decisions that generated high benefit, low cost projects. However, each such GA solution commented on every possible decisions and there was no apparent way to ascertain which of these are the most critical decisions. The TAR2 solution was deemed superior to the GA solution by the domain experts, since the TAR2 solution required just 30 actions rather than the 99 demanded by the GA. Note that the \fig{jpl} case study is not a counter example to our thesis that most domains have narrow funnels. That study adopted the incremental approach for reasons of convenience. JPL's semantic net simulator was too slow to generate enough examples at one run. Hence, an incremental generate-and-constrain approach was taken. %XXX scale up unknown . ok if the larger data sets have a small left tail and that is future work % all the examples discussed in the %XXX add in the COCOMO and the QR results \section{Generality} \begin{figure}[!t] \begin{center}{\scriptsize \begin{tabular}{r@{:}lrccccc} \multicolumn{1}{c}{} & & \multicolumn{4}{c}{Attributes} & treatment&runtime\\\cline{3-6} source&domain&\# examples&\#continuous&\#discrete&\#classes& size& (secs)\\\hline UCI&iris & 150 & 4 & 0 & 3 &1 & $<$1\\\hline UCI&wine & 178 & 13 & 0 & 3 & 2 & $<$1\\\hline UCI&car & 1,728 & 0 & 6 & 4 & 2 & $<$1\\\hline UCI&autompg & 398 & 6 & 1 & 4 & 2 & 1\\\hline UCI&housing & 506 & 13 & 0 & 4 & 2 & 1\\\hline UCI&page blocks & 5,473 & 10 & 0 & 5 & 2 & 2\\\hline here&circuit &35,228 & 0 & 18 & 10 & 4 & 4\\\hline here&cocomo &30,000 & 0 & 23 & 4 & 1 & 2\\\hline here&satellite &30,000 & 0 & 99 & 9 & 5 &86\\\hline other &reachness &25,000 & 4 & 9 & 4 & 2 &3\\\cline{3-8} & &250,000 & 4 & 9 & 4 & 1 &23\\\hline \end{tabular}}\end{center} \caption{Runtimes for TAR2 on different domains (on a 333MHz Windows machine with 200MB of ram). ``UCI'' denotes data sets from the machine learning repository at UC Irvine. ``Here'' denotes data sets taken from this article. ` The text discusses experiments with 10,000 examples from the satellite domain. This table shows a larger case study of 30,000 examples. `Other'' denotes a data set taken from~\protect\cite{me01h}. }\label{fig:runtimes} \end{figure} This section is an algorithmic assessment of TAR2. Such an algorithm assessment comments on TAR2's ability to scale to larger domains. \fig{runtimes} reports TAR2 runtimes on data sets of different sizes. \fig{ying} shows three studies where the size of the treatments ($N$, from line 7 in \fig{tar2}) was held constant, and the size of the dataset was increased. \fig{deltachanges} shows one study were the size of the dataset was held constant and the size of the treatments was increased. Note that: \begin{figure}[!t] \begin{center} \includegraphics{ying.eps} \end{center} \caption[Round1]{Increasing size of dataset and size of treatments. Datasets generated from the COCOMO model.}\label{fig:ying} \end{figure} \begin{figure} \begin{center} \includegraphics{deltachanges.eps} \end{center} \caption[Round1]{For different treatment sizes $N$, Increasing size of treatments, keeping data set size constant (3MB). Dataset generated from the COCOMO model.}\label{fig:deltachanges} \end{figure} \begin{enumerate} \item TAR2 can handle small to medium sized datasets. For example, the algorithm learnt effective treatments in 23 seconds from a dataset containing size 250,000 examples: see the {\em reachness} domain in \fig{runtimes}. \item TAR2 has the potential to scale to large datasets. Assuming constant treatment size, TAR2's runtimes are linear on dataset size: see~\fig{ying}. \item However, the algorithm is exponential on treatment size: see the marked increase in the runtimes between N=2 and N=3 in \fig{ying} and the log-linear plot of \fig{deltachanges}. \end{enumerate} The exponential impact of increasing treatment size is not necessarily a reason to reject TAR2. Firstly, if very large treatments are required, then a incremental treatment learning approach, such as used in the satellite case study of \fig{jpl}, may suffice. Secondly, if most domains don't need large treatments, then this exponential impact will not be seen in practice. Elsewhere, we have made an average case mathematical analysis of the ratio of the odds of a domain narrow funnels to the odds of larger funnels. Under a wide variety of assumptions, the same effect holds: the odds of narrower funnels are millions of times more likely that wider funnels~\cite{me00t}. Such a statistical analysis represents an average case result and may not apply to a particular domain. What would be useful would be some kind of assessment tool that checks if this average case statistical result applies to a particular domain. \begin{figure} \begin{center} {\small \begin{tabular}{c} \begin{barenv} \setstyle{\sffamily} \setwidth{8} \setnumberpos{none} \setstretch{0.84}\sethspace{0.5} \setyaxis{0}{64}{15} \setxaxis{1}{8}{1} \bar{5}{1} %-4 \bar{5}{1} % -3 \bar{5}{1} % -2 \bar{5}{1} % -1 \bar{8}{1} % 0 \bar{16}{1} % 1 \bar{32}{1} \bar{64}{1} \end{barenv} \end{tabular} } \end{center} \caption{A hypothetical confidence1 frequency distribution with a large left tail that is inconsistent with narrow funnels. Note: yet to be observed in any example set.} \label{fig:badnews} \end{figure} \begin{figure*} \begin{center} {\tiny \begin{tabular}{p{1.4in}p{1.1in}p{1.1in}p{1.1in}} {\em \fig{nudges}.i: housing}\newline {\tiny \begin{barenv} \setstretch{5.4} \setstyle{\sffamily} \setwidth{8} \setnumberpos{none} \setyaxis{0}{10}{2} \sethspace{0.2} \setxaxis{-10}{22}{4} \bar{8}{1} \bar{8}{1} \bar{8}{1} \bar{10}{1} \bar{9}{1} \bar{3}{1} \bar{1}{1} \bar{2}{1} \bar{2}{1} \end{barenv}} & {\em \fig{nudges}.ii: page-blocks}\newline {\tiny \begin{barenv} \setstretch{2.7} \setstyle{\sffamily} \setwidth{8} \setnumberpos{none} \sethspace{0.2}\setyaxis{0}{20}{4} \setxaxis{-29}{32}{8} \bar{20}{1} \bar{7}{1} \bar{0}{1} \bar{1}{1} \bar{0}{1} \bar{0}{1} \bar{1}{1} \bar{1}{1} \end{barenv}}& {\em \fig{nudges}.iii: wine}\newline {\tiny \begin{barenv} \setstretch{2.7} \setstyle{\sffamily} \setwidth{8} \setnumberpos{none} \setyaxis{0}{20}{5} \sethspace{0.2}\setxaxis{-4}{11}{2} \bar{13}{1} \bar{17}{1} \bar{15}{1} \bar{7}{1} \bar{5}{1} \bar{3}{1} \bar{2}{1} \bar{3}{1} \end{barenv}} & {\em \fig{nudges}.iv: car}\newline {\tiny \begin{barenv} \setstretch{5.4} \setstyle{\sffamily} \setwidth{8} \setnumberpos{none} \setyaxis{0}{10}{3} \sethspace{0.2} \setxaxis{-13}{-7}{1} \bar{5}{1} \bar{2}{1} \bar{3}{1} \bar{3}{1} \bar{6}{1} \bar{1}{1} \bar{1}{1} \end{barenv}} \\~\\ {\em \fig{nudges}.v: satellite}\newline {\tiny \begin{barenv} \setstretch{0.45}\sethspace{0.5} \setstyle{\sffamily} \setwidth{8} \setnumberpos{none} \setyaxis{0}{120}{30} \setxaxis{-350}{0}{50} \bar{1}{1} \bar{11}{1} \bar{46}{1} \bar{118}{1} \bar{16}{1} \bar{0}{1} \bar{4}{1} \bar{8}{1} \end{barenv}} & {\em \fig{nudges}.vi: QR circuit}\newline{\tiny \begin{barenv} \setstyle{\sffamily}\sethspace{0.2} \setwidth{8} \setnumberpos{none} \setstretch{0.154} \setyaxis{0}{350}{100} \setxaxis{4}{9}{1} \bar{144}{1} \bar{226}{1} \bar{280}{1} \bar{104}{1} \bar{17}{1} \bar{10}{1} \end{barenv}} & {\em \fig{nudges}.vii: cmm2}\newline {\tiny \begin{barenv} \setstretch{0.25}\sethspace{0.2} \setstyle{\sffamily} \setwidth{8} \setnumberpos{none} \setyaxis{0}{220}{50} \setxaxis{1}{14}{2} \bar{20}{1} \bar{147}{1} \bar{213}{1} \bar{212}{1} \bar{10}{1} \bar{20}{1} \bar{10}{1} \end{barenv}} & {\em \fig{nudges}.viii: cocomo2}\newline {\tiny \begin{barenv} \setstyle{\sffamily} \setwidth{8} \setstretch{0.72} \setnumberpos{none} \sethspace{0.2} \setyaxis{0}{75}{25} \setxaxis{-13}{9}{4} \bar{1}{1} \bar{3}{1} \bar{4}{1} \bar{68}{1} \bar{49}{1} \bar{9}{1} \end{barenv}} \end{tabular}} \end{center} \caption{Confidence1 distributions seen in eight domains. Y-axis is the number of times a particular confidence1 was seen. Top row comes from datasets taken from the UC Irvine repository. Bottom row were generated from other domains discussed in this article.}\label{fig:nudges} \end{figure*} The confidence1 distribution can be used to test for narrow funnels. Domains that contain such funnels would exhibit the following property: a small number of variables within the funnel exert a disproportionately large influence on the overall behavior of the system. A test for such variables is to check for {\em small right tails} in the confidence1 distributions. \fig{golft} has such a {\em small right tail}; i.e. the bulk of the distribution lies away from the maximum value. Distributions with a {\em large right tail} such as \fig{badnews} are not consistent with narrow funnels. \fig{nudges} shows the confidence1 distributions seen in eight example sets: four from the UC Irvine repository and some of the other domains described above. Note that in all cases, the distribution has a small right tail; i.e. a small number of variables exert a disproportionately large influence on the overall behavior of the system. In all, we have applied TAR2 to 20 domains: the ones discussed in this paper and others not shown for space reasons. In none of those domains have we observed a large right tail. \section{Conclusion} The minimal theories of TAR2 will be inadequate if domains contain complex relationships. Domains with narrow funnels are not complex: the key controllers for the whole space are merely the few variables in the funnel. The TAR2 association rule learner is both a test and an application of funnel theory. TAR2 offers two tests for narrow funnels. Firstly, a confidence1 distribution with a small right tail is consistent with a domain containing narrow funnels. Secondly, if a domain contains narrow funnels, then TAR2 should be able to generate adequate controllers and monitors for that domain. All the domains we have seen to date have these two features. The open issue is how many other domains lack complex relationships. Based on around 20 case studies with TAR2 (some of which were reported above), Holte's prior work with 1R, and the wrapper studies of Kohavi and John, we have some empirical reasons to believe that many domains are not complex. Also, we have theoretical reasons for believing that narrow funnels are common~\cite{me99q,me00b,me00d,me00t}; i.e. domains with complex relationships are rare and TAR2 will often suffice. The success of such a simple algorithm such as TAR2 suggests that it can be fruitful to {\em first} try lightweight methods {\em before} exploring heavyweight methods. We hence advocate using TAR2 as a preprocessor to other, more elaborate schemes. TAR2 is available for download from \url{http://www.ece.ubc.ca/twiki/bin/view/Softeng/TreatmentLearner}. %XXX note: runtimes worse than exponential as |lhs| increases % and less than exponential as sample size increases % and assuming small rules, then we're in! %not done: effect of unknowns, effects of increasing #attribtus, #classes, %#examples. and some control for solution quality. kidna like the % APRIORI paper \acknowledgements This research was conducted at the University of British Columbia and the West Virginia University, partially under NASA contract NCC2-0979. The work was partially sponsored by the NASA Office of Safety and Mission Assurance under the Software Assurance Research Program led by the NASA IV\&V Facility. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not constitute nor imply its endorsement by the United States Government. % The endnotes section will be placed here. \theendnotes \bibliographystyle{../../tex/kluwer/klunamed} %! remind me to send you my refs.bib file %! you'll have to change this line to wherever your store your refs %! to get the references in, you have run latex once, BibTeX once, then latex twice %! (to get the reference numbers right). so stupid... \bibliography{../../refs} \end{article} \end{document}