5d4
< \usepackage{amssymb}
53,57c52,54
< Our goal is intelligent adapative  controllers of running
< programs that are monitored by
< temporal logic invariants. We want to find  bugs, or ways
< to dodge them. Hence, we want 
< data miners to watch
---
> Our goal is intelligent adapative runtime monitoring of procedural
> programs augmented with temporal logic invariants. We want
> data miners to monitor
59c56
< constraints to input ranges which increase/decrease the number of logic
---
> constraints to input ranges which increase the number of logic
148,149c145
< experiments on 20 UCI data sets~\cite{Blake+Merz:1998}:  {\em 
< \{A:heart-c,
---
> experiments on 20 UCI data sets~\cite{Blake+Merz:1998}.  A:heart-c,
153,160c149,150
< T:housing\}}. Data sets A..J have discrete classes and are scored
< via the {\em accuracy} of the learned theory; i.e 
< \% successful classifications.
< Data sets K..T
< have continuous classes and are scored by the {\em PRED(30)} of the learned
< theory; i.e. what \% of the estimated values are within 30\% of the actual 
< value. 
< Data sets are sorted according to how many
---
> T:housing. Data sets A..J have discrete classes while Data sets K..T
> have continuous classes. Data sets are sorted according to how many
277c267
< number of $era$s of size $E$; i.e. $W=nE$ (default: {\em E=150} instances).
---
> number of $era$s of size $E$; i.e. $W=nE$ (default: {\em E=100} instances).
291c281
< \frac{S_j - (\frac{\sum_i S_x}{\sum_i E}*E)}{
---
> \frac{\frac{S_j}{E} - \frac{\sum_i S_x}{\sum_i E}}{
377c367
< numeric attribute using a single gaussian.
---
> numeric attribute using a single gaussian. 
387,391c377,379
< John and Langley comment that their method
< must access all the individual numeric values to build their
< kernel estimator and this is impractical for large data sets.
< Many discretization methods violate the  {\em one scan} 
< requirement of a data miner: i.e. execute using only
---
> Many kernel estimation and discretization methods violate the {\em one scan} 
> requirement of a data miner;
> i.e. learning needs only
406c394,396
< 
---
> Similarly, John and Langley's kernel estimation method 
> can't build its distribution
> until {\em after} seeing all the data.
410d399
< \section{Handling Numeric Attributes with SPADE}
411a401
> \section{Handling Numeric Attributes with SPADE}
421,435c411,413
< SPADE only scans the input data once and,
< at anytime during the processing
< of $X$ instances, SPADE's bins are available. 
< Further, if it ever adjusts bins (e.g. when merging bins with very small tallies),
< the information used for that merging comes
< from the bins themselves, and not some second scan of the instances.
< Hence, it can be used  for the incremental processing of very
< large data sets. 
< 
< N\aive Bayes classifiers usually make some gaussian assumption about the underlying
< numeric distributions.
< SPADE makes no such assumptions.
< It is similar to {\em 10-bins} but the MIN and MAX change incrementally. 
< The first value $N$ creates one bin and sets \{MIN=N,MAX=N\}. 
< If a subsequent new value arrives inside the current
---
> SPADE is like {\em 10-bins} but the MIN and MAX change incrementally. The first
> value $N$ creates one bin and sets \{MIN=N,MAX=N\}. If a subsequent
> new value arrives inside the current
467,489d444
< Preventing the creation of very few bins with big tallies is essential
< for a practical 
< incremental discretizer.
< Hence, SPADE checks for merges only occasionally (at the end of each era),
< thuss allowing for the generation of multiple bins before they are merged. 
< 
< SPADE runs as a pre-processor to {\tt update} to N\aive Bayes. Newly arrived
< numerics get placed into bins and it is this   bin number that is used
< as the $value$ is passed to {\tt update} or \fig{class}. Also, when SPADE 
< merges bins, this causes a similar merging in frequncy tables entries
< (the $F$ variable of \fig{class}). 
< 
< The opposite of merging would be to {\em split} bins with unusually
< large tallies.  SPADE has no split operator since we did not know how
< to best divide up a bin {\em without} keeping per-bin kernel
< estimation data (which would be memory-expensive).  
< Our early experiments suggested that adding
< $SubBins=5$ new bins between old ranges and newly arrived out-of-range
< values was enough to adequately divide the range. Our subsequent
< experiments (see below) were so encouraging that we are not motivated
< to add a split operator.
< 
< 
511a467,475
> SPADE only scans the input data once and,
> at anytime during the processing
> of $X$ instances, SPADE's bins are available. 
> Further, if it ever adjusts bins (e.g. when merging),
> then the information used for that merging comes
> from the bins themselves, and not some second scan of the instances.
> Hence, it can be used  for the incremental processing of very
> large data sets.
> 
512a477,535
> \begin{figure}[!b]
> {\scriptsize
> \begin{center}
> \includegraphics[width=1.4in]{data/kdd.pdf}
> \includegraphics[width=1.4in]{data/kddbands.pdf}
> \end{center}}
> \caption{SAWTOOTH and the KDD'99 data}\label{fig:kdd99}
> \end{figure}
> 
> \begin{figure*}
> \begin{center}
> \begin{minipage}{.6\linewidth}
> \begin{center}
> {\scriptsize
> \begin{tabular}{r@{~}r|@{~}l|l@{}c@{~}l@{~}c@{~}l@{}c@{}lc@{}r|}\cline{3-9}
> \multicolumn{2}{c}{}& \multicolumn{1}{|c|}{incremental} &\multicolumn{6}{c|}{batch}\\\cline{3-9}
> data set & \#instances & SAWTOOTH & NB & & nbk&&j48\\\hline
> \input{data/10way.dat}
> \end{tabular}}
> \end{center}
> \end{minipage}\hfill
> \begin{minipage}{.35\linewidth}
> \begin{center}
> \includegraphics[width=2in]{data/10way.pdf}
> ~\\
> ~\\~\\~\\
> {\scriptsize
> \begin{tabular}{|c|c|c|c|c|c|c|} \hline
> learner & win - loss & win & loss & ties \\ \hline
> J48        &     10 &     25 &     15 & 20 \\
> nbk        &      8 &     17 &      9  & 34\\
> SAWTOOTH        &     -6 &     12 &     18 & 30\\
> NB        &    -12 &      9 &     21 & 30 \\
> \hline
> \end{tabular}
> %~\newline~\newline
> }\end{center}
> \end{minipage}
> \end{center}
> \caption{$mean \pm standard\; deviations$
> seen in 10*10-way cross validation
> experiments on UCI Irvine data sets.
> ``NB'' and ``nbk'' denote  N\aive Bayes
> classifiers that use gaussians 
> model continuous attributes.  ``NB'' uses a single
> gaussian while ``nbk'' uses a sum of gaussians in the method
> recommended by John and Langley~\cite{john95}.
> The plot top-right sorts the differences in the accuracies
> found by SAWTOOTH and all the other learners.
> Some of those differences  aren't statistically significant:
> the  ``+'' or ``-'' in the left-hand-side table denote 
> mean differences that are significantly different
> to SAWTOOTH 
> at the $\alpha=0.05$ level.
> The significant differences between
> all the learners are shown in
> the win-loss statistics of the bottom-right table.
>  }\label{fig:uci}
> \end{figure*}
514c537
< \fig{spade} compares results from SPADE and John and Langley's kernel estimation method
---
> \fig{spade} compares results  from SPADE and John and Langley's kernel estimation method
536,537d558
< 
< 
549,552c570,573
< were still within 3\% of kernel estimation.
< In our view, the advantages
< of SPADE (incremental, one scan processing, distribution independent) compensates 
< for its occasional performing worse
---
> were still within 3\% of kernel estimation. 
> In our view,
> the advantages
> of SPADE (incremental, one scan processing) compensates for its occasional performing worse
558a580
> 
574a597
> 
577,587c600,618
< In order to stress test our system, we ran it on the 5,300,000
< instances used in the 1999 KDD
< cup\footnote{\url{http://www.ai.univie.ac.at/~bernhard/kddcup99.html}}.
< KDD'99 dealt with network intrusion detection and was divided
< into a training set of about five million instances and a {\em test
< set} of 311,029 instances.  The data comprised 6 discrete attributes,
< 34 continuous attributes, and 38 classes which fell into four main
< categories: {\em normal} (no attack); {\em probe} (surveillance and
< other probing); {\em DOS} (denial-of-service); {\em U2R} (unauthorized
< access to local super-user privileges); and {\em R2L} (unauthorized
< access from a remote machine).
---
> In order to stress test our system,
> we ran it on the 5,300,000 instances
> used in the 1999 KDD cup\footnote{\url{http://www.ai.univie.ac.at/~bernhard/kddcup99.html}}.
> The KDD data dealt with
> network intrusion
>  detection and was
> divided into
> a  training set
> of about five million instances and a {\em
> test set} of 311,029 instances.
> The data comprised
> 6 discrete attributes,  34 continuous attributes, and 
> 38 classes which
> fell into four main
> categories: {\em normal} (no attack);
> {\em probe} (surveillance and other probing);
> {\em DOS} (denial-of-service);
> {\em U2R} (unauthorized access to local super-user  privileges);
> and {\em R2L} (unauthorized access from a remote machine). 
596,697c627,651
< \fig{kdd99} shows all the sorted {\em mean M*C scores}
< from the KDD'99 entrants.  
< Also shown in that figure is  SAWTOOTH's mean \mbox{$M*C$} result.
< SAWTOOTH's results 
< were close to the winning score of entrant \#1; very similar to
< entrants 10,11,12,13,14,15,16; and much better than entrants
< 18,19,20,21,22,23,24.  These results are encouraging since SAWTOOTH is
< a much simpler tool that many of the other
< entries. For example, the winning entrant 
< took several runs to divide the data into smaller
< subsets and buid an ensemble of 50x10 C5 decision trees using an
< intricate cost-sensitive bagged boosting technique.  This took more
< than a day to terminate on a dual-processor 2x300MHz Ultra-Sparc2
< machine with 512MB of RAM using the commercially available
< implementation of C5, written in ``C''.  In contrast, our toolkit
< written in interpreted scripting langauges (gawk/bash),
< processed
< all 5,300,000 instances in one scan of  the data using less than 3.5
< Megabytes of memory. This took  11.5 hours on a 2GHz Pentium 4, with 500MB of
< RAM, runing Windows/Cygwin
< and we conjecture that that this 
< runtime could be greatly reduced by porting our toolkit to ``C''.
< 
< Another encouraging result is the {\em \# bins with tally=X} plot of
< \fig{kdd99}. One concern with SPADE is that several of its internal
< parameters are linked to the number of processed instances; e.g. {\em
< MaxBins} is the square root of the number of instances.  The 5,300,000
< instances of KDD'99 could therefore generate an impractically large number 
< of bins for each numeric attribute. This worst-case scenario
< would occur if each consecutive group of $SubBins$ number of numeric
< values has different values from the previously seen groups {\em and}
< they are sorted in ascending or descending order. If this
< unlikely combination of events does {\em not} occur then the resulting
< bins would have tallies than
< $MinInst$, encouraging it to merge with the next bin.  
< In all our experiments,
< we have never seen this worst-case behavior. 
< In KDD'99, for example,
< SPADE only ever generated 2 bins for 20 of the 40 attributes. Also,
< for only two of the attributes, did SPADE generate more than 50
< bins. Further, SPADE never generated more than 100 bins for any attribute.
< \begin{figure}[!t]
< {\scriptsize
< \begin{center}
< \includegraphics[width=1.4in]{data/kdd.pdf}~~~~
< \includegraphics[width=1.4in]{data/kddbands.pdf}
< \end{center}}
< \caption{SAWTOOTH and the KDD'99 data}\label{fig:kdd99}
< \end{figure}
< \newcommand{\upp}{$\blacktriangle$}
< \newcommand{\downn}{$\blacktriangledown$}
< 
< \begin{figure*}
< \begin{center}
< \begin{minipage}{.6\linewidth}
< \begin{center}
< {\scriptsize
< \begin{tabular}{l@{~}r|@{~}l|l@{}c@{~}l@{~}c@{~}l@{}c@{}lc@{}r|}\cline{3-9}
< \multicolumn{2}{c}{}& \multicolumn{1}{|c|}{incremental} &\multicolumn{6}{c|}{batch}\\\cline{3-9}
< data set & \#instances & SAWTOOTH & NB & & nbk&&j48\\\hline
< \input{data/10way.dat}
< \end{tabular}}
< \end{center}
< \end{minipage}\hfill
< \begin{minipage}{.35\linewidth}
< \begin{center}
< \includegraphics[width=2in]{data/10way.pdf}
< ~\\
< ~\\~\\~\\
< {\scriptsize
< \begin{tabular}{|c|c|c|c|c|c|c|} \hline
< learner & win - loss & win & loss & ties \\ \hline
< J48        &     10 &     25 &     15 & 20 \\
< nbk        &      8 &     17 &      9  & 34\\
< SAWTOOTH        &     -6 &     12 &     18 & 30\\
< NB        &    -12 &      9 &     21 & 30 \\
< \hline
< \end{tabular}
< %~\newline~\newline
< }\end{center}
< \end{minipage}
< \end{center}
< \caption{$mean \pm standard\; deviations$
< seen in 10*10-way cross validation
< experiments on UCI Irvine data sets.
< ``NB'' and ``nbk'' denote  N\aive Bayes
< classifiers that use gaussians 
< model continuous attributes.  ``NB'' uses a single
< gaussian while ``nbk'' uses a sum of gaussians in the method
< recommended by John and Langley~\cite{john95}.
< The plot top-right sorts the differences in the accuracies
< found by SAWTOOTH and all the other learners.
< Some of those differences  aren't statistically significant:
< the  {\upp} or {\downn} symbols in the left-hand-side table denote 
< mean differences that are significantly more or less (respectively) 
< to SAWTOOTH 
< at the $\alpha=0.05$ level.
< The significant differences between
< all the learners are shown in
< the win-loss statistics of the bottom-right table.
<  }\label{fig:uci}
< \end{figure*}
---
> \fig{kdd99} shows the {\em mean M*C scores}
> for  SAWTOOTH and the  KDD'99 entrants. 
> SAWTOOTH's mean $M*C$ 
> results were 
> close to the winning
> score of entrant \#1;
> very similar to entrants 10,11,12,13,14,15,16; and 
> better than 
> entrants 18,19,20,21,22,23,24.
> These results are encouraging since SAWTOOTH is a much simpler tool that the winning
> entry (which classified the security incidents using
> an ensemble of decision trees built from a 50*10 cross-val).
> 
> Another encouraging result  is the {\em \# bins with tally=X}
> plot of \fig{kdd99}. One concern with
> SPADE is that several of its internal
> parameters are linked to the number of processed
> instances; e.g. {\em MaxBins} is the square root of the number of instances.
> The 5,300,000 instances of KDD'99 could therefore, in the worst case,
> generate thousands of bins for each numeric attribute. In all our experiments,
> we have never seen this worst-case behavior. In KDD'99, for example,
> SPADE only ever generated
> 2 bins for 20 of the  40 attributes. Also, for only two
> of the attributes,  did SPADE generate more than 50 bins. Lastly,
> SPADE never generated more than 100 bins.
707,717d660
< Attempts to test our system using  other KDD cup data were not successful, for a variety
< of reasons\footnote{The KDD'04 evaluation portal was off-line
< during the period when SAWTOOTH was being developed.
< The KDD'03 problem required feature
< extration from free text- something that is
< beyond the scope of this research.
< The data for KDD'02 is no longer on-line.
< The KDD'01 had data with 130,000 attributes
< and we don't yet know how to extend our technqie to
< such a large attribute space.
< We had trouble following the  KDD'00 documentation.}.
735c678
< usually performs worse than learning from the total
---
> usually performs worse that learning from the total
905,906c848
< %\bibliography{../../refs}
< \bibliography{refs}
---
> \bibliography{../../refs}