\section[Case Study]{Case Study: The KDD Cup 99 Classification Contest} The task for the kdd cup 1999 contest was to learn a classification theory to distinguish between different connections in a computer network for intrusion detection learning. Two data sets were provided. One, the training set, is a collection of about five million connection records with 24 attack types (classes) and 40 attributes (from which 34 are continuous).The test set contains 311029 instances from a different probability distribution than the training data. It has 38 classes (14 additional to the ones on the training set) and the same 40 attributes as the training set. It is also important to note that the test instances are \textbf{not} all independent. There are 77291 distinct test examples out of the 311029 present in the dataset. This is important because it is a clear violation of the independence assumption of Bayesian classifiers. Classes from both sets are divided into four main categories: \be \item normal: No attack. \item probe: Surveillance and other probing. \item DOS: denial-of-service. \item U2R: unauthorized access to local super-user (root) privileges. \item R2L: unauthorized access from a remote machine. \ee The class distributions of the datasets are: \begin{tabular}{|c||r|r|} \hline \textbf{Class} & \textbf{Train} & \textbf{Test} \\ \hline 0 & 19.69\% & 19.48\% \\ 1 & 0.83\% & 1.34\% \\ 2 & 79.24\% & 73.90\% \\ 3 & 0.01\% & 0.07\% \\ 4 & 0.23\% & 5.20\% \\ \hline \end{tabular} The classification results submitted were scored according to the following cost matrix: \begin{tabular}{|l|l|l|l|l|l|} \hline &normal&probe&DOS&U2R&R2L\\ \hline normal&0&1&2&2&2\\ \hline probe&1&0&2&2&2\\ \hline DOS&2&1&0&2&2\\ \hline U2R&3&2&2&0&2\\ \hline R2L&4&2&2&2&0\\ \hline \end{tabular} \subsection{Results} There were 24 submissions to this contest. The winning group used an ensemble of 50x10 C5 decision trees using a cost-sensitive bagged boosting technique\footnote{More details are given at \url{http://www.ai.univie.ac.at/~bernhard/kddcup99.html}}. This group had the following results: Winning Confusion matrix: \begin{verbatim} Predicted 0 1 2 3 4 %correct Actual \--------------------------------------------------    0 | 60262 243 78 4 6 99.5%    1 | 511 3471 184 0 0 83.3%    2 | 5299 1328 223226 0 0 97.1%    3 | 168 20 0 30 10 13.2%    4 | 14527 294 0 8 1360 8.4% | %correct | 74.6% 64.8% 99.9% 71.4% 98.8% Average classification cost: 0.2331 Correctly Classified: 288349 Accuracy: 92.708% \end{verbatim} All the groups who participated obtained the average classification costs in \fig{otherResults}. \begin{figure}[!htp] \begin{tabular}{|lc|lc|lc|lc||} \hline 1&0.2331 & 7&0.2474 & 13&0.2552 & 19&0.3344 \\ 2&0.2356 & 8&0.2479 & 14&0.2575 & 20&0.3767 \\ 3&0.2367 & 9&0.2523 & 15&0.2588 & 21&0.3854 \\ 4&0.2411 & 10&0.2530 & 16&0.2644 & 22&0.3899 \\ 5&0.2414 & 11&0.2531 & 17&0.2684 & 23&0.5053 \\ 6&0.2443 & 12&0.2545 & 18&0.2952 & 24&0.9414 \\ \hline \hline \end{tabular} \caption[Kdd Cup 99]{Average classification cost results from the 24 submitted entries to the Kdd Cup 1999.} \label{fig:otherResults} \end{figure} This figure is sorted in ascending order ranging from 0.2356 to 0.9414. \dq{The first significant difference between entries with adjacent ranks is between the 17th and 18th best entries. The difference is very large: $0.2952 - 0.2684 = 0.0268$, which is about nine standard errors. One could conclude that the best 17 entries all performed well, while the worst 7 entries were definitely inferior}\cite{KddResults}. Motivated by the large scale of the datasets offered in this task, we ran IBC and got these results: ABC Confusion matrix: \begin{verbatim} predicted 0 1 2 3 4 %correct actual \-------------------------------------------------- 0 | 60408 128 55 2 0 99.7% 1 | 890 3251 25 0 0 78.0% 2 | 6711 2642 220500 0 0 95.9% 3 | 129 94 0 5 0 2.2% 4 | 16058 130 0 0 1 0.0% | %correct | 71.7% 52.1% 100.0% 71.4% 100.0% Average classification cost: 0.264647 Correctly Classified: 284165 Accuracy: 91.3628% \end{verbatim} The average classification run falls between places 16 and 17 of the submitted entries. That means that IBC is within the group that performed well. The main problem we encounter with our approach is that we have high Type I error on the least frequent classes. That is, we do not classify almost any instances as belonging to these classes. We believe that such problem is due to the over-training of the theory on very frequent classes, therefore it SAWTOOTH is expected to do better on these classes.