\begin{figure}[!b]
\small
%\begin{center}
\begin{tabular}{l@{~}|l@{~}|r@{~}r@{~}@{~}r@{~}|c}
%\multicolumn{2}{c}{~}&\multicolumn{5}{c}{quartiles}\\\cline{3-5}
%&min& & median & &max\\\cline{3-5}
\multicolumn{2}{c}{~}&\multicolumn{3}{c}{pd}&2nd quartile\\
\multicolumn{2}{c}{~}&\multicolumn{3}{c}{percentiles}&median,\\\cline{3-5}
Rank&Treatment& 25\%& 50\% & 75\%&3rd quartile\\\hline
1&WC lwl(100) & 20& 76& 97&  \boxplot{0.0}{20.0}{77.5}{97.2}{2.8}  \\
1&WC lwl(50)  & 25& 75& 95&  \boxplot{0.0}{25.0}{75.0}{95.2}{4.8}  \\
2&CC lwl(100) & 20& 73& 93&  \boxplot{0.0}{25.0}{73.1}{93.5}{6.5}  \\
3&WC lwl(25)  & 33& 73& 93&  \boxplot{0.0}{33.3}{72.7}{93.2}{6.8}  \\
3&WC lwl(10)  & 33& 73& 92&  \boxplot{0.0}{33.3}{72.7}{92.5}{7.5}  \\
4&CC lwl(50)  & 25& 70& 91&  \boxplot{0.0}{25.0}{70.2}{90.9}{9.1}  \\
5&WC nb       & 57& 68& 83&  \boxplot{0.0}{56.5}{68.3}{83.3}{16.7}\\ 
6&CC lwl(10)  & 25& 67& 89&  \boxplot{0.0}{25.0}{66.7}{89.1}{10.1} \\
7&CC lwl(25)  & 25& 67& 88&  \boxplot{0.0}{25.0}{66.7}{88.9}{11.1}  \\
8&CC nb& 47& 65& 88& \boxplot{0.0}{47.2}{64.9}{87.5}{12.5} \\\hline
\multicolumn{5}{c}{~}&~~~~~0~~~~~~~~50~~~~100
\end{tabular}
%\end{center}
\caption{Probability of Detection (PD) results,
sorted by median values.
}\label{fig:an2PD}
\end{figure}
\Begin{figure}[!b]
\small
%\begin{center}
\begin{tabular}{l@{}|l@{~}|r@{~}|r@{~}|r@{~}|c@{}c}
%\multicolumn{2}{c}{~}&\multicolumn{5}{c}{quartiles}\\\cline{3-5}
%&min& & median & &max\\\cline{3-5}
\multicolumn{2}{c}{~}&\multicolumn{3}{c}{pf}&2nd quartile\\
\multicolumn{2}{c}{~}&\multicolumn{3}{c}{percentiles}&median,\\\cline{3-5}
Rank&Treatment& 25\%& 50\% & 75\%&3rd quartile\\\hline
1&WC lwl(100)& 3& 22& 80&\boxplot{0.0}{2.7}{22.2}{80.0}{20.0}\\
1&WC lwl(50)& 5& 22& 75&\boxplot{0.0}{4.7}{22.2}{75.0}{25.0}\\
2&CC lwl(100)& 7& 26& 80&\boxplot{0.0}{6.5}{26.2}{80.0}{20.0}\\
3&WC lwl(25)& 7& 25& 67&\boxplot{0.0}{6.7}{25.0}{66.7}{32.7}\\
4&WC lwl(10)& 7& 27& 67&\boxplot{0.0}{7.3}{27.3}{66.7}{32.7}\\
5&CC lwl(50)& 9& 29& 75&\boxplot{0.0}{9.1}{28.7}{75.0}{25.0}\\
6&WC nb& 18& 32& 43&\boxplot{0.0}{16.7}{31.6}{43.2}{56.8}\\ 
7&CC lwl(10)& 11& 33& 74&\boxplot{0.0}{10.8}{33.3}{74.3}{25.7}\\
8&CC lwl(25)& 11& 33& 75&\boxplot{0.0}{10.6}{33.0}{75.0}{25.0}\\
9&CC nb& 11& 35& 53&\boxplot{0.0}{11.1}{34.8}{52.6}{47.4}\\\hline
\multicolumn{5}{c}{~}&~~~~~0~~~~~~~~50~~~~100
\end{tabular}
%\end{center}
\caption{Probability of False Alarm (PF) result,
sorted by median values.
}\label{fig:an2PF}
\end{figure}
\subsection{Results}

Our results, shown in \fig{an2PD} and \fig{an2PF}, are divided into
reports of probability of detection ($pd$) and probability of false
alarms ($pf$).  When a method uses $k$ nearest neighbors, the $k$
value is shown (in brackets). For example, the first line of
\fig{an2PD} reports WC data when $lwl$ used 100 nearest neighbors.

The results are sorted, top to bottom, from best to worse (higher $pd$
is better; lower $pf$ is better).  Quartile plots are shown on the
right-hand side of each row.  The black dot in those plots shows
the median value and the two ``arms'' on either side of the median
show the second and third quartile respectively. The three vertical
bars on each quartile chart mark the (0\%,50\%,100\%) points.

Column one of each row shows the results
of a Mann-Whitney test (95\% confidence): learners are ranked by how many times
they lose compared to every other learner (so the top-ranked learner loses the least).
In \fig{an2PD} and \fig{an2PF}, row $i$ has a different rank to row $i+1$ if
Mann-Whitney reports a statistically significant difference between them.


Several aspects of these results are noteworthy:
\bi
\item Measured in terms of $pd$ and $pf$, the learners are ranked the same.
That is, methods that result in higher $pd$ values also generate lower $pf$ values.
This result means that, in this experiment, we can make clear recommendations about the value of different
learners.
\item When performing relevancy filtering on CC data, extreme locality is not recommended.
Observe how \mbox{$k\in\{10,25\}$} are ranked second and third worst in terms of both $pd$ and
$pf$.
\item When using imported CC data, some degree of relevancy filtering is essential.
The last line of our results shows the worst $pd,pf$ results and on that line
we see that  Naive Bayes, using all the imported data, performs worst.
\item
The locally-weighted scheme used by $lwl$ improved WC results as well as
CC results. This 
implies that a certain level of noise still exists within local data sets and it
is useful to  
remove
extraneous factors from the local data. 
\ei
These results confirm the Turhan et al. results:
\bi
\item In a result consistent with {\bf Turhan\#1}, best results were seen using WC data
(see the top two rows of each table of results).
\item In a result consistent with {\bf Turhan\#2}, after relevancy filtering with
$k=100$, the CC results are nearly as good as the WC result: a loss of only 3\% in the median
$pd$ and a loss of 4\% in the median $pf$. 
\ei
Our conclusions from this study are the same as {\bf Turhan\#3}:
while local data is the preferred option, it is feasible to use imported data
provided it is selected by a relevancy filter.
If a company has a large collection of local development data, they should use that to develop
defect predictors. If no such local repository exists, then a cross-company data collection filtered by a locally-weighted
classifier will yield usable results. 
Repositories like PROMISE can be used to obtain that CC data.