\begin{figure}[!b] \small %\begin{center} \begin{tabular}{l@{~}|l@{~}|r@{~}r@{~}@{~}r@{~}|c} %\multicolumn{2}{c}{~}&\multicolumn{5}{c}{quartiles}\\\cline{3-5} %&min& & median & &max\\\cline{3-5} \multicolumn{2}{c}{~}&\multicolumn{3}{c}{pd}&2nd quartile\\ \multicolumn{2}{c}{~}&\multicolumn{3}{c}{percentiles}&median,\\\cline{3-5} Rank&Treatment& 25\%& 50\% & 75\%&3rd quartile\\\hline 1&WC lwl(100) & 20& 76& 97& \boxplot{0.0}{20.0}{77.5}{97.2}{2.8} \\ 1&WC lwl(50) & 25& 75& 95& \boxplot{0.0}{25.0}{75.0}{95.2}{4.8} \\ 2&CC lwl(100) & 20& 73& 93& \boxplot{0.0}{25.0}{73.1}{93.5}{6.5} \\ 3&WC lwl(25) & 33& 73& 93& \boxplot{0.0}{33.3}{72.7}{93.2}{6.8} \\ 3&WC lwl(10) & 33& 73& 92& \boxplot{0.0}{33.3}{72.7}{92.5}{7.5} \\ 4&CC lwl(50) & 25& 70& 91& \boxplot{0.0}{25.0}{70.2}{90.9}{9.1} \\ 5&WC nb & 57& 68& 83& \boxplot{0.0}{56.5}{68.3}{83.3}{16.7}\\ 6&CC lwl(10) & 25& 67& 89& \boxplot{0.0}{25.0}{66.7}{89.1}{10.1} \\ 7&CC lwl(25) & 25& 67& 88& \boxplot{0.0}{25.0}{66.7}{88.9}{11.1} \\ 8&CC nb& 47& 65& 88& \boxplot{0.0}{47.2}{64.9}{87.5}{12.5} \\\hline \multicolumn{5}{c}{~}&~~~~~0~~~~~~~~50~~~~100 \end{tabular} %\end{center} \caption{Probability of Detection (PD) results, sorted by median values. }\label{fig:an2PD} \end{figure} \Begin{figure}[!b] \small %\begin{center} \begin{tabular}{l@{}|l@{~}|r@{~}|r@{~}|r@{~}|c@{}c} %\multicolumn{2}{c}{~}&\multicolumn{5}{c}{quartiles}\\\cline{3-5} %&min& & median & &max\\\cline{3-5} \multicolumn{2}{c}{~}&\multicolumn{3}{c}{pf}&2nd quartile\\ \multicolumn{2}{c}{~}&\multicolumn{3}{c}{percentiles}&median,\\\cline{3-5} Rank&Treatment& 25\%& 50\% & 75\%&3rd quartile\\\hline 1&WC lwl(100)& 3& 22& 80&\boxplot{0.0}{2.7}{22.2}{80.0}{20.0}\\ 1&WC lwl(50)& 5& 22& 75&\boxplot{0.0}{4.7}{22.2}{75.0}{25.0}\\ 2&CC lwl(100)& 7& 26& 80&\boxplot{0.0}{6.5}{26.2}{80.0}{20.0}\\ 3&WC lwl(25)& 7& 25& 67&\boxplot{0.0}{6.7}{25.0}{66.7}{32.7}\\ 4&WC lwl(10)& 7& 27& 67&\boxplot{0.0}{7.3}{27.3}{66.7}{32.7}\\ 5&CC lwl(50)& 9& 29& 75&\boxplot{0.0}{9.1}{28.7}{75.0}{25.0}\\ 6&WC nb& 18& 32& 43&\boxplot{0.0}{16.7}{31.6}{43.2}{56.8}\\ 7&CC lwl(10)& 11& 33& 74&\boxplot{0.0}{10.8}{33.3}{74.3}{25.7}\\ 8&CC lwl(25)& 11& 33& 75&\boxplot{0.0}{10.6}{33.0}{75.0}{25.0}\\ 9&CC nb& 11& 35& 53&\boxplot{0.0}{11.1}{34.8}{52.6}{47.4}\\\hline \multicolumn{5}{c}{~}&~~~~~0~~~~~~~~50~~~~100 \end{tabular} %\end{center} \caption{Probability of False Alarm (PF) result, sorted by median values. }\label{fig:an2PF} \end{figure} \subsection{Results} Our results, shown in \fig{an2PD} and \fig{an2PF}, are divided into reports of probability of detection ($pd$) and probability of false alarms ($pf$). When a method uses $k$ nearest neighbors, the $k$ value is shown (in brackets). For example, the first line of \fig{an2PD} reports WC data when $lwl$ used 100 nearest neighbors. The results are sorted, top to bottom, from best to worse (higher $pd$ is better; lower $pf$ is better). Quartile plots are shown on the right-hand side of each row. The black dot in those plots shows the median value and the two ``arms'' on either side of the median show the second and third quartile respectively. The three vertical bars on each quartile chart mark the (0\%,50\%,100\%) points. Column one of each row shows the results of a Mann-Whitney test (95\% confidence): learners are ranked by how many times they lose compared to every other learner (so the top-ranked learner loses the least). In \fig{an2PD} and \fig{an2PF}, row $i$ has a different rank to row $i+1$ if Mann-Whitney reports a statistically significant difference between them. Several aspects of these results are noteworthy: \bi \item Measured in terms of $pd$ and $pf$, the learners are ranked the same. That is, methods that result in higher $pd$ values also generate lower $pf$ values. This result means that, in this experiment, we can make clear recommendations about the value of different learners. \item When performing relevancy filtering on CC data, extreme locality is not recommended. Observe how \mbox{$k\in\{10,25\}$} are ranked second and third worst in terms of both $pd$ and $pf$. \item When using imported CC data, some degree of relevancy filtering is essential. The last line of our results shows the worst $pd,pf$ results and on that line we see that Naive Bayes, using all the imported data, performs worst. \item The locally-weighted scheme used by $lwl$ improved WC results as well as CC results. This implies that a certain level of noise still exists within local data sets and it is useful to remove extraneous factors from the local data. \ei These results confirm the Turhan et al. results: \bi \item In a result consistent with {\bf Turhan\#1}, best results were seen using WC data (see the top two rows of each table of results). \item In a result consistent with {\bf Turhan\#2}, after relevancy filtering with $k=100$, the CC results are nearly as good as the WC result: a loss of only 3\% in the median $pd$ and a loss of 4\% in the median $pf$. \ei Our conclusions from this study are the same as {\bf Turhan\#3}: while local data is the preferred option, it is feasible to use imported data provided it is selected by a relevancy filter. If a company has a large collection of local development data, they should use that to develop defect predictors. If no such local repository exists, then a cross-company data collection filtered by a locally-weighted classifier will yield usable results. Repositories like PROMISE can be used to obtain that CC data.