264,266c264,270
<   system options.  In order to optimize the system, we used data
<   mining techniques to analyze which genes were the most useful.
<   We also report on the results of this analysis and optimization.
---
>   system options.  We show that we can achieve the
>   same coverage as previous studies did, and that
>   we can achieve high coverage on real-world software units.
>   Finally, we describe how we used data
>   mining techniques to analyze which genes were the most useful,
>   and used this information to optimize the system so that it
>   ran more quickly with no loss of coverage.
376c380
< The system Nighthawk described in this paper significantly
---
> The Nighthawk system described in this paper significantly
389a394,405
> Designing a GA means making decisions about what features are
> worthy of modeling and mutating.  For example, much of the effort
> on this project was a laborious trial-and-error process of
> trying different chromosomes.  To simplify that process, we
> describe experiments here with automatic feature subset
> selection (FSS). First, we try a very large and elaborate set of
> chromosomes. Next, we filter that set using automatic feature
> subset selection. The filtered set ran over 40\% more quickly, with
> no loss of coverage.  We therefore propose that automatic
> feature subset selection should be a routine part of the design
> of any large GA system.
> 
402,406c418,424
<   real-world units (the Java 1.5.0 Collection and Map classes)
<   to determine the effects of different option settings on the
<   basic algorithm.
< \item We describe how we optimized Nighthawk by systematically
<   analyzing which genes have the greatest effect on the fitness
---
>   real-world units (the Java 1.5.0 Collection and
>   Map classes) to determine the effects of different option
>   settings on the basic algorithm.  We show that Nighthawk can
>   achieve high coverage automatically on these units.
> \item We describe how we optimized Nighthawk by using FSS to
>   systematically analyze which genes have the greatest effect on
>   the fitness
408c426,429
<   little effect.
---
>   little effect.  We show that the optimized system achieved
>   the same good results in significantly less time, demonstrating
>   the utility of augmenting GAs with automatic feature subset
>   selection.
418c439,440
< of the empirical work in the paper.
---
> of the empirical work in the paper.  The procedure and results
> of our optimization are in Section 8.
686,687c708,709
< techniques.  For instance, Tonella \cite{tonella-issta04} uses
< a fitness function that specifically takes account of such
---
> techniques.  For instance, Michael et al.\ \cite{michael-etal-ga-tcg}
> use fitness functions that specifically take account of such
722c744
< detection capability.  The GA can of course be re-run to generate
---
> detection capability.  The GA can be re-run to generate
873,874c895,896
< and {\it remove} were needed to create data structures via which
< code in some of the other methods was accessible.
---
> and {\it remove} were needed to create data structures through
> which code in some of the other methods was accessible.
955c977
< methods in $M$ plus the reinitializers of the types of $I_M$.
---
> methods in $M$ plus the reinitializers of the types in $I_M$.
1157,1158c1179
<                  whether the argument will be drawn from the value
<                  pools of that type} \\
---
>                  whether the argument will be of that type\vspace{1mm}} \\
1264c1285
< values are removed, again due to value reuse.  A {\tt remove}
---
> keys will be removed, again due to value reuse.  A {\tt remove}
1296c1317
< a manner identical to the exploratory study (Section
---
> a manner identical to that of the exploratory study (Section
1519,1520c1540,1541
< For BHeap and FibHeap, Nighthawk runs faster than JPF, but for
< the other two units it runs slower than both JPF and Randoop.
---
> For BHeap and FibHeap, Nighthawk runs more quickly than JPF, but for
> the other two units it runs more slowly than both JPF and Randoop.
1875c1896
< results in less time.
---
> quality of results in less time.
1915c1936
< where, 83\% (on average) of the measures in a domain could be
---
> where 83\% (on average) of the measures in a domain could be
1937c1958
< nearest neighbors for each class that is different from the
---
> nearest neighbors for each class that is different from that of the
1965c1986
< For each of the 16 collection and map classes from {\tt java.util},
---
> For each of the 16 Collection and Map classes from {\tt java.util},
1967c1988,1989
< yielded 800 observations of gene value and score.
---
> yielded 800 observations, each consisting of a gene value vector
> and the chromosome score.
1974c1996
< into three regions. 
---
> into three regions:
2001,2004c2023,2026
< The following table shows the number of features that were {\it Selected} in
< our 19 examples, using different values for $\alpha$. Note that as
< $\alpha$ increases, we selected fewer and fewer features.
< \begin{tabular}{rl}
---
> \begin{figure}
> 
> \begin{center}
> \begin{tabular}{|c|c|}
2006,2009d2027
< 0.5 & 439\\
< 0.6& 217 \\
< 0.7 & 112 \\
< 0.8 & 62 \\
2011c2029,2050
< \end{tabular}
---
> \hline
> 0.8 & 62 \\
> \hline
> 0.7 & 112 \\
> \hline
> 0.6 & 217 \\
> \hline
> 0.5 & 439\\
> \hline
> \end{tabular} \vspace{2mm} \\
> \end{center}
> 
> \caption{
>   Numbers of selected features for values of $\alpha$.
> }
> \label{selected-features-fig}
> \end{figure}
> 
> Figure \ref{selected-features-fig} shows the number of features
> that were {\it Selected} in
> our 19 examples, using different values for $\alpha$. Note that as
> $\alpha$ increases, we selected fewer and fewer features.
2045a2085
>   Merit analysis of Nighthawk gene types.
2130,2131c2170,2171
< in a system that ran substantially faster but achieved the same
< high coverage of the SUT.
---
> in a system that ran substantially more quickly but achieved the
> same high coverage of the SUT.
2160,2161c2200,2207
< Nighthawk is able to achieve high coverage of complex Java units.
< The code is available by writing to the first author.
---
> Nighthawk is able to achieve the same coverage as earlier
> studies, and high coverage of complex, real-world Java units,
> while retaining the most desirable feature of randomized
> testing: the ability to generate many new high-coverage test
> cases quickly.
> We have also shown that we were able to simplify the design
> of the GA system and improve its runtime using automatic
> feature subset selection.
2169c2213,2215
< efficiency.
---
> efficiency.  We also wish to integrate a feature subset
> selection learner into the GA level of the Nighthawk algorithm
> for dynamic optimization of the GA.