TSESI-2008-09-0279.R1, "Controlling Randomized Unit Testing With Genetic Algorithms  "
Manuscript Type: Special Issue on Search-Based Optimiation

Dear  Dr. Andrews,

We now have reviews of your above referenced submission to IEEE
Transactions on Software Engineering. Copies of the review comments
are enclosed.

Unfortunately, based on these reviews, Associate Editor Prof. Harman
and Dr. Mansouri (Guest Editors of Search Based Optimization for
SE) is not able to recommend this submission for publication.

You may resubmit your paper, but it will be treated as a NEW
submission and given a new log number.  If you choose to resubmit
your paper please refer to this original log number
(TSESI-2008-09-0279.R1), and we will include your previous manuscript's
history in it's files and forward the necessary information to the
Editor-in-Chief and Associate Editor.  The manuscript will then
undergo a new review process.  Dr. (Guest Editors of Search Based
Optimization for SE) has the following comments for you:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-
Editor
Comments to the Author:
--- this additional comment comes from Mark Harman:-


Thank you for your revised paper. One of the referees has remaining
serious concerns about the degree of overlap between this paper and
previously published work. The referee is concerned that the new
material, though not overlapping with the ASE paper, now additionally
partly overlaps with the PROMISE paper. This concern is also raised
by a second referee in  comments to the AE.

The TSE review process strongly deprecates two phases of major
revisions and so the expected outcome at this stage of the process
with these kind of referee comments is either "reject" or "revise
and resubmit as new". I am recommending an outcome of "revise and
resubmit as new" because the referees indicate that there is clearly
merit in this work and the problem centres around the degree of new
material. This could clearly be addressed by a revision (though it
would equally clearly require martial that is fresh and only present
in the TSE paper with neither actual nor perceived overlap with
other conference papers).

The principle here is that a TSE paper can be extended from a
conference paper, with the additional of new material. However, it
cannot be extended from one conference paper by the addition of new
material from another conference paper. I most definitely do not
say that this is what has happened in this case, since the TSE paper
does, indeed, have material that only appears in the TSE version
only. However, I am afraid the fact remains that two of the three
referees are concerned about the degree of overlap and so this
concern can only be addressed by a major revision and a re-review
by the same referees.

If you resubmit the paper, I will be assigned as the AE to handle
the paper and you should provide a response to the referees and I
will seek to ensure that there is some overlap between the original
referees and those assigned to review the new submission so that
this can be addressed in their review. If subsequently accepted,
the paper would be put into the review process for consideration
for a regular issue of TSE.


Thank you once again for your revised version of the paper. Whatever
you decide to do with the paper, I hope that you find the referees'
comments helpful.  =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-

We hope that you will find the comments from the reviewers to be
useful in your future work.  If you have any questions, feel free
to contact the editor or the EIC.

We appreciate that you chose to submit your work to IEEE Transactions
on Software Engineering.  If you have suggestions about the journal
or the manuscript handling, feel free to send a note to the EIC or
Associate EIC.

Thank you,

Debby Mosher on behalf of Jeff Kramer, EIC
IEEE Transactions on Software Engineering
tse@computer.org


=======================================

REVIEWS

Reviewer: 1

Recommendation: Author Should Prepare A Major Revision For A Second Review

Comments:
My initial reservations regarding this paper were concerned with
the similarity to the authors previous ASE publication (I had a
number of other more minor concerns but the authors have adequately
addressed these). The distinction between this paper and the ASE
paper has been enhanced by including more information on the Feature
Subset Selection study. The authors initial work in this area has
also been reported in their Promise'09 paper and so there is an
element of similarity between these two publications, particularly
in terms of the background. The results are new since this paper
employs a different (more effective) reduction method to the Promise
paper.

However, this also raises a number of concerns about the study since
it suggests that this work may not be particularly mature. On a
minor point, Figure 13 is not easily readable (except to indicate
that the results for the varying numbers of gene types are quite
similar). The results in figure 14 are particularly interesting
since it would appear that the approach seems to be class-dependent,
as the trends appear to differ more for the classes than for the
number of gene types. This is something that could be explored and
tested.

It is also mentioned that the additional coverage is statistically
significant but there is no mention of how this was tested.y

The bigger concern with this is the range of classes (subject units)
used (which is a general concern with this paper) - important though
they are, they may not be diverse enough to demonstrate the general
applicability of the approach. This relates to the earlier observation
about the results in figure 14 appearing to be class-dependent. To
be more convincing this study would benefit from being run on a
wider range of diverse classes.

How relevant is this manuscript to the readers of this periodical?
Please explain under the Public Comments section below.: Very
Relevant

Is the manuscript technically sound? Please explain under the Public
Comments section below.: Partially

1. Are the title, abstract, and keywords appropriate? Please explain
under the Public Comments section below.: Yes

2. Does the manuscript contain sufficient and appropriate references?
Please explain under the Public Comments section below.: References
are sufficient and appropriate

3. Please rate the organization and  readability of this manuscript.
Please explain under the Public Comments section below.: Easy to
read

Please rate the manuscript. Explain your rating under the Public
Comments section below.: Good


Reviewer: 2

Recommendation: Author Should Prepare A Minor Revision

Comments:
GENERAL

The changes made by the authors have significantly improved the
readability and coherence of the manuscript.  The removal of the
exploratory work section and the revision of the section on FSS are
both effective.

There remain a few (relatively) minor technical and typographical issues listed below.

an additional concern is regards the fss work published at promise.
while it is true that the fss approach described here, is a new,
more formulation, the broad method is similar.  thus, given that
the main case study is essentially that previously published at
ase, and the fss refinement similar to that published at promise,
the proportion of unpublished material is arguably lower than in
the initial version of the paper.

rEMARKS ON RESPONSE TO REVIEWER 2

The comments I made on the previous work have largely been address
either by changes to the paper or clarified in the response to
reviewers.  I remark here on a couple of the points.

* comment regarding use of Design-of-Experiments or OR methods
I accept the authors' response.  To clarify: my comment was whether
a GA was the best 'search' technique to be used here, given that
other exist.  For example, DoE approaches might find near-optimal
'chromosome' values relatively efficiently, possibly with additional
the benefit of identifying unimportant gene types - in a similar
was to FSS - through simple 'screening' experiments.

* comment regarding 'spikes' in Fig 1
The authors haven't completely addressed this point.  The 'spikes'
are simply an artefact of the how the graph is drawn.  For example,
the prob of covering (z-value) at point (x=9, y=9) is 1.  The z-value
at (x=10, y=10) is 1.  There are no valid values along the line
(9,9) to (10,10), but if a line is drawn from (9,9,1) to (10,10,1)
it would form part of a diagonal ridge rather than set of spikes.
It is the thin ridge, not the spikes, that are the important point.

ADDITIONAL COMMENTS

* Abstract - 2nd paragraph:  While relevant, the description here
is perhaps a little too detailed for an abstract.  The terminology
of 'gene type' isn't clear at this point, and the terminology of
'mutator' does not appear to explained in the paper at all.  I'd
suggest referring simply to the use of feature subset selection to
reduce the size and content of the representation and the resultant
improvement in performance with only a small decrease in quality.

* Section III C - 3rd para: "Nighthawk gets the random testing level
to generate and run ...".  For readability, I suggest rewording as
"The random testing level of Nighthawk generates and runs ..."

* Section V - 4th para: Unfortunately the Shapiro-Wilk test has
been applied inappropriately.  In order to use the paired t-test,
it is necessary to demonstrate that the distribution for each
combination of option and source file (i.e. each cell in Fig 9) is
a normal distribution, i.e. across the 10 test cases taken for each.
Apply the SW to the entire column is not meaningful: the values in
the column will essentially depend instead of how 'difficult' each
source file is, and this is a consequence of the choice of the
source file, not of the distribution of each.  Therefore, I suggest
removing discussion of the SW test, and of the t-test and simply
retain the Wilcoxon test results: this does not change the conclusion.

*  Section V - 4th para: Arguably, there was no need to apply the
Bonferroni correction.  If the hypothesis was that at least one
pair of columns is different, then it is necessary (since one such
difference could easily occur by chance over the multiple combination).
However, if the hypothesis is about specific pairs e.g. that (PN,EN)
are no different, then no correction is necessary.  Since the
Bonferroni correction makes the test more conservative, I suggest
leaving it as is.

*  Section V - Although 10 test cases were generated in order to
assess coverage, these 10 test cases appear to have been generated
from the results of one run of the GA in each case.  It would have
helped to control variance due to the GA by comparing the results
of multiple runs of the GA.  (Although I don't suggest any changes
need be made at this point since it doesn't affect the validity of
the conclusions.)

* Section VI - D : how were the default constant values chosen for
gene types that were no optimised by the GA?

* Section VI - D, Fig 13 - although these results are summarised
in Fig 14, the figure 13 itself is to small to be readable: it is
impossible to distinguish the lines corresponding to different
number of gene types eliminated.  Same is true of Fig 15, but to a
lesser extent.

* Section VI - D: it appears that only one 'run' was performed for
each source code/number of eliminated gene types: as mentioned
above, averaging results across multiple runs of the GA would have
reduced the variance (and therefore may have avoided lines 'above'
the thick line in Fig 13).

* Section VI - D: why is area under the curve chosen as the metric?
Would not have coverage of the best chromosome (as used in the
earlier case study) been a better response metric?

* Section VI - D (1): the alpha-value does not "result" in a
particular p-value, the p-value is independent of the alpha value.
The alpha value is the critical value for determining whether the
p-value demonstrates a significant result.  (Also, is alpha=0.5 the
correct value - should it be 0.05?)

* Section VIII - final para: "depreciated in safety critical
applciations" - is "depreciated" the correct word here - perhaps
just "but possible not for safety critical applications"; also typo
in "applciations"

How relevant is this manuscript to the readers of this periodical?
Please explain under the Public Comments section below.: Very
Relevant

Is the manuscript technically sound? Please explain under the Public Comments section below.: Partially

1. Are the title, abstract, and keywords appropriate? Please explain
under the Public Comments section below.: Yes

2. Does the manuscript contain sufficient and appropriate references?
Please explain under the Public Comments section below.: References
are sufficient and appropriate

3. Please rate the organization and  readability of this manuscript.
Please explain under the Public Comments section below.: Readable
- but requires some effort to understand

Please rate the manuscript. Explain your rating under the Public
Comments section below.: Excellent


Reviewer: 3

Recommendation: Accept With No Changes

Comments:
I am happy with the author's responses to my original concerns and
the overall modifications made to the document.

How relevant is this manuscript to the readers of this periodical?
Please explain under the Public Comments section below.: Very
Relevant

Is the manuscript technically sound? Please explain under the Public
Comments section below.: Appears to be - but didn't check completely

1. Are the title, abstract, and keywords appropriate? Please explain
under the Public Comments section below.: Yes

2. Does the manuscript contain sufficient and appropriate references?
Please explain under the Public Comments section below.: References
are sufficient and appropriate

3. Please rate the organization and  readability of this manuscript.
Please explain under the Public Comments section below.: Easy to
read

Please rate the manuscript. Explain your rating under the Public Comments section below.: Good