TSESI-2008-09-0279.R1, "Controlling Randomized Unit Testing With Genetic Algorithms " Manuscript Type: Special Issue on Search-Based Optimiation Dear Dr. Andrews, We now have reviews of your above referenced submission to IEEE Transactions on Software Engineering. Copies of the review comments are enclosed. Unfortunately, based on these reviews, Associate Editor Prof. Harman and Dr. Mansouri (Guest Editors of Search Based Optimization for SE) is not able to recommend this submission for publication. You may resubmit your paper, but it will be treated as a NEW submission and given a new log number. If you choose to resubmit your paper please refer to this original log number (TSESI-2008-09-0279.R1), and we will include your previous manuscript's history in it's files and forward the necessary information to the Editor-in-Chief and Associate Editor. The manuscript will then undergo a new review process. Dr. (Guest Editors of Search Based Optimization for SE) has the following comments for you: =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=- Editor Comments to the Author: --- this additional comment comes from Mark Harman:- Thank you for your revised paper. One of the referees has remaining serious concerns about the degree of overlap between this paper and previously published work. The referee is concerned that the new material, though not overlapping with the ASE paper, now additionally partly overlaps with the PROMISE paper. This concern is also raised by a second referee in comments to the AE. The TSE review process strongly deprecates two phases of major revisions and so the expected outcome at this stage of the process with these kind of referee comments is either "reject" or "revise and resubmit as new". I am recommending an outcome of "revise and resubmit as new" because the referees indicate that there is clearly merit in this work and the problem centres around the degree of new material. This could clearly be addressed by a revision (though it would equally clearly require martial that is fresh and only present in the TSE paper with neither actual nor perceived overlap with other conference papers). The principle here is that a TSE paper can be extended from a conference paper, with the additional of new material. However, it cannot be extended from one conference paper by the addition of new material from another conference paper. I most definitely do not say that this is what has happened in this case, since the TSE paper does, indeed, have material that only appears in the TSE version only. However, I am afraid the fact remains that two of the three referees are concerned about the degree of overlap and so this concern can only be addressed by a major revision and a re-review by the same referees. If you resubmit the paper, I will be assigned as the AE to handle the paper and you should provide a response to the referees and I will seek to ensure that there is some overlap between the original referees and those assigned to review the new submission so that this can be addressed in their review. If subsequently accepted, the paper would be put into the review process for consideration for a regular issue of TSE. Thank you once again for your revised version of the paper. Whatever you decide to do with the paper, I hope that you find the referees' comments helpful. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=- We hope that you will find the comments from the reviewers to be useful in your future work. If you have any questions, feel free to contact the editor or the EIC. We appreciate that you chose to submit your work to IEEE Transactions on Software Engineering. If you have suggestions about the journal or the manuscript handling, feel free to send a note to the EIC or Associate EIC. Thank you, Debby Mosher on behalf of Jeff Kramer, EIC IEEE Transactions on Software Engineering tse@computer.org ======================================= REVIEWS Reviewer: 1 Recommendation: Author Should Prepare A Major Revision For A Second Review Comments: My initial reservations regarding this paper were concerned with the similarity to the authors previous ASE publication (I had a number of other more minor concerns but the authors have adequately addressed these). The distinction between this paper and the ASE paper has been enhanced by including more information on the Feature Subset Selection study. The authors initial work in this area has also been reported in their Promise'09 paper and so there is an element of similarity between these two publications, particularly in terms of the background. The results are new since this paper employs a different (more effective) reduction method to the Promise paper. However, this also raises a number of concerns about the study since it suggests that this work may not be particularly mature. On a minor point, Figure 13 is not easily readable (except to indicate that the results for the varying numbers of gene types are quite similar). The results in figure 14 are particularly interesting since it would appear that the approach seems to be class-dependent, as the trends appear to differ more for the classes than for the number of gene types. This is something that could be explored and tested. It is also mentioned that the additional coverage is statistically significant but there is no mention of how this was tested.y The bigger concern with this is the range of classes (subject units) used (which is a general concern with this paper) - important though they are, they may not be diverse enough to demonstrate the general applicability of the approach. This relates to the earlier observation about the results in figure 14 appearing to be class-dependent. To be more convincing this study would benefit from being run on a wider range of diverse classes. How relevant is this manuscript to the readers of this periodical? Please explain under the Public Comments section below.: Very Relevant Is the manuscript technically sound? Please explain under the Public Comments section below.: Partially 1. Are the title, abstract, and keywords appropriate? Please explain under the Public Comments section below.: Yes 2. Does the manuscript contain sufficient and appropriate references? Please explain under the Public Comments section below.: References are sufficient and appropriate 3. Please rate the organization and readability of this manuscript. Please explain under the Public Comments section below.: Easy to read Please rate the manuscript. Explain your rating under the Public Comments section below.: Good Reviewer: 2 Recommendation: Author Should Prepare A Minor Revision Comments: GENERAL The changes made by the authors have significantly improved the readability and coherence of the manuscript. The removal of the exploratory work section and the revision of the section on FSS are both effective. There remain a few (relatively) minor technical and typographical issues listed below. an additional concern is regards the fss work published at promise. while it is true that the fss approach described here, is a new, more formulation, the broad method is similar. thus, given that the main case study is essentially that previously published at ase, and the fss refinement similar to that published at promise, the proportion of unpublished material is arguably lower than in the initial version of the paper. rEMARKS ON RESPONSE TO REVIEWER 2 The comments I made on the previous work have largely been address either by changes to the paper or clarified in the response to reviewers. I remark here on a couple of the points. * comment regarding use of Design-of-Experiments or OR methods I accept the authors' response. To clarify: my comment was whether a GA was the best 'search' technique to be used here, given that other exist. For example, DoE approaches might find near-optimal 'chromosome' values relatively efficiently, possibly with additional the benefit of identifying unimportant gene types - in a similar was to FSS - through simple 'screening' experiments. * comment regarding 'spikes' in Fig 1 The authors haven't completely addressed this point. The 'spikes' are simply an artefact of the how the graph is drawn. For example, the prob of covering (z-value) at point (x=9, y=9) is 1. The z-value at (x=10, y=10) is 1. There are no valid values along the line (9,9) to (10,10), but if a line is drawn from (9,9,1) to (10,10,1) it would form part of a diagonal ridge rather than set of spikes. It is the thin ridge, not the spikes, that are the important point. ADDITIONAL COMMENTS * Abstract - 2nd paragraph: While relevant, the description here is perhaps a little too detailed for an abstract. The terminology of 'gene type' isn't clear at this point, and the terminology of 'mutator' does not appear to explained in the paper at all. I'd suggest referring simply to the use of feature subset selection to reduce the size and content of the representation and the resultant improvement in performance with only a small decrease in quality. * Section III C - 3rd para: "Nighthawk gets the random testing level to generate and run ...". For readability, I suggest rewording as "The random testing level of Nighthawk generates and runs ..." * Section V - 4th para: Unfortunately the Shapiro-Wilk test has been applied inappropriately. In order to use the paired t-test, it is necessary to demonstrate that the distribution for each combination of option and source file (i.e. each cell in Fig 9) is a normal distribution, i.e. across the 10 test cases taken for each. Apply the SW to the entire column is not meaningful: the values in the column will essentially depend instead of how 'difficult' each source file is, and this is a consequence of the choice of the source file, not of the distribution of each. Therefore, I suggest removing discussion of the SW test, and of the t-test and simply retain the Wilcoxon test results: this does not change the conclusion. * Section V - 4th para: Arguably, there was no need to apply the Bonferroni correction. If the hypothesis was that at least one pair of columns is different, then it is necessary (since one such difference could easily occur by chance over the multiple combination). However, if the hypothesis is about specific pairs e.g. that (PN,EN) are no different, then no correction is necessary. Since the Bonferroni correction makes the test more conservative, I suggest leaving it as is. * Section V - Although 10 test cases were generated in order to assess coverage, these 10 test cases appear to have been generated from the results of one run of the GA in each case. It would have helped to control variance due to the GA by comparing the results of multiple runs of the GA. (Although I don't suggest any changes need be made at this point since it doesn't affect the validity of the conclusions.) * Section VI - D : how were the default constant values chosen for gene types that were no optimised by the GA? * Section VI - D, Fig 13 - although these results are summarised in Fig 14, the figure 13 itself is to small to be readable: it is impossible to distinguish the lines corresponding to different number of gene types eliminated. Same is true of Fig 15, but to a lesser extent. * Section VI - D: it appears that only one 'run' was performed for each source code/number of eliminated gene types: as mentioned above, averaging results across multiple runs of the GA would have reduced the variance (and therefore may have avoided lines 'above' the thick line in Fig 13). * Section VI - D: why is area under the curve chosen as the metric? Would not have coverage of the best chromosome (as used in the earlier case study) been a better response metric? * Section VI - D (1): the alpha-value does not "result" in a particular p-value, the p-value is independent of the alpha value. The alpha value is the critical value for determining whether the p-value demonstrates a significant result. (Also, is alpha=0.5 the correct value - should it be 0.05?) * Section VIII - final para: "depreciated in safety critical applciations" - is "depreciated" the correct word here - perhaps just "but possible not for safety critical applications"; also typo in "applciations" How relevant is this manuscript to the readers of this periodical? Please explain under the Public Comments section below.: Very Relevant Is the manuscript technically sound? Please explain under the Public Comments section below.: Partially 1. Are the title, abstract, and keywords appropriate? Please explain under the Public Comments section below.: Yes 2. Does the manuscript contain sufficient and appropriate references? Please explain under the Public Comments section below.: References are sufficient and appropriate 3. Please rate the organization and readability of this manuscript. Please explain under the Public Comments section below.: Readable - but requires some effort to understand Please rate the manuscript. Explain your rating under the Public Comments section below.: Excellent Reviewer: 3 Recommendation: Accept With No Changes Comments: I am happy with the author's responses to my original concerns and the overall modifications made to the document. How relevant is this manuscript to the readers of this periodical? Please explain under the Public Comments section below.: Very Relevant Is the manuscript technically sound? Please explain under the Public Comments section below.: Appears to be - but didn't check completely 1. Are the title, abstract, and keywords appropriate? Please explain under the Public Comments section below.: Yes 2. Does the manuscript contain sufficient and appropriate references? Please explain under the Public Comments section below.: References are sufficient and appropriate 3. Please rate the organization and readability of this manuscript. Please explain under the Public Comments section below.: Easy to read Please rate the manuscript. Explain your rating under the Public Comments section below.: Good