Gmail 	* Tim Menzies <menzies.tim@gmail.com> *

------------------------------------------------------------------------
*lost: tse reviews*
------------------------------------------------------------------------
* David Owen <owen@messiah.edu> * 	* Sat, Jun 27, 2009 at 12:34 PM *
To: tim@menzies.us
Tim,

Hope you're feeling better.  Here are the reviews:

RE: TSESI-2008-10-0347, "The Benefits of Combining Diverse Model
Translation and Verification Strategies"
Manuscript Type: Special Issue on Software Dependability

Dear Prof. Cukic,

We have completed the review process of the above referenced paper for
the IEEE Transactions on Software Engineering. Enclosed are your reviews.

Associate Editor Dr. Software Dependability Goseva and Kanoun has
recommended to the Editor-in-Chief that your paper undergo a major
revision.  If you should choose to revise your paper, please prepare a
separate document describing how each of the reviewers' comments are
responded to in your revision and submit within three months.

To revise your manuscript, log into
https://mc.manuscriptcentral.com/tse-cs
<https://mc.manuscriptcentral.com/tse-cs>  and enter your Author Center,
where you will find your manuscript title listed under "Manuscripts with
Decisions." Under "Actions," click on "Create a Revision." Your
manuscript number has been appended to denote a revision.

Once the revised manuscript is prepared, you can upload it and submit it
through your Author Center.

When submitting your revised manuscript, you will be able to respond to
the comments made by the reviewer(s) in the space provided. You can use
this space to document any changes you make to the original manuscript.
In order to expedite the processing of the revised manuscript, please be
as specific as possible in your response to the reviewer(s)’ questions
and comments. You may also upload your responses as separate files for
review along with your revision. If you choose to do this, please choose
“Summary of Changes” as the file designation.

IMPORTANT: Your original files are available to you when you upload your
revised manuscript. Please delete any redundant files before completing
the submission.

When the submission process is complete, you will receive an automated
confirmation email immediately. If you did not receive that email, your
submission is not yet complete.

The journal’s publication coordinator will contact you should we have
any concerns or questions regarding your revision. Otherwise, your
revision will be forwarded to the assigned associate editor with a
request to begin the second round of reviews.

Our page limitation and formatting guidelines for TSE can be found on

http://www.computer.org/portal/site/transactions/menuitem.eda2ca84d8d67764cfe79d108bcd45f3/index.jsp?&pName=transactions_level1&path=transactions/tse/mc&file=author.xml&xsl=article.xsl&#manuscript
<http://www.computer.org/portal/site/transactions/menuitem.eda2ca84d8d67764cfe79d108bcd45f3/index.jsp?&pName=transactions_level1&path=transactions/tse/mc&file=author.xml&xsl=article.xsl&#manuscript>

Please do not hesitate in contacting us should you have any questions
about our process or are experiencing technical difficulties. You can
reach me at tse@computer.org <mailto:tse@computer.org>.

Thank you for your contribution to TSE, and we look forward to receiving
your revised manuscript.

Thank you,

Mrs. Joyce Arnold
TSE Assistant
IEEE  Transactions on Software Engineering
IEEE Computer Society
10662 Los Vaqueros Circle
Los Alamitos, CA  90720
USA
Voice: 714.821.8380
Fax: 714.821.9975
tse@computer.org <mailto:tse@computer.org>

********************
Editor Comments

Dear Authors:

We have received four substantive reviews for your paper. The reviewers
describe many valid concerns with the technical content that need to be
remedied. They also make some very useful suggestions for improvement of
the paper and clarification of its technical contribution. Therefore, we
are going to recommend this paper to undergo a major revision. Please
revise the paper carefully, responding to reviewers recommendations and
detailed comments.

To keep the Special Issue on schedule, please submit the revised version
of the paper at latest by July 8, 2009.

********************

Reviewer Comments

Reviewer: 1

Recommendation: Author Should Prepare A Minor Revision

Comments:
The paper advocates the use of multiple formal verification tools to
adequately verify a formal model against requirements. The authors
consider a formal specification written in SCR (a Personnel Access
Control System), generate fault seeded versions of theation tools
(Salsa, Cadence SMV, NuSMV, SPIN, Lurch) to verify properties/detect
errors on these specifications.  From the results of these experiments
and the performances of the tools, they define a strategy to efficiently
combine the tools.
The idea of combining different verification tools is a very interesting
one and has not yet been explored much. The paper gives an informative
example of what could be a verification strategy for an SCR
specification, it however remains only an example from which it is
difficult to infer a generic approach, more work still needs to be done
for that purpose. The paper is clear and easy to read, provides
sufficient information on the experiments and includes interesting
discussions. Related work is sufficient and appropriate.
The paper could be improved on the following points.
* Your motivating example did not really convince me and are even
sometimes counterproductive. In the first one, the problem for me is
really a translation problem and it is indeed an important issue if you
want to advocate the use of multiple tools working on different input
languages. There is some work to be done on the semantics of the
different languages used, the similarities, the differences, whar can be
handled by which tool. You discuss it a bit in section 4 but I think it
is a central issue in your approach. In the second example, the problem
comes from the fact that the SCR specification was not correct. And the
third example is not really convincing either. The general idea I got
from this section is that because there can be mistakes at every level
(spec, translation, tool), it is better to use several different tools
to have more chance to detect and diagnose the error. It is not false
but it is usually an argument that is more used for informal
verification techniques and it is much less convincing than the results
you give on your experiment. So maybe this section should be improved.
* You propose a verification strategy based on the results you obtained
on a single specification. Do you think the results would be the same
for another SCR specification (with different characteristics) or
different kinds of properties? In other words, does the strategy only
depends on the capabilities and complementarity of the tools? It would
be interesting to discuss this point.
*  I agree with the general conclusion that it is interesting to combine
different verification tools, but I also thinks that using only one
formal verification tool can be complex for an industrial user because
formal tools in general lack support for everything that is not purely
the verification itself (understanding of counter examples, management
of verification configurations and so on). A verification framework
including several tools and verification strategies will be even more
difficult to use for a non specialist. I think there is really a lot of
work to be done in this area.
*Details:
p6 : patrial order instead of partial order
p8 : one word missing (we showing)

How relevant is this manuscript to the readers of this periodical?
Please explain under the Public Comments section below.: Very Relevant

Is the manuscript technically sound? Please explain under the Public
Comments section below.: Yes

1. Are the title, abstract, and keywords appropriate? Please explain
under the Public Comments section below.: Yes

2. Does the manuscript contain sufficient and appropriate references?
Please explain under the Public Comments section below.: References are
sufficient and appropriate

3. Please rate the organization and  readability of this manuscript.
Please explain under the Public Comments section below.: Easy to read

Please rate the manuscript. Explain your rating under the Public
Comments section below.: Good

********************
Reviewer: 2

Recommendation: Author Should Prepare A Minor Revision

Comments:
Reader Interest - This paper describes the author's experiences applying
several verification tools to an SCR requirements model. This topic is
everal interesting points, but also makes some claims I feel are quite
misleading.

1) The authors state that the translation from SCR to SMV provided with
the SCR toolset does not support the translation of assertions involving
NEXT states. For this reason, only 9 of the 16 assertions in the
original SCR assertion are checked by (Nu)SMV.  This may be true of the
translator provided to the authors, but there is no reason why a
translator cannot be written from SCR to SMV that support such
assertions. Both SCR and SMV are two state models. At worst, this could
be accomplished by adding a few additional state variables as was done
for SPIN (though its only necessary to add additional state variables
for those used in assertions involving NEXT, not for every variable in
the specification).

This would not be a problem except that the authors repeatedly infer
that this is a shortcoming of (Nu)SMV, not the translator. For example,
on page 22 they refer to "this limitation to the effectiveness of SMV",
on page 24 they state that "in our experimental framework only SPIN can
be used to fully verify all 16 of the assertions", on page 26 they state
that "Cadence SMV and NuSMV could not check 6 of the 15 properties", on
page 28 they state  that "the only tool capable of fully verifying all
types of assertions present in an SCR specification is SPIN", on page 34
they state that "SPIN ... is the only tool capable of fully verifying
SCR specifications".  Sometimes their statements are accompanied by
qualifiers attributing the limitations to their translator, but just as
often they are not.  The overall effect is to convey the impression that
SMV and NuSMV cannot verify properties involving more than a single state.

Our experiences have been quite different - with an effective
translator, (Nu)SMV should have been able check all 16 properties, and
for a synchronous example as small as the one presented by the authors,
in roughly the same time as that required for the single state assertions.

In similar fashion, the translator provided to the authors from SCR to
SPIN makes two copies of every variable in the model to support
assertions using NEXT.  This is not necessary - only the variables
referenced using NEXT need to be duplicated. This greatly slows down
SPIN, and the authors frequently comment on how slow SPIN is compared to
the other verification tools. Again, its not clear if this is due to the
translator or to SPIN.

At the very least, the authors need to carefully rewrite these
statements to make it perfectly clear that the limitations of the
translators are not being attributed to (Nu)SMV and SPIN.

2) The authors state that (to the best of their knowledge), mutant
specifications have not be been in the realm of formal verification.
 This is incorrect. Hanna Chockler wrote her dissertation on this topic
"Coverage Metrics for Temporal Logic Model Checking" and published a
paper "Coverage Metrics for Temporal Logic Model Checking", Formal
Methods in System Design, Volume 28, #3, 2006.

Mats Heimdahl and Mike Whalen also used mutations and model checkers to
to check the effectiveness of varying test coverage criteria given
different, functionally equivalent models in an ICSE 2008 paper:  "The
Effect of Program and Model Structure on MC/DC Test Adequacy Coverage".

3) I am suspicious of the authors explanation of why SPEC and INVARSPEC
are treated differently in NuSMV. The SPEC keyword in NuSMV has been
deprecated for quite a while now in favor of CTLSPEC. Its unclear from
the paper if the authors were using NuSMV as a bounded model checker or
as an implicit state bounded model checker. That information would help
me to understand why SPEC was being improperly handled by NuSMV.

4) In general, the authors conclusion that using several back end tools
to find translation strategies is not a realistic one. Real users aren't
going to have the luxury of verifying a specification several times.
Having different tools catch different errors is onlfferent translators
will be used for each verification tool and that the same error will not
be made in each translator. Even in this paper, they used part of the
same translation chain for SPIN and Lurch. The real lesson here isn't
for the users to use multiple verification tools - it's for the tool
developers to get their translators right and to make them generate
efficient input code for the verification tools.

5) I can't figure out out to relate the numbers in the first paragraph
of page 23 (lines 4-10) to Figure 8.

6) The authors claim on page 31, lines 29-33, that automatically
generated models tend to be less efficient and use more memory than
carefully handwritten models. This has not been our experience. We have
found that well written translators do an excellent job of producing
efficient models for model checking, usually producing more efficient
models than we would have expected by hand.

In any case, the use of hand written models is a non-starter.  Models
describing real systems are becoming so large that re-specifying them in
the language of the model checker by hand will be prohibitively
expensive. Even if a hand written model were created, nobody would trust
them to really describe the system being developed.

Presentation - Keywords

I'm not sure that the keywords of "fault tolerance" and "software
performance" are all that helpful here. Fault tolerance is usually
associated with techniques to detect or mitigate faults in hardware -
here the authors are using diverse tools to detect errors some of the
tools miss. "Fault detection" might be more appropriate. Software
performance is usually associated with the techniques to improve the
performance of software - here the authors are comparing the performance
of several verification tools. "Tool Performance" might be more appropriate.

Presentation - References

The authors need to cite the work of Hanna Chockler, Mats Heimdahl, and
Mike Whalen mentioned above.

The authors do not cite the work sponsored by the NASA Langely Research
Center on model checking
(http://shemesh.larc.nasa.gov/fm/fm-flightdeck.html
<http://shemesh.larc.nasa.gov/fm/fm-flightdeck.html>) conducted by both
Honeywell and Rockwell Collins. The work by Rockwell Collins and the
University of Minnesota on translating RSML, Simulink, and SCADE models
to NuSMV and Prover are particularly relevant to their research.

Presentation - Readability

The English in the paper is excellent and it is generally very well
written It does tend to get rather verbose, particularly in the
conclusions.  It would be easier to read if it were made shorter and
tighter, but that's a fairly significant rewrite.

Evaluation

This paper is one of the more difficult papers I've had to evaluate. It
is well written and is on a highly relevant topic. However, the quality
of the translators used (which the authors may not have had a choice
about) greatly weaken the contribution of the paper. If good translators
had been available, this paper could have presented some useful
comparisons of SMV, NuSMV, SPIN, and Salsa. However, the results are so
dominated by the quality of translators that its hard to conclude
anything useful about the individual verification tools.

The authors conclusions that diverse verification strategies are
necessary is one that I agree with, but for different problem domains,
not to compensate for tool deficiencies. The real lesson here is the
importance of designing tools (particularly translators) that are
correct and generate efficient specifications for the verification
tools. If real users have to go to the time and expense of using
multiple verification tools to compensate for translator errors and
shortcomings, they're simply going to revert back to testing.

The authors do make a valid point that it may be more efficient to use
tools that are fast first to weed out errors those tools can find, but
only if this speeds up the use of subsequent tools to find other classes
of errors.  If not, you might as well just use the slower but more
complete tool in the ir - its acceptable, but not of the level I would
expect in an IEEE TSE paper.

If the paper is accepted, the authors should attempt to rewrite all
sections that make statements about (Nu)SMV and SPIN to make perfectly
clear that many of the results are due to the translators used, not the
verification tools themselves.

I would also recommend shortening the conclusions sections.





How relevant is this manuscript to the readers of this periodical?
Please explain under the Public Comments section below.: Very Relevant

Is the manuscript technically sound? Please explain under the Public
Comments section below.: Partially

1. Are the title, abstract, and keywords appropriate? Please explain
under the Public Comments section below.: No

2. Does the manuscript contain sufficient and appropriate references?
Please explain under the Public Comments section below.: Important
references are missing; more references are needed

3. Please rate the organization and  readability of this manuscript.
Please explain under the Public Comments section below.: Easy to read

Please rate the manuscript. Explain your rating under the Public
Comments section below.: Fair

********************
Reviewer: 3

Recommendation: Author Should Prepare A Minor Revision

Comments:
Summary:

The paper looks at the issue of using various verification and testing
tools on the same example to determine how such tools can achieve the
best synergy. Specifically the authors use a SCR model of an access
controller and analyze it with an invariant checker, two related
symbolic model checkers, an explicit state model checker and lastly a
random testing tool. The ultimate conclusion of the work is that the
combination of tools work better than any one individual tool. The
authors present a methodology for using the tools in sequence to allow
the optimal results in terms of time; memory was not really an issue in
any of the experiments presented.

Verdict:

I support acceptance with some revisions outlined below in the
Evaluation section.

For and Against

+ Well written, easy to read and technically sound.
+ Interesting results for others in the field

- Scope of the results is limited
- Title does not reflect this limited scope
- Need more details on the tool performance

Evaluation:

This is a nicely written paper that is easy to read and follow. The main
drawback as I see it is that the results have very limited scope: it
only pertains to analyzing SCR models using the translation tools
available to Salsa, (Nu)SMV, SPIN and Lurch.. If one wants to be even
more specific it only really pertains to one example as well. The
authors must change the title to reflect the true scope of this work:
adding “…for SCR” to the end of the title would be the minimal change
required.

What is really done here is to compare the different translation tools
to the back-end analyzers. In most previous studies that compared tools,
the focus is on the analyzers, but here the focus is almost exclusively
on the translators.  The reason why the tools are not actually being
analyzed is that unlike in other studies the example is “small” enough
that all the tools can analyze it; although SPIN could not do that with
its vanilla settings and had to be helped along a bit. What I would like
to see added to the paper is a more in-depth discussion of the
differences in behavior of the tools.

For example, the SMVs could not find errors that SPIN could. What was
the reason for this? Was it due to the fact that the SPIN errors were
spurious due to not considering NATURE constraints, was it due to the
properties requiring NEXT and thus not suitable for SMV, or was there a
fundamental issue with SMV (for example maybe it needed some fine-tuning
as was done for  SPIN to find more errors). Why did Lurch miss some
errors that SPIN found? Exactly what made these hard to find for the
random search? This is a question many people are interested in! How
many of the SPIN errors were spurious due to not looking at NATURE
 etc., why, what made them harder to find?

Some more specific issues (I will give page and line numbers in the
P5L34(-39) format)

1.  P4L17 Why is it that one would want random search to confirm the
results of model checking? Seems impossible when there are no errors and
unnecessary when there is an error.
2.  P8L38 disconfirmed should be discarded
3.  P8L36-42 There is a whole body of research devoted to vacuity
checking that addresses the shortcoming of not having any feedback when
a model checker claims a formula holds. Please make reference to this
work; it is mostly for symbolic model checking though.
4.  P8L47 “we showing that a violation reported” typo should be “..show..”
5.  P16L13-31 For Salsa, what are generic disjointness and coverage
properties? I think the former of the two is mentioned somewhere but not
the latter. Please state it here too so that this section becomes more
self-contained. Also, give a reason and an example of why only
two-states are given as a counter-example. Later on it is alluded to
that it is hard to determine if a failure is spurious or not, why?
6.  P17L10 type, extra “)” at the end of the paragraph.
7.  P17L17 Artefact should be artifact
8.  P17L19-20 Are you sure such an experiment has not been performed in
verification. I pretty sure lots of mutation related experiments have
been done in verification, but to be honest can only think of a Paul
Black paper at ASE around 2000 that used mutation but not exactly in
this fashion.
9.  P18L30 type user’ name should be user’s name
10. P20L22-33 This whole paragraph should be rewritten, to make it more
clear. I think I know what you are trying to say but it is not overly
obvious. It is also not obvious that getting the same coverage is even
relevant after mutation is even relevant, isn’t that a function of the
test suite more than the mutated program.
11. P20L38 Typo, In SCR Toolset should be In the SCR toolset
12. P21L20L41 What are the assertions and what are the generic
properties used and where do they come from? Also, why is it necessary
to subdivide into these categories since it seems this play little or no
role in the rest of the paper. It is stated that 16 assertions are
checked in the rest of the paper, so no generic properties?
13. P22L3-21 In Fig 8 the one error on the right that lurch never found,
was that an actual error and why did Lurch not find it? Same really goes
for the one found once, why was that, what made it so hard to find?
14. P26L48 only time the number of assertions are referred to as 15 and
not 16?
15. P30L15-19 How many of the errors found by SPIN/Lurch were due to not
having the NATURE constraints, i.e. are false positives?
16. P34L24-26 What were these errors that were so hard to find for SPIN,
and why were they hard to find?


How relevant is this manuscript to the readers of this periodical?
Please explain under the Public Comments section below.: Relevant

Is the manuscript technically sound? Please explain under the Public
Comments section below.: Yes

1. Are the title, abstract, and keywords appropriate? Please explain
under the Public Comments section below.: No

2. Does the manuscript contain sufficient and appropriate references?
Please explain under the Public Comments section below.: References are
sufficient and appropriate

3. Please rate the organization and  readability of this manuscript.
Please explain under the Public Comments section below.: Easy to read

Please rate the manuscript. Explain your rating under the Public
Comments section below.: Good

********************
Reviewer: 4

Recommendation: Reject

Comments:
The authors present work where they attempt to verify a small model
expressed in the SCR modeling notation using several diverse
verification tools.  From their experiences, they attempt to provide an
approach where the diverse tools are used together to complement each
other, an attempt to take advantage of the strengths while eliminating
weaknesses. They also provide some performble, the experiences in this
paper do not support the conclusions and the approach proposed really
has nothing to do with the verification techniques used.

First, the study performed exposed no problems with the verification
tools. They exposed several faults with either the translation or the
assumptions made on the environment or semantics of the language being
translated. Thus, the title and general direction of the paper is quite
misleading. What they are arguing for in this work is really n-version
programming in the translation portion of the verification tool chain.
They cold have gotten exactly the same results with several translators
targeting the same verification tool.  The paper does not discuss this
aspect of the problem and there is no reference to the n-version
programming studies by John Knight and Nancy Leveson back in the 80s.

Second, the performance data is misleading at best. The translation of a
model to the input language of a model checker is hugely important for
the performance of the analysis.  As an example, in our own work we have
seen an order of magnitude performance difference between tow different
translations from one language to the same model checker. Thus,
performing a performance evaluation of the model checkers based on
arbitrary translations makes little sense.  In addition, the
verification can be highly sensitive to how the properties are
structured as well as the settings on the model checker. For example,
performing a backward versus a forward search in symbolic model checker,
or reusing a previous variable ordering for the BDD (either by reading
from a file or verify many properties in the same run).  In addition,
the analysis is only performed on one example providing no statistical
significance. In sum, the performance results are not telling us much.

These two weaknesses do, in my opinion, make the paper unpalatable in TSE.

In addition, there are quite a few editorial and clarification issues
that should be resolved.

Page 2: You mention that something is to be reduced as “much as
possible”, that is an ambitions goal… I think the goal is typically to
reduce something to an acceptable level.

Middle of page 3: You mention that complementary performance implies
that a problem will be easy for at least one tool. I see no evidence of
this anywhere. Scalability and undecidability are the bane of
verification and-I assert-a problem too large for one tool is very
likely to be too large for all other tools also. There are problems that
do not scale in one tool, but scales well in another, but I do not think
that is the norm. This paper did nothing to change that view.

The related work section is highly unfocused and is difficult to follow.
I suggest you reorganize around the various verification technologies
(explicit state, symbolic, SMT based, etc.).  You also have some
inappropriate statements in this section that you do not back up. For
example, you claim SMV is the best implementation of a symbolic checker,
I am pretty sure there are several groups with a different opinion
(Cadence being one of them).


How relevant is this manuscript to the readers of this periodical?
Please explain under the Public Comments section below.: Relevant

Is the manuscript technically sound? Please explain under the Public
Comments section below.: No

1. Are the title, abstract, and keywords appropriate? Please explain
under the Public Comments section below.: No

2. Does the manuscript contain sufficient and appropriate references?
Please explain under the Public Comments section below.: References are
sufficient and appropriate

3. Please rate the organization and  readability of this manuscript.
Please explain under the Public Comments section below.: Readable - but
requires some effort to understand

Please rate the manuscript. Explain your rating under the Public
Comments section below.: Poor




>>> Tim Menzies <tim@menzies.us <mailto:tim@menzies.us>> 06/26/09 9:55
PM >>>
dear david,
can you send along the reviews on the tse article? my email has EAtimm?
[Quoted text hidden]

------------------------------------------------------------------------