Subject:
IEEE Software , SWSI-0089-0605 major revision required
From:
software@computer.org
Date:
Thu, 14 Jul 2005 14:42:12 -0400 (EDT)
To:
timm@cs.pdx.edu

IEEE Software,SWSI-0089-0605
manuscript type: Special issue on Best Papers from 2005 PROMISE
Workshop 
"Software Cost Models: When  Less is  More"

Dear Dr. Menzies,

The review process for the above-referenced manuscript is
complete.  After carefully examining the manuscript and reviews,
the editor in chief has decided that the manuscript needs major
revisions before it can be considered for a second review. We
encourage you to revise your manuscript and resubmit it.

If you are planning to send a revision, please do so by August
8, while we realize the time is short for a major revision,
becuase of the nature of special issues, we must have your
revision by then in order to allow time for the second round of
reviews and still making the production deadline.  Please
maintain our 5,400 word limit as you make your revisions. The
reviewer comments are attached below.

If necessary, please refer to the instructions on how to upload
the revision included below.  If you have any questions
regarding our policies or procedures, please refer to the IEEE
Software Author Center found at
http://www.computer.org/mc/software/author.htm.  We look forward
to receiving your revised manuscript.

Regards,

Hilda Hosillos 
Magazine Assistant
IEEE Software
software@computer.org
  
  
========================================
Author Instructions

You should upload your revision and summary of changes to
http://cs-ieee.manuscriptcentral.com by doing the following:

1) Log on to your Author Center.
2) Scroll down to the section called 'Manuscripts to be
Revised'. 
3) Click on the View Comments/Respond button of the paper with
your original log number (SWSI-0089-0605).  The '.R1' indicates
that it is a revision.
4) Scroll down to the very bottom of the screen to the free-text
boxes called 'Response to Editor:' and 'Responses to Reviewer:'.
Paste in your responses to the editor and reviewers here.  If
you prefer to upload a file containing your responses, please
indicate this information in the free-text box.
5) Click on the 'Save Response' button.
6) Click on the title link to upload the new revision. Click on
OK to continue.
7) This will take you the File upload screen - Screen 10 of 12.
You may click on the 'Previous' button if you need to make
changes to the title, names of contributing authors, abstract,
etc.
8)  After you have uploaded the file(s) on Screen 11, proceed to
Screen 12.
9) Click on the 'Submit Your Manuscript' button.

IMPORTANT - If you do not receive a system generated message
confirming the successful uploading of your revision, then the
upload was not complete. Contact the magazine assistant
(software@computer.org) immediately for assistance.


========================================
Editor comments

Two reviewers recommend a major revision. It appears their
reason is the fact that the paper is not clearly written. The
topic is very relevant but the message is cryptic. Needs a major
revision to ensure that IEEE Software readers can benefit from
the article.  
     

========================================
Reviews
  
 
Reviewer 1
				
				
Section I. Overview 


A. Reader Interest


1. How relevant is this manuscript to the readers of this
periodical? Please explain your rating in the Confidential
Comments section.

( ) Very Relevant

(X) Relevant

( ) Interesting - but not very relevant

( ) Irrelevant


B. Content 


1. Please summarize what you view as the key point(s) of the
manuscript and the importance of the content to the readers of
this periodical.

     Biggest problem with this paper is that the data set sample

sizes are much too small. As a result, it skews the results 
presented in Figure 1. 

     A review of the t03 data set illustrates the problem.
Assuming 66% of the data is allocated to training and 33%
to testing, then 7 tuples are used to build the model and 3
are used to test the model. Figure 1 states that 28% and 62% 
of the results (I assume this only refers to the test data) are

within pred(30) for the "before" and "after" respectively.
For 3 test tuples the 28% maps to roughly 1 out of 3 and the
62 maps to 2 out of 3. Although it is a gain of 221% (relative)
it only corresponds to a gain of 1 project (absolute).
Furthermore,
the "after" column uses results from the best reduction level.
A more realistic approach would show "before" again a specific
level (e.g. FS02).

     It is interesting to note that if c01..c03, p02..p03, and 
t02..t03 are consolidated by using a weighted average. Then,
the
results from Figure 1 look as follows:

     # PROJECTS	 "After/Before" Column
	24	   341% (t02..t03)
	44	   500% (p02..p03)
	56	   221% (c01..c03)
	60	   120%
	63	   116%
	119	   106%
	161	   101%

Excluding the p0X results, the impact of feature reduction
diminishes
as the sample size grows.

This reviewer recommends that this paper be accepted provided
that
c01..c03, p02..p03, and t02..t03 are consolidated and results
are
presented in both relative and absolute terms. If "before" is
10% and after is 15%, then relative improvement is 50% 
((15% - 10%)/10%) and absolute improvement is 5% (15% - 10%).
I believe this will address many of this concerns mentioned
below.


2. Is the manuscript technically sound? Please explain your
answer in the Confidential Comments section.

( ) Yes

( ) Appears to be - but didn't check completely

(X) Partially

( ) No


3. What do you see as this manuscript's contribution to the
literature in this field? 

One of the biggest challenges in Empirical Software Engineering
is building
models with little or no data. This paper demonstrates that
good
models can be built using less data.


4. What do you see as the strongest aspect of this manuscript?

1) Data mining implicitly/explicitly starved Software
Engineering
   domains is a challenge. The paper presents an interesting
   technique for addressing data starvation through feature
   reduction. It would be amusing if a future version of this
   paper ends up discarding COCOMO X in favor of some SLOC
count.

2) Paper is easy to read. Ideas flow nicely.

3) Bottom paragraph on page 2 (How this paper extends
   previous work) is a very nice touch. Sets a nice
   context for this paper.


5. What do you see as the weakest aspect of this manuscript?

1) The paper argues when less is "NOT" more on page 3.
   If the paper is assuming a "data mining" applied to
   "Software Engineering" context, then reason 1 does not
   make much sense. If no data is available, then there
   is no form of algorithmic and/or machine learning modeling.

2) Figure 1 shows impressive results especially with data sets
   5 through 12. However, sample size in data sets 5 though 10
   are quite small. What about merging c01..c03; p02..p04; and
   t02..t03? This would yield c0X = 56, p0X = 48, and t0X = 24.

3) Figure 1 divides results into 2 subtables. What was the
reasoning
   for this partition? Is it based on sample size (larger
samples
   in top subtable)? Or is it based on public versus private
data?

4) Also the aggregate results in Figure 1 (row with the word
"mean")
   uses a simple average. Since there is a relatively broad
range
   of sample sizes, this reviewer recommends using a weighted
   average.

5) On page 2 the authors claim "If experience can tell us when
   to add variables, it should also be able to tell us when to
   subtract variables."

   It is assumed that "experience" refers to human-based
   experience. If so, then is it really necessary to build
   statistical/ML-based models?

6) Figure 2 is a bit confusing. It does not seem to add
   much value to the paper. This reviewer suggests removing
   the figure. It seems that the authors are proposing how
   to incorporate variable reduction into the data mining
process.

7) In section 3 ("Why Subtract Variables"), the authors provide
   business reasons why it is important to subtract variables.
   In the second bullet, the authors provide an example of
assessing
   competitive bids. There are two concerns regarding this
example:
   A) It is an obvious example in the Time and Money will be
      the most prominent driving forces.
   B) This data mining type of problem is a lot easier than the
      case study presented in the paper. The reason for this
claim
      is that the second bullet is an assessment type of
problem
      as opposed to a predictor type of problem.

8) In section 3 ("Why Subtract Variables"), the authors argue
   for subtract variable because of "Irrelevancy." Essentially,
   this is a Type I, Type II issue (Only throw away the
irrelevant,
   but keep the significant). Jumping ahead to Figure 5, there
is
   a blur between relevant and irrelevant. That is, the paper
   argues the irrelevancy issue, but doesn't deliver consistent
   results in Figure 5.

9) In section 3 ("Why Subtract Variables"), the authors argue
   for subtracting variables because of "Under-sampling." 
   A) For the paper "in preparation," the authors might want
      to consider the question, "How many features are enough?"
      That is, how does sample size drive feature set size.
   B) The argument made on page 6, first paragraph, assumes an
      equal distribution of the variables, which is not
normally
      the case. Thus, the 88% and 3.5% results are "worst case
      scenarios."
   C) Figure 3 suggests a technique which combines Feature
reduction
      (theme of this paper) with instance reduction (reduce
cplx
      in Figure 3 from 5 instances to 3 instances).
      normal

10) On page 8, equation 1, claims that EMi has 15 effort
multipliers
    (which is true for COCOMO I). Since some of your data is
    COCOMO II data, you might want to mention that the latter
    version has 17 effort multipliers.

11) Equation 3 raises an interesting question about feature
    reduction. Since many of the terms contain "Size" then
    won't it be less likely that "Size" would be removed?
    (In this case loc.)

12) The paper argues for using the Wrapper technique for
producing
    better effort estimation predictions. However, if I am a 
    project manager, how far do I reduce? Figure 5 does not
    shed any light on this since "best results" are achieved 
    anywhere from FS01 (t02) all the way through FS07 (p04).
    The paper does not plot the lines (in Figure 5) all the way
    to FS07 for all the lines. Thus, as a project manager, I do
    not know the consequences of extending the reduction out
    to FS07. 

    Relating this observation back to reason number 2 on page 3
    (the "less is NOT more" section) a user will be unable to
trust
    this approach since he/she will be unable to when to stop
    reducing.

13) It is noted that the plots (bottom graph of Figure 5)
    are not monotonic. This inconsistency raises questions
about
    how far to reduce. Perhaps the authors may wish 
    to include a confidence factor. That is, a reduction to FS01

    will improve effort prediction X percent of the time.

14) The paper talks about performing 30 hold-out experiments.
    What was the distribution of training to test samples. Page
    11 of the paper implies a 2/3, 1/3 ratio. Is this correct?
    If so, then for data sets 5 through 10 there are 10 or
    fewer samples in the training set and 5 or fewer samples in
    the test set. This argues for the consolidation of data
sets.

15) Also, if an experiment is run 30 times per data set per
feature
    reduction, then why not run a t-test on data set X for
feature
    levels N and N+1. It would be possible to claim that level
N+1
    produces statistically superior results to level N (for data
set
    X).

16) In section 5, Related Work, the authors refer to Kirsopp &
    Shepperd as "K&S" more than once. This is rather informal
    probably not suitable for a journal article.

17) The authors converted the answers using natural logs. Were
    the answers converted back prior to measuring with
pred(30)?
    If not, all the results are greatly distorted.


C. Presentation


1. Are the title, abstract, and keywords appropriate? Please
elaborate in the Confidential Comments section.

(X) Yes

( ) No


2. Does the manuscript contain title, abstract, and/or
keywords?

(X) Yes

( ) No


3. Does the manuscript contain sufficient and appropriate
references? Please elaborate in the Confidential Comments
section.

(X) References are sufficient and appropriate

( ) Important references are missing; more references are
needed

( ) Number of references are excessive


4. Does the introduction state the objectives of the manuscript
in terms that encourage the reader to read on? Please explain
your answer in the Confidential Comments section.

( ) Yes

(X) Could be improved

( ) No


5. How would you rate the organization of the manuscript? Is it
focused? Is the length appropriate for the topic? Please
elaborate in the Confidential Comments section. 

(X) Satisfactory

( ) Could be improved

( ) Poor


6. Is the manuscript focused? Please elaborate in the
Confidential Comments section.

(X) Satisfactory

( ) Could be improved

( ) Poor


7. Is the length of the manuscript appropriate for the topic?
Please elaborate in the Confidential Comments section.

(X) Satisfactory

( ) Could be improved

( ) Poor


8. Please rate and comment on the readability of this manuscript
in the Confidential Comments section.

(X) Easy to read

( ) Readable - but requires some effort to understand

( ) Difficult to read and understand

( ) Unreadable


Section II. Summary and Recommendation


A. Evaluation 

Please rate the manuscript. Explain your choice in the
Confidential Comments section.

( ) Award Quality

( ) Excellent

(X) Good

( ) Fair

( ) Poor


B. Recommendation 

Please make your recommendation and explain your decision in the
Detailed Comments section.

( ) Accept with no changes

(X) Accept if certain minor revisions are made

( ) Author should prepare a major revision

( ) Reject


Section III. Detailed Comments

A. Public Comments (these will be made available to the author)
 Title: "Software Cost Models: When Less is More"
Authors: Chen, Menzies, Port, and Boehm

OVERALL

     Biggest problem with this paper is that the data set sample

sizes are much too small. As a result, it skews the results 
presented in Figure 1. 

     A review of the t03 data set illustrates the problem.
Assuming 66% of the data is allocated to training and 33%
to testing, then 7 tuples are used to build the model and 3
are used to test the model. Figure 1 states that 28% and 62% 
of the results (I assume this only refers to the test data) are

within pred(30) for the "before" and "after" respectively.
For 3 test tuples the 28% maps to roughly 1 out of 3 and the
62 maps to 2 out of 3. Although it is a gain of 221% (relative)
it only corresponds to a gain of 1 project (absolute).
Furthermore,
the "after" column uses results from the best reduction level.
A more realistic approach would show "before" again a specific
level (e.g. FS02).

     It is interesting to note that if c01..c03, p02..p03, and 
t02..t03 are consolidated by using a weighted average. Then,
the
results from Figure 1 look as follows:

     # PROJECTS	 "After/Before" Column
	24	   341% (t02..t03)
	44	   500% (p02..p03)
	56	   221% (c01..c03)
	60	   120%
	63	   116%
	119	   106%
	161	   101%

Excluding the p0X results, the impact of feature reduction
diminishes
as the sample size grows.

This reviewer recommends that this paper be accepted provided
that
c01..c03, p02..p03, and t02..t03 are consolidated and results
are
presented in both relative and absolute terms. If "before" is
10% and after is 15%, then relative improvement is 50% 
((15% - 10%)/10%) and absolute improvement is 5% (15% - 10%).
I believe this will address many of this concerns mentioned
below.


STRENGTHS
1) Data mining implicitly/explicitly starved Software
Engineering
   domains is a challenge. The paper presents an interesting
   technique for addressing data starvation through feature
   reduction. It would be amusing if a future version of this
   paper ends up discarding COCOMO X in favor of some SLOC
count.

2) Paper is easy to read. Ideas flow nicely.

3) Bottom paragraph on page 2 (How this paper extends
   previous work) is a very nice touch. Sets a nice
   context for this paper.


WEAKNESSESS
1) The paper argues when less is "NOT" more on page 3.
   If the paper is assuming a "data mining" applied to
   "Software Engineering" context, then reason 1 does not
   make much sense. If no data is available, then there
   is no form of algorithmic and/or machine learning modeling.

2) Figure 1 shows impressive results especially with data sets
   5 through 12. However, sample size in data sets 5 though 10
   are quite small. What about merging c01..c03; p02..p04; and
   t02..t03? This would yield c0X = 56, p0X = 48, and t0X = 24.

3) Figure 1 divides results into 2 subtables. What was the
reasoning
   for this partition? Is it based on sample size (larger
samples
   in top subtable)? Or is it based on public versus private
data?

4) Also the aggregate results in Figure 1 (row with the word
"mean")
   uses a simple average. Since there is a relatively broad
range
   of sample sizes, this reviewer recommends using a weighted
   average.

5) On page 2 the authors claim "If experience can tell us when
   to add variables, it should also be able to tell us when to
   subtract variables."

   It is assumed that "experience" refers to human-based
   experience. If so, then is it really necessary to build
   statistical/ML-based models?

6) Figure 2 is a bit confusing. It does not seem to add
   much value to the paper. This reviewer suggests removing
   the figure. It seems that the authors are proposing how
   to incorporate variable reduction into the data mining
process.

7) In section 3 ("Why Subtract Variables"), the authors provide
   business reasons why it is important to subtract variables.
   In the second bullet, the authors provide an example of
assessing
   competitive bids. There are two concerns regarding this
example:
   A) It is an obvious example in the Time and Money will be
      the most prominent driving forces.
   B) This data mining type of problem is a lot easier than the
      case study presented in the paper. The reason for this
claim
      is that the second bullet is an assessment type of
problem
      as opposed to a predictor type of problem.

8) In section 3 ("Why Subtract Variables"), the authors argue
   for subtract variable because of "Irrelevancy." Essentially,
   this is a Type I, Type II issue (Only throw away the
irrelevant,
   but keep the significant). Jumping ahead to Figure 5, there
is
   a blur between relevant and irrelevant. That is, the paper
   argues the irrelevancy issue, but doesn't deliver consistent
   results in Figure 5.

9) In section 3 ("Why Subtract Variables"), the authors argue
   for subtracting variables because of "Under-sampling." 
   A) For the paper "in preparation," the authors might want
      to consider the question, "How many features are enough?"
      That is, how does sample size drive feature set size.
   B) The argument made on page 6, first paragraph, assumes an
      equal distribution of the variables, which is not
normally
      the case. Thus, the 88% and 3.5% results are "worst case
      scenarios."
   C) Figure 3 suggests a technique which combines Feature
reduction
      (theme of this paper) with instance reduction (reduce
cplx
      in Figure 3 from 5 instances to 3 instances).
      normal

10) On page 8, equation 1, claims that EMi has 15 effort
multipliers
    (which is true for COCOMO I). Since some of your data is
    COCOMO II data, you might want to mention that the latter
    version has 17 effort multipliers.

11) Equation 3 raises an interesting question about feature
    reduction. Since many of the terms contain "Size" then
    won't it be less likely that "Size" would be removed?
    (In this case loc.)

12) The paper argues for using the Wrapper technique for
producing
    better effort estimation predictions. However, if I am a 
    project manager, how far do I reduce? Figure 5 does not
    shed any light on this since "best results" are achieved 
    anywhere from FS01 (t02) all the way through FS07 (p04).
    The paper does not plot the lines (in Figure 5) all the way
    to FS07 for all the lines. Thus, as a project manager, I do
    not know the consequences of extending the reduction out
    to FS07. 

    Relating this observation back to reason number 2 on page 3
    (the "less is NOT more" section) a user will be unable to
trust
    this approach since he/she will be unable to when to stop
    reducing.

13) It is noted that the plots (bottom graph of Figure 5)
    are not monotonic. This inconsistency raises questions
about
    how far to reduce. Perhaps the authors may wish 
    to include a confidence factor. That is, a reduction to FS01

    will improve effort prediction X percent of the time.

14) The paper talks about performing 30 hold-out experiments.
    What was the distribution of training to test samples. Page
    11 of the paper implies a 2/3, 1/3 ratio. Is this correct?
    If so, then for data sets 5 through 10 there are 10 or
    fewer samples in the training set and 5 or fewer samples in
    the test set. This argues for the consolidation of data
sets.

15) Also, if an experiment is run 30 times per data set per
feature
    reduction, then why not run a t-test on data set X for
feature
    levels N and N+1. It would be possible to claim that level
N+1
    produces statistically superior results to level N (for data
set
    X).

16) In section 5, Related Work, the authors refer to Kirsopp &
    Shepperd as "K&S" more than once. This is rather informal
    probably not suitable for a journal article.

17) The authors converted the answers using natural logs. Were
    the answers converted back prior to measuring with
pred(30)?
    If not, all the results are greatly distorted.

Reviewer 2
				
				
Section I. Overview 


A. Reader Interest


1. How relevant is this manuscript to the readers of this
periodical? Please explain your rating in the Confidential
Comments section.

(X) Very Relevant

( ) Relevant

( ) Interesting - but not very relevant

( ) Irrelevant


B. Content 


1. Please summarize what you view as the key point(s) of the
manuscript and the importance of the content to the readers of
this periodical.

An example of using a software tool to eliminate variables in
order to improve the predicitive capabilities of the dataset.


2. Is the manuscript technically sound? Please explain your
answer in the Confidential Comments section.

( ) Yes

(X) Appears to be - but didn't check completely

( ) Partially

( ) No


3. What do you see as this manuscript's contribution to the
literature in this field? 

Rather than continually collect more data, it says you can stop
sometimes and eliminate unnecessary data.


4. What do you see as the strongest aspect of this manuscript?

Use of algorithm to evaluate effectiveness of data collection


5. What do you see as the weakest aspect of this manuscript?

Section 4 on the case study. It is too dense. Too much is
missing. I have no idea how to interpret Figure 5 which is the
mainr esult. I am not sure what the variable listed in Fuigure 4
means. In order to fit within IEEE Software length guidelines,
too much of understanding was removed in the paper.
   The supporting paper with the manuscript is not relevant. If
I need that paper to understand the current paper, then there is
no need for the current paper.
Section 4 needs to be rewritten and made more understandable as
to what is going on.

C. Presentation


1. Are the title, abstract, and keywords appropriate? Please
elaborate in the Confidential Comments section.

(X) Yes

( ) No


2. Does the manuscript contain title, abstract, and/or
keywords?

(X) Yes

( ) No


3. Does the manuscript contain sufficient and appropriate
references? Please elaborate in the Confidential Comments
section.

(X) References are sufficient and appropriate

( ) Important references are missing; more references are
needed

( ) Number of references are excessive


4. Does the introduction state the objectives of the manuscript
in terms that encourage the reader to read on? Please explain
your answer in the Confidential Comments section.

(X) Yes

( ) Could be improved

( ) No


5. How would you rate the organization of the manuscript? Is it
focused? Is the length appropriate for the topic? Please
elaborate in the Confidential Comments section. 

(X) Satisfactory

( ) Could be improved

( ) Poor


6. Is the manuscript focused? Please elaborate in the
Confidential Comments section.

(X) Satisfactory

( ) Could be improved

( ) Poor


7. Is the length of the manuscript appropriate for the topic?
Please elaborate in the Confidential Comments section.

( ) Satisfactory

(X) Could be improved

( ) Poor


8. Please rate and comment on the readability of this manuscript
in the Confidential Comments section.

( ) Easy to read

( ) Readable - but requires some effort to understand

(X) Difficult to read and understand

( ) Unreadable


Section II. Summary and Recommendation


A. Evaluation 

Please rate the manuscript. Explain your choice in the
Confidential Comments section.

( ) Award Quality

( ) Excellent

(X) Good

( ) Fair

( ) Poor


B. Recommendation 

Please make your recommendation and explain your decision in the
Detailed Comments section.

( ) Accept with no changes

( ) Accept if certain minor revisions are made

(X) Author should prepare a major revision

( ) Reject


Section III. Detailed Comments

A. Public Comments (these will be made available to the author)
 Section 4 - the central theme of the paper - is unreadable. The
topic is important, but this version is not readable to the
general IEEE Software reader.
Reviewer 3
				
				
Section I. Overview 


A. Reader Interest


1. How relevant is this manuscript to the readers of this
periodical? Please explain your rating in the Confidential
Comments section.

( ) Very Relevant

( ) Relevant

(X) Interesting - but not very relevant

( ) Irrelevant


B. Content 


1. Please summarize what you view as the key point(s) of the
manuscript and the importance of the content to the readers of
this periodical.

When using the general(ized) cost estimation model COCOMO, the
number of its parameters could be reduced to a number of
significant parameters, by learning from historical data for
sets of projects.


2. Is the manuscript technically sound? Please explain your
answer in the Confidential Comments section.

( ) Yes

(X) Appears to be - but didn't check completely

( ) Partially

( ) No


3. What do you see as this manuscript's contribution to the
literature in this field? 

Suggests that for COCOMO to make accurate estimations, a reduced
number of parameters might be better than using the entire set
of parameters, given that there is historical data for machine
learners to determine which parameters can be eliminated.


4. What do you see as the strongest aspect of this manuscript?

The idea that under given circumstances, using less parameters
for COCOMO not only leads to same results, but also can improve
the accuracy of its estimation.


5. What do you see as the weakest aspect of this manuscript?

It does not target the audience of the "Software" magazine.

C. Presentation


1. Are the title, abstract, and keywords appropriate? Please
elaborate in the Confidential Comments section.

( ) Yes

(X) No


2. Does the manuscript contain title, abstract, and/or
keywords?

(X) Yes

( ) No


3. Does the manuscript contain sufficient and appropriate
references? Please elaborate in the Confidential Comments
section.

(X) References are sufficient and appropriate

( ) Important references are missing; more references are
needed

( ) Number of references are excessive


4. Does the introduction state the objectives of the manuscript
in terms that encourage the reader to read on? Please explain
your answer in the Confidential Comments section.

(X) Yes

( ) Could be improved

( ) No


5. How would you rate the organization of the manuscript? Is it
focused? Is the length appropriate for the topic? Please
elaborate in the Confidential Comments section. 

( ) Satisfactory

(X) Could be improved

( ) Poor


6. Is the manuscript focused? Please elaborate in the
Confidential Comments section.

( ) Satisfactory

(X) Could be improved

( ) Poor


7. Is the length of the manuscript appropriate for the topic?
Please elaborate in the Confidential Comments section.

( ) Satisfactory

(X) Could be improved

( ) Poor


8. Please rate and comment on the readability of this manuscript
in the Confidential Comments section.

( ) Easy to read

( ) Readable - but requires some effort to understand

(X) Difficult to read and understand

( ) Unreadable


Section II. Summary and Recommendation


A. Evaluation 

Please rate the manuscript. Explain your choice in the
Confidential Comments section.

( ) Award Quality

( ) Excellent

( ) Good

(X) Fair

( ) Poor


B. Recommendation 

Please make your recommendation and explain your decision in the
Detailed Comments section.

( ) Accept with no changes

( ) Accept if certain minor revisions are made

(X) Author should prepare a major revision

( ) Reject


Section III. Detailed Comments

A. Public Comments (these will be made available to the author)
 Section I.C.1 The title implies that more than one cost
estimation models was used in this experiment. The paper only
refers to COCOMO. Reconcile this issue.

The paper has potential but right now it reads as if it was
written for datamining specialists. It should be re-written for
the readers of the "Software" magazine. That means that less
details should be given about the machine learners and more
about why this work is important for a user of the COCOMO model.
How would a software project manager use COCOMO any differently
due to your proposed approach and what does that buy him/her?
What does he/she need to do or have e.g., what type of
historical data about projects? Do they have to be related
projects to the one at hand? How much related? Try to present
this work from the point of view of the project manager - more
like a black box description rather than a white box one. How
one would use this tool, rather than the internals of the tool
itself.

In addition, to improve readability:
- correct typing errors
- place figures right after their first reference in text
- distinguish between figures and tables
- if tables and figures help understanding, then use them, but
explain them better