At the end of this draft, we offered a reply to reviewer2 comment that ``the main limitation of the research to me is the use of just two data sets (coming from 1981 and 1993). Even though there are subsets created, this still does not change anything on this limitation. This is especially true for the type of conclusions drawn to demonstrate conclusion stability. I can not see the reason why further data sets are (not) used to validate the stated proposition.''. Politeness prevents us for a more honest reply to Reviewer2. This empirical basis of our paper is as strong as several prior prominent TSE papers: 1) Chulani and Boehm's 1999 paper on Bayes tuning in COCOMO used 141 records. We have 156. 2) Shepperd 2001 TSE paper is based on 6 data sets. We have 19. Yes, there is some overlap in our 19 data sets but as observed in figures 3 and 4 of this paper, that overlap is not large. 3) The core empirical result of Shepperd's 2002 and 2005 TSE papers come from artificial data sets that were generated using distributions pulled from one data set per publication. 4) Our own prior TSE paper (October 2006) on this topic was accepted after a considered examination of two rounds of TSE reviews. None of those reviewers felt that we were over-generalizing our conclusions from the small data sets used in that paper. The industrial reality is that this kind of data is rare as hen's teeth. Now in an ideal world, we'd have more that 156 records divided into 19 subsets. But Boehm has been trying for a decade to extend that set, with no success. Given the data poverty, it is important that we can demonstrate stable conclusions with the available data. Hence, this paper.