The included datasets include:

Two sets from WHICH2, both log10'ed then equal width discritized into 2 bins
one with bins: 1-defective-bins.txt, 1-flawless-bins.txt
one with ranges: 2-defective-ranges.txt, 2-flawless-ranges.txt

Then 5 different single attribute rules tested over every set, all predicting for class = defective
four are high value bin: def-H-CBO.txt, def-H-DIT.txt, def-H-LOC.txt, def-H-WMC.txt
one is lower bin: def-L-NOC.txt

Then 5 predicting for class = flawless to see how the work for the opposite class to the predictions
four are high value bin: fl-H-CBO.txt, fl-H-DIT.txt, fl-H-LOC.txt, fl-H-WMC.txt
one is lower bin: fl-L-NOC.txt


Personally I don't think that the majority of the rules generated or predicted in literature work very well over the promise data sets. 
In nearly every case with a high PD, and equally high PF exists. Some of the WHICH2 rules test slightly better than the literature, 
while others do not. There does not seem to be a consistent better set of rules.  
The rules pulled from the literature appear to test just as poorly if the class is reversed from what it should be, 
considering it just caused the PD and PF to flip.

It is worth noting that several of the so called "interesting" attributes did show up in our WHICH2 rules fairly often, 
such as NOC, RFC, CA. Some others less frequently but present include CBO, DIT, and LCOM.
 
I think most of the problem is the relative rarity of the defective class in most datasets. 
While it is possible to find rules which predict the defective instance very well, 
there are inevitably just as many flawless instances which fit the same rules.

WHICH2 scoring
(* 2 (scores-pd s) (- 1  (scores-pf s)) (/  (+ (scores-pd s) (- 1 (scores-pf s)) (randf 0.0000001))))


V. Basili, L. Briand, and W. Melo. A validation of
object-oriented design metrics as quality indicators. Software
Engineering, IEEE Transactions on, 22(10):751–761, 1996.

~ High WMC, DIT, and CBO... low NOC -> Defects

L. Briand and Others. Investigating quality factors in
object-oriented designs: an industrial case study. Engineering,
(14598):1–36, 1999.

~ finds that coupling and cohesion measures can predict for software defects (CBO, LCOM, IC, CBM, CA)

K. El Emam, W. Melo, and J. Machado. The prediction of
faulty classes using object-oriented design metrics. Journal of
Systems and Software, 56(1):63–75, Feb. 2001.
 
~ finds that the DIT metric is directly associated with increased fault-proneness

H. Sahraoui, R. Godin, and T. Miceli. Can metrics help to
bridge the gap between the improvement of OO design quality
and its automation? Proceedings International Conference on
Software Maintenance, pages 154–162.

~ find coupling and inheritance metrics predict defects, such as CBO values greater than 14

M. Thapaliyal, G. Verma, and H. Garhwal. Software Defects
and Object Oriented Metrics-An Empirical Analysis.
ijcaonline.org, 9(5):41–44, 2010.

~ High WMC -> Defects

J. Xu, D. Ho, and L. F. Capretz. An Empirical Validation of
Object-Oriented Design Metrics for Fault Prediction. Journal
of Computer Science, 4(7):571–577, July 2008.

~ among OO metrics they conclude that WMC, CBO and RFC are most significant