COCOMIN was designed with two goals in mind.
The first was to develop a simple and fast wrapper for COCOMO feature selection without sacrificing performance.
The second was to be used as a 'strawman' approach to feature selection.

* algorithm description already written

explain reason for experiment is to find which variation of cocomin to use

mention this fss was only used on the cost drivers, we always used kloc


A wrapper uses the target learner as its evaluation critera, in this case local calibration.
However, there are multiple methods to evaluate a learner's performance.
We tried the following evaluation measures in this experiment:
	mmre
	pred30
	sd(mre)
	correlation

Another important quality of a wrapper is the search method.
If there are n features to consider, then there are 2^n potential combinations, which is generally considered too many to check.
Heuristics are used to prune the realm of possibilities.
Since our goal was speed we decided to use a greedy search technique that makes O(n) feature subset evaluations.

Typically there are two ways to initiate a search: forward-select and backward-elimination.
In a forward-select search you start with an empty feature set and grow it one feature at a time.
If the feature improves performance it is kept, otherwise it is discarded.
In a backward-elimination search you start using all of the features and try removing features one at a time.
In our experiment we tried forward-select, backward-elimination, and a method that tries both and returns the best performing feature set.

It is possible that the order the features are tested is very important.  Potential new features may be less likely to be valuable if there are already highly correlated features in the feature set.
We tried ordering the features by:
	native ordering
	correlation (high and low)
	standard deviation (high and low)
	entropy (high and low)

need to show graph of cocomin vs lc and write about it

* need to describe my experiment
 - describe use of n-fold to try and reduce variance in my experimental method and also by removing the random element it makes it easier to reproduce.

* need link to code

* need to explain the graphs
 - mention that the x axis is the coc81 dataset, the nasa93 dataset, and the stratifications  of those data sets