% Keep existing writing about cocomin and add this.

Since COCOMIN is a WRAPPER, which has been shown to perform well in feature selection due to its evaluation using the target learner (REF TO HALL&HOLMES), we hope that COCOMIN will be both effective and fast.
We had several ideas to improve the effectiveness of our strawman WRAPPER without making it overly complicated.
We ran an experiment to discover the methods useful to a baseline feature selector for COCOMO models.

EXPERIMENT

The experiment was designed to illustrate the differences between many variations of the COCOMIN wrapper.
The data used was the publicly availabe COCOMO-I and NASA dataset including their subsets.
This was done using leave-one-out cross validation, also known as n-fold cross validation (REF WITTEN & FRANK HERE).
This is done by sequentially using each record as the test instance and training on all other records.
This method was chosen because 1) COCOMIN was fast enough to use this exhaustive validation, and 2) no random sampling is involved.
Thus, it is easily reproducible.

One major quality of a wrapper is the search method.
If there are n features to consider, then there are 2^n potential combinations, which is generally considered too many to check.
Heuristics are used to prune the realm of possibilities.
Since our goal was speed we decided to use a greedy search technique that makes O(n) feature subset evaluations.
Variations of the search method were not explored because we COCOMIN to make a linear number of feature set evaluations.

Typically, the feature space is greedily searched via either forward-select or backward-elimination (REF TO WITTEN & FRANK, KOHAVI).
In a forward-select search you start with an empty feature set and grow it one feature at a time.
If the feature improves performance it is kept, otherwise it is discarded.
In a backward-elimination search you start using all of the features and try removing features one at a time.
In our experiment we tried forward-select, backward-elimination, and a method that tries both and returns the best performing feature set.

We must also consider the criteria that the WRAPPER uses to evaluate its target learner.
We tried the following evaluation measures in this experiment:
	MMRE
	Pred30
	SD(MRE)
	Correlation

It is possible that the order the features are tested is very important.
Potential new features may be less likely to be valuable if there are already highly correlated features in the feature set.
We tried ordering the features by:
	Native Ordering
	Correlation (High and Low)
	Standard deviation (High and Low)
	Entropy (High and Low)

If we can order the features such that the most valuable features come first then we might not want to try them all.
If we keep searching the feature space without finding useful features we can short-circuit out of the search.
We tried this idea with a value called Horizon.
For example, with a Horizon of 2 in a forward-select search, we will accept testing 2 features in a row that aren't kept but on the 3rd one the search will stop.
Note that if we find a useful feature then this countdown is reset.
We tried Horizons of 0, 1, 2, 4, 8, and 16.
Note that since our datasets only have 15 effort multipliers that we are considering, a Horizon of 16 means we always try them all.
Also, since the Horizon countdown is reset when a useful feature is found, the Horizons of 4 and 8 are very similar to 16.

RESULTS

We define a COCOMIN variation as a combination of the:
	Search Method (e.g. Forward-Select, Backward-Elimination, Both)
	Evaluation Method (e.g. MMRE, Pred30, SD(MRE), Correlation)
	Ordering Method (e.g. Native, Correlation, Deviation, Entropy)
	Horizon Used (e.g. 0, 1, 2, etc)

We chose our baseline configuration	for COCOMIN to use both search methods, MMRE for evaluation, native feature ordering, and a horizon of 16.
In all of the following graphs of the experiment's output this baseline is used except for one of the methods, such as the evaluation criteria.
In each of these graphs the X axis is composed of the 19 datasets used in the experiment, and the Y axis is one of 3 evaluation measures: MMRE, Pred30, and SD(MRE).


SEARCH - MMRE
	Forward-Select does much worse on some data sets and only slightly better on others.
	Both follows Backward-Elimination very closely.

SEARCH - PRED30
	There is no clear trend.  They all do perform fairly equivalently.

SEARCH - SD(MRE)
	All methods are equivalent except Forward-Select has a bad spike on one dataset.
	Both follows Backward-Elimination very closely.

EVALUATION - MMRE
	All methods except evaluation by standard deviation performed similarly well.

EVALUATION - PRED30
	All methods perform well on some of the data.
	Not thate evaluation by Deviation did very well on some of the data.
	Note that evaluation by Pred30 did not optimize the Pred30 results.

EVALUATION - SD(MRE)
	All methods performed similarly well.

ORDERING - MMRE
	All methods performed similarly well.

ORDERING - PRED30
	All methods performed similarly well.

ORDERING - SD(MRE)
	All methods performed similarly well.

HORIZON - MMRE
	All methods performed similarly well.

HORIZON - PRED30
	All methods performed similarly well.

HORIZON - SD(MRE)
	All methods performed similarly well.

CONCLUSION

Analysis of the search methods showed that using both forward-select and backward-elimination and choosing the best performing feature set usually picked backward-elimination.
In addition, forward-select performed worse on some of the datasets.
Therefore, we will use backward-elimination in future implementations of COCOMIN.
Investigation of the evaluation methods showed that MMRE performed as well as the other methods.
Because it is the simplest to calculate we will use evaluation by MMRE in COCOMIN.
Analysis of the feature ordering showed the native ordering of the data was as good as the other methods so we will not use ordering methods in COCOMIN.
It is interesting that the low horizon values performed as well as they did.
However, since we have decided to use the native ordering of the features in our search, we will not use a horizon technique to stop the search.
This is because if there is no ranked ordering of the features then they should each have a chance to be considered for removal during the backward-elimination search.

This experiment showed another example where simple methods did as well as more complicated methods.