OVERVIEW ======== This is CLAM, a prototype-based nearest neighbor learner (with feature selection). Rather than learn from all features and instances, CLAM prunes the dull feature ranges then creates a small nuumber of prototypes (most informative instances) using only those ranges. In the following, anything in "quotes" is an option that can be set on the command line. INPUTS ====== CLAM accepts inputs in csv format. The first line must start with a # symbol and is a list of column names. For numeric columns, make the name start with upper case. The last column is the class column. For example input files, see etc/data/*.csv. Usage ===== Everything in CLAM is calls to the Makefile to generate new files from an input .csv file. The new files are one of: .sym ---- The numeric columns of the .csv file replaced with a small bumber of "BINS". If the class column is numeric, then it is replaced with 1 or 0 where 1 is union of all the "BEST" class bins and 0 is the rest. For examle, the file stem.sym is generated from stem.csv via: make stem.sym .best ----- This file lists the best ranges for each class. The file stem.best is generated from stem.csv via: make stem.best BUILT-IN EXAMPLES ================= CLAM was built via TDD (test-driven development); i.e. by identifying the next smallest, most useful, change to the code then implemetning that delta, then writing a small test to demonstrate that delta. At each step, an "egN" command was added to the system. Sometimes, the output from "egN" was cached to etc/want/N. As a result, this distribution comes with numerous example commands already built in and a regression suite to test everything is still working. To run the one example, use make eg1 To run all the examples (which may be slow), use make egs To update the cached output