Project 3b

Previously you have build a device that accepts inputs and generates an output score (area under the curve of the optimal frontier).

input = complete(treatment)
score = oracle(model(input))

Now it is time to search that model to find "sweet spots" that maximize area under the curve of the final frontier.

Use any six search engines. Two of them must be KEYS and Simulated annealing and the remainder are up to you (and "one" or more search engines can be a mix and match of others).

What to do

Live Demo

At our demo time, I will update your repository, load emacs, load slime, load iccle, then run

(make 'whatYouTellMeToMake)
(demo-search1)
(demo-search2)
(demo-search3)
(demo-search4)
(demo-search5)
(demo-search6)

When I run each one, I expect to see line 1 to be "I am (e.g.) simulated annealing". There should then be some (succinct) trace output that includes information on the best solution found so far. Finally, the best solution found by "demo-searchI" should be printed.

Documentation

For each search engine, add a text document to your repository that includes:

Pseudo-code and some explanation of the search algorithm. The explanation should include reference(s) to the original authors of the algorithm. For examples on the appropriate level of detail, see http://menzies.us/pdf/08realtime.pdf.
Write a short piece of ASCII text commenting on what "policy commitments" are made by the search algorithm.

General Search Principles

All these searcher will take the following form.

There is a bag of choices (lambda, sigma, criticality, personnel, etc) and each choice has a range of discrete values.

A search engine evaluates the value of adding some new choice=range to a treatment: some subset of choices*ranges. Note that:

Initially, the treatment is the empty set.
There is no assumption that all choices appear in the treatments.

To test the value of a treatment on a model, N times, a treatment is completed to form a model input. That is,

Input = complete(treatment)

An input contains one range of every choices. There are three cases:

Case 1: If a choice is not in the treatment, then we can complete that choice by picking any range of that choice at random.
Case 2: If a treatment contains multiple ranges for choice, then we select one of those ranges at random.
Case 3: If a treatment contains only one range for that choice, then we just use that range

(Note that, in the above, when we say "select at random", it is possible to bias that selection using data learned during the search; e.g. mix and matching the KEYS algorithm with, say, simulated annealing).

Policy Commitments

To walk, or jump? When you add in ranges to a treatment, do you do so one at a time? Or do you load in multiple additions? One at a time is safest but are there ever occasions when you can truck in a whole bunch of stuff? Note that KEYS2 is just KEYS that starts out walking, but is allowed to jump "i" steps ahead at each era "i".
When to jump back?A greedy search can get stuck in local maxima. Does your search need a reset and start again?
To Conjunct or Disjunct, that is the question When growing treatments, adding a new choice is a conjunction: the old choices AND this new one. But if you add more than one range for the same choice, that is a disjunction; i.e. old choices OR choice= someOtherRange. Conjuncts drive you into a smaller part of the historical log and add constraints to future runs.
Stochastic or Deterministic: Stochastic can be faster and can explore more space. Deterministic can be more thorough and not miss important solutions. Which is best?
Sampling policy: To score a treatment, N must be large enough to sample the distributions associated with the random choices of completion. Pre-experimentally, we have no guidance on that so try N=100,200,400,800 then sort the outputs and collect statistics on the 50% percentile (a.k.a. median) and the (75-50)% range (a.k.a. the spread). Try to keep N small (so runtimes are faster) but consider increasing N to reduce the spread.
What to remember? When searching, it is possible to arrive at the same state twice. In which case, it can be useful to store a cache of prior completions, and the scores they achieved. Poorly managed, this cache can consume all memory. So...
- Consider NOT using such a cache. Many methods don't use one since the overhead of memory management defeat any advantage this approach offers.
- Don't store the state, store a hash of the state (using LISP's "sxhash" function). Yes, this will lead to conclusions and (some) conclusions, but life is a risk.
- One simple strategy is to keep the cache of limited size and, if full, delete anything at random to make room for the new entry.
- If it is a commonly found state then it will be added in multiple times and the random deletion won't (usually) kill it.
- And if it is a rare state, then who cares if we delete it from the cache?
Early termination criteria: As the size of the treatment grows, the median score generally plateaus, i.e. further extensions do not grow the plateau any further. Therefore, your treatment generally needs a early termination criteria (e.g. median score not improved by 5%).
Back select: After treatment growth stops, the treatment may be over-specialized. E.g. the treatment added A=1,B=2,C=3 to the treatment (in that order) but A is highly correlated to C to either A or C could now be dropped. To achieve this, after treatment growth stops, you need a Back select that tries throwing things away and scores the reduced set using the above scoring mechanism. Back-select continues until the reduced treatment starts to score significantly worse than the non-reduced one (for a definition of "significantly worse", see any stats textbook and the definition of "t-tests"). The simplest back-select is just a greedy search that tries deleting one thing at a time (and stops if the next removal does not improve the median score over previous removals). More complex back-selects collect statistics while growing the treatment to offer hints on what might be thrown away first, second, etc, during the back select.
Nothing trite: Some solutions found by the search will be trite; e.g. write non-critical systems using large teams of alpha programmers. You'll need to implement some method of push back that dodges trite solutions. The simplest way is to run the search for specialized subsets of the input space. E.g. run it ONLY for top-half criticality.

Notes on Specific Search Principles

Note that simulated annealing (and, if you use it, ISAMP), will probably be your baseline result that everything else will beat. But you never know.

Mandatory

You must implement the following.

Simulated Annealing
When generating a neighborhood solution for simulated annealing, use nudging.
Simulated annealing offers ranges to every choice and may severely over-constrain the solution (so back select will be useful++ here).
Keys
Keys is described in XXX
When ranking ranges for KEYS, use the support*probability measure described on page 6, column 1 of http://menzies.us/pdf/07casease.pdf
Keys may require no back-select (since it works it out on the way down).

Others

You can implement any or none of the following. Please don't let your imagination be restricted to just this list.

ISSAMP: Very simple to code: if a new solution stalls and very gets beyond X% of the previous "best" found, then hit the reset and start again.
DFID: Very simple to code and since it does very little work at each step, it can search more steps and out-perform the others methods.
Beam: A beam of width 10 seems to conqueror most problems. But you never can tell till you try.
A-STAR
Recall that A-star is a beam/best search that ranks new ranges by their contribution to g+h , where "g" is the distance from the start to here and "h" is the distance to the goal. For "g", use the number of constraints added into the treatment:
- If the input has 7 choices and the each choice has a 7 ranges, that is 7*7 degrees of freedom.
- Adding in ranges to a choice reduces those degrees of freedom for that choice. So a disjunction of four ranges in choice3 rules out three ranges. So Choice3's "g" is 3. But If we adding all 7 ranges, then Choice3's "g" is zero since we're ruled out nothing at all. So "g" can be computed by summing the number of ruled out ranges from each choice, then dividing by the total number of ranges.
As for "h", you need some distance measure to the maximum possible value. You won't have the final frontier during the search so the best you can do is score each output's "h" as the ratio of the top value seen so far.
Limited Discrepancy Search
Here's an interesting variant of ISSAMP that does not jump back right to the beginning: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.2426. Note that you want to try their LDS as well as their LDS-BBS.
Tabu Search
One common meta-search method is "tabu" search where you keep some hash of the space you've searched before:
- When you reach what looks like a best solution, sample some distance around the current solution to find the the least worst solution in the local neighborhood.
- Start searching again from this point, avoiding places you've been too before (in the tabu set).
- Alternatively, start searching again from this point, favoring places you've been too before (use the tabu set as a "attractor" set). Note that if the tabu set is persistent across multiple searches, then this method could become a little working memory of past useful solutions that (might) optimizes the search for future solutions.
- If the memory requirements of your tabu memory explores, use the tricks described above under "what to remember?".
Mix and Match
Feel free to mix and match the above. E.g. use Keys as a sub-routine of simulated annealing so that "random" neighborhood generation can be constrained to promising areas. For example, Mr. El-waras's "STAR" system is a combination of simulated annealing and KEYS (but note that subsequent work found that KEYS could do just as well on that problem space, without any initial simulated annealing).