Project 2b

For 500-level students only.

Note: Apologies for getting this out late. I'll make it due Friday Week 7. If you moan loud enough in week7, I'll make it due Monday week 8 without any late fines.

Theory

Consider a model with N inputs (i.e. either details about a case study or some environmental factors or some control distribution for a random number generator). Assume each inputs comes from some space of possibilities range(N_i).

Some inputs are numeric in which case we know their range min..max is divided into "b" bins.
Some are discrete. E.g. we can model booleans as range(N_i)=(t,nil).

Notice we can handle both discrete and numeric values the same way: as a set of bins with only one difference we call ordinalp

Numeric bins are "ordinalp=t" and some pairs of bins are closer than others (e.g. 10 is close to 9, 1000 is far away from 9).
Discrete bins are "ordinalp=nil".

Note we can initialize the probability distribution of N_i, just by drawing the bins using "~":

(defconstant ~ 0)
(defconstant ~~ 1)
(defconstant ~~~ 2)
(defconstant ~~~~ 3)
(defconstant ~~~~~ 4)
(defconstant ~~~~~~ 5)
(defconstant ~~~~~~~ 6)
(defconstant ~~~~~~~~ 7)
(defconstant ~~~~~~~~~ 8)
(defconstant ~~~~~~~~~~ 9)
(defconstant ~~~~~~~~~~~ 10)
(defconstant ~~~~~~~~~~~~ 11)
(defconstant ~~~~~~~~~~~~~ 12)
(defconstant ~~~~~~~~~~~~~~ 13)
(defconstant ~~~~~~~~~~~~~~~ 14)
(defconstant ~~~~~~~~~~~~~~~~ 15)
(defconstant ~~~~~~~~~~~~~~~~~ 16)
(defconstant ~~~~~~~~~~~~~~~~~~ 18)
(defconstant ~~~~~~~~~~~~~~~~~~~ 19)
(defconstant ~~~~~~~~~~~~~~~~~~~~ 20)

(define age 
      20 ~
         ~~~
         ~~~~~~
         ~~~~~~~~~~
         ~~~~~~~~~~
         ~~~~~~
         ~~~
      60 ~
      
)

(This will need a little support code (and that is something you need to code).

Anyway, note that the model input is now a vector "V" with one slot for each range of each input.

At any time, some oracle has demanded that we only use a subset U ⊆ V of this vector.

Completion

The model requires one input for each N_i variable. We call the process of finding the inputs as the "completion" of "U".

If an input N_i is not in "U", we picks its value at random according by sampling the distribution "V - U".
If an input N_i is in "U", we select it from picks its value by sampling the distribution "V".
If more that one value for N_i appears in "U" then we are free to pick any one.
If the range is ordinal then each bin is defined by its own "min.. max" range, in which case we select any number at random in that range.

Using the Bins

Conceptually, there are multiple copies of the bins:

"PD": The raw bins counting how often new observations fall into a bin. Defines the probability distribution for the data:
```
bin1 : 10
bin2 : 200
bin3  : 10
bin4  : 2
```
"CD": The raw cumulative frequencies. Defines the cumulative probability distributions for the data:
```
bin1 : 10
bin2 : 210
bin3 : 220
bin4 : 222
```

"Sorted PD": The sorted raw bins

bin2 : 200
bin1 : 10
bin3 : 10
bin4 : 2

"Sorted CD": The raw sorted cumulative bins:

bin2 : 200
bin1 : 210
bin3 : 220
bin4 : 222

Note that as soon as we enter a new observation into "PD", then the other distributions becomes "stale" and we can't reuse it till we resort and recalculated the others

These bins are used for different purposes:

Updates:
- "Raw" is used for fast data update. By doing a binary-chop over the sorted keys of raw, you can quickly find a bin to update. Or, if "pd" is stored as an array with can determine its bin by (x-min)/((max - min)/b) and increment straight there.
Nudging:
- "CD" is used for "nudging". If you like the current value of "X" but you want to perturb a little, a lot, a heck of a lot, you can pick any random from the "CD" within plus or minus (10,20,35)% of the current position on the "CD".
Sampling:
- "Sorted PD" is used to generated "Sorted CD".
- "Sorted CD" is used to select random values according to "PD":
  - Pick a goal at random from (max to min) sorted CD. Set a counter to (max - min). Walk down "sorted CD" max to min. At every step, decrement the counter by the delta from last to next Sorted CD entry. If counter ≤ goal, return a number from that bin.
  Note: the sampling selection process will occur in the inner most core of the simulator. It is worth profiling the code and optimizing it. For notes on profiling, see "watch" in debug.lisp. For example, never re-sort unless you absolutely have to.

Tasks

Task1: Implement POM1
As described in the paper. See if you can generate the three figures for low, medium, high dynamism.
Task2: Monkey's Banana
Here's a little throw-away AI task, just to make sure you don't feel AI-starved.
Using DFS, BFS, DFID, adapt the tree search algorithm i gave you in class to the monkey/banana problem shown in class (and yes, you to use the (op ...) syntax.
If you answer this question properly then your search engine should be very general and the only place we see domain details is in (op).
Important: the 400 level students are also solving this problem but their solution is due 3 weeks after yours. Please ensure that they do not see your code. Testing Task2:
I will run demo-task2 and expect to see a trace generating of the monkey getting the banana. Then, I will edit the "-op" list, run again, and expect a different behavior.
Important: the 400 level students are also solving this problem but their solution is due 3 weeks after yours. Please ensure that they do not see your code.
Task3: Implement POM2
Same deliverables above. But now generate 32 graphs for low,high * {dynamism, size, culture, criticality, personnel}.
Task4: Implement "define"
Implement the define function shown above. Make it store the generated dists in a global *dists*.
Testing Task4: I will run various defines and then inspect *dists*.
Task5: Updates
Implement the update functionality described above. Hint, see any.lisp.
Testing Task5: I will run various demo-task5, that you write, and I expect to see before and after printing of values pulled from *dists*.
Task6: Nudging
Implement the nudging functionality described above.
Testing Task6: I will run various demo-task6, that you write, and I expect to see before and after nudging values pulled from *dists*.
Task7: Sampling
Implement the sampling functionality described above.
Testing Task7: I will run various demo-task7, that you write, and I expect to see values printed from a sample of *dist* vars that conforms to the distributions defined in your define functions.
Task8: Completion
Implement the completing functionality described above.
Testing Task8: I will run various demo-task8, that you write, and I expect to see a print of a vector containing some undecided values. Then, after completing, there will be another vector with atoms filled in from each value.

What's Next?

That's all for this project. But if you want to get started on the next one:

Augment your representation of a project such that some distributions are fixed (can't be controlled).
Implement a simulated annealing search for better project options that most increase value/cost. Using nudging to alter some percent of the non-fixed project options in the current solution.
As above, but use a beam search. Assume that no ranges have been selected, do random sims to assign scores to randomly selected inputs (and by random, we mean complete an empty vector). by their distance to the max score.