cs572: project 4b

Some structure as before (for each task, write output files to
doc/proj4 in your svn repositories).

==============
Due dates

Seminars: week of april 22

Code walk thru: week of april 29

Final report: tuesday may 6th <== hard deadline. no extensions

==============
Deliverables.

1) All your code working so i can run it in the usual way (cd to the
   main directory, load "make.lisp", type (make))

2) A 10 page (max) report concluding what is the best search engine
for finding the fewest constraints that most reduce
effort/defect/threats/months.

3) A code walk through with me

4) A seminar to the class in the last week of term.

5) Slides from the seminar (in doc/proj4)

==========================
About the slides

slides should be in pdf format

30 minutes max, so 30 slides max (but consider using far fewer
slides).

slide one- everyone's name and email

main font no smaller than 15 point

include in doc/proj4

slides should include statistical results of section (4), below

=========================
About the report (syntactic details)

Syntactic details:

- report must be submitted in pdf format

- the report must be gray scale (NO COLOR! NONE! NONE!)

- the report must be called report_lname1_lname2_....pdf where "lname"
  is the last name of everyone in the group.

- report must comply with the style instructions at
  http://www.aaai.org/Publications/Author/author.php (see the msword
  and latex macros listed at the end of the page)

- the files used to generate the report (figures, latex source, ms
  word) must be committed to your svn doc/proj3 directory.

- report must include author names and emails on page1, some attention
  grabbing title, an introduction/motivation section, a related work
  section, results, discussion, future work, conclusion,
  references. all listed references must be used in the paper. all
  references used in the paper must be in the reference section.

- the abstract should be of the form...

"A" is a problem because of "B". Previously, "C" said it is best to
use "D"to solve "A". But when we tried "C" we say "E"E which made us
hypothesize "F". We checked "F" using "G" and we say "H". Therefore,
in future, we recommend that "I" is used for "A".

=========================
About the report (Semantic details)

The report must include the following material (and how you do that
and in what order is up to you- but add footnotes saying "this part
addresses 1a, 2c, etc")

1a) Succinctly describe all your search engines with pseudo-code and
known theoretical results about their properties (e.g. max. memory
required, complete? max run time?).

1b) Read and summarize section three (only) of
http://menzies.us/pdf/08icsp.pdf.  Your summary should explain

1b1) two policies: strategic and tactical (see fig 2)

1b2) four case studies: (see fig 3)

1c) Go forth to statistics textbooks, wikipedia, etc etc and answer
these questions:

1c1) what is the difference between a paired and an unpaired test?

1c2) What is the difference between parametric tests like t-tests and
non-parametric tests like Wilcoxon and Mann-Whitney?

(for hep with 1c1 1c2 , see:
-http://faculty.vassar.edu/lowry/webtext.html - Janez
Demsar. Statistical Comparisons of Classifiers over Multiple Data
Sets. Journal of Machine Learning Research (2006). Note that this
paper is overly complex and does not talk to Mann-Whitney.  - For
notes on Mann-Whitney, see http://faculty.vassar.edu/lowry/ch11a.html.
)

1c3) decide if you should use paired t-tests, unpaired tests,
parametric or non-parametric. justify your decision

1c4) nova comes with paired t-tests and unpaired mann-whitney. if you
decided on other ones, please implement them in lisp

2) generate baselines

2a) Run your search engines on FIVE different case studies and THREE
policies. Regarding the FIVE case studies: four case studies come from
figure 3 of http://menzies.us/pdf/08icsp.pdf; and he fifth case study
is "ANY" and it lets the features range over their entire range (no
constraints). Regarding the THREE policies: two case policies come
from fig2 of http://menzies.us/pdf/08icsp.pdf and the third comes is
"ALL" and lets the search engine use any attributes at all.

2b) repeat (2a) 20 times and generate gnuplots of median/spread
energy/threats/months/effort/detects .

- Do you recall median and spread? 

   - Median is the point where 50% of the population falls before that
     number.

   - Upper is the point where 75% of the population falls before that
     number.

   - Spread is Upper minus Median and is a non-parametric measure of
     variance.-

- To include pretty gnuplots:

    set terminal eps postscript 15
    set output "something.eps"'
    - after gnuplotting
             pdflatex something.eps
    - if latexing, using "pdflatex" not "latex" to include pdf includes.

3) run your experiments

For each search engine, for each case study, run 20 times and collect
medians and spreads for the following measures:

- initial   threats/months/effort/detects

- minimum    threats/months/effort/detects

- number of ranges used to reach the minimum; e.g. setting
  acap=[2,3]  and pmat=[1,2,3] is five ranges

- number of features used to reach the minimum; e.g. setting
  acap=[2,3]  and pmat=[1,2,3] is two features

- runtime to reach the minimum

- runtime for your rig (which may not be runtime to minimum if you do
  many runs, then look back on it all to find the minimum)

Show these somehow using gnuplot. The how is up to you but have a look
at figure 8 of http://menzies.us/pdf/07casease.pdf

4) Perform a statistical analysis on the results (mann-whitney,
t-tests, whatever you think best). Report quartiles and win loss
ties. Sort results by losses. For a sample of that report, see
http://unbox.org/wisp/var/Zach/var/whichOut/quartileWinLoss.out. Note
that "quartiles" are defined by five numbers (and these are the
numbers in the table shown top left of
http://unbox.org/wisp/var/Zach/var/whichOut/quartileWinLoss.out):

- min
- quartile1
- quartile2 (a.k.a. the median)
- quartile3 (and spread is quartile3 - quartile2)
- max

5)  Make a recommendation:

- Please balance the threats/effort/defects/months reduction vs the
  complexity of the solution (# of features, # of ranges). Note that
  figure 8 of http://menzies.us/pdf/07casease.pdf offers some nice
  trade off information about how the solution complexity effects the
  value of the output.

- Please don't rely just on the stats tests: two things can be
  statistically different by only (say) 1% different in absolute
  terms. In which case I'd say that they tied and I'd pick the simpler
  solution.