cs572: project 4b Some structure as before (for each task, write output files to doc/proj4 in your svn repositories). ============== Due dates Seminars: week of april 22 Code walk thru: week of april 29 Final report: tuesday may 6th <== hard deadline. no extensions ============== Deliverables. 1) All your code working so i can run it in the usual way (cd to the main directory, load "make.lisp", type (make)) 2) A 10 page (max) report concluding what is the best search engine for finding the fewest constraints that most reduce effort/defect/threats/months. 3) A code walk through with me 4) A seminar to the class in the last week of term. 5) Slides from the seminar (in doc/proj4) ========================== About the slides slides should be in pdf format 30 minutes max, so 30 slides max (but consider using far fewer slides). slide one- everyone's name and email main font no smaller than 15 point include in doc/proj4 slides should include statistical results of section (4), below ========================= About the report (syntactic details) Syntactic details: - report must be submitted in pdf format - the report must be gray scale (NO COLOR! NONE! NONE!) - the report must be called report_lname1_lname2_....pdf where "lname" is the last name of everyone in the group. - report must comply with the style instructions at http://www.aaai.org/Publications/Author/author.php (see the msword and latex macros listed at the end of the page) - the files used to generate the report (figures, latex source, ms word) must be committed to your svn doc/proj3 directory. - report must include author names and emails on page1, some attention grabbing title, an introduction/motivation section, a related work section, results, discussion, future work, conclusion, references. all listed references must be used in the paper. all references used in the paper must be in the reference section. - the abstract should be of the form... "A" is a problem because of "B". Previously, "C" said it is best to use "D"to solve "A". But when we tried "C" we say "E"E which made us hypothesize "F". We checked "F" using "G" and we say "H". Therefore, in future, we recommend that "I" is used for "A". ========================= About the report (Semantic details) The report must include the following material (and how you do that and in what order is up to you- but add footnotes saying "this part addresses 1a, 2c, etc") 1a) Succinctly describe all your search engines with pseudo-code and known theoretical results about their properties (e.g. max. memory required, complete? max run time?). 1b) Read and summarize section three (only) of http://menzies.us/pdf/08icsp.pdf. Your summary should explain 1b1) two policies: strategic and tactical (see fig 2) 1b2) four case studies: (see fig 3) 1c) Go forth to statistics textbooks, wikipedia, etc etc and answer these questions: 1c1) what is the difference between a paired and an unpaired test? 1c2) What is the difference between parametric tests like t-tests and non-parametric tests like Wilcoxon and Mann-Whitney? (for hep with 1c1 1c2 , see: -http://faculty.vassar.edu/lowry/webtext.html - Janez Demsar. Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research (2006). Note that this paper is overly complex and does not talk to Mann-Whitney. - For notes on Mann-Whitney, see http://faculty.vassar.edu/lowry/ch11a.html. ) 1c3) decide if you should use paired t-tests, unpaired tests, parametric or non-parametric. justify your decision 1c4) nova comes with paired t-tests and unpaired mann-whitney. if you decided on other ones, please implement them in lisp 2) generate baselines 2a) Run your search engines on FIVE different case studies and THREE policies. Regarding the FIVE case studies: four case studies come from figure 3 of http://menzies.us/pdf/08icsp.pdf; and he fifth case study is "ANY" and it lets the features range over their entire range (no constraints). Regarding the THREE policies: two case policies come from fig2 of http://menzies.us/pdf/08icsp.pdf and the third comes is "ALL" and lets the search engine use any attributes at all. 2b) repeat (2a) 20 times and generate gnuplots of median/spread energy/threats/months/effort/detects . - Do you recall median and spread? - Median is the point where 50% of the population falls before that number. - Upper is the point where 75% of the population falls before that number. - Spread is Upper minus Median and is a non-parametric measure of variance.- - To include pretty gnuplots: set terminal eps postscript 15 set output "something.eps"' - after gnuplotting pdflatex something.eps - if latexing, using "pdflatex" not "latex" to include pdf includes. 3) run your experiments For each search engine, for each case study, run 20 times and collect medians and spreads for the following measures: - initial threats/months/effort/detects - minimum threats/months/effort/detects - number of ranges used to reach the minimum; e.g. setting acap=[2,3] and pmat=[1,2,3] is five ranges - number of features used to reach the minimum; e.g. setting acap=[2,3] and pmat=[1,2,3] is two features - runtime to reach the minimum - runtime for your rig (which may not be runtime to minimum if you do many runs, then look back on it all to find the minimum) Show these somehow using gnuplot. The how is up to you but have a look at figure 8 of http://menzies.us/pdf/07casease.pdf 4) Perform a statistical analysis on the results (mann-whitney, t-tests, whatever you think best). Report quartiles and win loss ties. Sort results by losses. For a sample of that report, see http://unbox.org/wisp/var/Zach/var/whichOut/quartileWinLoss.out. Note that "quartiles" are defined by five numbers (and these are the numbers in the table shown top left of http://unbox.org/wisp/var/Zach/var/whichOut/quartileWinLoss.out): - min - quartile1 - quartile2 (a.k.a. the median) - quartile3 (and spread is quartile3 - quartile2) - max 5) Make a recommendation: - Please balance the threats/effort/defects/months reduction vs the complexity of the solution (# of features, # of ranges). Note that figure 8 of http://menzies.us/pdf/07casease.pdf offers some nice trade off information about how the solution complexity effects the value of the output. - Please don't rely just on the stats tests: two things can be statistically different by only (say) 1% different in absolute terms. In which case I'd say that they tied and I'd pick the simpler solution.