/timm's /charming /python /tricks
Download
statsnotes.py.
Read more on How to be Charming (in Python).
001: """ 002: <em>(Note to students: before you get scared about all this, note that I've coded 003: this up for you- see the demos in 004: [statsd.py](http://unbox.org/open/trunk/472/14/spring/var/code/statsd.html). 005: But you should not just use things 006: you do not understand. So read and enjoy.)</em> 007: 008: Comparing Different Optimizers 009: ============================= 010: 011: 012: 013: For the most part, we are concerned with very 014: high-level issues that strike to the heart of the 015: human condition: 016: 017: - What does it mean to find controlling principles in the world? 018: - How can we find those principles better, faster, cheaper? 019: 020: But sometimes we have to leave those lofty heights 021: to discuss more pragmatic issues. Specifically, how 022: to present the results of an optimizer and, 023: sometimes, how to compare and rank the results from 024: different optimizers. 025: 026: Note that there is no best way, and often the way we 027: present results depends on our goals, the data we 028: are procesing, and the audience we are trying to 029: reach. So the statistical methods discussed below 030: are more like first-pass approximations to something 031: you may have to change extensively, depending on the 032: task at hand. 033: 034: In any case, in order to have at least 035: one report that that you quickly generate, then.... 036: 037: Theory 038: ------ 039: 040: The test that one optimizer is better than another can be recast 041: as four checks on the _distribution_ of performance scores. 042: 043: 1. Visualize the data, somehow. 044: 2. Check if the distributions are _significantly different_; 045: 3. Check if the central tendency of one distribution is _better_ 046: than the other; e.g. compare their mean values. 047: 4. Check the different between the central tendencies is not some _small effect_. 048: 049: The first step is very important. Stats should 050: always be used as sanity checks on intuitions gained 051: by other means. So look at the data before making, 052: possibly bogus, inferences from it. For example, 053: here are some charts showing the effects on a 054: population as we apply more and more of some 055: treatment. Note that the mean of the populations 056: remains unchanged, yet we might still endorse the 057: treatment since it reduces the uncertainty 058: associated with each population. 059: 060: 061: <center> 062: <img width=300 063: src="http://unbox.org/open/trunk/472/14/spring/doc/img/index_customers_clip_image002.jpg"> 064: </center> 065: 066: 067: One possible bogus inference would be to apply the 068: third test without the second since if the second 069: _significance_ test fails, then the third _better_ test could give misleading results. 070: For example, returning 071: to the above distributions, note the large overlap 072: in the top two curves in those plots. When 073: distributions exhibit a very large overlap, it is 074: very hard to determine if one is really different to 075: the other. So large variances can mean that even if 076: the means are _better_, we cannot really say that 077: the values in one distribution are usually better 078: than the other. 079: 080: 081: 082: Pragmatics 083: ---------- 084: 085: There are several pragmatic issues associated with these tests. 086: 087: 088: ### Experimental Design 089: 090: In a very famous quote, [Ernest Rutherford](http://en.wikipedia.org/wiki/Ernest_Rutherford) once said 091: 092: + _"If your experiment needs statistics, you ought to have done a better experiment_". 093: 094: In this view, it is a mistake to perform arcane statistical tests on large sets 095: of results- far better to design your experiments better. 096: 097: In this subject, 098: we have seen one such paper that 099: [explored multiple options in optimization](http://unbox.org/doc/pso/Off-The-Shelf_PSO.pdf) 100: but they did not explore all combinations. Rather, they walked through the space 101: of options comparing a few options at a time. 102: 103: While 104: [that's a good paper](http://unbox.org/doc/pso/Off-The-Shelf_PSO.pdf), it does not explore 105: _interaction effects_ where some options, in combination with others, have unintended 106: side-effects. So all hail [Ernest Rutherford](http://en.wikipedia.org/wiki/Ernest_Rutherford) 107: but sometimes you just gotta look at many options. 108: 109: So, on with the show. 110: 111: 112: ### Effect size 113: 114: Much recent commentary has 115: remarked that two sets of numbers can be 116: _significantly different_, but the size of that 117: difference is so small as to be very boring. For 118: example, the blue and red lines in the following are 119: significantly different: 120: 121: <center> 122: <img width=300 123: src="http://unbox.org/open/trunk/472/14/spring/doc/img/distributed.png"> 124: </center> 125: 126: But when you look at the _size_ of 127: the difference, it is so small that you have 128: to say its kinda boring. The lesson here is that 129: even if two distributions are _significantly difference_ 130: and even if their means are _better_, then if the _effect size_ 131: is very small then we should not report a difference in the population. 132: 133: In the following, we will use the _a12_ test for effect size. 134: 135: 136: ### Handling Distributions of Many Shapes 137: 138: Data may be _shaped_ in funny ways. Consider the following 139: distributions: 140: 141: <center> 142: <img width=400 143: src="http://unbox.org/open/trunk/472/14/spring/doc/img/dists.gif"> 144: </center> 145: 146: Note that these are all not smooth symmetrical bell-shaped curves 147: that rise to a single maximum value. Why is this important? Well, several 148: of the widely used tests of _statistical significance_ assume such shapes. 149: One method that does not is the _bootstrap sampling_ method discussed below- but 150: that requires hundreds to thousands of resamples of the domain (which can be 151: slow for very large distributions). So, pragmatically, if we want to avoid 152: assuming that the data fits some particular shape then we need to somehow 153: restrict the number of times we apply slower methods like bootstrapping. 154: 155: 156: Note that one reason we use the _a12_ test for effect size is that 157: this particular test makes no assumptions about the shape of the data being explored. 158: 159: Note also that bootstrapping can be very, very slow indeed. A constant plea 160: in the bootstrapping literature is "tell us how to make it run faster". 161: In the following, we will minimize the calls to bootstrapping as follows: 162: 163: + Only bootstrap if the effect size is not small; 164: 165: That is, instead of checking for effect size _after_ checking for _significant difference_, 166: we check _before_. This is important since it avoid unnecessary calls (and very slow) 167: calls to bootstrapping. 168: 169: ### The Confidence Problem 170: 171: A second pragmatic issue is the _confidence_ 172: issue. Suppose we have four optimizers, each of 173: which is controlled by four parameters 174: (e.g. mutation rate etc), and you are testing these 175: on 10 models (Fonseca, ZDT, etc). If you break the 176: parameters into (say) three big chunks (e.g. lo, 177: medium, high), in effect you are comparing results 178: on 10*4*3<sup>4</sup>=3,240 _treatments_; i.e. over 179: 5,000,000 comparisons. Now it is sensible to 180: explore treatments within each model separately, which means you are 181: "only" comparing pairs of 324 numbers, which is 182: still over 50,000 comparisons. 183: 184: Why is this a problem? Well, tests for 185: _significantly difference_ report the probability 186: that members of population1 do not overlap 187: population2. In practice, most populations overlap a 188: little, so standard practice is to restrict the 189: overlap test to some small number; e.g. report that 190: two populations are significantly different if 95% 191: of the members are not found in the overlap. 192: 193: Now can you see the problem? If these tests are 95% confident 194: and we run 324 such comparisons tests, our results 195: now have confidence 0.95<sup>50,000</sup>=0.0000061% 196: confident (i.e. not confident at all. 197: 198: An alternative to the above is to sort all the 199: results from different treatments on their mean value, 200: then compare the optimizer at position _i_ to position 201: _i+1_. 202: Note that this procedure requires _N-1_ comparisons 203: for _N_ treatments. 204: So now for 324 treatments, our statistical tests are 205: confident to 0.95<sup>323</sup>=0.000006 percent- which 206: is better than before but still very very very poor. 207: 208: Yet another alternative is to sort the data then 209: conduct a binary chop on the results, and only run 210: the significance tests on each chop. For our 323 211: treatments, this will generate results with 212: a confidence of 0.95<sup>log2(323)</sup>=65%. 213: 214: This can be improved further but only chopping the 215: data at points where the mean value of the 216: treatments in each chop are most different. This is 217: the _Scott-Knott_ procedure discussed below. If 218: some population has mean _mu_ and some chop divides 219: the _n_ members of that population into _n1_ and 220: _n2_ groups with means _mu1_ and _mu2_, then the 221: expected value of the difference the mean before and 222: after the chop is: 223: 224: delta = n1/n*(mu - mu1)**2 + n2/n*(mu - mu2)**2 225: 226: Scott-Knott picks the chops that maximizes _delta_. 227: In the code shown below, that Scott-Knott also rejects chops 228: that: 229: 230: - Generate tiny groups in the data- e.g. less than 4 individuals in any chop; 231: - Are different by only some _small effect_ implemented using, say, the _a12_ test discussed below. 232: 233: As mentioned above, this second step (running _a12_ before checking for significant differences) 234: is a way to reduce the time required for bootstrapping. 235: 236: More importantly, the fewer chops we explore, the fewer times we run 237: bootstrap and the fewer times we run into the confidence problem. 238: Just as a "what-if", support our Scott-Knott means we only 239: have to chop the data 20 times (which is actually more that usual). 240: If so, the our confidence in the conclusions would be 0.95<sup>log2(20)</sup> which is 241: 80 percent- which is not bad for a study looking at 324 options. 242: 243: 244: Enough theory, on with the code 245: 246: 247: ## How to... 248: 249: ### Visualization 250: 251: 252: As said above, stats should 253: always be used as sanity checks on intuitions gained 254: by other means. 255: 256: 257: Suppose we had two optimizers which in a 10 repeated 258: runs generated performance from two models: 259: 260: """ 261: def _tile2(): 262: def show(lst): 263: return xtile(lst,lo=0, hi=1,width=25, 264: show= lambda s:" %3.2f" % s) 265: print "one", show([0.21, 0.29, 0.28, 0.32, 0.32, 266: 0.28, 0.29, 0.41, 0.42, 0.48]) 267: print "two", show([0.71, 0.92, 0.80, 0.79, 0.78, 268: 0.9, 0.71, 0.82, 0.79, 0.98]) 269: """ 270: 271: When faced with new data, always chant the following mantra: 272: 273: + _First_ visualize it to get some intuitions; 274: + _Then_ apply some statistics to double check those intuitions. 275: 276: That is, it is _strong recommended_ that, prior 277: doing any statistical work, an analyst generates a 278: visualization of the data. Percentile charts a 279: simple way to display very large populations in very 280: little space. For example, here are our results from 281: _one_, displayed on a range from 0.00 to 1.00. 282: 283: one * --| , 0.28, 0.29, 0.32, 0.41, 0.48 284: two | -- * -- , 0.71, 0.79, 0.80, 0.90, 0.98 285: 286: In this percentile chart, the 2nd and 3rd 287: percentiles as little dashes left and right of the 288: median value, shown with a _"*"_, (learner _two_'s 289: 3rd percentile is so small that it actually 290: disappears in this display). The vertical bar _"|"_ 291: shows half way between the display's min and max (in 292: this case, that would be (0.0+1.00)/2= 0.50) 293: 294: From the above, we could write a little report that 295: shows the mean and rank performance of the two 296: learners. 297: 298: one :mu 0.32 :rank 1 299: two :mu 0.80 :rank 2 300: 301: From this report, if the goal was to maximize some 302: factor, then it is clear we would recommend _two_ 303: over optimizer__one_. 304: 305: #### Xtile 306: 307: The advantage of percentile charts is that we can 308: show a lot of data in very little space. For 309: example, here's 2000 numbers shown as a _quintile_ 310: chart on two lines. 311: 312: + Quintiles divide the data into the 10th, 30th, 313: 50th, 70th, 90th percentile. 314: + Dashes (_"-"_) mark the range (10,30)th and 315: (70,90)th percentiles; 316: + White space marks the ranges (30,50)th and 317: (50,70)th percentiles. 318: 319: Consider two distributions, of 1000 samples each: 320: one shows square root of a _rand()_ and the other 321: shows the square of a _rand()_. 322: """ 323: 324: def _tile() : 325: import random 326: r = random.random 327: def show(lst): 328: return xtile(lst,lo=0, hi=1,width=25, 329: show= lambda s:" %3.2f" % s) 330: print "one", show([r()**0.5 for x in range(1000)]) 331: print "two", show([r()**2 for x in range(1000)]) 332: 333: """ 334: 335: In the following quintile charts, we show these distributions: 336: 337: + The range is 0 to 1. 338: + One line shows the square of 1000 random numbers; 339: + The other line shows the square root of 1000 random numbers; 340: 341: Note the brevity of the display: 342: 343: one -----| * --- , 0.32, 0.55, 0.70, 0.84, 0.95 344: two -- * |-------- , 0.01, 0.10, 0.27, 0.51, 0.85 345: 346: As before, the median value, shown with a _"*"_; and 347: the point half-way between min and max (in this 348: case, 0.5) is shown as a vertical bar _"|"_. 349: 350: For details on how to draw percentile charts, _xtiles_ 351: in [do.py](http://unbox.org/open/trunk/472/14/spring/var/code/do.html) 352: 353: 354: ### The Scott-Knott Procedure 355: 356: 357: Suppose we have four optimizers which have been run 358: four times each, there is more wriggle in their 359: results: 360: 361: one 0.34 0.49 0.51 0.8 362: two 0.6 0.9 0.8 0.9 363: three 0.7 0.9 0.8 0.6 364: four 0.2 0.3 0.35 0.4 365: 366: To rank these, we'd first sort them by their mean 367: and maybe add a little bar chart to the 368: 369: four :mu 0.3125 ****** 370: one :mu 0.535 *********** 371: three :mu 0.75 *************** 372: two :mu 0.8 **************** 373: 374: Intuitively, the learners seem to fall into three groups: 375: 376: + Highest scores: two and three; 377: + Lowest scores: four; 378: + Somewhere in-between: one. 379: 380: The problem though is that it is a little hard to 381: check if those are the right groups since some of 382: these learners generate numbers very close to each 383: other. 384: 385: Just as an aside, part of the problem is 386: insufficient experimentation. Four runs barely 387: exercises the optimizers so it does not given any of 388: them a chance to show their true worth. I recommend 389: at 10 to 20 repeats- but then another problem 390: arises; i.e. too many numbers to read and 391: understand. 392: 393: If we explore the above using a _Scott-Knott_ 394: procedure, then we would: 395: 396: + Sort the learners by their mean (as above) 397: + Recursively _cut_ the list in two, stopping when 398: one half of the cut was similar 399: to the other half. 400: + <em>Scott AJ and Knott M (1974) Cluster analysis method for 401: grouping means in the analysis of variance. Biometrics 30: 402: 507-512.</em> 403: 404: There any many ways to find the _cut_. Following the 405: recommendations of Mittas and Angelis, we proceed as 406: follows. 407: 408: + <em>Nikolaos Mittas, Lefteris Angelis: Ranking and Clustering Software Cost Estimation 409: Models through a Multiple Comparisons Algorithm. IEEE Trans. Software Eng. 39(4): 537-551 (2013)</em> 410: 411: Mittas and Angelsis find the cut by collecting: 412: 413: + The mean `mu` of all the data below the cut; 414: + The mean `mu0,mu1` of all the data below,above the cut. 415: + Then return the cut that most divides the data. 416: 417: More specifically, according to Mittas and 418: Angelesis, the best cut is the one that _maximizes 419: the difference in the mean_ before and after the 420: cut. 421: """ 422: 423: def minMu(parts,all,big,same): 424: cut,left,right = None,None,None 425: before, mu = 0, all.mu 426: for i,l,r in leftRight(parts): 427: if big(l.n) and big(r.n): 428: if not same(l,r): 429: n = all.n * 1.0 430: x = l.n/n*(mu - l.mu)**2 + \ 431: r.n/n*(mu - r.mu)**2 432: if x > before: 433: before,cut,left,right = x,i,l,r 434: return cut,left,right 435: """ 436: 437: In the above _l.n_ and _r.n_ are the number of 438: measurements left and right of the cut (and _n_ = 439: _l.n + r.n_). So the above function gives most 440: weight to larger cuts that most change the mean of 441: the most number of measurements. 442: 443: The _minMu_ function is a low-level procedure. A higher-level 444: tool calls _minMu_ to find one cut, then recurses on each cut. 445: 446: """ 447: 448: def rdiv(data, # a list of class Nums 449: all, # all the data combined into one num 450: div, # function: find the best split 451: big, # function: rejects small splits 452: same): # function: rejects similar splits 453: def recurse(parts,all,rank=0): 454: cut,left,right = div(parts,all,big,same) 455: if cut: 456: # if cut, rank "right" higher than "left" 457: rank = recurse(parts[:cut],left,rank) + 1 458: rank = recurse(parts[cut:],right,rank) 459: else: 460: # if no cut, then all get same rank 461: for part in parts: 462: part.rank = rank 463: return rank 464: recurse(sorted(data),all) 465: return data 466: """ 467: Finally, the above is called by _scottknott_ that 468: recursively splits data, maximizing delta of the 469: expected value of the mean before and after the 470: splits (and rejects splits with under 3 items). 471: 472: """ 473: def scottknott(data,small=3,b=250, conf=0.05): 474: def theSame(one, two): 475: if a12small(two, one): return True 476: return not bootstrap(one, two, b=b, conf=conf) 477: all = reduce(lambda x,y:x+y,data) 478: same = lambda l,r: theSame(l.saw(), r.saw()) 479: big = lambda n: n > small 480: return rdiv(data,all,minMu,big,same) 481: 482: """ 483: 484: (The _a12small_ and _bootstrap_ functions will be 485: explained below). 486: 487: 488: When this is run on the following data from 489: "_x1,x2,x3,x4,x5_" we see sensible groupings: 490: 491: """ 492: def rdivDemo(data): 493: data = map(lambda lst:Num(lst[0],lst[1:],keep=512), 494: data) 495: for x in sorted(scottknott(data),key=lambda y:y.rank): 496: print x.rank, x.name, gs([x.mu, x.s]) 497: 498: def rdiv2(): 499: rdivDemo([ 500: ["x1",0.34, 0.49, 0.51, 0.6], 501: ["x2",0.6, 0.7, 0.8, 0.9], 502: ["x3",0.15, 0.25, 0.4, 0.35], 503: ["x4",0.6, 0.7, 0.8, 0.9], 504: ["x5",0.1, 0.2, 0.3, 0.4] 505: ]) 506: """ 507: 508: Note that `rdiv` performs a binary chop on the list 509: of optimizers. This is important since it means we 510: can rank _N_ learners using _log2(N)_ 511: comparisons. This is very important, for reasons 512: we'll see shortly. 513: 514: For full details on the Scott-Knott procedure, see 515: [stats.py](http://unbox.org/open/trunk/472/14/spring/var/code/stats.html). 516: 517: 518: ### The A12 test 519: 520: 521: I prefer a test for _small effect_ that has does not 522: _sweat the small stuff_; i.e. ignore small 523: differences between items in the samples. My 524: preferred test for _small effect_ has: 525: 526: + a simple intuition; 527: + which makes no assumptions about (say) Gaussian 528: assumptions; 529: + and which has a solid lineage in the literature. 530: 531: Such a test is Vargha and Delaney's _A12 statistic_- 532: The stastic was proposed in Vargha and Delaney's 533: 2000 paper was endorsed in many places including in 534: Acruci and Briad's ICSE 2011 paper. 535: 536: + <em>A. Vargha and H. D. Delaney. A critique and 537: improvement of the CL common language effect size 538: statistics of McGraw and Wong. Journal of 539: Educational and Behavioral Statistics, 540: 25(2):101-132, 2000 541: + Andrea Arcuri, Lionel C. Briand: A practical guide 542: for using statistical tests to assess randomized 543: algorithms in software engineering. ICSE 2011: 544: 1-10</em> 545: 546: After I describe it to you, you will wonder why 547: anyone would ever want to use anything else. Given 548: a performance measure seen in _m_ measures of _X_ 549: and _n_ measures of _Y_, the A12 statistics measures 550: the probability that running algorithm _X_ yields 551: higher values than running another algorithm _Y_. 552: Specifically, it counts how often we seen larger 553: numbers in _X_ than _Y_ (and if the same numbers 554: are found in both, we add a half mark): 555: 556: a12= #(X.i > Y.j) / (n*m) + .5#(X.i == Y.j) / (n*m) 557: 558: According to Vargha and Delaney, a small, medium, large difference 559: between two populations is: 560: 561: + _large_ if `a12` is over 71%; 562: + _medium_ if `a12` is over 64%; 563: + _small_ if `a12` is 56%, or less. 564: 565: The code is very simple- just remember to sort _lst1_ and 566: _lst2_ before doing the comparisons: 567: 568: """ 569: def a12small(lst1, lst2): return a12(lst1,lst2) <= 0.56 570: 571: def a12(lst1,lst2, gt= lambda x,y: x > y): 572: "how often is x in lst1 more than y in lst2?" 573: def loop(t,t1,t2): 574: while t1.i < t1.n and t2.i < t2.n: 575: h1 = t1.l[t1.i] 576: h2 = t2.l[t2.i] 577: if gt(h1,h2): 578: t1.i += 1; t1.gt += t2.n - t2.i 579: elif h1 == h2: 580: t2.i += 1; t1.eq += 1; t2.eq += 1 581: else: 582: t2,t1 = t1,t2 583: return t.gt*1.0, t.eq*1.0 584: #-------------------------- 585: lst1 = sorted(lst1, cmp=gt) 586: lst2 = sorted(lst2, cmp=gt) 587: n1 = len(lst1) 588: n2 = len(lst2) 589: t1 = Thing(l=lst1,i=0,eq=0,gt=0,n=n1) 590: t2 = Thing(l=lst2,i=0,eq=0,gt=0,n=n2) 591: gt,eq= loop(t1, t1, t2) 592: return gt/(n1*n2) + eq/2/(n1*n2) 593: 594: """ 595: ### Bootstrap Tests 596: 597: (_For more details on this section, see p220 to 223 of Efron's book "An introduction to the bootstrap"._) 598: 599: 600: Formally, _a12_ is actually a _post hoc effect size 601: test_, which is applied _after_ some hypothesis test 602: for statistically significant difference between two 603: populations. 604: 605: Such statistical tests check if there is enough of a 606: difference between two lists of numbers to falsify 607: some hypothesis (e.g. the values in list1 are less than 608: those in list2). Usual practice is to make some 609: parametric assumption (e.g. that the numbers come 610: from a Gaussian distribution). 611: 612: Another method is called _bootstrapping_ that makes 613: no parametric assumption. The way it works is to 614: define some _testStatistic_ and apply it to: 615: 616: 1. The original two lists; 617: 2. A "bootstrap" sample; 618: i.e. two artificially created lists created by 619: sampling with replacement from the original lists. 620: 621: A bootstrap test runs the _testStatistic_: 622: 623: + Once on the original pair of lists... 624: + Then (say) 1000 times on 1000 bootstrap samples. 625: 626: Then we return how often in the bootstrap samples, 627: the _testStatistic_ returns the same value as with 628: the original list (see the last line of the following code): 629: 630: """ 631: def bootstrap(y0,z0,conf=0.05,b=1000): 632: class total(): 633: def __init__(i,some=[]): 634: i.sum = i.n = i.mu = 0 ; i.all=[] 635: for one in some: i.put(one) 636: def put(i,x): 637: i.all.append(x); 638: i.sum +=x; i.n += 1; i.mu = float(i.sum)/i.n 639: def __add__(i1,i2): return total(i1.all + i2.all) 640: def testStatistic(y,z): 641: tmp1 = tmp2 = 0 642: for y1 in y.all: tmp1 += (y1 - y.mu)**2 643: for z1 in z.all: tmp2 += (z1 - z.mu)**2 644: s1 = float(tmp1)/(y.n - 1) 645: s2 = float(tmp2)/(z.n - 1) 646: delta = z.mu - y.mu 647: if s1+s2: 648: delta = delta/((s1/y.n + s2/z.n)**0.5) 649: return delta 650: def one(lst): return lst[ int(any(len(lst))) ] 651: def any(n) : return random.uniform(0,n) 652: y, z = total(y0), total(z0) 653: x = y + z 654: tobs = testStatistic(y,z) 655: yhat = [y1 - y.mu + x.mu for y1 in y.all] 656: zhat = [z1 - z.mu + x.mu for z1 in z.all] 657: bigger = 0.0 658: for i in range(b): 659: if testStatistic(total([one(yhat) for _ in yhat]), 660: total([one(zhat) for _ in zhat])) > tobs: 661: bigger += 1 662: return bigger / b < conf 663: """ 664: 665: In the above, the bootstrap sample is generated with the call the _one_. 666: This data is collected and summarized with the _total_ class. 667: Also, the _testStatsitic_ function 668: checks if two means are different, tempered 669: by the size of the two lists and the variance of the data. 670: 671: One issue with the above is the number of 672: bootstrap. There are some statistical results that 673: say 200 to 300 bootstrap samples are enough. Note 674: that the more bootstrap samples, the slower the 675: statistical test (which is why folks often use the 676: faster parametric methods- even if the parametric 677: methods make the wrong assumptions about the data). 678: 679: For this reason, this code runs _bootstrap_ 680: within a Scott-Knott that calls it only on "interesting" chops 681: (i.e. those that are not too small, that divide the data on more than a small 682: effect, and that maximize the expected value of the delta of the mean). 683: 684: ## Demos 685: 686: All the above is coded up. 687: See [statsd.py](http://unbox.org/open/trunk/472/14/spring/var/code/statsd.html). 688: Share and enjoy. 689: 690: """ 691:
This file is part of Timm's charming Python tricks.
© 2014, Tim Menzies:
tim.menzies@gmail.com,
http://menzies.us.
Timm's charming Python tricks are free software: you can redistribute it and/or modify it under the terms of the GNU Lesser Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Timm's charming Python tricks are distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU Lesser Public License along with Foobar. If not, see http://www.gnu.org/licenses.