/timm's /charming /python /tricks

do.py

Download do.py.
Read more on How to be Charming (in Python).


001: import sys
002: sys.dont_write_bytecode=True # no .pyc files
003: from the import *
004: from about import *
005: from lib import *
006: 
007: 
008: def _do():
009:   """A every iteraction, some performance score 
010:    is logged back with the 'Do' loop, using 'seen'.
011:    Note the use of 'controlled' to print the eras."""
012:   for x,d in do(range(0,50),
013:               era=10,
014:               haltOn='fx',
015:               epsilon=0.05):
016:     y = 1.0/(1 + e**(-0.1*x))
017:     d.seen(x,fx=y)
018:   done(d,0,1,value=lambda s: '%4.2f' % s)
019: 
020: def do(iterator=range(0,100),
021:        epsilon=0.01,
022:        haltOn= None,
023:        era   = 100,
024:        also  = None,      
025:        cohen = The.math.cohen,
026:        cmp   = lambda old, new: new - old):
027:   """syntactic sugar for calling the Do class.
028:      Without this helper function, you need to do
029: 
030:        for x,d in Do(....).loop():
031:            ...
032: 
033:      With this function, you can do:
034: 
035:        for x,d in do(....):
036:           ... 
037:   """
038:   for tick,doings in Do(iterator=iterator,
039:        epsilon= epsilon, haltOn=haltOn, era= era,
040:        also=also, cohen=cohen,  cmp=cmp).loop():
041:     yield tick,doings
042: 
043: class Do:
044:   """'Do' adds to features to a standard
045:    iterator. 'Do' lets you run an optimizer many
046:    times, while incrementally storing performance
047:    results.  Using those results, 'Do' also lets
048:    you determine in the optimizer should quit early.
049: 
050:   For a simple example of its usage, see
051:   '_do()', above. Note that during each
052:   iteration, we log some performance value using the
053:   'seen' method.
054: 
055:   For a more typical example of usage, see search.py
056:   and the 'sa' function. From that example,here's 50
057:   runs of SA from k=0 to 1000 where all the results
058:   are grouped together in 'era's of size 30.  On the
059:   left is era number followed by [sampleSize] of
060:   items in that era. Then there is a quantile chart
061:   showing the 10,30,50,70,90-th percentile of the
062:   performance scores in that era.
063: 
064:   This first run stops if the mean results in any
065:   era are within an epsilon on 20% of max;
066:   i.e. epsilon=0.2. 'Best' is the 'eb' scores and
067:   'every' are the 'e' scores.
068:  
069:   FIRST RUN:
070: 
071:   -----| best |----------------------------------------- ----------------
072: 
073:          0 [  128]             |  --  * -   ,0.62, 0.71, 0.79, 0.85, 0.91
074:         30 [  128]             |      -- *  ,0.79, 0.85, 0.88, 0.91, 0.94
075:         60 [  120]             |     - *-   ,0.75, 0.79, 0.82, 0.86, 0.89
076:         90 [   30]             |       *    ,0.80, 0.80, 0.80, 0.80, 0.80
077: 
078:   ------| every |-------------------------------------------------------- 
079: 
080:          0 [  128]             |---   *--   ,0.53, 0.67, 0.77, 0.82, 0.90
081:         30 [  128]             |     -- *-  ,0.74, 0.81, 0.87, 0.90, 0.94
082:         60 [  120]             |     - *-   ,0.74, 0.78, 0.82, 0.86, 0.89
083:         90 [   30]             |-------*    ,0.53, 0.80, 0.80, 0.80, 0.80
084: 
085:    This second run stops if we reach an epsilon of
086:    within 2.5% of max; i.e. epsilon=0.025. Notice we
087:    run for more eras than in the FIRST RUN.
088: 
089:    SECOND RUN:
090: 
091:    ------| best |--------------------------------------------------------
092: 
093:          0 [  128]             | ---- *--   ,0.60, 0.76, 0.79, 0.84, 0.91
094:         30 [  128]             |      -- *  ,0.77, 0.84, 0.89, 0.91, 0.95
095:         60 [  128]             |         *- ,0.84, 0.87, 0.92, 0.94, 0.97
096:         90 [  128]             |        - * ,0.87, 0.91, 0.92, 0.94, 0.97
097:        120 [  128]             |        --* ,0.86, 0.93, 0.94, 0.94, 0.97
098:        150 [   30]             |           *,0.97, 0.97, 0.97, 0.97, 0.97
099: 
100:    ------| every |-------------------------------------------------------- 
101: 
102:          0 [  128]             | --- * --   ,0.60, 0.70, 0.76, 0.81, 0.88
103:         30 [  128]             |    --- *-- ,0.69, 0.83, 0.86, 0.91, 0.96
104:         60 [  128]             |   ----- *- ,0.67, 0.85, 0.89, 0.93, 0.96
105:         90 [  128]             |      -- *- ,0.79, 0.87, 0.91, 0.94, 0.97
106:        120 [  128]             |   ---  * - ,0.66, 0.80, 0.88, 0.94, 0.97
107:        150 [   30]             |--*         ,0.51, 0.62, 0.63, 0.83, 0.83
108: 
109:    Another thing we can 'Do' is stop if the
110:    improvement in the new era is very small compared
111:    to the last one. Here's the last run
112:    (epsilon=0.025) but in this run, we say "very
113:    small" is less than one standard deviation of the
114:    all the scores seen in the run. This is the Cohen
115:    test for effect size so we denote it cohen=1.
116: 
117:    THIRD RUN:
118: 
119:    -----| best |---------------------------------------------------------
120: 
121:          0 [  128]             |   --  *-   ,0.64, 0.75, 0.81, 0.86, 0.91
122:         30 [  128]             |       - *- ,0.80, 0.84, 0.89, 0.91, 0.97
123:         60 [  128]             |        -*- ,0.84, 0.89, 0.92, 0.94, 0.98
124:         90 [  120]             |         *- ,0.84, 0.88, 0.91, 0.91, 0.98
125: 
126:    ------| every |-------------------------------------------------------
127: 
128:          0 [  128]             | ---- * -   ,0.59, 0.72, 0.80, 0.87, 0.91
129:         30 [  128]             |     -- *-- ,0.73, 0.83, 0.87, 0.91, 0.97
130:         60 [  128]             |  -----  *- ,0.62, 0.83, 0.89, 0.92, 0.98
131:         90 [  120]             |       - *- ,0.81, 0.88, 0.89, 0.91, 0.98
132: 
133:    Finally, our last run shows a run with typical
134:    settings for this `Do'-ing;
135:    i.e. epsilon=0.01 (1% of max) and cohen=0.2
136:    (i.e. "very small" is less than 20% of one
137:    standard deviation). Note that its the same as
138:    the SECOND RUN shown above since, in this model,
139:    achieving 1% of max is not possible.
140: 
141:    FOURTH RUN:
142: 
143:    ------| best |--------------------------------------------------------
144: 
145:          0 [  128]             | ---- *--   ,0.60, 0.76, 0.79, 0.84, 0.91
146:         30 [  128]             |      -- *  ,0.77, 0.84, 0.89, 0.91, 0.95
147:         60 [  128]             |         *- ,0.84, 0.87, 0.92, 0.94, 0.97
148:         90 [  128]             |        - * ,0.87, 0.91, 0.92, 0.94, 0.97
149:        120 [  128]             |        --* ,0.86, 0.93, 0.94, 0.94, 0.97
150:        150 [   30]             |           *,0.97, 0.97, 0.97, 0.97, 0.97
151: 
152:   ------| every |-------------------------------------------------------- 
153: 
154:          0 [  128]             | --- * --   ,0.60, 0.70, 0.76, 0.81, 0.88
155:         30 [  128]             |    --- *-- ,0.69, 0.83, 0.86, 0.91, 0.96
156:         60 [  128]             |   ----- *- ,0.67, 0.85, 0.89, 0.93, 0.96
157:         90 [  128]             |      -- *- ,0.79, 0.87, 0.91, 0.94, 0.97
158:        120 [  128]             |   ---  * - ,0.66, 0.80, 0.88, 0.94, 0.97
159:        150 [   30]             |--*         ,0.51, 0.62, 0.63, 0.83, 0.83
160:   """
161:   def __init__(i,
162:                iterator=range(0,100),
163:                epsilon= 0.005,
164:                 haltOn=None,
165:                era= 100,
166:                also=None,      
167:                cohen=The.math.cohen,
168:                cmp=lambda old, new: new - old):
169:     i.iterator=iterator
170:     i.era = era
171:     i.epsilon = epsilon
172:     i.all = Nums()
173:     i.now = Nums()
174:     i.also = also
175:     i.cohen = cohen
176:     i.haltOn = haltOn
177:     i.cmp = cmp
178:   def thisEra(i,tick):
179:     """Internally, 'Do' instances keep time
180:      rounded off to the nearest era size."""
181:     return int(tick/i.era)*i.era
182:   def small(i,where) : 
183:     "Small is, say, 30% of std.dev seen so far"
184:     return i.all[where].s*i.cohen
185:   def close2Best(i,where,when) :
186:     "mean within epsilon of best"
187:     return  i.now[(where,when)].mu > (1 - i.epsilon)
188:   def improving(i,where,when):
189:     """The difference in means between now and
190:      last is not small."""
191:     before = when - i.era
192:     new    = i.now[(where,when)]
193:     old    = i.now[(where,before)]
194:     return i.cmp(old.mu, new.mu) > i.small(where)
195:   def earlyStop(i,when):
196:     "Early stop if close to best or not improving."
197:     where = i.haltOn
198:     if where:
199:       if i.close2Best(where,when) or \
200:             not i.improving(where,when): 
201:         return True
202:     return False
203:   def seen(i,tick,**d):
204:     "Syntactic sugar for adding new items"
205:     when = i.thisEra(tick)
206:     for where,what in d.items():
207:       i.seen1(when,what,where)
208:   def seen1(i,when,what,where):
209:     "Primitive data add. Also adds to parent."
210:     if i.also: 
211:       i.also.seen1(when,what,where)
212:     i.all[where].seen(what)
213:     i.now[(where, when)].seen(what)
214:   def loop(i):
215:     "Every time we change eras, check for early stop."
216:     before = 0
217:     for tick in i.iterator:
218:       now = i.thisEra(tick)
219:       if before and now != before:
220:         if i.earlyStop(before):
221:           break
222:       before = now
223:       yield tick,i
224: 
225: def done(doings,lo,hi,
226:            key  =lambda z:'%10s' %  z,
227:            value=lambda z: '%2d' %  z
228:            ):
229:     d = doings.now
230:     wheres= {}
231:     whens = {}
232:     for where,when in d.keys():
233:       wheres[where] = 1
234:       whens[ when ] = 1
235:     wheres = sorted(wheres.keys())
236:     whens  = sorted(whens.keys())
237:     for where in wheres:
238:       print '\n------|',str(where),'|'+'-'*75,'\n'
239:       for when in whens:
240:         if (where,when) in d:
241:           all = d[(where,when)].cached()
242:           s   = "?"
243:           if len(all) > 5:
244:             s=   xtile(all,show = value,
245:                      lo=lo,hi=hi,width=25)
246:           print '%10s' % key(when),\
247:               '[%5s]' % len(all), s
248:         
249: """
250: ### xtile: Pretty Print  Distributions
251: 
252: The function _xtile_ takes a list of (possibly) unsorted numbers and
253: presents them as a horizontal xtile chart (in ascii format). The default
254: is a contracted _quintile_ that shows the 10,30,50,70,90 breaks in the
255: data (but this can be changed- see the optional flags of the function).
256: 
257: """
258: def xtile(lst,lo=0,hi=100,width=50,
259:              chops=[0.1 ,0.3,0.5,0.7,0.9],
260:              marks=["-" ," "," ","-"," "],
261:              bar="|",star="*",show= lambda s:" %3.0f" % s):
262:   def pos(p)   : return ordered[int(len(lst)*p)]
263:   def place(x) : 
264:     tmp= int(width*float((x - lo))/(hi - lo))
265:     if tmp == width: tmp += -1
266:     return tmp
267:   def pretty(lst) : 
268:     return ', '.join([show(x) for x in lst])
269:   ordered = sorted(lst)
270:   lo      = min(lo,ordered[0])
271:   hi      = max(hi,ordered[-1])
272:   what    = [pos(p)   for p in chops]
273:   where   = [place(n) for n in  what]
274:   out     = [" "] * width
275:   for one,two in pairs(where):
276:     for i in range(one,two): 
277:       out[i] = marks[0]
278:     marks = marks[1:]
279:   out[int(width/2)]    = bar
280:   loc = place(pos(0.5)) 
281:   #print loc, len(out)
282:   out[loc] = star 
283:   return ''.join(out) +  "," +  pretty(what)
284: 
285: def _tile() :
286:   import random
287:   r = random.random
288:   def show(lst):
289:     return xtile(lst,lo=0, hi=1,width=25,
290:                  show= lambda s:" %3.2f" % s)
291:   print "one", show([r()**0.5 for x in range(1000)])
292:   print "two", show([r()**2   for x in range(1000)])w
293: 
294: def _tile2():
295:   def show(lst):
296:      return xtile(lst,lo=0, hi=1,width=25,
297:                   show= lambda s:" %3.2f" % s)
298:   print "one", show([0.21, 0.29, 0.28, 0.32, 0.32, 
299:                      0.28, 0.29, 0.41, 0.42, 0.48])
300:   print "two", show([0.71, 0.92, 0.80, 0.79, 0.78, 
301:                      0.9,  0.71, 0.82, 0.79, 0.98])
302:   
303: if __name__ == '__main__' : 
304:   eval(cmd('_do()'))
305: 

This file is part of Timm's charming Python tricks.
© 2014, Tim Menzies: tim.menzies@gmail.com, http://menzies.us.

Timm's charming Python tricks are free software: you can redistribute it and/or modify it under the terms of the GNU Lesser Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Timm's charming Python tricks are distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU Lesser Public License along with Foobar. If not, see http://www.gnu.org/licenses.