leake
	explanation is an inference procedure
	different for each <audience,goals, backgroundK>

information smaller than data
	must apply some bias to reduce it
	bias makes us blind, bias lets us see (the future)

timm: to convert data to info, need two classes of tools
	model-based
	data-based

Model based
	BSC perspective
		perspectives, each with
			objectives (highest-level concepts)
			measures   (some subset of which are the observables)
			targets    (measure OP threshold)
			initiative (what are we doing about this)
	tufte
		informative diagrams
		space shuttle example
			skew the example with correlated columns
data based
	dimensionality reduction
		e.g. remove 
			correlated columns
			non-informative attributes
				TD-IDT
					Reject all but the top k most interesting words
				 	Interesting if frequent OR rare
					F[i,j]= frequency of item i in things j
					Interesting = F[i,j] * log((Number of things)/(number of things with item i))
				entropy
			irrelevant rows
		e.g. add
			synthetic attributes
			values for missing attributes
		e.g. combine
			decrease granularity on numerics
				e.g. replace nums with quartiles.

	validation
		best: on new data
		worst: on old data (can't predict error on new data)
		middle: cross-val
			jack-knifing: leave one out
			bootstrap: 66% random samples
				- note, trick: boosting- focus on prior
					mis-classified examples
			10-way: usual
			3-way: for small data sets

distinction soft between model-based and data-based
	background models inform data processing