Learning

Bayes Classifiers

A Bayes classifier is a simple statistical-based learning scheme.

Advantages:

Assumptions

It has some drawbacks: it can offer conclusions put it is poor at explaining how those conclusions were reached. But that is something we'll get back to below.

Example

weather.symbolic.arff

outlook  temperature  humidity   windy   play
-------  -----------  --------   -----   ----
rainy    cool        normal    TRUE    no
rainy    mild        high      TRUE    no
sunny    hot         high      FALSE   no
sunny    hot         high      TRUE    no
sunny    mild        high      FALSE   no
overcast cool        normal    TRUE    yes
overcast hot         high      FALSE   yes
overcast hot         normal    FALSE   yes
overcast mild        high      TRUE    yes
rainy    cool        normal    FALSE   yes
rainy    mild        high      FALSE   yes
rainy    mild        normal    FALSE   yes
sunny    cool        normal    FALSE   yes
sunny    mild        normal    TRUE    yes%%

This data can be summarized as follows:


           Outlook            Temperature           Humidity
====================   =================   =================
          Yes    No            Yes   No            Yes    No
Sunny       2     3     Hot     2     2    High      3     4
Overcast    4     0     Mild    4     2    Normal    6     1
Rainy       3     2     Cool    3     1
          -----------         ---------            ----------
Sunny     2/9   3/5     Hot   2/9   2/5    High    3/9   4/5
Overcast  4/9   0/5     Mild  4/9   2/5    Normal  6/9   1/5
Rainy     3/9   2/5     Cool  3/9   1/5

            Windy        Play
=================    ========
      Yes     No     Yes   No
False 6      2       9     5
True  3      3
      ----------   ----------
False  6/9    2/5   9/14  5/14
True   3/9    3/5

So, what happens on a new day:

Outlook       Temp.         Humidity    Windy         Play
Sunny         Cool          High        True          ?%%

First find the likelihood of the two classes

So, we aren't playing golf today.

Bayes' rule

More generally, the above is just an application of Bayes' Theorem.

Return the classification with highest probability

Decision Trees

Bayes classifiers perform but they do not explain their performance. If you ask "what is going on? how does it make its decisions?", there's no answer except to browse the complicated frequency count tables.

Q: So, how to learn a decision tree whose leaves are classifications and whose internal nodes are tests on attributes?

curb-weight <= 2660 : 
    |   curb-weight <= 2290 : 
    |   |   curb-weight <= 2090 : 
    |   |   |   length <= 161 : price=6220
    |   |   |   length >  161 : price=7150
    |   |   curb-weight >  2090 : price=8010
    |   curb-weight >  2290 : 
    |   |   length <= 176 : price=9680
    |   |   length >  176 : 
    |   |   |   normalized-losses <= 157 : price=10200
    |   |   |   normalized-losses >  157 : price=15800
    curb-weight >  2660 : 
    |   width <= 68.9 : price=16100
    |   width >  68.9 : price=25500

Preliminaries: Entropy

Iterative Dichotomization

Back to tree learning...

Example

Which feature splits generates symbols that are less mixed up?

Which is the best attribute to split on?

Compare the number of bits required to encode the splits with the number of bits required to encode the un-split data.

e.g. Outlook= sunny

Outlook = overcast

Outlook = rainy

Expected info for Outlook = Weighted sum of the above

Computing the information gain

Repeatedly split recursively: