Review questions w12
These questions all relate to the following data:
outlook, temp, humid, windy, play
------- ---- ----- ----- ----
rainy, cool, normal, TRUE, no
rainy, mild, high, TRUE, no
sunny, hot, high, TRUE, no
sunny, hot, high, FALSE, no
sunny, mild, high, FALSE, no
overcast, cool, normal, TRUE, yes
overcast, mild, high, TRUE, yes
sunny, mild, normal, TRUE, yes
overcast, hot, high, FALSE, yes
overcast, hot, normal, FALSE, yes
rainy, cool, normal, FALSE, yes
rainy, mild, high, FALSE, yes
rainy, mild, normal, FALSE, yes
sunny, cool, normal, FALSE, yes
- Bayes classifiers
- Define Bayes' theorem, explaining all terms.
-
Write down the symbol frequency counts for the above data.
-
If outlook=rainy, temp=hot, humid=high, and windy=FALSE, apply Bayes' theorem to determine if we will play golf tomorrow. Show all working.
- Decision tree learning:
-
Entropy are measures of "mixed-up-ness" for discrete classes. Write down the entrophy formulae for each. Compute entropy value of 6 apples, 0 pears, and 2 oranges.
-
Define iterative dichomization using the terms "measure, split, recuse, condense" as used in C4.5.
- Here is some data describing the health of the residents of the farming town of Little Swansey Indiana. If you had to split theise data into two ranges,
where would you do it and why?
ageRange #healthy #sick
0-10 1000 10
10-20 1000 50
20-30 800 40
30-40 510 45
40-50 490 40
50-60 100 100
60-70 35 50
70-80 10 40
80-90 2 30
90-100 1 20
- Using entropy, find the best top-level split for playing golf *using the data at top of page).
Show all working.