Exercise 5.1¶
Here is a confusion matrix for a classifier of joke funniness.
Considering funny to be the positive outcome, calculate the (a) recall, (b) precision, (c) specificity, (d) accuracy, and (e) $F_1$ score of the classifier.
# Solution
TP = 648
FP = 45
TN = 1004
FN = 78
# following the table in Definition 5.3
print("accuracy: ", (TP+TN)/(TP+FP+TN+FN))
recall = TP/(TP+FN)
print("recall: ", recall)
print("specificity:", TN/(TN+FP))
precision = TP/(TP+FP)
print("precision: ", TP/(TP+FP))
# Definition 5.4
F1 = 1/(0.5*(1/precision + 1/recall))
print("F1 score: ", F1)
accuracy: 0.9307042253521127 recall: 0.8925619834710744 specificity: 0.9571020019065777 precision: 0.935064935064935 F1 score: 0.9133192389006343
Exercise 5.2¶
Here is a confusion matrix for a classifier of ice cream flavours.
(a) Calculate the recall rate for chocolate.
(b) Find the precision for vanilla.
(c) Find the accuracy for strawberry.
Solution: In the multiclass case we apply the one-vs-rest paradigm to reduce to binary. For example, the number of true positives for vanilla is $\mathrm{TP}=75$, while there are $\mathrm{FP}=8+11=19$ false positives. Therefore,
$$ \mathrm{precision\ for\ vanilla} = \dfrac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FP}} = \dfrac{75}{94} \approx 0.798. $$
Solution: Applying Definition 5.5, this set has $K=3$ labels and
$$ p_1 = 1/6, \quad p_2 = 2/6, \quad p_3 = 3/6. $$
The Gini impurity of $S$ is
$$ H(S) = \sum_{k=1}^K p_k (1-p_k) = \frac{1}{6}\cdot\frac{5}{6} + \frac{2}{6}\cdot\frac{4}{6} + \frac{3}{6}\cdot\frac{3}{6} \approx 0.6111. $$
Exercise 5.4¶
Explain why for the binary case $K=2$, Definition 5.5 implies that $H(S)$ is maximized when the two classes are equally represented in $S$.
Solution: In the binary case, one class appears with probability $p$ and the other with probability $1-p$. Hence,
$$ H = p (1-p) + (1-p) p = 2p - 2p^2. $$
This is a quadratic function that attains its maximum at $p=1/2$ (you can check the derivative to verify).
Exercise 5.5¶
Given $x_i=i$ for $i=0,\ldots,5$, with labels $$ y_0=y_4=y_5=A, \quad y_1=y_2=y_3=B, $$ write Python code to find an optimal partition threshold using Gini impurity.
# Solution
# This is similar to Example 5.13, but here solve the problem with some code.
# The smallest Q value of 1.5 is achieved for theta = 3,
# resulting in the partition [A, B, B, B] | [A, A]
data = { (0,'A'), (1,'B'), (2,'B'), (3,'B'), (4,'A'), (5,'A') }
def gini(S):
""""Gini impurity of a list."""
g = 0
unique_el = set(S)
for el in unique_el:
p = S.count(el)/len(S)
g += p*(1-p)
return g
for theta in range(-1,6):
SL = [ y for (x,y) in data if x<=theta ]
SR = [ y for (x,y) in data if x>theta ]
Q = len(SL)*gini(SL) + len(SR)*gini(SR)
print("{:2} {:32} {:32} {:3.1f}".format(theta, str(SL), str(SR), Q))
-1 [] ['B', 'A', 'A', 'B', 'A', 'B'] 3.0 0 ['A'] ['B', 'A', 'A', 'B', 'B'] 2.4 1 ['A', 'B'] ['B', 'A', 'A', 'B'] 3.0 2 ['B', 'A', 'B'] ['A', 'A', 'B'] 2.7 3 ['B', 'B', 'A', 'B'] ['A', 'A'] 1.5 4 ['B', 'A', 'B', 'A', 'B'] ['A'] 2.4 5 ['B', 'A', 'A', 'B', 'A', 'B'] [] 3.0
Exercise 5.6¶
For the decision tree drawn in Example 5.10, make predictions for each of the following queries, showing the path taken through the tree for each case:
(a) $(x_1, x_2) = (4, 5),\quad$ (b) $(x_1, x_2) = (-3, 1),\quad$ (c) $(x_1, x_2) = (10, -1)$.
Solution:
(a) Will go to leaf A and hence predict "T" by majority vote.
(b) Will go to leaf B and hence predict "H".
(c) Will go to leaf C and hence predict "H" by majority vote.
Exercise 5.7¶
Using 1-norm, 2-norm, and $\infty$-norm, find the distance between the given vectors:
(a) $\mathbf u=[2,3,0], \ \mathbf v=[-2,2,1]$
(b) $\mathbf u=[0,1,0,1,0], \ \mathbf v=[1,1,1,1,1]$
Solution: The distance metric is defined via $\mathrm{dist}(\mathbf{u}, \mathbf{v}) = \|\mathbf{u} - \mathbf{v}\|$. The rest is straightforward calculation using the definitions of the respective norms.
Exercise 5.8¶
(a) Prove that for any $\mathbf u \in \mathbb{R}^d$, $\|\mathbf u\|_\infty \le \| \mathbf u\|_2$.
(b) Prove that for any $\mathbf u \in \mathbb{R}^d$, $\|\mathbf u\|_2 \le \sqrt{d}\, \|\mathbf u\|_\infty$.
Solution:
(a) Let $u_j$ be an entry of $\mathbf{u}$ such that $|u_j| = \|\mathbf{u}\|_\infty$. Then
$$ \|\mathbf{u}\|_2^2 = \sum_{i=1}^d | u_i |^2 \geq | u_j |^2 = \|\mathbf{u}\|_\infty^2. $$
(b) We have $|u_i| \leq \|\mathbf{u}\|_\infty$ for all $i=1,\ldots,d$. Hence,
$$ \| \mathbf{u}\|_2^2 = \sum_{i=1}^d |u_i |^2 \leq \sum_{i=1}^d \|\mathbf{u}\|_\infty^2 = d \|\mathbf{u}\|_\infty^2. $$
Exercise 5.9¶
Carefully sketch the set of all points in $\mathbb{R}^2$ whose 1-norm distance from the origin equals 1. This is a Manhattan unit circle. (Hint: You can consider each quadrant of the plane separately.)
# Solution
import numpy as np
import matplotlib.pyplot as plt
# corner points for the 1-norm unit circle
x = np.array([1, 0, -1, 0, 1])
y = np.array([0, 1, 0, -1, 0])
plt.figure(figsize=(6, 6))
plt.plot(x, y)
plt.gca().set_aspect('equal', adjustable='box')
plt.xlabel('x'); plt.ylabel('y')
plt.title('1-norm unit circle');
Exercise 5.10¶
Suppose you are training to fit the ground-truth one-dimensional classifier
$$ y = f(x) = \begin{cases} +1, & \text{if } |x| \leq 2, \\ -1, & \text{otherwise}. \end{cases} $$
Here is a table of training data:
| $x_i$ | $-5$ | $-4$ | $-3$ | $-2$ | $-1$ | $0$ | $1$ | $2$ | $3$ | $4$ | $5$ |
|---|---|---|---|---|---|---|---|---|---|---|---|
| $y_i$ | $-1$ | $-1$ | $+1$ | $-1$ | $+1$ | $+1$ | $-1$ | $+1$ | $-1$ | $-1$ | $-1$ |
(a) Here is a table of testing data:
| $t_i$ | $-4.75$ | $-3.75$ | $-2.75$ | $-1.75$ | $-0.75$ | $0.25$ | $1.25$ | $2.25$ | $3.25$ | $4.25$ |
|---|---|---|---|---|---|---|---|---|---|---|
| $f(t_i)$ | $\,$ | $\,$ | $\,$ | $\,$ | $\,$ | $\,$ | $\,$ | $\,$ | $\,$ | $\,$ |
Fill in the second row.
(b) Add a row to your table from part (a) showing the predictions of a kNN classifier with $k=1$ trained on the given training data.
(c) Add another row for a kNN classifier with $k=3$. Then add another row for $k=9$.
(d) Find the testing precision and recall for the rows with $k=1,3,9$, considering $+1$ to be the positive outcome.
Solution:
We provide the solution for $k=1$:
| $t_i$ | $-4.75$ | $-3.75$ | $-2.75$ | $-1.75$ | $-0.75$ | $0.25$ | $1.25$ | $2.25$ | $3.25$ | $4.25$ |
|---|---|---|---|---|---|---|---|---|---|---|
| $f(t_i)$ | $-1$ | $-1$ | $-1$ | $+1$ | $+1$ | $+1$ | $+1$ | $-1$ | $-1$ | $-1$ |
| kNN (k=1) | $-1$ | $-1$ | $+1$ | $-1$ | $+1$ | $+1$ | $-1$ | $+1$ | $-1$ | $-1$ |
With this, we have $\mathrm{TP} = 2$, $\mathrm{FP} = 2$, $\mathrm{TN} = 4$, $\mathrm{FN} = 2$ for the test data. Therefore,
$$ \text{precision} = \frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}} = \frac{2}{4}, \quad \text{recall} = \frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}} = \frac{2}{4}. $$
Exercise 5.11¶
Here are blue/orange labels on an integer lattice.
Let $\hat{f}(x_1,x_2)$ be the kNN probabilistic classifier with $k=4$, Euclidean metric, and mean averaging that returns the probability of a blue label. In each case below, a function $g(t)$ is defined from values of $\hat{f}$ along a vertical or horizontal line. Carefully sketch a plot of $g(t)$ for $-2\le t \le 2$.
(a) $g(t) = \hat{f}(1.2,t)$
(b) $g(t) = \hat{f}(t,-0.75)$
(c) $g(t) = \hat{f}(t,1.6)$
(d) $g(t) = \hat{f}(-0.25,t)$
Exercise 5.12¶
Suppose you have written a kNN classifier with $k=5$ to predict whether dad jokes are funny. Here are the votes for 6 test jokes:
| Joke | Funny | Not funny | Actual |
|---|---|---|---|
| Is the refrigerator running? Better go catch it! | 3 | 2 | not funny |
| Why did the scarecrow win an award? Because he was outstanding in his field! | 1 | 4 | funny |
| What do you call a fake noodle? An impasta! | 5 | 0 | funny |
| Why couldn’t the bicycle stand up by itself? It was two-tired! | 4 | 1 | funny |
| What’s blue and not heavy? Light blue. | 2 | 3 | not funny |
| What do you call a pile of cats? A meowtain! | 0 | 5 | not funny |
Carefully sketch the ROC curve for this classifier, considering the positive outcome to be "funny."