course |
course_year |
question_number |
tags |
title |
year |
Mathematics of Machine Learning |
II |
96 |
II |
2020 |
Mathematics of Machine Learning |
|
Paper 1, Section II, J |
2020 |
(a) Let $Z_{1}, \ldots, Z_{n}$ be i.i.d. random elements taking values in a set $\mathcal{Z}$ and let $\mathcal{F}$ be a class of functions $f: \mathcal{Z} \rightarrow \mathbb{R}$. Define the Rademacher complexity $\mathcal{R}_{n}(\mathcal{F})$. Write down an inequality relating the Rademacher complexity and
$$\mathbb{E}\left(\sup {f \in \mathcal{F}} \frac{1}{n} \sum{i=1}^{n}\left(f\left(Z_{i}\right)-\mathbb{E} f\left(Z_{i}\right)\right)\right)$$
State the bounded differences inequality.
(b) Now given i.i.d. input-output pairs $\left(X_{1}, Y_{1}\right), \ldots,\left(X_{n}, Y_{n}\right) \in \mathcal{X} \times{-1,1}$ consider performing empirical risk minimisation with misclassification loss over the class $\mathcal{H}$ of classifiers $h: \mathcal{X} \rightarrow{-1,1}$. Denote by $\hat{h}$ the empirical risk minimiser [which you may assume exists]. For any classifier $h$, let $R(h)$ be its misclassification risk and suppose this is minimised over $\mathcal{H}$ by $h^{*} \in \mathcal{H}$. Prove that with probability at least $1-\delta$,
$$R(\hat{h})-R\left(h^{*}\right) \leqslant 2 \mathcal{R}_{n}(\mathcal{F})+\sqrt{\frac{2 \log (2 / \delta)}{n}}$$
for $\delta \in(0,1]$, where $\mathcal{F}$ is a class of functions $f: \mathcal{X} \times{-1,1} \rightarrow{0,1}$ related to $\mathcal{H}$ that you should specify.
(c) Let $Z_{i}=\left(X_{i}, Y_{i}\right)$ for $i=1, \ldots, n$. Define the empirical Rademacher complexity $\hat{\mathcal{R}}\left(\mathcal{F}\left(Z_{1: n}\right)\right)$. Show that with probability at least $1-\delta$,
$$R(\hat{h})-R\left(h^{*}\right) \leqslant 2 \hat{\mathcal{R}}\left(\mathcal{F}\left(Z_{1: n}\right)\right)+2 \sqrt{\frac{2 \log (3 / \delta)}{n}}$$