Skip to content

Latest commit

 

History

History
29 lines (19 loc) · 1.85 KB

2020-96.md

File metadata and controls

29 lines (19 loc) · 1.85 KB
course course_year question_number tags title year
Mathematics of Machine Learning
II
96
II
2020
Mathematics of Machine Learning
Paper 1, Section II, J
2020

(a) Let $Z_{1}, \ldots, Z_{n}$ be i.i.d. random elements taking values in a set $\mathcal{Z}$ and let $\mathcal{F}$ be a class of functions $f: \mathcal{Z} \rightarrow \mathbb{R}$. Define the Rademacher complexity $\mathcal{R}_{n}(\mathcal{F})$. Write down an inequality relating the Rademacher complexity and

$$\mathbb{E}\left(\sup {f \in \mathcal{F}} \frac{1}{n} \sum{i=1}^{n}\left(f\left(Z_{i}\right)-\mathbb{E} f\left(Z_{i}\right)\right)\right)$$

State the bounded differences inequality.

(b) Now given i.i.d. input-output pairs $\left(X_{1}, Y_{1}\right), \ldots,\left(X_{n}, Y_{n}\right) \in \mathcal{X} \times{-1,1}$ consider performing empirical risk minimisation with misclassification loss over the class $\mathcal{H}$ of classifiers $h: \mathcal{X} \rightarrow{-1,1}$. Denote by $\hat{h}$ the empirical risk minimiser [which you may assume exists]. For any classifier $h$, let $R(h)$ be its misclassification risk and suppose this is minimised over $\mathcal{H}$ by $h^{*} \in \mathcal{H}$. Prove that with probability at least $1-\delta$,

$$R(\hat{h})-R\left(h^{*}\right) \leqslant 2 \mathcal{R}_{n}(\mathcal{F})+\sqrt{\frac{2 \log (2 / \delta)}{n}}$$

for $\delta \in(0,1]$, where $\mathcal{F}$ is a class of functions $f: \mathcal{X} \times{-1,1} \rightarrow{0,1}$ related to $\mathcal{H}$ that you should specify.

(c) Let $Z_{i}=\left(X_{i}, Y_{i}\right)$ for $i=1, \ldots, n$. Define the empirical Rademacher complexity $\hat{\mathcal{R}}\left(\mathcal{F}\left(Z_{1: n}\right)\right)$. Show that with probability at least $1-\delta$,

$$R(\hat{h})-R\left(h^{*}\right) \leqslant 2 \hat{\mathcal{R}}\left(\mathcal{F}\left(Z_{1: n}\right)\right)+2 \sqrt{\frac{2 \log (3 / \delta)}{n}}$$