---
title: Active Regression
description: Adaptive sampling strategies for linear regression where observation costs are significant
keywords: [active regression, adaptive sampling, observation costs, sequential sampling, active learning, optimal design, sample complexity]
numbering:
  equation:
    enumerator: 7.%s
    continue: true
  proof:theorem:
    enumerator: 7.%s
    continue: true
  proof:algorithm:
    enumerator: 7.%s
    continue: true
  proof:definition:
    enumerator: 7.%s
    continue: true
  proof:proposition:
    enumerator: 7.%s
    continue: true
---

The active regression problem is a variant of the standard [linear regression problem].

:::{prf:definition} Active Linear Regression
Given $\vec{A}\in\R^{n\times d}$ (full rank) and $\vec{b}\in\R^n$, solve
\begin{equation*}
\min_{\vec{x}} \| \vec{b} - \vec{A}\vec{x} \|^2
,\qquad \vec{A}\in\R^{n\times d}, \quad \vec{b}\in\R^n,
\end{equation*}
using as few entry evaluations of $\vec{b}$ as possible.
:::
Since we are measuring cost by the number of entries of $\vec{b}$ that get observed, most of the algorithm presented in the [chapter](../04-Regression/regression) on linear regression, which require reading the entire vector $\vec{b}$,  are off the table.

This section outlines basic sampling-based approaches to the active regression problem that aim to use a small number of entry evaluations.
<!-- 

### Basis independence

For convenience, we will work in an orthonormal basis for $\range(\vec{A})$. 
In particular, let $\vec{U}$ be a matrix with orthonormal columns such that $\vec{U}\vec{C} = \vec{A}$, for some (invertible) matrix $\vec{C}$ and consider the regression problem
\begin{equation*}
\min_{\vec{z}} \| \vec{b} - \vec{U}\vec{z} \|^2
\end{equation*}
The solution is $\vec{U}^\T\vec{b}$.


This change of variables does not impact the residual error nor does it impact sampling $\vec{b}$.
As noted in {prf:ref}`thm-residual-to-error`, residual bounds can be converted to error bounds. -->


# Leverage score sampling

Recall $\Call{leverage-dist}(\vec{A})$ is the distribution that corresponds to sampling an index from $\{1, \ldots, n\}$ proportional to the [Leverage-scores](def:leverage-score) $(\ell_1, \ldots, \ell_n)$ of $\vec{A}$.


:::{prf:algorithm} Regression by Leverage-score sampling
:label: alg-leverage-regression
**Input:** $\vec{A}$, $\vec{b}$, number of samples $k$ 

1. Sample iid indices $s_1, \ldots, s_k\sim \Call{leverage-dist}(\vec{A})$
1. Form 
\begin{equation*}
\widehat{\vec{A}} := 
\begin{bmatrix}
- & \ell_{s_1}^{-1/2}  \vec{a}_{s_1}^\T & -\\
&\vdots \\
- & \ell_{s_k}^{-1/2}  \vec{a}_{s_k}^\T & -\\
\end{bmatrix}
,\quad
\widehat{\vec{b}} := 
\begin{bmatrix}
\ell_{s_1}^{-1/2}  b_{s_1} \\ \vdots \\ \ell_{s_k}^{-1/2} b_{s_n}
\end{bmatrix}
\end{equation*} 
1. Obtain solution $\widehat{\vec{x}}$ to least squares problem $\min_{\vec{x}} \|\widehat{\vec{b}} - \widehat{\vec{A}}\vec{x}\|$.


**Output:** $\widehat{\vec{x}}$
::: 


# Analysis

Note that {prf:ref}`alg-leverage-regression` is nothing more than {prf:ref}`alg-sketch-and-solve` (sketch-and-solve) using the [Leverage-score Sketch](def:leverage-score-sketch).
{prf:ref}`prf-leverage-SE` guarantees that the leverage-score sketch is a subspace embedding for $\vec{A}$.
However, we cannot immediately apply the analysis techniques from used in the analysis of [](../04-Regression/sketch-and-solve), because these require that the sketch is a subspace embedding for $[\vec{A},\vec{b}]$. 
The standard approach to the analysis is to make use of [](./approximate-matrix-multiplication.md) guarantee.


:::{prf:theorem}
Then for some 
\begin{equation*}
k = O\left( d \log\left( \frac{d}{\delta}\right) + \frac{d}{\delta\varepsilon} \right),
\end{equation*}
except with probability at most $\delta$, it holds that, 
\begin{equation*}
\| \vec{b} - \vec{A}\widehat{\vec{x}} \|^2 \leq (1+\varepsilon) \|\vec{b} - \vec{A}\vec{x}^* \|^2.
\end{equation*}
:::


:::{prf:proof}
:class: dropdown
:enumerated: false

See Raphel's [wiki](https://randnla.github.io/leverage-score-regression/).
:::
