

# CoLA Library Exercise

This exercise will help you get familiar with the CoLA. 

## Installation
You need a Python 3.10+ environment with either JAX or PyTorch installed (or both). You can install CoLA using pip as follows:

In [None]:
%pip install git+https://github.com/wilson-labs/cola.git
    
%load_ext autoreload
%autoreload 2


Alternatively, you can open the documentation in Colab and start working from there: [Quick Start](https://colab.research.google.com/github/wilson-labs/cola/blob/master/docs/notebooks/colabs/Quick_Start.ipynb)

We strongly recommend that you read through the [documentation](https://cola.readthedocs.io/en/latest/index.html) to understand the library better.

## Basic Exercises
We'll start with some basic exercises to get you warmed-up for later. For each of the following, you can use either JAX or PyTorch. We recommend that you try both and see if you spot any difference on the behavior of CoLA.

As explained in [Linear Operators: What and Why?](https://cola.readthedocs.io/en/latest/notebooks/LinOpIntro.html) `LinearOperators` are an efficient and scalable manner to represent matrices. To illustrate how to use the `LinearOperators` available in `CoLA`, I'm going to create a `Dense` `LinearOperator` using PyTorch.

In [7]:
import cola
import torch

dtype = torch.float32
A = torch.tensor([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]], dtype=dtype)
A_op = cola.ops.Dense(A)
print(A_op)
print(type(A_op))
print(A_op.dtype)
print(A_op.shape)

Dense
<class 'cola.ops.operators.Dense'>
torch.float32
torch.Size([3, 3])


As you can see from the prints, `A_op` is no longer a `torch.tensor`!

The `LinearOperator` class is quite simple as it only requires that we define three minimum requirements: (1) a `dtype`, (2) a `shape` and (3) a `matmul` function. In the dense case above, all these requirements are taken from the matrix $A$. 

In [8]:
ones = torch.ones(size=(A.shape[0],), dtype=dtype)
print(A_op @ ones)
print(A @ ones)

tensor([ 6., 15., 24.])
tensor([ 6., 15., 24.])


In both cases we get the same results and, notably, we can use the same syntax that we are accustomed to!
You can find all the available operators [here](https://cola.readthedocs.io/en/latest/package/cola.ops.html).

Now, create a diagonal `LinearOperator` using the `cola.ops.Diagonal` class, where the diagonal should be equal to $d=(-1, 2, 3)^T$ and a tridiagonal `LinearOperator` with a diagonal full of ones and the upper and lower bands being $\alpha=(-1/2, 1/4)^T$. _Hint:_ Should $\beta$ had the same shape as $\alpha$?.

In [None]:
diagonal = torch.tensor(_, dtype=dtype)
D_op = cola.ops.Diagonal(_)
alpha = torch.tensor(_, dtype=dtype)
beta = _
T_op = cola.ops.Tridiagonal(alpha, beta, _)

Create a dense version of each of these `LinearOperators` using the `to_dense()` method and print the results.

In [None]:
D = _ 
T = _

Then, perform some basic operations like addition, subtraction, and multiplication on the previous operators. Verify that the computations are correct using the dense API. I suggested one set of binary operations below, but please try more!

In [None]:
print(_)
print(A + D - T)

### Solution

In [1]:
dtype = torch.float32
A = torch.tensor([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]], dtype=dtype)
ones = torch.ones(size=(A.shape[0],), dtype=dtype)
A_op = cola.ops.Dense(A)
D_op = cola.ops.Diagonal(torch.tensor([-1., 2., 3.], dtype=dtype))
alpha = torch.tensor([[-0.5, 0.25]], dtype=dtype).T
beta = torch.tensor([[1., 1., 1.]], dtype=dtype).T
T_op = cola.ops.Tridiagonal(alpha, beta, -alpha)


print(A_op.dtype)
print(A_op.shape)

D = D_op.to_dense()
T = T_op.to_dense()
print(T_op.to_dense())

print(A_op @ ones)
print(A @ ones)

print(A_op + D_op - T_op)
print((A_op + D_op - T_op).to_dense())
print(A + D - T)

torch.float32
torch.Size([3, 3])
tensor([[ 1.0000,  0.5000,  0.0000],
        [-0.5000,  1.0000, -0.2500],
        [ 0.0000,  0.2500,  1.0000]])
tensor([ 6., 15., 24.])
tensor([ 6., 15., 24.])
Dense+diag(tensor([-1.,  2.,  3.]))+-1*Tridiagonal
tensor([[-1.0000,  1.5000,  3.0000],
        [ 4.5000,  6.0000,  6.2500],
        [ 7.0000,  7.7500, 11.0000]])
tensor([[-1.0000,  1.5000,  3.0000],
        [ 4.5000,  6.0000,  6.2500],
        [ 7.0000,  7.7500, 11.0000]])


Arguably the most popular linear algebra operation is solving a linear system, the so called "solves". You are now going to solve a linear system generated from a random matrix using CoLA. What function should you use from the [high level interface](https://cola.readthedocs.io/en/latest/package/cola.linalg.html)? Here is some code to create a random problem.

In [None]:
from jax import numpy as jnp
from jax.random import PRNGKey, normal, split

N = _
key = PRNGKey(seed=21)
A = normal(key, shape=(N, N))
key = split(key, num=1)
rhs = normal(key, shape=(N,))
rhs /= jnp.linalg.norm(rhs)

Now, create a dense `LinearOperator` and find the solution. Compare it to the solution found using `JAX`'s own solver implementation.

In [None]:
soln = _
soln_jax = jnp.linalg.solve(A, rhs)
abs_diff = jnp.linalg.norm(_)
print(f"{abs_diff:1.2e}")

What floating point precision was used in the previous case? Is this relevant? Why or why not?

_[Answer Here]_

### Solution

In [2]:
from jax import numpy as jnp
from jax.random import PRNGKey
from jax.random import normal
from jax.random import split

N = 100
key = PRNGKey(seed=21)
A = normal(key, shape=(N, N))
A_op = cola.ops.Dense(A)
key = split(key, num=1)
rhs = normal(key, shape=(N,))
rhs /= jnp.linalg.norm(rhs)

soln = cola.inverse(A_op) @ rhs

soln_jax = jnp.linalg.solve(A, rhs)
abs_diff = jnp.linalg.norm(soln - soln_jax)
print(f"{abs_diff:1.2e}")



6.31e-06


In the previous exercise, CoLA dispatched a general linear solver as it did not have any extra information about the `LinearOperator` that it could exploit. Lets see how to add some information about the operator such as being PSD, self-adjoint or symmetric if real.
In CoLA we do this using annotation operators like `cola.ops.PSD`, `cola.ops.SelfAdjoint` and `cola.ops.Symmetric`. Let's work with a PSD matrix now.
Construct a `LinearOperator` $S = A A^T + \mu I$. _Hint_: make $A$ dense, use `cola.ops.I_like` (see [docs](https://cola.readthedocs.io/en/latest/package/cola.ops.html#cola.ops.I_like)) and forget that you are using CoLA. Don't forget to annotate your operator!

In [None]:
from jax.config import config
config.update("jax_enable_x64", True)

N = 1_000
key = PRNGKey(seed=21)
dtype = jnp.float64
A = cola.ops.Dense(normal(key, shape=(N, N)))
mu = 1.e-1  # a large enough value ensures PSD
S = _

Having a PSD `LinearOperator` opens up the possiblity in CoLA to dispatch our favorite algorithm: CG. As we all know, CG has a couple of hyperparameters like the tolerance (set it to $10^{-10}$) and the max number of iterations (set it to 10K). _Hint_: Pass `method`, `tol` and `max_iters` to the function that you used in the previous exercise. To see what is going on under the hood, take a look at the [source code](https://github.com/wilson-labs/cola/blob/main/cola/linalg/inverse.py#L67).

In [None]:
soln = _ @ rhs
soln_jax = jnp.linalg.solve(S, rhs)
abs_diff = jnp.linalg.norm(soln - soln_jax)
print(f"{abs_diff:1.2e}")

Why is the difference much smaller that in the previous exercise?

_[Answer Here]_

### Solution

In [3]:
from jax.config import config
config.update("jax_enable_x64", True)

N = 100
key = PRNGKey(seed=21)
dtype = jnp.float64
A = normal(key, shape=(N, N))
mu = 1.e-1  # a large enough value ensures PSD
S = A @ A.T + mu * jnp.eye(A.shape[0])
# S_op = cola.ops.Dense(S)  # compare to option below
S_op = cola.ops.PSD(cola.ops.Dense(S))
key = split(key, num=1)
rhs = normal(key, shape=(N,))
rhs /= jnp.linalg.norm(rhs)

soln = cola.inverse(S_op, method="cg", tol=1e-10, max_iters=10_000) @ rhs

soln_jax = jnp.linalg.solve(S, rhs)
abs_diff = jnp.linalg.norm(soln - soln_jax)
print(f"{abs_diff:1.2e}")

8.62e-13


To conclude the basic set of exercises lets focus on another fundamental linear algebra operation: eigendecomposition. For this case, use the $T$ matrix used here: 
[Linear Operators: What and Why?](https://github.com/wilson-labs/cola/blob/main/docs/notebooks/LinOpIntro.ipynb). Get the eigendecomposition of $T$ using double precision and compare it with PyTorch or JAX's implementation. _Hint_: Check CoLA's [API](https://cola.readthedocs.io/en/latest/package/cola.linalg.html) and maybe use an annotation?

In [None]:
N = 100
dtype = _
abs_diff = _
print(f"{abs_diff:1.2e}")

Are different algorithms being used? Is there a runtime benefit from dispatching a different algorithm?

_[Answer Here]_

### Solution

In [4]:
N = 100
dtype = torch.float64
alpha = torch.ones(size=(N - 1, 1), dtype=dtype)
beta = -2 * torch.ones(size=(N, 1), dtype=dtype)
T_op = cola.ops.Tridiagonal(alpha, beta, alpha)
eigvals, eigvecs = cola.eig(T_op)
eigvals_torch, eigvecs_torch = torch.linalg.eig(T_op.to_dense())
abs_diff = torch.linalg.norm(eigvals - eigvals_torch)
print(f"{abs_diff:1.2e}")

0.00e+00



## Large Scale Machine Learning with CoLA

Using JAX or PyTorch, pick any 3 out of the 5:

### 1. GP

GP Implement Gaussian Process (GP) inference with Radial Basis Function (RBF) kernel using `inverse()` from scratch on a dataset with at least 10k observations. You are not allowed to use GPyTorch. The formula for the GP posterior is:

$$f_* | X, y, X_* \sim \mathcal{N}(\mu_*, \Sigma_*)$$

where:

$$\mu_* = K(X_*, X)[K(X, X) + \sigma^2_n I]^{-1}y$$

$$\Sigma_* = K(X_*, X_*) - K(X_*, X)[K(X, X) + \sigma^2_n I]^{-1}K(X, X_*)$$

Here, $K$ is the RBF kernel, $X$ are the training inputs, $y$ are the training targets, $X_*$ are the test inputs, and $\sigma^2_n$ is the noise variance.


In [None]:
!wget -O bike.mat "https://www.andpotap.com/static/bike.mat"

In [5]:
from jax import numpy as jnp
import os
import numpy as np
from math import floor
from scipy.io import loadmat
import cola


def load_uci_data(data_dir, dataset, train_p=0.75, test_p=0.15):
    file_path = os.path.join(data_dir, dataset + '.mat')
    data = np.array(loadmat(file_path)['data'])
    X = data[:, :-1]
    y = data[:, -1]

    X = X - X.min(0)[None]
    X = 2.0 * (X / X.max(0)[None]) - 1.0
    y -= y.mean()
    y /= y.std()

    train_n = int(floor(train_p * X.shape[0]))
    valid_n = int(floor((1. - train_p - test_p) * X.shape[0]))

    split = split_dataset(X, y, train_n, valid_n)
    train_x, train_y, valid_x, valid_y, test_x, test_y = split

    return train_x, train_y, test_x, test_y, valid_x, valid_y


def split_dataset(x, y, train_n, valid_n):
    train_x = x[:train_n, :]
    train_y = y[:train_n]

    valid_x = x[train_n:train_n + valid_n, :]
    valid_y = y[train_n:train_n + valid_n]

    test_x = x[train_n + valid_n:, :]
    test_y = y[train_n + valid_n:]
    return train_x, train_y, valid_x, valid_y, test_x, test_y


train_x, train_y, *_, test_x, test_y = load_uci_data(data_dir="./", dataset="bike")

dtype = jnp.float32
train_x, train_y = jnp.array(train_x[:100], dtype=dtype), jnp.array(train_y[:100], dtype=dtype)
test_x, test_y = jnp.array(test_x[:10], dtype=dtype), jnp.array(test_y[:10], dtype=dtype)
# train_x, train_y = jnp.array(train_x, dtype=dtype), jnp.array(train_y, dtype=dtype)
# test_x, test_y = jnp.array(test_x, dtype=dtype), jnp.array(test_y, dtype=dtype)


def compute_rbf_cov(xi, xj):
    xi, xj = jnp.expand_dims(xi, -2), jnp.expand_dims(xj, -3)
    res = jnp.exp(-jnp.array(0.5, dtype=xi.dtype) * jnp.sum((xi - xj) ** 2., axis=-1))
    return res


ls = jnp.array(100., dtype=dtype)
noise = jnp.array(1., dtype=dtype)
oscale = jnp.array(1., dtype=dtype)
K_train_train = cola.ops.Dense(oscale * compute_rbf_cov(train_x / ls, train_x / ls))
K_test_train = cola.ops.Dense(oscale * compute_rbf_cov(test_x / ls, train_x / ls))
K_test_test = cola.ops.Dense(oscale * compute_rbf_cov(test_x / ls, test_x / ls))
K = cola.ops.PSD(K_train_train + noise * cola.ops.I_like(K_train_train))
mu = K_test_train @ cola.inverse(K) @ train_y
Sigma = K_test_test.to_dense() - K_test_train @ (cola.inverse(K)) @ K_test_train.T
# Sigma = K_test_test - K_test_train @ (cola.inverse(K)) @ K_test_train.T


### 2. Hessian Spectrum
Compute the eigenspectrum of the Hessian of a pretrained neural network. You can download weights of image classifiers pretrained on CIFAR10. Use `cola.eig` or `cola.algorithms.lanczsos` and the spectral KDE smoothing method from [this paper](https://arxiv.org/pdf/1901.10159.pdf) to get a smoothed spectrum estimate.


In [1]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from torch.nn.utils import stateless

# Load CIFAR10 dataset
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True,
                                        transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=50, shuffle=False, num_workers=2)
model = torch.hub.load("chenyaofo/pytorch-cifar-models", "cifar10_resnet20", pretrained=True)
model = model.to(device='cuda' if torch.cuda.is_available() else 'cpu')

criterion = nn.CrossEntropyLoss()


def flatten_params(params):
    shapes = [p.shape for p in params]
    flat_params = torch.cat([p.flatten() for p in params])
    return flat_params, shapes


def unflatten_params(flat_params, shapes):
    params = []
    i = 0
    for shape in shapes:
        size = torch.prod(torch.tensor(shape)).item()
        param = flat_params[i:i + size]
        param = param.view(shape)
        params.append(param)
        i += size
    return params


flat_p, shape = flatten_params(list(model.parameters()))
flat_p = flat_p.detach().requires_grad_(True)


def stateless_model(fparams, x):
    params = unflatten_params(fparams, shape)
    names = list(n for n, _ in model.named_parameters())
    nps = {n: p for n, p in zip(names, params)}
    return stateless.functional_call(model, nps, x)


def loss_fn(params):
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    total_loss = 0.0
    for i, (images, labels) in enumerate(trainloader):
        outputs = stateless_model(params, images.to(device))
        loss = criterion(outputs, labels.to(device))
        total_loss += loss
        if i > 10:
            break  # For now we will only use a subset of the data
    return total_loss / len(trainloader)


g = torch.autograd.grad([loss_fn(flat_p)], flat_p)[0]

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 170498071/170498071 [00:07<00:00, 24219391.49it/s]


Extracting ./data/cifar-10-python.tar.gz to ./data


Using cache found in /home/ubu/.cache/torch/hub/chenyaofo_pytorch-cifar-models_master


In [None]:
grads = torch.autograd.grad([loss_fn(flat_p)], flat_p, create_graph=True)
v = torch.ones(size=(grads[0].shape[0],))


def matmat(vec):
    if len(vec.shape) == 1:
        vec = vec.reshape(-1, 1)
    hessian_vector_product = []
    for idx in range(vec.shape[1]):
        inner_grad = torch.autograd.grad(grads, flat_p, vec[:, idx], retain_graph=True)[0]
        hessian_vector_product.append(inner_grad)

    hessian_mvm = torch.stack(hessian_vector_product).T
    return hessian_mvm


shape = (flat_p.shape[0], flat_p.shape[0])
Hess = cola.ops.LinearOperator(shape=shape, dtype=v.dtype, matmat=matmat)
# Hess = cola.ops.Symmetric(Hess)
print(Hess @ v)
out = cola.algorithms.lanczos(Hess, flat_p, max_iters=10, tol=1e-7)


### 3. Linear Regression
Implement linear regression with a heteroscedastic noise model where $\Phi$ is the design matrix, $\beta$ are the parameters and $\sigma_i$ is the measurement noise. The model is:

$$y = \Phi \beta + \epsilon, \quad \epsilon \sim \mathcal{N}(0, D)$$

where $D$ is a diagonal matrix with $\sigma_i^2$ on the diagonal. Add a Gaussian prior (regularization) if necessary.
    


Hint: $\hat{\beta}_{MLE} = (\Phi^T D^{-1} \Phi)^{-1} \Phi^T D^{-1} y$

In [6]:
import torch

train_x, train_y, *_, test_x, test_y = load_uci_data(data_dir="./", dataset="bike")

dtype = torch.float32
# train_x = torch.tensor(train_x[:100], dtype=dtype)
# train_y = torch.tensor(train_y[:100], dtype=dtype)
# test_x, test_y = torch.tensor(test_x[:10], dtype=dtype), torch.tensor(test_y[:10], dtype=dtype)
train_x, train_y = torch.tensor(train_x, dtype=dtype), torch.tensor(train_y, dtype=dtype)
test_x, test_y = torch.tensor(test_x, dtype=dtype), torch.tensor(test_y, dtype=dtype)

sigma = 0.1
sigma_x = sigma * (1. + torch.norm(train_x, dim=1))
# sigma_x = sigma * torch.norm(train_x, dim=1)
# sigma_x = torch.ones(size=(train_x.shape[0],))
D = cola.ops.Diagonal(sigma_x)
Phi = cola.ops.Dense(train_x)
mu = 0.05
Phi_inv = Phi.T @ cola.inverse(D) @ Phi
Phi_inv += mu * cola.ops.I_like(Phi_inv)
Phi_inv = cola.inverse(cola.ops.PSD(Phi_inv))
beta_mle = Phi_inv @ Phi.T @ cola.inverse(D) @ train_y
print(beta_mle)

test_rmse = torch.norm(test_y - test_x @ beta_mle)
print(f"Test RMSE: {test_rmse:1.3e}")

tensor([ 0.0178,  0.0054, -0.0097,  0.0379,  0.0159,  0.0046,  0.6153, -0.4644,
         0.0087, -0.1270, -0.2287,  0.9366, -0.5653, -0.1517, -0.2609, -0.3012,
         1.4519])
Test RMSE: 2.409e+01


### 4. Implement pagerank to find the most influential pages of Wikipedia.
 From the transition matrix on the [Linked- WikiText-2 dataset](https://rloganiv.github.io/linked-wikitext-2/#/), compute the largest eigenvector using `cola.eigmax`. From this eigenvector, rank the values to determine which web pages are most influential.

The PageRank algorithm computes the stationary distribution of a Markov chain. Given a transition matrix $P$, the PageRank vector $r$ is the eigenvector corresponding to the largest eigenvalue (which should be 1 for a stochastic matrix).

The transition matrix $P$ is defined as:

$$P = (1-\alpha)W + \alpha \mathbf{1}\mathbf{1}^T$$

where $W$ is the adjacency matrix normalized by the degree, $\alpha$ is the damping factor (usually set to 0.15), and $\mathbf{1}$ is a vector of ones.

The adjacency matrix $A_{ij}$ is 1 if there is a link from page $i$ to page $j$ (not the other way around). The degree-normalized adjacency matrix $W$ is obtained by dividing each row of $A$ by its sum.

The PageRank vector $r$ can be found by solving the eigenproblem:

$$P^T r = r$$

The entries of $r$ give the PageRank scores of the pages. The pages can then be ranked by these scores to find the most influential ones.


Below is some starter code to create an adjacency matrix. The pages are in the form of Wikipedia QIDs. After finding the most popular QIDs, if they are not in the `page_to_title dict`, you can look them up using the wikipedia API with the `get_titles_from_wikidata` function.

Suggestion: use the `cola.ops.Sparse` matrix for the adjacency matrix.

In [None]:
%pip --quiet install git+https://github.com/wilson-labs/cola.git
%pip install requests
%pip install io
%pip install zipfile

In [7]:
import cola
import torch
import json
import numpy as np
import requests
from collections import defaultdict
import zipfile
import io

# Request the zipped data
link = "https://rloganiv.github.io/linked-wikitext-2/static/media/linked-wikitext-2.142e2e52.zip"
r = requests.get(link)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall()

# Initialize a dictionary to hold the adjacency list
adjacency_list = defaultdict(set)

# Initialize a dictionary to map page ids to indices
page_to_index, index_to_page, page_to_title, next_index = {}, {}, {}, 0

# Extract the JSONL file
files = ['valid.jsonl', 'train.jsonl', 'test.jsonl']
for file in files:
    with z.open(file) as f:
        data = f.read().decode()
        for line in data.splitlines():
            data = json.loads(line)
            current_page_id = data['annotations'][0]['id']

            page_to_title[current_page_id] = data['title']
            # If the current page id is not in the dictionary, add it
            if current_page_id not in page_to_index:
                page_to_index[current_page_id] = next_index
                index_to_page[next_index] = current_page_id
                next_index += 1

            current_page_index = page_to_index[current_page_id]
            for annotation in data['annotations']:
                # If the annotation is a link to another page, add it to the adjacency list
                if (annotation['source'] == 'WIKI') and (annotation['id'] != current_page_id):
                    linked_page_id = annotation['id']

                    # If the linked page id is not in the dictionary, add it
                    if linked_page_id not in page_to_index:
                        page_to_index[linked_page_id] = next_index
                        index_to_page[next_index] = linked_page_id
                        next_index += 1

                    linked_page_index = page_to_index[linked_page_id]
                    adjacency_list[current_page_index].add(linked_page_index)

num_pages = len(page_to_index)
adjacency_matrix = np.zeros((num_pages, num_pages), dtype=int)
for page_index, linked_page_indices in adjacency_list.items():
    for linked_page_index in linked_page_indices:
        adjacency_matrix[page_index, linked_page_index] = 1

In [8]:
dtype = torch.float32
adjacency_matrix = adjacency_matrix[:100, :100]
adjacency_matrix = torch.tensor(adjacency_matrix, dtype=dtype)
ones = cola.ops.Dense(torch.ones(size=(adjacency_matrix.shape[0], 1)))
norm = adjacency_matrix.sum(0).unsqueeze(1)
norm = torch.where(norm == 0., torch.tensor(1.), norm)
W = cola.ops.Dense(adjacency_matrix / norm)
alpha = 0.15
P = (1. - alpha) * W + alpha * ones @ ones.T
eigvec_max, eig_max = cola.eigmax(P)
_, indices = torch.sort(eigvec_max, descending=True)
print(indices[:10])

tensor([47,  0, 96, 73, 75, 72, 71, 70, 69, 68])


In [None]:
def get_titles_from_wikidata(qids):
    qids_string = '|'.join(qids)
    url = 'https://www.wikidata.org/w/api.php'
    params = {
        'action': 'wbgetentities',
        'ids': qids_string,
        'format': 'json',
        'props': 'labels',
        'languages': 'en'
    }
    response = requests.get(url, params=params)
    data = response.json()
    titles = {}
    for qid, entity in data['entities'].items():
        if 'en' in entity['labels']:
            titles[qid] = entity['labels']['en']['value']
    return titles


### 5. Make a pull request to CoLA.
 e.g., improvement to the documentation, new commonly used linear operator (e.g., Fisher information matrix, banded matrix, FFT matrix), bug fix. If your code for one of the above exercises is particularly clean, consider adding markdown text explaining the steps and let's add it to the CoLA documentation under examples.
