# Введение

В этом задании Вы продолжите работать с данными из семинара [Articles Sharing and Reading from CI&T Deskdrop](https://www.kaggle.com/gspmoreira/articles-sharing-reading-from-cit-deskdrop). Если нет аккаунта на кеггле, скачать датасет можно [здесь](https://drive.google.com/file/d/1rLSr49zx6RPZIn7PV_LQr9KnnpPhrr0K/view?usp=sharing).

# Загрузка и предобработка данных

In [407]:
import math

import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix

Загрузим данные и проведем предобраотку данных как на семинаре.

In [408]:
articles_df = pd.read_csv("articles/shared_articles.csv")
articles_df = articles_df[articles_df["eventType"] == "CONTENT SHARED"]
articles_df.head(2)

Unnamed: 0,timestamp,eventType,contentId,authorPersonId,authorSessionId,authorUserAgent,authorRegion,authorCountry,contentType,url,title,text,lang
1,1459193988,CONTENT SHARED,-4110354420726924665,4340306774493623681,8940341205206233829,,,,HTML,http://www.nytimes.com/2016/03/28/business/dea...,"Ethereum, a Virtual Currency, Enables Transact...",All of this work is still very early. The firs...,en
2,1459194146,CONTENT SHARED,-7292285110016212249,4340306774493623681,8940341205206233829,,,,HTML,http://cointelegraph.com/news/bitcoin-future-w...,Bitcoin Future: When GBPcoin of Branson Wins O...,The alarm clock wakes me at 8:00 with stream o...,en


In [409]:
interactions_df = pd.read_csv("articles/users_interactions.csv")
interactions_df.head(2)

Unnamed: 0,timestamp,eventType,contentId,personId,sessionId,userAgent,userRegion,userCountry
0,1465413032,VIEW,-3499919498720038879,-8845298781299428018,1264196770339959068,,,
1,1465412560,VIEW,8890720798209849691,-1032019229384696495,3621737643587579081,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2...,NY,US


In [410]:
interactions_df.personId = interactions_df.personId.astype(str)
interactions_df.contentId = interactions_df.contentId.astype(str)
articles_df.contentId = articles_df.contentId.astype(str)

In [411]:
# зададим словарь определяющий силу взаимодействия
event_type_strength = {
    "VIEW": 1.0,
    "LIKE": 2.0,
    "BOOKMARK": 2.5,
    "FOLLOW": 3.0,
    "COMMENT CREATED": 4.0,
}

interactions_df["eventStrength"] = interactions_df.eventType.apply(
    lambda x: event_type_strength[x]
)

Оставляем только тех пользователей, которые произамодействовали более чем с пятью статьями.

In [412]:
users_interactions_count_df = (
    interactions_df.groupby(["personId", "contentId"])
    .first()
    .reset_index()
    .groupby("personId")
    .size()
)
print("# users:", len(users_interactions_count_df))

users_with_enough_interactions_df = users_interactions_count_df[
    users_interactions_count_df >= 5
].reset_index()[["personId"]]
print("# users with at least 5 interactions:", len(users_with_enough_interactions_df))

# users: 1895
# users with at least 5 interactions: 1140


Оставляем только те взаимодействия, которые относятся к отфильтрованным пользователям.

In [413]:
interactions_from_selected_users_df = interactions_df.loc[
    np.in1d(interactions_df.personId, users_with_enough_interactions_df)
]

In [414]:
print(f"# interactions before: {interactions_df.shape}")
print(f"# interactions after: {interactions_from_selected_users_df.shape}")

# interactions before: (72312, 9)
# interactions after: (69868, 9)


Объединяем все взаимодействия пользователя по каждой статье и сглаживаем полученный результат, взяв от него логарифм.

In [415]:
def smooth_user_preference(x):
    return math.log(1 + x, 2)


interactions_full_df = (
    interactions_from_selected_users_df.groupby(["personId", "contentId"])
    .eventStrength.sum()
    .apply(smooth_user_preference)
    .reset_index()
    .set_index(["personId", "contentId"])
)
interactions_full_df["last_timestamp"] = interactions_from_selected_users_df.groupby(
    ["personId", "contentId"]
)["timestamp"].last()

interactions_full_df = interactions_full_df.reset_index()
interactions_full_df.head(5)

Unnamed: 0,personId,contentId,eventStrength,last_timestamp
0,-1007001694607905623,-5065077552540450930,1.0,1470395911
1,-1007001694607905623,-6623581327558800021,1.0,1487240080
2,-1007001694607905623,-793729620925729327,1.0,1472834892
3,-1007001694607905623,1469580151036142903,1.0,1487240062
4,-1007001694607905623,7270966256391553686,1.584963,1485994324


Разобьём выборку на обучение и контроль по времени.

In [416]:
from sklearn.model_selection import train_test_split

split_ts = 1475519530
interactions_train_df = interactions_full_df.loc[
    interactions_full_df.last_timestamp < split_ts
].copy()
interactions_test_df = interactions_full_df.loc[
    interactions_full_df.last_timestamp >= split_ts
].copy()

print(f"# interactions on Train set: {len(interactions_train_df)}")
print(f"# interactions on Test set: {len(interactions_test_df)}")

interactions_train_df

# interactions on Train set: 29329
# interactions on Test set: 9777


Unnamed: 0,personId,contentId,eventStrength,last_timestamp
0,-1007001694607905623,-5065077552540450930,1.0,1470395911
2,-1007001694607905623,-793729620925729327,1.0,1472834892
6,-1032019229384696495,-1006791494035379303,1.0,1469129122
7,-1032019229384696495,-1039912738963181810,1.0,1459376415
8,-1032019229384696495,-1081723567492738167,2.0,1464054093
...,...,...,...,...
39099,997469202936578234,9112765177685685246,2.0,1472479493
39100,998688566268269815,-1255189867397298842,1.0,1474567164
39101,998688566268269815,-401664538366009049,1.0,1474567449
39103,998688566268269815,6881796783400625893,1.0,1474567675


Для удобства подсчёта качества запишем данные в формате, где строка соответствует пользователю, а столбцы будут истинными метками и предсказаниями в виде списков.

In [417]:
interactions = (
    interactions_train_df.groupby("personId")["contentId"]
    .agg(lambda x: list(x))
    .reset_index()
    .rename(columns={"contentId": "true_train"})
    .set_index("personId")
)

interactions["true_test"] = interactions_test_df.groupby("personId")["contentId"].agg(
    lambda x: list(x)
)

# заполнение пропусков пустыми списками
interactions.loc[pd.isnull(interactions.true_test), "true_test"] = [
    ""
    for x in range(
        len(interactions.loc[pd.isnull(interactions.true_test), "true_test"])
    )
]

interactions.head(1)

Unnamed: 0_level_0,true_train,true_test
personId,Unnamed: 1_level_1,Unnamed: 2_level_1
-1007001694607905623,"[-5065077552540450930, -793729620925729327]","[-6623581327558800021, 1469580151036142903, 72..."


# Библиотека LightFM

Для рекомендации Вы будете пользоваться библиотекой [LightFM](https://making.lyst.com/lightfm/docs/home.html), в которой реализованы популярные алгоритмы. Для оценивания качества рекомендации, как и на семинаре, будем пользоваться метрикой *precision@10*.

In [418]:
#!pip install lightfm

In [419]:
from lightfm import LightFM
from lightfm.evaluation import precision_at_k

## Задание 1 (1.5 балла)

Модели в LightFM работают с разреженными матрицами. Создайте разреженные матрицы `data_train` и `data_test` (размером количество пользователей на количество статей), такие что на пересечении строки пользователя и столбца статьи стоит сила их взаимодействия, если взаимодействие было, и стоит ноль, если взаимодействия не было.

In [420]:
df_before_ts = interactions_full_df[interactions_full_df['last_timestamp'] <= split_ts]
df_after_ts = interactions_full_df[interactions_full_df['last_timestamp'] > split_ts]

unique_person_ids = sorted(set(df_before_ts['personId']).union(set(df_after_ts['personId'])))
unique_content_ids = sorted(set(df_before_ts['contentId']).union(set(df_after_ts['contentId'])))

data_train = pd.DataFrame(index=unique_person_ids, columns=unique_content_ids, data=0)
data_test = pd.DataFrame(index=unique_person_ids, columns=unique_content_ids, data=0)

for row in df_before_ts.itertuples():
    data_train.at[row.personId, row.contentId] = row.eventStrength

for row in df_after_ts.itertuples():
    data_test.at[row.personId, row.contentId] = row.eventStrength

  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.perso

  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.personId, row.contentId] = row.eventStrength
  data_train.at[row.perso

In [421]:
data_train

Unnamed: 0,-1006791494035379303,-1021685224930603833,-1022885988494278200,-1024046541613287684,-1033806831489252007,-1038011342017850,-1039912738963181810,-1046621686880462790,-1051830303851697653,-1055630159212837930,...,9222265156747237864,943818026930898372,957332268361319692,962287586799267519,966067567430037498,967143806332397325,972258375127367383,980458131533897249,98528655405030624,991271693336573226
-1007001694607905623,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,...,0,0.000000,0.0,0,0.0,0,0,0.0,0.0,0
-1032019229384696495,1,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0,0.0,...,0,2.321928,0.0,0,0.0,0,0,0.0,0.0,0
-108842214936804958,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,...,0,0.000000,0.0,0,2.0,0,0,0.0,0.0,0
-1119397949556155765,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,...,0,0.000000,0.0,0,0.0,0,0,0.0,0.0,0
-1130272294246983140,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,...,0,1.000000,0.0,0,0.0,0,0,0.0,0.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
953707509720613429,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,...,0,0.000000,0.0,0,0.0,0,0,0.0,0.0,0
983095443598229476,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,...,0,0.000000,0.0,0,0.0,0,0,0.0,0.0,0
989049974880576288,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,...,0,0.000000,0.0,0,0.0,0,0,0.0,0.0,0
997469202936578234,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,...,0,0.000000,0.0,0,0.0,0,0,0.0,0.0,0


In [422]:
data_train = csr_matrix(data_train.values)
data_train

<1140x2984 sparse matrix of type '<class 'numpy.float64'>'
	with 29329 stored elements in Compressed Sparse Row format>

In [423]:
data_test = csr_matrix(data_test.values)
data_test

<1140x2984 sparse matrix of type '<class 'numpy.float64'>'
	with 9777 stored elements in Compressed Sparse Row format>

In [424]:
assert data_train.shape == data_test.shape

In [425]:
assert data_train.shape == (1140, 2984)


## Задание 2 (0.5 балла)

Обучите модель LightFM с `loss="warp"` и посчитайте *precision@10* на тесте.

In [426]:
from lightfm import LightFM
from lightfm.datasets import fetch_movielens
from lightfm.evaluation import precision_at_k


model = LightFM(loss="warp", random_state=42)
model.fit(data_train, epochs=50, num_threads=2)


precision = precision_at_k(model, data_test, k=10).mean()
precision

0.0041751526

> ¯\_(ツ)_/¯

## Задание 3 (2 балла)

При вызове метода `fit` LightFM позволяет передавать в `item_features` признаковое описание объектов. Воспользуемся этим. Будем получать признаковое описание из текста статьи в виде [TF-IDF](https://ru.wikipedia.org/wiki/TF-IDF) (можно воспользоваться `TfidfVectorizer` из scikit-learn). Создайте матрицу `feat` размером количесвто статей на размер признакового описание и обучите LightFM с `loss="warp"` и посчитайте precision@10 на тесте.

In [427]:
interactions_full_df.nunique()

personId           1140
contentId          2984
eventStrength        70
last_timestamp    38900
dtype: int64

In [448]:
feat = articles_df[['title', 'text']]
feat.index = articles_df['contentId']
feat = feat.reindex(unique_content_ids).fillna('')
feat['text'] = feat.apply(lambda x: x['title'] + ' ' + x['text'], axis=1)
feat.drop(columns=['title'], inplace=True)
feat


Unnamed: 0_level_0,text
contentId,Unnamed: 1_level_1
-1006791494035379303,Google unleashes DeepMind on energy-hungry dat...
-1021685224930603833,Indústria 4.0: desafios e oportunidades *Igor ...
-1022885988494278200,12 JavaScript Hacks In this post I will share ...
-1024046541613287684,Australian Bitcoin Entrepreneur Launches Robo-...
-1033806831489252007,React Native v0.32.0-rc.0 released v0.32.0-rc....
...,...
967143806332397325,Baidu abre laboratório de realidade aumentada ...
972258375127367383,Better Exposed Filters The Better Exposed Fil...
980458131533897249,Elasticsearch: CSV exporter for Kibana Discove...
98528655405030624,Quer reclamar? Desenvolvedores vencem hackatho...


первые статьи в data_test -1006791494035379303	-1021685224930603833	-1022885988494278200

In [429]:
data_train

<1140x2984 sparse matrix of type '<class 'numpy.float64'>'
	with 29329 stored elements in Compressed Sparse Row format>

In [430]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split

tfidf_vectorizer = TfidfVectorizer()

tfidf_vectorizer.fit(feat.text)

In [434]:
item_features = tfidf_vectorizer.transform(feat.text)

model = LightFM(loss='warp', random_state=42)
model.fit(data_train, item_features=item_features, epochs=20, num_threads=2)

precision = precision_at_k(model, data_test, item_features=item_features, k=10).mean()
precision

0.0045824847

> улучшилось

## Задание 4 (1.5 балла)

В задании 3 мы использовали сырой текст статей. В этом задании необходимо сначала сделать предобработку текста (привести к нижнему регистру, убрать стоп слова, привести слова к номральной форме и т.д.), после чего обучите модель и оценить качество на тестовых данных.

In [436]:
import re
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
import spacy # просто потому что могу

In [437]:
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to /home/choosen-
[nltk_data]     one/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /home/choosen-
[nltk_data]     one/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /home/choosen-
[nltk_data]     one/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [439]:
nlp = spacy.load('en_core_web_sm')

In [364]:
stopwords.fileids()

['arabic',
 'azerbaijani',
 'basque',
 'bengali',
 'catalan',
 'chinese',
 'danish',
 'dutch',
 'english',
 'finnish',
 'french',
 'german',
 'greek',
 'hebrew',
 'hinglish',
 'hungarian',
 'indonesian',
 'italian',
 'kazakh',
 'nepali',
 'norwegian',
 'portuguese',
 'romanian',
 'russian',
 'slovene',
 'spanish',
 'swedish',
 'tajik',
 'turkish']

In [440]:
articles_df['lang'].unique()

array(['en', 'pt', 'es', 'la', 'ja'], dtype=object)

In [471]:
stop_words = set(stopwords.words('english') + stopwords.words('portuguese') + stopwords.words('spanish'))
lemmatizer = WordNetLemmatizer()

In [472]:
def preprocess_text(text):
    text = text.lower()
    text = re.sub(r'[^\w\s]', '', text)
    tokens = word_tokenize(text)
    tokens = [lemmatizer.lemmatize(token) for token in tokens if token not in stop_words]
    return ' '.join(tokens)

In [473]:
feat = feat.copy()
feat['text'] = feat['text'].apply(preprocess_text)

In [474]:
feat

Unnamed: 0_level_0,text
contentId,Unnamed: 1_level_1
-1006791494035379303,google unleashes deepmind energyhungry datacen...
-1021685224930603833,indústria 40 desafios oportunidades igor schie...
-1022885988494278200,12 javascript hack post share 12 extremely use...
-1024046541613287684,australian bitcoin entrepreneur launch roboadv...
-1033806831489252007,react native v0320rc0 released v0320rc0 github...
...,...
967143806332397325,baidu abre laboratório realidade aumentada sta...
972258375127367383,better exposed filter better exposed filter mo...
980458131533897249,elasticsearch csv exporter kibana discover use...
98528655405030624,quer reclamar desenvolvedores vencem hackathon...


In [475]:
tfidf_vectorizer = TfidfVectorizer()

item_features = tfidf_vectorizer.fit_transform(feat.text)

model = LightFM(loss='warp', random_state=42)
model.fit(data_train, item_features=item_features, epochs=100, num_threads=8)

precision = precision_at_k(model, data_test, item_features=item_features, k=10).mean()

In [470]:
precision

0.0047861505

## Задание 5 (1.5 балла)

Подберите гиперпараметры модели LightFM (`n_components` и др.) для улучшения качества модели.

In [477]:
from sklearn.model_selection import GridSearchCV

In [478]:
param_grid = {
    'no_components': (1, 30),
    'learning_rate': (0.01, 0.15),    
    'loss': ['warp', 'bpr']
}

def precision_scorer(estimator, X, k=10):
    return precision_at_k(estimator, X, k=k).mean()

# Создание объекта GridSearchCV
grid_search = GridSearchCV(model, param_grid, cv=5, scoring=precision_scorer)

# Поиск лучших параметров
grid_search.fit(data_train, item_features=item_features, epochs=40)

# Вывод лучших параметров
print("Best parameters:", grid_search.best_params_)

# Оценка модели с лучшими параметрами
precision = precision_at_k(grid_search.best_estimator_, data_test, item_features=item_features, k=10).mean()
print("Precision@10 with best parameters:", precision)

Traceback (most recent call last):
  File "/home/choosen-one/.local/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 980, in _score
    scores = scorer(estimator, X_test, **score_params)
  File "/tmp/ipykernel_23773/3508549640.py", line 8, in precision_scorer
    return precision_at_k(estimator, X, k=k).mean()
  File "/home/choosen-one/.local/lib/python3.10/site-packages/lightfm/evaluation.py", line 71, in precision_at_k
    ranks = model.predict_rank(
  File "/home/choosen-one/.local/lib/python3.10/site-packages/lightfm/lightfm.py", line 954, in predict_rank
    raise ValueError("Incorrect number of features in item_features")
ValueError: Incorrect number of features in item_features



Best parameters: {'learning_rate': 0.01, 'loss': 'warp', 'no_components': 1}
Precision@10 with best parameters: 0.002851324


> С гридсеарч получилась еренду - воспользуемся байесовской оптимизацией

In [487]:
from bayes_opt import BayesianOptimization

def lightfm_precision(epochs, learning_rate, no_components):
    epochs = int(epochs)
    no_components = int(no_components)       
    model = LightFM(loss='warp', learning_rate=learning_rate, no_components=no_components)    
  
    model.fit(data_train,  item_features=item_features, epochs=epochs)
    
    precision = precision_at_k(model, data_test, item_features=item_features, k=10).mean()
    
    return precision

parameters = {
    'no_components': (10, 80),
    'epochs': (5, 50),
    'learning_rate': (0.01, 0.1)
}

optimizer = BayesianOptimization(
    f=lightfm_precision,
    pbounds=parameters,
    verbose=5,
    random_state=42,
)

optimizer.maximize(init_points=5, n_iter=15)

print(optimizer.max)

|   iter    |  target   |  epochs   | learni... | no_com... |
-------------------------------------------------------------
| [0m1        [0m | [0m0.001935 [0m | [0m21.85    [0m | [0m0.09556  [0m | [0m63.92    [0m |
| [95m2        [0m | [95m0.004277 [0m | [95m31.94    [0m | [95m0.02404  [0m | [95m29.36    [0m |
| [0m3        [0m | [0m0.002138 [0m | [0m7.614    [0m | [0m0.08796  [0m | [0m56.07    [0m |
| [95m4        [0m | [95m0.004888 [0m | [95m36.86    [0m | [95m0.01185  [0m | [95m78.19    [0m |
| [0m5        [0m | [0m0.003564 [0m | [0m42.46    [0m | [0m0.02911  [0m | [0m30.91    [0m |
| [0m6        [0m | [0m0.001426 [0m | [0m47.96    [0m | [0m0.07356  [0m | [0m79.63    [0m |
| [0m7        [0m | [0m0.0008147[0m | [0m36.77    [0m | [0m0.09216  [0m | [0m78.41    [0m |
| [0m8        [0m | [0m0.003259 [0m | [0m19.64    [0m | [0m0.05312  [0m | [0m28.57    [0m |
| [0m9        [0m | [0m0.002749 [0m | [0m47.4

## Задание 6 (1 балл)

Реализуйте функции для вычисления следующих метрик:
* precision@k
* recall@k
* NDCG@k



In [265]:
# Ваш код здесь

## Задание 7 (1 балл)

Вычислите значения реализованных метрик для $k=10$ для лучшей полученной модели в предыдущих шагах.

Найдите уже реализованные варианты этих метрик в библиотеках lightfm и sklearn. Сравните полученные у вас значения метрик с результатами встроенных в библиотеки метрик.

In [266]:
# Ваш код здесь

## Задание 8 (1 балл)

Реализуйте алгоритм ALS и примените его для решения задачи ноутбука.

**ALS**

Итак, поставлена задача построения модели со скрытыми переменными (latent factor model) для коллаборативной фильтрации:

$$ \sum_{u,i} (r_{ui} - \langle p_u, q_i \rangle)^2 \to \min_{P,Q}$$

Суммирование ведется по всем парам $(u, i),$ для которых известен рейтинг $r_{ui}$ (и только по ним), а $p_u, q_i$ – латентные представления пользователя~$u$ и товара $i$, соответственно, матрицы $P, Q$ получаются путем записывания по столбцам векторов $p_u, q_i$ соответственно.

Подход ALS (Alternating Least Squares) решает задачу, попеременно фиксируя матрицы $P$ и $Q$, — оказывается, что, зафиксировав одну из матриц, можно выписать аналитическое решение задачи для другой.

$$\nabla_{p_u} \bigg[ \sum_{u,i} (r_{ui} - \langle p_u, q_i \rangle)^2 \bigg] = \sum_{i} 2(r_{ui} - \langle p_u, q_i \rangle)q_i = 0$$

Воспользовавшись тем, что $a^Tbc = cb^Ta$, получим
$$\sum_{i} r_{ui}q_i - \sum_i q_i q_i^T p_u = 0.$$

Тогда окончательно каждый столбец матрицы $P$ можно найти по формуле
$$p_u = \bigg( \sum_i q_i q_i^T\bigg)^{-1}\sum_ir_{ui}q_i \;\; \forall u,$$

аналогично для столбцов матрицы $Q$
$$q_i = \bigg( \sum_u p_u p_u^T\bigg)^{-1}\sum_ur_{ui}p_u \;\; \forall i.$$

Таким образом мы можем решать оптимизационную задачу, поочередно фиксируя одну из матриц $P$ или $Q$ и проводя оптимизацию по второй.

**Оригинальная статья c постановкой задачи для ALS на explicit feedback:**

* Bell, R.M. and Koren, Y., 2007, October. Scalable collaborative filtering with jointly derived neighborhood interpolation weights. In Seventh IEEE international conference on data mining (ICDM 2007) (pp. 43-52). IEEE.

**Оригинальная статья с ALS для implicit данных, которая стала более известной:**

* Hu, Y., Koren, Y. and Volinsky, C., 2008, December. Collaborative filtering for implicit feedback datasets. In 2008 Eighth IEEE international conference on data mining (pp. 263-272). Ieee.


In [267]:
# Ваш код здесь