# Intro
Recommender systems are all over the place, assisting you in your search for everything from books to romantic dates, hotels to restaurants.

There are several recommender systems for various scenarios, depending on your needs and accessible data.


## Data 
In our situation, the purpose is to build a recommender system using implicit data , which is clickstream ecommerce website in specialized cosmetics

Each row in the file represents an event. All events are related to products and users. Each event is like many-to-many relation between products and users.


<table>
<thead>
<tr>
<th>Property</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>event_time</strong></td>
<td>Time when event happened at (in UTC).</td>
</tr>
<tr>
<td><strong>event_type</strong></td>
<td>Only one kind of event: purchase.</td>
</tr>
<tr>
<td><strong>product_id</strong></td>
<td>ID of a product</td>
</tr>
<tr>
<td><strong>category_id</strong></td>
<td>Product's category ID</td>
</tr>
<tr>
<td><strong>category_code</strong></td>
<td>Product's category taxonomy (code name) if it was possible to make it. Usually present for meaningful categories and skipped for different kinds of accessories.</td>
</tr>
<tr>
<td><strong>brand</strong></td>
<td>Downcased string of brand name. Can be missed.</td>
</tr>
<tr>
<td><strong>price</strong></td>
<td>Float price of a product. Present.</td>
</tr>
<tr>
<td><strong>user_id</strong></td>
<td>Permanent user ID.</td>
</tr>
<tr>
<td>** user_session**</td>
<td>Temporary user's session ID. Same for each user's session. Is changed every time user come back to online store from a long pause.</td>
</tr>
</tbody>
</table>

### The Event types are:

<li><code>view</code> - a user viewed a product</li>
<li><code>cart</code> - a user added a product to shopping cart</li>
<li><code>remove_from_cart</code> - a user removed a product from shopping cart</li>
<li><code>purchase</code> - a user purchased a product</li>


In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## Importing relevent libraries

In [2]:
import dask
import dask.dataframe as dd
import matplotlib.pyplot as plt
import matplotlib.style as style
import seaborn as sns
import scipy.sparse as sparse
import implicit
sns.set()
style.use('ggplot')
%matplotlib inline 
dask.config.set(scheduler='processes')
plt.rcParams["figure.figsize"] = (12, 8)

## Reading Data 

In [3]:
%%time
ddf = dd.read_csv("/kaggle/input/ecommerce-events-history-in-cosmetics-shop/2020-Jan.csv", dtype={'category_code': 'object'})
ddf.head()

## Checking for null values

In [4]:
ddf.isnull().sum().compute()

# Data Analysis


## Ckecking the data distributions

### Event types

In [5]:
%%time
ET = ddf.event_type.value_counts().compute()
print(ET)
ET.plot(kind='barh',title='Event types distribution')

### Cetegory codes

In [6]:
%%time
CC = ddf.category_code.value_counts().compute()
print(CC.head(10))
CC.head(10).plot(kind='barh',title='Cetegory code distribution')

### Brands

In [7]:
%%time

B = ddf.brand.value_counts().compute()
print(B.head(10))
B.head(10).plot(kind='barh',title='Brand distribution')

In [8]:
%%time
PET = ddf[ddf['event_type'] == 'purchase'].brand.value_counts().compute()
print(PET.head(10))
PET.head(10).plot(kind='barh',title='Purchased brands distribution')

### Day of the week

# Data preprocessing

In [11]:
%%time
event_type_strength = {
    'view': 1,
    'cart': 2,
    'remove_from_cart':-1,
    'purchase': 3
}

ddf['event_strength'] = ddf['event_type'].map(lambda x: event_type_strength[x],meta=pd.Series([], dtype=int, name='event_strength'))
ddf.head(10)


In [25]:
%%time
ET = ddf.event_strength.value_counts().compute()
print(ET)
ET.plot(kind='barh')

# SECTION IN PROGRESS

In [13]:
ddf

In [14]:
%%time
ddf = ddf.drop_duplicates()
grouped = ddf.groupby(['user_id', 'product_id']).event_strength.sum(split_out=14).reset_index()

In [15]:
grouped

In [16]:
grouped = grouped.reset_index()

In [17]:


sparse_content_person = sparse.csr_matrix((grouped['event_strength'].astype(float), (grouped['product_id'], grouped['user_id'])))

model = implicit.als.AlternatingLeastSquares(factors=20, regularization=0.1, iterations=50)

alpha = 15
data = (sparse_content_person * alpha).astype('double')
model.fit(data)

In [None]:
grouped = grouped[grouped.event_strength > 1]
grouped