# Recommendation Systems

Örnek uygulama: Movie Recommendation Software

Kursumuzun bu bölümünde Recommendation Systems konusu ile ilgili bir movie recommendation sistemi tasarlayacağız.

<IMG src="https://miro.medium.com/max/1132/1*N0-ikjPv4RUVvS-6KCgLPg.jpeg" width="500" height="500" >

In [1]:
import numpy as np
import pandas as pd

In [2]:
column_names = ['user_id', 'item_id', 'rating', 'timestamp']
df = pd.read_csv('users.data', sep='\t', names=column_names)

In [3]:
df.head()


Unnamed: 0,user_id,item_id,rating,timestamp
0,0,50,5,881250949
1,0,172,5,881250949
2,0,133,1,881250949
3,196,242,3,881250949
4,186,302,3,891717742


In [4]:
# Kaç kayıt varmış görelim:

len(df)

100003

### Şimdi diğer dosyamızı yükleyelim,

In [5]:

movie_titles = pd.read_csv("movie_id_titles.csv")
movie_titles.head()

Unnamed: 0,item_id,title
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
2,3,Four Rooms (1995)
3,4,Get Shorty (1995)
4,5,Copycat (1995)


In [6]:
# Kaç kayıt varmış görelim:

len(movie_titles)

1682

In [7]:

df = pd.merge(df, movie_titles, on='item_id')
df.head()

Unnamed: 0,user_id,item_id,rating,timestamp,title
0,0,50,5,881250949,Star Wars (1977)
1,290,50,5,880473582,Star Wars (1977)
2,79,50,4,891271545,Star Wars (1977)
3,2,50,5,888552084,Star Wars (1977)
4,8,50,5,879362124,Star Wars (1977)


### Recommendation Sistemimizi Kuruyoruz:


In [8]:
# Öncelikle Excel'deki pivot tablo benzeri bir yapı kuruyoruz.
# Bu yapıya göre her satır bir kullanıcı olacak şekilde (yani dataframe'imizin index'i user_id olacak)
# Sütunlarda film isimleri (yani title sütunu) olacak,
# tablo içinde de rating değerleri olacak şekilde bir dataframe oluşturuyoruz!

moviemat = df.pivot_table(index='user_id',columns='title',values='rating')
moviemat.head()

title,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,,,,,,,,,,,...,,,,,,,,,,
1,,,2.0,5.0,,,3.0,4.0,,,...,,,,5.0,3.0,,,,4.0,
2,,,,,,,,,1.0,,...,,,,,,,,,,
3,,,,,2.0,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,


In [9]:
type(moviemat)

pandas.core.frame.DataFrame

### Amaç: Starwars filmine benzer film önerileri yapmak

Star Wars (1977) filminin user ratinglerine bakalım:

In [10]:
starwars_user_ratings = moviemat['Star Wars (1977)']
starwars_user_ratings.head()

user_id
0    5.0
1    5.0
2    5.0
3    NaN
4    5.0
Name: Star Wars (1977), dtype: float64

corrwith() metodunu kullanarak Star wars filmi ile korelasyonları hesaplatalım:

In [11]:
similar_to_starwars = moviemat.corrwith(starwars_user_ratings)


  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)


In [12]:
similar_to_starwars

title
'Til There Was You (1997)                0.872872
1-900 (1994)                            -0.645497
101 Dalmatians (1996)                    0.211132
12 Angry Men (1957)                      0.184289
187 (1997)                               0.027398
                                           ...   
Young Guns II (1990)                     0.228615
Young Poisoner's Handbook, The (1995)   -0.007374
Zeus and Roxanne (1997)                  0.818182
unknown                                  0.723123
Á köldum klaka (Cold Fever) (1994)            NaN
Length: 1664, dtype: float64

In [13]:
type(similar_to_starwars)

pandas.core.series.Series

#### Bazı kayıtlarda boşluklar olduğu için hata veriyor similar_to_starwars bir seri, biz bunu corr_starwars isimli bir dataframe'e dönüştürelim ve NaN kayıtlarını temizleyip bakalım:

In [14]:
corr_starwars = pd.DataFrame(similar_to_starwars, columns=['Correlation'])
corr_starwars.dropna(inplace=True)
corr_starwars.head()

Unnamed: 0_level_0,Correlation
title,Unnamed: 1_level_1
'Til There Was You (1997),0.872872
1-900 (1994),-0.645497
101 Dalmatians (1996),0.211132
12 Angry Men (1957),0.184289
187 (1997),0.027398


### Elde ettiğimiz dataframe'i sıralayım ve görelim bakalım star Wars'a en yakın tavsiye edeceği film neymiş:

In [15]:
corr_starwars.sort_values('Correlation',ascending=False).head(10)

Unnamed: 0_level_0,Correlation
title,Unnamed: 1_level_1
Hollow Reed (1996),1.0
Stripes (1981),1.0
Star Wars (1977),1.0
Man of the Year (1995),1.0
"Beans of Egypt, Maine, The (1994)",1.0
Safe Passage (1994),1.0
"Old Lady Who Walked in the Sea, The (Vieille qui marchait dans la mer, La) (1991)",1.0
"Outlaw, The (1943)",1.0
"Line King: Al Hirschfeld, The (1996)",1.0
Hurricane Streets (1998),1.0


#### Görüldüğü gibi alakasız sonuçlar çıktı, bu konuyu biraz araştırınca bunun nedeninin bu filmlerin çok az oy alması nedeniyle olduğunu bulacaksınız.. Bu durumu düzeltmek için 100'den az oy alan filmleri eleyelim.. Bu amaçla ratings isimli bir dataframe oluşturalım ve burada her fimin kaç tane oy aldığını (yani oy sayısını) tutalım...


In [16]:
df.head()

Unnamed: 0,user_id,item_id,rating,timestamp,title
0,0,50,5,881250949,Star Wars (1977)
1,290,50,5,880473582,Star Wars (1977)
2,79,50,4,891271545,Star Wars (1977)
3,2,50,5,888552084,Star Wars (1977)
4,8,50,5,879362124,Star Wars (1977)


timestamp sütununa ihtiyacımız yok silelim...

In [17]:
df.drop(['timestamp'], axis = 1)

Unnamed: 0,user_id,item_id,rating,title
0,0,50,5,Star Wars (1977)
1,290,50,5,Star Wars (1977)
2,79,50,4,Star Wars (1977)
3,2,50,5,Star Wars (1977)
4,8,50,5,Star Wars (1977)
...,...,...,...,...
99998,840,1674,4,Mamma Roma (1962)
99999,655,1640,3,"Eighth Day, The (1996)"
100000,655,1637,3,Girls Town (1996)
100001,655,1630,3,"Silence of the Palace, The (Saimt el Qusur) (1..."


In [19]:
# Her filmin ortalama (mean value) rating değerini bulalım 
ratings = pd.DataFrame(df.groupby('title')['rating'].mean())

# Büyükten küçüğe sıralayıp bakalım...
ratings.sort_values('rating',ascending=False).head()


Unnamed: 0_level_0,rating
title,Unnamed: 1_level_1
They Made Me a Criminal (1939),5.0
Marlene Dietrich: Shadow and Light (1996),5.0
"Saint of Fort Washington, The (1993)",5.0
Someone Else's America (1995),5.0
Star Kid (1997),5.0


#### Dikkat: Bu ortalamalar hesaplanırken kaç oy aldığına bakmadık o yüzden böyle hiç bilmediğimiz filmler çıktı..

In [20]:
# Şimdi her filmin aldığı oy sayısını bulalım..
ratings['rating_oy_sayisi'] = pd.DataFrame(df.groupby('title')['rating'].count())
ratings.head()

Unnamed: 0_level_0,rating,rating_oy_sayisi
title,Unnamed: 1_level_1,Unnamed: 2_level_1
'Til There Was You (1997),2.333333,9
1-900 (1994),2.6,5
101 Dalmatians (1996),2.908257,109
12 Angry Men (1957),4.344,125
187 (1997),3.02439,41


In [21]:
# Şimdi en çok oy alan filmleri büyükten küçüğe sıralayıp bakalım...
ratings.sort_values('rating_oy_sayisi',ascending=False).head()


Unnamed: 0_level_0,rating,rating_oy_sayisi
title,Unnamed: 1_level_1,Unnamed: 2_level_1
Star Wars (1977),4.359589,584
Contact (1997),3.803536,509
Fargo (1996),4.155512,508
Return of the Jedi (1983),4.00789,507
Liar Liar (1997),3.156701,485


In [None]:
# Tekrar esas amacımıza dönelim ve corr_starwars dataframe'imize rating_oy_sayisi sütununu ekleyelim (join ile) 

In [23]:
corr_starwars.sort_values('Correlation',ascending=False).head(10)

Unnamed: 0_level_0,Correlation
title,Unnamed: 1_level_1
Hollow Reed (1996),1.0
Stripes (1981),1.0
Star Wars (1977),1.0
Man of the Year (1995),1.0
"Beans of Egypt, Maine, The (1994)",1.0
Safe Passage (1994),1.0
"Old Lady Who Walked in the Sea, The (Vieille qui marchait dans la mer, La) (1991)",1.0
"Outlaw, The (1943)",1.0
"Line King: Al Hirschfeld, The (1996)",1.0
Hurricane Streets (1998),1.0


In [24]:
corr_starwars = corr_starwars.join(ratings['rating_oy_sayisi'])
corr_starwars.head()

Unnamed: 0_level_0,Correlation,rating_oy_sayisi
title,Unnamed: 1_level_1,Unnamed: 2_level_1
'Til There Was You (1997),0.872872,9
1-900 (1994),-0.645497,5
101 Dalmatians (1996),0.211132,109
12 Angry Men (1957),0.184289,125
187 (1997),0.027398,41


### Veee sonuç:

In [25]:
corr_starwars[corr_starwars['rating_oy_sayisi']>100].sort_values('Correlation',ascending=False).head()

Unnamed: 0_level_0,Correlation,rating_oy_sayisi
title,Unnamed: 1_level_1,Unnamed: 2_level_1
Star Wars (1977),1.0,584
"Empire Strikes Back, The (1980)",0.748353,368
Return of the Jedi (1983),0.672556,507
Raiders of the Lost Ark (1981),0.536117,420
Austin Powers: International Man of Mystery (1997),0.377433,130
