# Makale Ã–neri Sistemi | Content-Based Recommendation

Makale iÃ§eriklerine dayalÄ± **iÃ§erik temelli Ã¶neri sistemi**. TF-IDF vektÃ¶rleÅŸtirme ve Cosine Similarity kullanÄ±larak her makale iÃ§in benzer iÃ§erikler Ã¶nerilmektedir.

| Ã–zellik | Detay |
|---------|-------|
| **Veri Seti** | articles.csv â€” 34 makale, 2 sÃ¼tun (article, title) |
| **Problem TÃ¼rÃ¼** | Ä°Ã§erik TabanlÄ± Filtreleme (GÃ¶zetimsiz) |
| **YÃ¶ntem** | TF-IDF VektÃ¶rleÅŸtirme + Cosine Similarity |
| **Ã‡Ä±ktÄ±** | Her makale iÃ§in en benzer 4 makale Ã¶nerisi |

**Ä°ÅŸ AkÄ±ÅŸÄ±:** Veri YÃ¼kleme â†’ Metin Temizleme â†’ TF-IDF DÃ¶nÃ¼ÅŸÃ¼mÃ¼ â†’ Cosine Similarity Matrisi â†’ Ã–neri Fonksiyonu â†’ KayÄ±t

In [None]:
# Temel KÃ¼tÃ¼phaneler
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Makine Ã–ÄŸrenmesi
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler, MinMaxScaler, LabelEncoder, OneHotEncoder
from sklearn.impute import SimpleImputer, KNNImputer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier, GradientBoostingRegressor, GradientBoostingClassifier
from sklearn.linear_model import LinearRegression, LogisticRegression, Ridge, Lasso
from sklearn.svm import SVR, SVC
from sklearn.neighbors import KNeighborsRegressor, KNeighborsClassifier
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, classification_report, confusion_matrix
from sklearn.feature_extraction import text
from sklearn.metrics.pairwise import cosine_similarity
import joblib

print('âœ… KÃ¼tÃ¼phaneler yÃ¼klendi.')


## ðŸ“Š 1. Veri YÃ¼kleme ve Ä°lk Ä°nceleme

In [17]:


try:
    data = pd.read_csv('articles.csv', encoding='utf-8')
except:
    data = pd.read_csv('articles.csv', encoding='latin1')

if 'df' in locals():
    print('--- Ä°lk 5 SatÄ±r ---')
    display(df.head())
    display(df.tail())
    display(df.sample(5))
    print(df.info())
    display(df.describe().T)
    print(df.columns.tolist())
    print(df.isnull().sum()[df.isnull().sum() > 0])

--- Ä°lk 5 SatÄ±r ---


Unnamed: 0,article,title
0,Data analysis is the process of inspecting and...,Best Books to Learn Data Analysis
1,The performance of a machine learning algorith...,Assumptions of Machine Learning Algorithms
2,You must have seen the news divided into categ...,News Classification with Machine Learning
3,When there are only two classes in a classific...,Multiclass Classification Algorithms in Machin...
4,The Multinomial Naive Bayes is one of the vari...,Multinomial Naive Bayes in Machine Learning


Unnamed: 0,article,title
29,Many machine learning algorithms can be used t...,Applications of Deep Learning
30,Almost every app or website you visit today sh...,Introduction to Recommendation Systems
31,There are so many algorithms in machine learni...,Use Cases of Different Machine Learning Algori...
32,"In machine learning, the Naive Bayes algorithm...",Naive Bayes Algorithm in Machine Learning
33,Swapping elements of a Python list is very sim...,Swap Items of a Python List


Unnamed: 0,article,title
12,API stands for Application Programming Interfa...,Best Python Frameworks to Build APIs
25,A scatter plot is one of the most useful ways ...,Animated Scatter Plot using Python
31,There are so many algorithms in machine learni...,Use Cases of Different Machine Learning Algori...
18,Dictionaries are one of the most useful data s...,For Loop Over Keys and Values in a Python Dict...
29,Many machine learning algorithms can be used t...,Applications of Deep Learning


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 34 entries, 0 to 33
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   article  34 non-null     object
 1   title    34 non-null     object
dtypes: object(2)
memory usage: 676.0+ bytes
None


Unnamed: 0,count,unique,top,freq
article,34,33,You must have seen the news divided into categ...,2
title,34,33,News Classification with Machine Learning,2


['article', 'title']
Series([], dtype: int64)


In [18]:
# Kolon isimlerini standartlaÅŸtÄ±r
df.columns = df.columns.str.lower().str.replace(' ', '_').str.replace(r'[^a-z0-9_]', '', regex=True)

# ðŸ“š 4. Content-Based Recommendation System (TF-IDF & Cosine Similarity)

In [19]:
# TF-IDF Vectorization
articles = data['Article'].tolist()
uni_tfidf = text.TfidfVectorizer(stop_words='english')
uni_matrix = uni_tfidf.fit_transform(articles)

# Cosine Similarity
uni_sim = cosine_similarity(uni_matrix)

# Recommendation Function
def recommend_articles(x):
    # Get top 4 similar articles (excluding itself)
    top_indices = x.argsort()[-5:-1]
    return ', '.join(data['Title'].iloc[top_indices].tolist())

# Apply to DataFrame
data['Recommended Articles'] = [recommend_articles(x) for x in uni_sim]

# Display Result
data.head()

Unnamed: 0,Article,Title,Recommended Articles
0,Data analysis is the process of inspecting and...,Best Books to Learn Data Analysis,"Introduction to Recommendation Systems, Best B..."
1,The performance of a machine learning algorith...,Assumptions of Machine Learning Algorithms,"Applications of Deep Learning, Best Books to L..."
2,You must have seen the news divided into categ...,News Classification with Machine Learning,"Language Detection with Machine Learning, Appl..."
3,When there are only two classes in a classific...,Multiclass Classification Algorithms in Machin...,"Assumptions of Machine Learning Algorithms, Be..."
4,The Multinomial Naive Bayes is one of the vari...,Multinomial Naive Bayes in Machine Learning,"Assumptions of Machine Learning Algorithms, Me..."


In [20]:
import joblib
# Save the processed data for the App
joblib.dump(data, 'article_data.pkl')
print('Data saved as article_data.pkl')

Data saved as article_data.pkl


## SonuÃ§ ve DeÄŸerlendirme

Bu proje, 34 makale Ã¼zerinde TF-IDF + Cosine Similarity tabanlÄ± iÃ§erik Ã¶neri sistemi geliÅŸtirmiÅŸtir.

| Parametre | DeÄŸer |
|-----------|-------|
| Corpus Boyutu | 34 makale |
| VektÃ¶rleÅŸtirme | TF-IDF (stop_words='english') |
| Benzerlik MetriÄŸi | Cosine Similarity |
| Ã–neri SayÄ±sÄ± | Her makale iÃ§in top-4 |

**Temel Bulgular:**
- Ä°Ã§erik tabanlÄ± filtreleme, kullanÄ±cÄ± geÃ§miÅŸi gerektirmeden soÄŸuk baÅŸlangÄ±Ã§ sorununu Ã§Ã¶zer
- TF-IDF, makine Ã¶ÄŸrenmesi ve Python odaklÄ± makaleler arasÄ±nda yÃ¼ksek benzerlik saptamÄ±ÅŸtÄ±r
- KÃ¼Ã§Ã¼k corpus (34 makale), Ã¶nerilerin kalitesini sÄ±nÄ±rlar; daha bÃ¼yÃ¼k veri setleriyle sistem Ã¶lÃ§eklenebilir
- Daha zengin iÃ§erik Ã¶zellikleri (yazar, etiket, kategori) eklenirse Ã¶neri isabeti artacaktÄ±r