# Boyut İndirgeme

# PCA (Pricipal Component Analysis)

**Kullanım Alanları**
* Görüntü Filtreleme
* Görselleştirme
* Öznitelik Çıkarımı
* Öznitelik Eleme/Dönüştürme
* Borsa Analizi
* Sağlık Verileri/Genetik Veriler

<br>

## PCA Algoritması

* İndirgenmek istenen boyut k olsun
* Veriyi standartlaştırma
* Covariance (Kovaryans) veya Correlation (Korelasyon) matrisinden öz değerleri ve öz vektörleri elde etme veya SVD kullanma
* Öz değerleri büyükten küçüğe sıralama ve k tanesini alma
* Seçilen k özdeğerden W projeksiyon matrisini oluşturma
* Orjinal veri kümesi X'i W kullanarak dönüştürma ve k-boyutlu Y uzayını elde etme

<br>

## Eigen Value (Öz Değer) ve Eigen Vector (Öz Yöney)

**Rastgle bir matrisin tek boyutlu bir matris ile çarpılması sonucu çarpanın hrhangi bir skaler katını veriyorsa, bu skalar öz değer, bu vektör öz yöneydir**

$${\begin{bmatrix} 1 & 2 & 0 \\ 0 & 1 & 2 \\ -1 & 0 & -2 \end{bmatrix}} * {\begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix}} =  {\begin{bmatrix} 3 \\ 3 \\ 0 \end{bmatrix}} = 3 * {\begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix}}$$

Sonuç, [1 1 0] matrisinin katıdır
<br><br>

$${\begin{bmatrix} 1 & 2 & 0 \\ 0 & 1 & 2 \\ 1 & 0 & 2 \end{bmatrix}} * {\begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}} =  {\begin{bmatrix} 3 \\ 3 \\ 3 \end{bmatrix}} = 3 * {\begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}}$$

Sonuç, [1 1 1] matrisinin katıdır

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.decomposition import PCA
from sklearn.metrics import confusion_matrix
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA


from warnings import filterwarnings
filterwarnings("ignore")

In [4]:
veriler = pd.read_csv("../Docs/Wine.csv")
X = veriler.iloc[:, 0:13].values
y = veriler.iloc[:, 13].values

# Eğitim ve test kümelerinin bölünmesi
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Ölçekleme
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
X_test

array([[ 9.38847070e-01, -6.32166068e-01, -4.35010303e-01,
        -9.19695615e-01,  1.26324041e+00,  5.59998633e-01,
         9.77754158e-01, -1.20637533e+00,  2.36680192e-02,
         3.39284695e-01, -1.45574805e-01,  8.52295413e-01,
         1.04940526e+00],
       [-2.42263344e-01,  2.67579163e-01,  4.20859365e-01,
         7.12764102e-01,  8.40672358e-01, -1.27747161e+00,
        -6.05828120e-01, -9.70634096e-01, -5.87397203e-01,
         2.42611713e+00, -2.06608025e+00, -1.55017035e+00,
        -8.66598582e-01],
       [-7.64438475e-01, -1.11802849e+00, -7.69915825e-01,
        -1.61767889e-01, -9.20027861e-01,  2.03653722e+00,
         1.18341419e+00, -1.36353615e+00,  4.48018868e-01,
        -2.50930538e-01,  1.16386073e+00,  3.94021597e-01,
        -1.06480588e+00],
       [ 7.15057728e-01, -5.78181354e-01,  3.46435916e-01,
         2.75498106e-01,  1.12238439e+00,  1.15061407e+00,
         8.54358136e-01, -1.28495574e+00,  1.43251284e+00,
         5.07917619e-01,  1.16312302e

In [26]:
# PCA
pca = PCA(n_components = 2)

X_train2 = pca.fit_transform(X_train)
X_test2 = pca.transform(X_test)

# Pca dönüşümünden önce gelen LR
classifier = LogisticRegression(random_state=0)
classifier.fit(X_train,y_train)

# Pca dönüşümünden sonra gelen LR
classifier2 = LogisticRegression(random_state=0)
classifier2.fit(X_train2,y_train)

# Tahminler
y_pred = classifier.predict(X_test)
y_pred2 = classifier2.predict(X_test2)

In [27]:
?pca

[1;31mType:[0m        PCA
[1;31mString form:[0m PCA(n_components=2)
[1;31mFile:[0m        c:\users\tolga\appdata\local\programs\python\python310\lib\site-packages\sklearn\decomposition\_pca.py
[1;31mDocstring:[0m  
Principal component analysis (PCA).

Linear dimensionality reduction using Singular Value Decomposition of the
data to project it to a lower dimensional space. The input data is centered
but not scaled for each feature before applying the SVD.

It uses the LAPACK implementation of the full SVD or a randomized truncated
SVD by the method of Halko et al. 2009, depending on the shape of the input
data and the number of components to extract.

It can also use the scipy.sparse.linalg ARPACK implementation of the
truncated SVD.

Notice that this class does not support sparse input. See
:class:`TruncatedSVD` for an alternative with sparse data.

Read more in the :ref:`User Guide <PCA>`.

Parameters
----------
n_components : int, float or 'mle', default=None
    Number of co

In [28]:
# Actual / PCA olmadan çıkan sonuç
print("Gerçek / PCA'sız")
cm = confusion_matrix(y_test,y_pred)
print(cm, "\n")

# Actual / PCA sonrası çıkan sonuç
print("Gerçek / PCA'lı")
cm2 = confusion_matrix(y_test,y_pred2)
print(cm2, "\n")

# PCA sonrası / PCA öncesi
print("PCA'sız ve PCA'lı")
cm3 = confusion_matrix(y_pred,y_pred2)
print(cm3)

Gerçek / PCA'sız
[[14  0  0]
 [ 0 16  0]
 [ 0  0  6]] 

Gerçek / PCA'lı
[[14  0  0]
 [ 1 15  0]
 [ 0  0  6]] 

PCA'sız ve PCA'lı
[[14  0  0]
 [ 1 15  0]
 [ 0  0  6]]


# LDA (Linear Discriminant Analysis)

* PCA benzeri bir boyut dönüştürme/indirgeme algoritmasıdır
* PCA'dan farklı olarak sınıflar arasındaki ayımı önemser ve maksimize etmeye çalışır
* PCA bu açıdan gözetimsiz (unsupervised), LDA ise gözetimli (supervised) özelliktedir

In [29]:
# LDA
lda = LDA(n_components = 2)

X_train_lda = lda.fit_transform(X_train,y_train)
X_test_lda = lda.transform(X_test)

# LDA dönüşümünden sonra
classifier_lda = LogisticRegression(random_state=0)
classifier_lda.fit(X_train_lda,y_train)

# LDA verisini tahmin et
y_pred_lda = classifier_lda.predict(X_test_lda)

# LDA sonrası / orijinal 
print('LDA ve Orijinal')
cm4 = confusion_matrix(y_pred,y_pred_lda)
print(cm4)

LDA ve Orijinal
[[14  0  0]
 [ 0 16  0]
 [ 0  0  6]]


In [30]:
?lda

[1;31mType:[0m        LinearDiscriminantAnalysis
[1;31mString form:[0m LinearDiscriminantAnalysis(n_components=2)
[1;31mFile:[0m        c:\users\tolga\appdata\local\programs\python\python310\lib\site-packages\sklearn\discriminant_analysis.py
[1;31mDocstring:[0m  
Linear Discriminant Analysis.

A classifier with a linear decision boundary, generated by fitting class
conditional densities to the data and using Bayes' rule.

The model fits a Gaussian density to each class, assuming that all classes
share the same covariance matrix.

The fitted model can also be used to reduce the dimensionality of the input
by projecting it to the most discriminative directions, using the
`transform` method.

.. versionadded:: 0.17
   *LinearDiscriminantAnalysis*.

Read more in the :ref:`User Guide <lda_qda>`.

Parameters
----------
solver : {'svd', 'lsqr', 'eigen'}, default='svd'
    Solver to use, possible values:
      - 'svd': Singular value decomposition (default).
        Does not compute th