## B√∂l√ºm 0: Kurulum ve Imports

Bu b√∂l√ºmde PyTorch k√ºt√ºphaneleri import edilmi≈ü, GPU/CPU kullanƒ±mƒ± kontrol edilmi≈ütir.

In [None]:
# Section 0: Setup and Required Imports
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader, TensorDataset

# Check PyTorch version and GPU availability
print(f"PyTorch Version: {torch.__version__}")
print(f"GPU Available: {torch.cuda.is_available()}")
print(f"GPU Device: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'No GPU'}")

# Set device (GPU if available, else CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Current Device: {device}")
print(f"CPU Count: {torch.get_num_threads()}")

Soru 1: Neural Network yapmak i√ßin hangi k√ºt√ºphaneleri import etmeliyiz?
Cevap:
torch: PyTorch'un ana mod√ºl√º
torch.nn: Layer'lar, activation fonksiyonlarƒ± vs. buradan geliyor
torch.optim: Optimizer'lar var burada, aƒüƒ±rlƒ±klarƒ± g√ºncellemek i√ßin
numpy: Sayƒ±larla i≈ülem yapmak i√ßin
matplotlib: Grafikleri √ßizmek i√ßin

Soru 2: Eƒüitimi GPU'da mƒ± yoksa CPU'da mƒ± yapƒ±caƒüƒ±z?
Cevap:
√ñnce torch.cuda.is_available() ile GPU var mƒ± kontrol et
Sonra device = torch.device("cuda" if torch.cuda.is_available() else "cpu") yaz
Model ve veriyi .to(device) ile g√∂nder
GPU'da olursa √ßok daha hƒ±zlƒ± gidiyor

## B√∂l√ºm 1: PyTorch Temelleri ve Basit Model Olu≈üturma

Soru 1: Rastgele eƒüitim verisi nasƒ±l olu≈ütururuz?
Cevap:
torch.randn() ile random sayƒ±lar √ºretiyoruz
X = torch.randn(100, 5) dedin mi 100 √∂rnek, 5 feature
y = torch.randn(100, 1) target deƒüi≈ükeni
Sonra .to(device) ile GPU'ya veya CPU'ya g√∂nder

Soru 2: PyTorch'ta model nasƒ±l yazarƒ±z?
Cevap:
nn.Module class'ƒ±nƒ± extend etmen lazƒ±m
__init__ metotunda layer'larƒ± tanƒ±mla, fc1, relu, fc2 vs
forward() i√ßinde veri flow'unu yaz: 
 x gir, layer1'e ge√ß, relu uygula, layer2'ye ge√ß, √ßƒ±k

Soru 3: Eƒüitim d√∂ng√ºs√º nasƒ±l yazƒ±yoruz?
Cevap:
Epoch d√∂ng√ºs√º a√ß:
1. Forward: tahmin = model(input)
2. Loss hesapla: hata = loss_function(tahmin, hedef)
3. Gradients sƒ±fƒ±rla: optimizer.zero_grad()
4. Backward: hata.backward()
5. G√ºncelle: optimizer.step()
Bunu tekrar tekrar yap

In [None]:
# Section 1: PyTorch Fundamentals - Create Random Data and Simple Model

# 1. Create random data and target variables (Rastgele veri olu≈üturma)
np.random.seed(42)
torch.manual_seed(42)

# Generate random input data: 100 samples, 5 features
X = torch.randn(100, 5, device=device)  # Input features
y = torch.randn(100, 1, device=device)  # Target variable

print(f"Input shape: {X.shape}")
print(f"Target shape: {y.shape}")
print(f"X device: {X.device}")
print(f"y device: {y.device}")

In [None]:
# 2. Define a simple PyTorch nn.Module (Basit bir model tanƒ±mlama)
class SimpleNN(nn.Module):
    def __init__(self, input_size=5, hidden_size=10, output_size=1):
        super(SimpleNN, self).__init__()
        # Layer 1: Input to Hidden
        self.fc1 = nn.Linear(input_size, hidden_size)
        # Activation function
        self.relu = nn.ReLU()
        # Layer 2: Hidden to Output
        self.fc2 = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        """Forward pass through the network"""
        x = self.fc1(x)           # Apply first linear layer
        x = self.relu(x)          # Apply activation function
        x = self.fc2(x)           # Apply second linear layer
        return x

# Initialize the model
model = SimpleNN(input_size=5, hidden_size=10, output_size=1).to(device)
print("Model architecture:")
print(model)
print(f"\nTotal parameters: {sum(p.numel() for p in model.parameters())}")

In [None]:
# 3. Training loop - Forward, Loss Calculation, and Optimizer Steps
# Define loss function and optimizer
loss_fn = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training parameters
num_epochs = 50
losses = []

print("Training the model...")
for epoch in range(num_epochs):
    # Forward pass (Forward)
    y_pred = model(X)
    
    # Loss calculation (Loss Hesaplama)
    loss = loss_fn(y_pred, y)
    
    # Backward pass (Gradients calculation)
    optimizer.zero_grad()  # Clear previous gradients
    loss.backward()        # Compute gradients
    
    # Optimizer step (Optimizer Adƒ±mƒ±)
    optimizer.step()       # Update weights
    
    losses.append(loss.item())
    
    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.6f}")

print(f"\nFinal Loss: {losses[-1]:.6f}")

# Visualize training loss
plt.figure(figsize=(10, 5))
plt.plot(losses, linewidth=2)
plt.xlabel("Epoch")
plt.ylabel("Loss (MSE)")
plt.title("Training Loss Over Epochs")
plt.grid(True, alpha=0.3)
plt.show()

## B√∂l√ºm 2: Yapay Sinir Aƒüƒ± Olu≈üturma

---

# B√∂l√ºm 2: Yapay Sinir Aƒüƒ± Olu≈üturma - XOR Problem √á√∂z√ºm√º
## Section 2: Creating Neural Network - Solving XOR Problem

In [None]:
# Section 2: XOR Problem Solution with Neural Network

# XOR Problem: A classic problem where linear models fail
# Input: 2 binary features, Output: 1 binary output
# XOR Truth Table:
# 0,0 -> 0
# 0,1 -> 1
# 1,0 -> 1
# 1,1 -> 0

# Create XOR dataset
X_xor = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float32, device=device)
y_xor = torch.tensor([[0], [1], [1], [0]], dtype=torch.float32, device=device)

print("XOR Dataset:")
print(f"Inputs:\n{X_xor}")
print(f"Targets:\n{y_xor}")

# Define XOR Neural Network
class XOR_NN(nn.Module):
    def __init__(self):
        super(XOR_NN, self).__init__()
        self.fc1 = nn.Linear(2, 4)      # Input layer: 2 neurons
        self.relu = nn.ReLU()           # Hidden layer activation
        self.fc2 = nn.Linear(4, 1)      # Output layer: 1 neuron
        self.sigmoid = nn.Sigmoid()     # Output activation (0-1 range)
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.sigmoid(x)
        return x

# Initialize XOR model
xor_model = XOR_NN().to(device)
xor_optimizer = optim.Adam(xor_model.parameters(), lr=0.01)
xor_loss_fn = nn.BCELoss()  # Binary Cross Entropy Loss

# Train XOR model
xor_losses = []
xor_epochs = 1000

print("\nTraining XOR model...")
for epoch in range(xor_epochs):
    # Forward pass
    y_pred = xor_model(X_xor)
    
    # Loss and backward
    loss = xor_loss_fn(y_pred, y_xor)
    xor_optimizer.zero_grad()
    loss.backward()
    xor_optimizer.step()
    
    xor_losses.append(loss.item())
    
    if (epoch + 1) % 200 == 0:
        print(f"Epoch [{epoch+1}/{xor_epochs}], Loss: {loss.item():.6f}")

# Test the model
print("\nXOR Model Predictions:")
with torch.no_grad():
    predictions = xor_model(X_xor)
    for i in range(len(X_xor)):
        print(f"Input: {X_xor[i].cpu().numpy()} -> Predicted: {predictions[i].item():.4f}, Target: {y_xor[i].item()}")

# Visualize XOR training loss
plt.figure(figsize=(10, 5))
plt.plot(xor_losses, linewidth=2, color='green')
plt.xlabel("Epoch")
plt.ylabel("Loss (BCE)")
plt.title("XOR Problem - Training Loss Over Epochs")
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Understanding Gradients and Mathematical Operations in Neural Networks

# Example: Manual gradient calculation
print("\n" + "="*60)
print("Understanding Gradients and Backpropagation")
print("="*60)

# Create simple tensors with requires_grad=True
x = torch.tensor([[2.0, 3.0]], requires_grad=True, device=device)
print(f"\nInput x: {x}")
print(f"x.requires_grad: {x.requires_grad}")

# Forward pass: y = x^2
y = (x ** 2).sum()
print(f"\nFunction: y = sum(x^2)")
print(f"y value: {y.item()}")

# Backward pass: compute gradients
y.backward()
print(f"\nGradients dy/dx: {x.grad}")
print(f"Expected gradients: [2*2, 2*3] = [4, 6]")

# Gradient explanation
print("\nüìö Gradient (T√ºrev) Nedir?")
print("- Gradient, fonksiyonun her parametreye g√∂re kƒ±smi t√ºrevi")
print("- Modelin hatasƒ±nƒ± azaltmak i√ßin parametreleri hangi y√∂nde g√ºncelleyeceƒüini g√∂sterir")
print("- Negatif gradient y√∂n√ºne (gradient descent) hareket ederek loss azaltƒ±lƒ±r")

In [None]:
# Understanding Overfitting, Epochs, and Data Retention

print("\n" + "="*60)
print("Overfitting, Epochs, and Data Retention Concepts")
print("="*60)

# Create a simple regression problem: y = 2x + 3 (with some noise)
np.random.seed(42)
X_train = torch.linspace(0, 10, 20, device=device).reshape(-1, 1)
y_train = 2 * X_train + 3 + torch.randn(20, 1, device=device) * 0.5
X_test = torch.linspace(0.5, 9.5, 10, device=device).reshape(-1, 1)
y_test = 2 * X_test + 3 + torch.randn(10, 1, device=device) * 0.5

# Create models with different complexities
class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(1, 1)
    
    def forward(self, x):
        return self.fc(x)

class ComplexModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(1, 50)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(50, 50)
        self.fc3 = nn.Linear(50, 1)
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        return x

# Train both models
def train_model(model, X_train, y_train, X_test, y_test, epochs=200, lr=0.01):
    optimizer = optim.SGD(model.parameters(), lr=lr)
    loss_fn = nn.MSELoss()
    train_losses = []
    test_losses = []
    
    for epoch in range(epochs):
        # Training
        model.train()
        y_pred_train = model(X_train)
        loss_train = loss_fn(y_pred_train, y_train)
        optimizer.zero_grad()
        loss_train.backward()
        optimizer.step()
        train_losses.append(loss_train.item())
        
        # Validation
        model.eval()
        with torch.no_grad():
            y_pred_test = model(X_test)
            loss_test = loss_fn(y_pred_test, y_test)
            test_losses.append(loss_test.item())
    
    return train_losses, test_losses

simple_model = SimpleModel().to(device)
complex_model = ComplexModel().to(device)

print("\nTraining simple model...")
simple_train_loss, simple_test_loss = train_model(simple_model, X_train, y_train, X_test, y_test, epochs=200)

print("Training complex model...")
complex_train_loss, complex_test_loss = train_model(complex_model, X_train, y_train, X_test, y_test, epochs=200)

# Visualize overfitting
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Simple model
axes[0].plot(simple_train_loss, label='Training Loss', linewidth=2)
axes[0].plot(simple_test_loss, label='Test Loss', linewidth=2)
axes[0].set_title('Simple Model - Underfitting')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Complex model
axes[1].plot(complex_train_loss, label='Training Loss', linewidth=2)
axes[1].plot(complex_test_loss, label='Test Loss', linewidth=2)
axes[1].set_title('Complex Model - Overfitting')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Loss')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüìö Overfitting Kavramƒ±:")
print("- Model eƒüitim verisine √ßok iyi uyarlanƒ±r, test verisinde ba≈üarƒ±sƒ±z olur")
print("- Training loss d√º≈üer, test loss y√ºkselir")
print("- Modeli engelleme: Early stopping, dropout, regularization")
print("\nüìö Epoch Kavramƒ±:")
print("- Epoch = t√ºm eƒüitim verisinin bir kez model tarafƒ±ndan i≈ülenmesi")
print("- Daha fazla epoch = daha fazla √∂ƒürenme (limit var: overfitting)")
print("\nüìö Veri Tutulmasƒ±:")
print("- Training set: Modeli eƒüitmek i√ßin")
print("- Validation set: Eƒüitim sƒ±rasƒ±nda performans kontrol√º")
print("- Test set: Final model performansƒ±nƒ± deƒüerlendirmek")

Soru 1: XOR problemi nedir ve neden sinir aƒüƒ± lazƒ±m?
Cevap:
XOR klasik bir problem, tek layer ile √ß√∂zemezsin lineer deƒüil
Hidden layer eklersen non-linear ≈üeyler yapabilir
Input(2) ‚Üí Hidden(4 n√∂ron) ‚Üí Output(1)
ReLU ve Sigmoid kullanƒ±rsƒ±n, artƒ±k XOR √∂ƒürenebilir

Soru 2: Aƒüda matematiksel i≈ülemler nasƒ±l √ßalƒ±≈üƒ±yor?
Cevap:
Her layer matris i≈ülemi yapar: z = W¬∑x + b
Sonra activation gelir, mesela ReLU(z)
Tersine giderken chain rule kullanƒ±yoruz
Her layer'dan √∂ncekine gradient geri gidiyor

Soru 3: Gradient tam olarak ne?
Cevap:
Gradient = Loss'un parametrelere g√∂re t√ºrevi
Nereye gidersek loss azalƒ±r bunu g√∂steriyor
w_new = w_old - learning_rate √ó gradient diye aƒüƒ±rlƒ±k g√ºncelleriz
Bunu yapmasak aƒü hi√ßbir ≈üey √∂ƒürenemez

Soru 4: Overfitting, epoch, veri split nedir?
Cevap:
Overfitting: Aƒü eƒüitim verisine √ßok alƒ±≈ütƒ±, yeni veriye uymuyor
Epoch: T√ºm eƒüitim verisini bir kez ge√ßirmek
Veri split: Training %70, test %30 b√∂l falan
Validation seti de izle overfitting yapƒ±yor mu

---

# B√∂l√ºm 3: Matematiksel Tanƒ±mlar
## Section 3: Mathematical Definitions

In [None]:
# Section 3: Mathematical Definitions

print("="*70)
print("MATHEMATICAL FOUNDATIONS OF NEURAL NETWORKS")
print("="*70)

# 1. ACTIVATION FUNCTIONS (Aktivasyon Fonksiyonlarƒ±)
print("\n1Ô∏è‚É£ ACTIVATION FUNCTIONS")
print("-" * 70)

x_range = torch.linspace(-5, 5, 100, device=device)

fig, axes = plt.subplots(2, 3, figsize=(15, 8))

# ReLU: f(x) = max(0, x)
relu = nn.ReLU()
y_relu = relu(x_range)
axes[0, 0].plot(x_range.cpu().numpy(), y_relu.cpu().numpy(), linewidth=2, color='red')
axes[0, 0].set_title('ReLU: f(x) = max(0, x)')
axes[0, 0].grid(True, alpha=0.3)
print("ReLU (Rectified Linear Unit):")
print("  Formula: f(x) = max(0, x)")
print("  Advantage: Simple, fast, reduces vanishing gradient problem")
print("  Use: Hidden layers, most common")

# Sigmoid: f(x) = 1 / (1 + e^-x)
sigmoid = nn.Sigmoid()
y_sigmoid = sigmoid(x_range)
axes[0, 1].plot(x_range.cpu().numpy(), y_sigmoid.cpu().numpy(), linewidth=2, color='blue')
axes[0, 1].set_title('Sigmoid: f(x) = 1/(1+e‚ÅªÀ£)')
axes[0, 1].grid(True, alpha=0.3)
print("\nSigmoid:")
print("  Formula: f(x) = 1 / (1 + e^-x)")
print("  Range: (0, 1)")
print("  Use: Binary classification output layer")

# Tanh: f(x) = (e^x - e^-x) / (e^x + e^-x)
tanh = nn.Tanh()
y_tanh = tanh(x_range)
axes[0, 2].plot(x_range.cpu().numpy(), y_tanh.cpu().numpy(), linewidth=2, color='green')
axes[0, 2].set_title('Tanh: f(x) = (eÀ£ - e‚ÅªÀ£)/(eÀ£ + e‚ÅªÀ£)')
axes[0, 2].grid(True, alpha=0.3)
print("\nTanh (Hyperbolic Tangent):")
print("  Formula: f(x) = (e^x - e^-x) / (e^x + e^-x)")
print("  Range: (-1, 1)")
print("  Use: Hidden layers, better than sigmoid for centered data")

# Leaky ReLU: f(x) = x if x > 0 else 0.01*x
leaky_relu = nn.LeakyReLU(0.01)
y_leaky = leaky_relu(x_range)
axes[1, 0].plot(x_range.cpu().numpy(), y_leaky.cpu().numpy(), linewidth=2, color='orange')
axes[1, 0].set_title('Leaky ReLU: f(x) = x or 0.01x')
axes[1, 0].grid(True, alpha=0.3)
print("\nLeaky ReLU:")
print("  Formula: f(x) = x if x > 0 else 0.01*x")
print("  Advantage: Allows small negative gradients")
print("  Use: When ReLU causes dead neurons")

# ELU: Exponential Linear Unit
elu = nn.ELU()
y_elu = elu(x_range)
axes[1, 1].plot(x_range.cpu().numpy(), y_elu.cpu().numpy(), linewidth=2, color='purple')
axes[1, 1].set_title('ELU: Exponential Linear Unit')
axes[1, 1].grid(True, alpha=0.3)
print("\nELU (Exponential Linear Unit):")
print("  Formula: f(x) = x if x > 0 else Œ±(e^x - 1)")
print("  Advantage: Smooth gradient, better performance")

# Softmax: Used for multi-class classification
x_softmax = torch.tensor([[1.0, 2.0, 3.0]], device=device)
softmax = nn.Softmax(dim=1)
y_softmax = softmax(x_softmax)
axes[1, 2].bar(['Class 1', 'Class 2', 'Class 3'], y_softmax[0].cpu().numpy(), color=['red', 'green', 'blue'])
axes[1, 2].set_title('Softmax Output (Multi-class)')
axes[1, 2].set_ylim([0, 1])
axes[1, 2].grid(True, alpha=0.3, axis='y')
print("\nSoftmax:")
print("  Formula: œÉ(x_i) = e^x_i / Œ£(e^x_j)")
print("  Range: (0, 1), Sum = 1")
print("  Use: Multi-class classification output layer")

plt.tight_layout()
plt.show()

In [None]:
# 2. LOSS FUNCTIONS (Kayƒ±p Fonksiyonlarƒ±)
print("\n2Ô∏è‚É£ LOSS FUNCTIONS")
print("-" * 70)

# Create sample data for loss visualization
y_true = torch.tensor([[1.0, 0.0, 0.0]], device=device)  # Ground truth (one-hot)
y_pred_range = torch.linspace(0, 1, 100, device=device)

fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# MSE Loss: (1/n) * Œ£(y_true - y_pred)^2
mse_loss_fn = nn.MSELoss()
y_pred_mse = torch.randn(100, 1, device=device)
y_true_mse = torch.ones(100, 1, device=device)
mse_losses = [mse_loss_fn(y_pred_mse[i:i+1], y_true_mse[i:i+1]).item() for i in range(100)]
axes[0, 0].hist(mse_losses, bins=20, color='blue', alpha=0.7)
axes[0, 0].set_title('MSE Loss Distribution')
axes[0, 0].set_xlabel('Loss Value')
axes[0, 0].grid(True, alpha=0.3)
print("\nMean Squared Error (MSE):")
print("  Formula: MSE = (1/n) * Œ£(y_true - y_pred)¬≤")
print("  Range: [0, ‚àû)")
print("  Use: Regression problems")
print("  Advantage: Smooth, differentiable everywhere")

# Cross Entropy Loss: -Œ£ y_true * log(y_pred)
ce_loss_fn = nn.CrossEntropyLoss()
y_pred_ce = torch.randn(50, 3, device=device)
y_true_ce = torch.randint(0, 3, (50,), device=device)
ce_losses = [ce_loss_fn(y_pred_ce[i:i+1], y_true_ce[i:i+1]).item() for i in range(50)]
axes[0, 1].hist(ce_losses, bins=20, color='green', alpha=0.7)
axes[0, 1].set_title('Cross Entropy Loss Distribution')
axes[0, 1].set_xlabel('Loss Value')
axes[0, 1].grid(True, alpha=0.3)
print("\nCross Entropy Loss:")
print("  Formula: CE = -Œ£ y_true * log(y_pred)")
print("  Range: [0, ‚àû)")
print("  Use: Multi-class classification")
print("  Advantage: Works well with softmax activation")

# Binary Cross Entropy: -(y*log(p) + (1-y)*log(1-p))
bce_loss_fn = nn.BCELoss()
y_pred_bce = torch.rand(50, device=device)
y_true_bce = torch.randint(0, 2, (50,), dtype=torch.float32, device=device)
bce_losses = [bce_loss_fn(y_pred_bce[i:i+1], y_true_bce[i:i+1]).item() for i in range(50)]
axes[1, 0].hist(bce_losses, bins=20, color='red', alpha=0.7)
axes[1, 0].set_title('Binary Cross Entropy Loss Distribution')
axes[1, 0].set_xlabel('Loss Value')
axes[1, 0].grid(True, alpha=0.3)
print("\nBinary Cross Entropy (BCE):")
print("  Formula: BCE = -(y*log(p) + (1-y)*log(1-p))")
print("  Range: [0, ‚àû)")
print("  Use: Binary classification")

# L1 Loss (MAE): Œ£|y_true - y_pred|
l1_loss_fn = nn.L1Loss()
y_pred_l1 = torch.randn(50, 1, device=device)
y_true_l1 = torch.randn(50, 1, device=device)
l1_losses = [l1_loss_fn(y_pred_l1[i:i+1], y_true_l1[i:i+1]).item() for i in range(50)]
axes[1, 1].hist(l1_losses, bins=20, color='orange', alpha=0.7)
axes[1, 1].set_title('L1 Loss (MAE) Distribution')
axes[1, 1].set_xlabel('Loss Value')
axes[1, 1].grid(True, alpha=0.3)
print("\nL1 Loss (Mean Absolute Error):")
print("  Formula: L1 = (1/n) * Œ£|y_true - y_pred|")
print("  Range: [0, ‚àû)")
print("  Use: Regression, robust to outliers")

plt.tight_layout()
plt.show()

In [None]:
# 3. GRADIENTS - Detailed Explanation (Gradientler)
print("\n3Ô∏è‚É£ GRADIENTS AND BACKPROPAGATION")
print("-" * 70)

print("\nWhat is a Gradient?")
print("  - Gradient is the partial derivative of a loss function with respect to parameters")
print("  - It shows the direction and magnitude of steepest ascent")
print("  - ‚àáf = [‚àÇf/‚àÇw1, ‚àÇf/‚àÇw2, ..., ‚àÇf/‚àÇwn]")

# Visualize gradient descent
def create_loss_surface():
    w1 = np.linspace(-3, 3, 100)
    w2 = np.linspace(-3, 3, 100)
    W1, W2 = np.meshgrid(w1, w2)
    Z = W1**2 + W2**2  # Simple quadratic loss surface
    return W1, W2, Z

W1, W2, Z = create_loss_surface()

fig = plt.figure(figsize=(14, 5))

# 3D surface plot
ax1 = fig.add_subplot(121, projection='3d')
ax1.plot_surface(W1, W2, Z, cmap='viridis', alpha=0.8)
ax1.set_xlabel('w1')
ax1.set_ylabel('w2')
ax1.set_zlabel('Loss')
ax1.set_title('Loss Surface: L = w1¬≤ + w2¬≤')

# Contour plot with gradient descent path
ax2 = fig.add_subplot(122)
contour = ax2.contour(W1, W2, Z, levels=20, cmap='viridis')
ax2.clabel(contour, inline=True, fontsize=8)

# Simulate gradient descent
w = np.array([-2.5, 2.5])
learning_rate = 0.1
path = [w.copy()]

for _ in range(50):
    # Compute gradient: ‚àáL = [2w1, 2w2]
    gradient = 2 * w
    # Update weights: w = w - lr * ‚àáL
    w = w - learning_rate * gradient
    path.append(w.copy())
    if np.linalg.norm(gradient) < 0.01:
        break

path = np.array(path)
ax2.plot(path[:, 0], path[:, 1], 'ro-', linewidth=2, markersize=4, label='Gradient Descent Path')
ax2.plot(0, 0, 'g*', markersize=20, label='Optimum')
ax2.set_xlabel('w1')
ax2.set_ylabel('w2')
ax2.set_title('Gradient Descent Optimization')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nGradient Descent Update Rule:")
print("  w_new = w_old - learning_rate √ó ‚àáL")
print("  - learning_rate: How big steps to take (0.001 - 0.01 typical)")
print("  - ‚àáL: Direction of steepest ascent (we go opposite direction)")
print("  - This iterative process finds optimal weights that minimize loss")

print("\nWhy Gradients are Important:")
print("  1. Without gradients, we don't know which direction to update weights")
print("  2. Backpropagation efficiently computes gradients via chain rule")
print("  3. Enables learning in deep networks (deep learning)")
print("  4. Chain rule: ‚àÇL/‚àÇw = (‚àÇL/‚àÇa) √ó (‚àÇa/‚àÇz) √ó (‚àÇz/‚àÇw)")

print("\nGradient Issues:")
print("  - Vanishing Gradient: Gradients become too small, learning stops")
print("  - Exploding Gradient: Gradients become too large, training unstable")
print("  - Dead ReLU: ReLU neurons die (always output 0) ‚Üí no gradient flow")

## B√∂l√ºm 3: Matematiksel Tanƒ±mlar

---

# B√∂l√ºm 4: Veri Akƒ±≈üƒ± ve Yakla≈üƒ±mlar
## Section 4: Data Flow and Approaches

In [None]:
# Section 4: Data Flow in Neural Networks and Problem-Solving Approaches

print("="*70)
print("DATA FLOW AND PROBLEM-SOLVING APPROACHES")
print("="*70)

# 1. Data Flow Diagram
print("\n1Ô∏è‚É£ DATA FLOW IN NEURAL NETWORKS")
print("-" * 70)

print("""
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ                    DATA FLOW THROUGH NN                      ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò

    Input Data (X)
         ‚Üì
    [Input Layer: 5 neurons]
         ‚Üì (matrix multiply: X √ó W1 + b1)
    [z1 = X¬∑W1 + b1]
         ‚Üì (activation function)
    [a1 = ReLU(z1)]  ‚Üê Hidden Layer: 10 neurons
         ‚Üì (matrix multiply: a1 √ó W2 + b2)
    [z2 = a1¬∑W2 + b2]
         ‚Üì (activation function)
    [Output = Sigmoid(z2)]  ‚Üê Output Layer: 1 neuron
         ‚Üì
    Loss Calculation: L = BCE_Loss(Output, y_true)
         ‚Üì
    [BACKWARD PASS - Gradient Computation]
         ‚Üì
    ‚àÇL/‚àÇW2, ‚àÇL/‚àÇb2 (output layer gradients)
         ‚Üì
    ‚àÇL/‚àÇW1, ‚àÇL/‚àÇb1 (hidden layer gradients)
         ‚Üì
    Weight Update: W = W - lr √ó ‚àÇL/‚àÇW
""")

# Visualize data flow with tensor shapes
print("\n2Ô∏è‚É£ TENSOR SHAPES THROUGH FORWARD PASS")
print("-" * 70)

class DataFlowModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(5, 10)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(10, 1)
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, x):
        print(f"Input shape: {x.shape}")
        
        x = self.fc1(x)
        print(f"After fc1 (5‚Üí10): {x.shape}")
        
        x = self.relu(x)
        print(f"After ReLU: {x.shape}")
        
        x = self.fc2(x)
        print(f"After fc2 (10‚Üí1): {x.shape}")
        
        x = self.sigmoid(x)
        print(f"After Sigmoid: {x.shape}")
        
        return x

model_flow = DataFlowModel().to(device)
X_demo = torch.randn(32, 5, device=device)  # Batch size 32, 5 features
print("\nForward Pass Shape Propagation:")
with torch.no_grad():
    _ = model_flow(X_demo)

In [None]:
# 3. Problem-Solving Framework
print("\n3Ô∏è‚É£ GENERAL FRAMEWORK FOR NN MODELING")
print("-" * 70)

print("""
STEP 1: PROBLEM ANALYSIS
‚îú‚îÄ Identify problem type: Classification or Regression?
‚îú‚îÄ Multi-class or Binary classification?
‚îú‚îÄ Output activation: Sigmoid/ReLU/Softmax?
‚îî‚îÄ Loss function: MSE/BCE/CrossEntropy?

STEP 2: DATA PREPARATION
‚îú‚îÄ Load and explore data
‚îú‚îÄ Handle missing values
‚îú‚îÄ Normalize/Standardize features (0-1 or -1-1)
‚îú‚îÄ Split into train/val/test sets
‚îî‚îÄ Create data loaders for batching

STEP 3: MODEL ARCHITECTURE
‚îú‚îÄ Input layer size = number of features
‚îú‚îÄ Hidden layers: Start simple, increase if needed
‚îú‚îÄ Number of hidden neurons: 32-512 typically
‚îú‚îÄ Output layer size = number of classes/targets
‚îú‚îÄ Choose activations: ReLU for hidden, Sigmoid/Softmax for output
‚îî‚îÄ Add regularization: Dropout, L1/L2 if overfitting

STEP 4: TRAINING SETUP
‚îú‚îÄ Choose optimizer: Adam (recommended for beginners)
‚îú‚îÄ Set learning rate: 0.001-0.01
‚îú‚îÄ Define loss function
‚îú‚îÄ Set batch size: 32-128
‚îî‚îÄ Set number of epochs: 50-500

STEP 5: TRAINING LOOP
‚îú‚îÄ For each epoch:
‚îÇ  ‚îú‚îÄ For each batch:
‚îÇ  ‚îÇ  ‚îú‚îÄ Forward pass: y_pred = model(X)
‚îÇ  ‚îÇ  ‚îú‚îÄ Compute loss: loss = loss_fn(y_pred, y)
‚îÇ  ‚îÇ  ‚îú‚îÄ Backward: loss.backward()
‚îÇ  ‚îÇ  ‚îî‚îÄ Update: optimizer.step()
‚îÇ  ‚îú‚îÄ Validate on validation set
‚îÇ  ‚îî‚îÄ Monitor: training loss, validation loss, metrics
‚îî‚îÄ Early stopping if validation loss increases

STEP 6: EVALUATION
‚îú‚îÄ Test on test set (never seen during training)
‚îú‚îÄ Compute metrics: Accuracy, Precision, Recall, F1
‚îú‚îÄ Create confusion matrix for classification
‚îî‚îÄ Analyze failure cases

STEP 7: OPTIMIZATION
‚îú‚îÄ If underfitting: Add complexity
‚îú‚îÄ If overfitting: Add regularization
‚îú‚îÄ Tune hyperparameters: lr, batch_size, hidden_size
‚îî‚îÄ Try different architectures
""")

# Example: Actual problem-solving
print("\n4Ô∏è‚É£ PRACTICAL EXAMPLE: IRIS CLASSIFICATION")
print("-" * 70)

from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Load Iris dataset
iris = load_iris()
X_iris = torch.from_numpy(iris.data).float().to(device)
y_iris = torch.from_numpy(iris.target).long().to(device)

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X_iris.cpu().numpy(), y_iris.cpu().numpy(), test_size=0.3, random_state=42
)
X_train = torch.from_numpy(X_train).float().to(device)
X_test = torch.from_numpy(X_test).float().to(device)
y_train = torch.from_numpy(y_train).long().to(device)
y_test = torch.from_numpy(y_test).long().to(device)

# Normalize
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train.cpu().numpy())
X_test_scaled = scaler.transform(X_test.cpu().numpy())
X_train = torch.from_numpy(X_train_scaled).float().to(device)
X_test = torch.from_numpy(X_test_scaled).float().to(device)

print(f"Dataset: Iris Classification")
print(f"Train set: {X_train.shape}, Test set: {X_test.shape}")
print(f"Number of classes: 3 (Setosa, Versicolor, Virginica)")

# Build model
class IrisClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(4, 16)       # 4 features ‚Üí 16 hidden
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.2)
        self.fc2 = nn.Linear(16, 8)       # 16 ‚Üí 8 hidden
        self.fc3 = nn.Linear(8, 3)        # 8 ‚Üí 3 classes
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        return x

iris_model = IrisClassifier().to(device)
iris_optimizer = optim.Adam(iris_model.parameters(), lr=0.01)
iris_loss_fn = nn.CrossEntropyLoss()

# Train
print("\nTraining Iris Classifier...")
for epoch in range(100):
    iris_model.train()
    y_pred = iris_model(X_train)
    loss = iris_loss_fn(y_pred, y_train)
    iris_optimizer.zero_grad()
    loss.backward()
    iris_optimizer.step()
    
    if (epoch + 1) % 25 == 0:
        iris_model.eval()
        with torch.no_grad():
            y_pred_test = iris_model(X_test)
            test_loss = iris_loss_fn(y_pred_test, y_test)
            _, predictions = torch.max(y_pred_test, 1)
            accuracy = (predictions == y_test).float().mean()
        print(f"Epoch {epoch+1}: Train Loss: {loss.item():.4f}, Test Loss: {test_loss.item():.4f}, Accuracy: {accuracy.item():.4f}")

# Final evaluation
iris_model.eval()
with torch.no_grad():
    y_pred_final = iris_model(X_test)
    _, predictions = torch.max(y_pred_final, 1)
    final_accuracy = (predictions == y_test).float().mean()
    print(f"\nFinal Test Accuracy: {final_accuracy.item():.4f} ({int(final_accuracy.item()*len(y_test))}/{len(y_test)} correct)")

## B√∂l√ºm 4: Veri Akƒ±≈üƒ± ve Yakla≈üƒ±mlar

Soru 1: Problem √ß√∂z√ºm√ºnde NN modellemede genel fikir
Cevap:
1. Problem t√ºr√ºn√º belirleme: Classification mƒ± Regression mi?
2. Veri hazƒ±rlama: Normalization, train/val/test split
3. Model mimarisi tasarlama: Input ‚Üí Hidden layers ‚Üí Output
4. Uygun activation ve loss function se√ßimi
5. Optimizer se√ßimi (Adam genellikle iyi √ßalƒ±≈üƒ±r)
6. Training loop: Forward ‚Üí Loss ‚Üí Backward ‚Üí Optimize
7. Validation ile overfitting kontrol√º
8. Early stopping uygulamasƒ±
9. Test seti ile final evaluation
10. Hyperparameter tuning gerekirse

Soru 2: NN i√ßerisindeki data akƒ±≈üƒ±nƒ± sezgisel olarak a√ßƒ±klayƒ±nƒ±z
Cevap:
Input (x): Feature vekt√∂r√º
Layer 1: z1 = W1¬∑x + b1, a1 = œÉ(z1) - Non-linear transformation
Layer 2: z2 = W2¬∑a1 + b2, a2 = œÉ(z2) - Daha kompleks √∂zellikler
Layer 3 (Output): z3 = W3¬∑a2 + b3 - Final prediction

Forward pass: Veri input'dan output'a akar
Backward pass: Loss output'tan input'a geri yayƒ±lƒ±r
Her layer gradients hesaplar ve aƒüƒ±rlƒ±klarƒ± g√ºnceller
ƒ∞lk layerlar low-level √∂zellikleri, son layerlar high-level √∂zellikleri √∂ƒürenirler

---

# B√∂l√ºm 5: Deƒüerlendirme ve G√∂revler
## Section 5: Evaluation and Tasks

In [None]:
# Section 5: Complete PyTorch and NN Project - Fashion MNIST Classification

print("="*70)
print("COMPLETE PROJECT: FASHION MNIST CLASSIFICATION")
print("="*70)

import torchvision.datasets as datasets
import torchvision.transforms as transforms

# Load Fashion MNIST dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

train_dataset = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")
print(f"Image shape: {train_dataset[0][0].shape}")

# Class names
classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
           'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# Build deep neural network model
class DeepNN(nn.Module):
    def __init__(self):
        super(DeepNN, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 256)
        self.relu1 = nn.ReLU()
        self.dropout1 = nn.Dropout(0.3)
        
        self.fc2 = nn.Linear(256, 128)
        self.relu2 = nn.ReLU()
        self.dropout2 = nn.Dropout(0.3)
        
        self.fc3 = nn.Linear(128, 64)
        self.relu3 = nn.ReLU()
        self.dropout3 = nn.Dropout(0.2)
        
        self.fc4 = nn.Linear(64, 10)
    
    def forward(self, x):
        x = x.view(-1, 28 * 28)
        x = self.fc1(x)
        x = self.relu1(x)
        x = self.dropout1(x)
        
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.dropout2(x)
        
        x = self.fc3(x)
        x = self.relu3(x)
        x = self.dropout3(x)
        
        x = self.fc4(x)
        return x

# Initialize model
model_final = DeepNN().to(device)
optimizer_final = optim.Adam(model_final.parameters(), lr=0.001)
loss_fn_final = nn.CrossEntropyLoss()

print("\nModel Architecture:")
print(model_final)
print(f"Total parameters: {sum(p.numel() for p in model_final.parameters())}")

# Training function
def train_epoch(model, train_loader, loss_fn, optimizer, device):
    model.train()
    total_loss = 0
    correct = 0
    total = 0
    
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        
        optimizer.zero_grad()
        outputs = model(images)
        loss = loss_fn(outputs, labels)
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        _, predicted = torch.max(outputs.data, 1)
        correct += (predicted == labels).sum().item()
        total += labels.size(0)
    
    return total_loss / len(train_loader), correct / total

# Validation function
def validate(model, test_loader, loss_fn, device):
    model.eval()
    total_loss = 0
    correct = 0
    total = 0
    
    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = loss_fn(outputs, labels)
            
            total_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            correct += (predicted == labels).sum().item()
            total += labels.size(0)
    
    return total_loss / len(test_loader), correct / total

# Train model
print("\nTraining Fashion MNIST Classifier...")
num_epochs = 15
train_losses = []
train_accs = []
val_losses = []
val_accs = []

for epoch in range(num_epochs):
    train_loss, train_acc = train_epoch(model_final, train_loader, loss_fn_final, optimizer_final, device)
    val_loss, val_acc = validate(model_final, test_loader, loss_fn_final, device)
    
    train_losses.append(train_loss)
    train_accs.append(train_acc)
    val_losses.append(val_loss)
    val_accs.append(val_acc)
    
    if (epoch + 1) % 3 == 0:
        print(f"Epoch [{epoch+1}/{num_epochs}] - Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f} | Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")

print(f"\nFinal Validation Accuracy: {val_accs[-1]:.4f}")

# Visualize training curves
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Loss curves
axes[0].plot(train_losses, label='Training Loss', linewidth=2)
axes[0].plot(val_losses, label='Validation Loss', linewidth=2)
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')
axes[0].set_title('Training and Validation Loss')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Accuracy curves
axes[1].plot(train_accs, label='Training Accuracy', linewidth=2)
axes[1].plot(val_accs, label='Validation Accuracy', linewidth=2)
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Accuracy')
axes[1].set_title('Training and Validation Accuracy')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Test on sample images
print("\nSample Predictions:")
model_final.eval()
with torch.no_grad():
    images, labels = next(iter(test_loader))
    images = images[:5].to(device)
    labels = labels[:5]
    
    outputs = model_final(images)
    _, predicted = torch.max(outputs, 1)
    
    for i in range(5):
        print(f"True: {classes[labels[i]]}, Predicted: {classes[predicted[i]]}")


In [None]:
# Confusion Matrix and Detailed Metrics
from sklearn.metrics import confusion_matrix, classification_report
import seaborn as sns

print("\nDetailed Evaluation Metrics:")

model_final.eval()
all_preds = []
all_labels = []

with torch.no_grad():
    for images, labels in test_loader:
        images = images.to(device)
        outputs = model_final(images)
        _, predicted = torch.max(outputs, 1)
        all_preds.extend(predicted.cpu().numpy())
        all_labels.extend(labels.numpy())

# Classification report
print(classification_report(all_labels, all_preds, target_names=classes))

# Confusion matrix
cm = confusion_matrix(all_labels, all_preds)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=classes, yticklabels=classes)
plt.title('Confusion Matrix - Fashion MNIST')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.xticks(rotation=45, ha='right')
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()

## B√∂l√ºm 5: Deƒüerlendirme ve G√∂revler

Soru: PyTorch ve NN projelerini tamamlayƒ±nƒ±z
Cevap:

Fashion MNIST veri seti ile derin sinir aƒüƒ± projesi tamamlanmƒ±≈ütƒ±r.

Model Mimarisi:
- Giri≈ü katmanƒ±: 784 n√∂ron (28√ó28 g√∂r√ºnt√º)
- Gizli katman 1: 256 n√∂ron + ReLU + Dropout(0.3)
- Gizli katman 2: 128 n√∂ron + ReLU + Dropout(0.3)
- Gizli katman 3: 64 n√∂ron + ReLU + Dropout(0.2)
- √áƒ±kƒ±≈ü katmanƒ±: 10 n√∂ron (10 sƒ±nƒ±f)

Eƒüitim Parametreleri:
- Optimizer: Adam (lr=0.001)
- Loss fonksiyonu: CrossEntropyLoss
- Batch size: 64
- Epoch sayƒ±sƒ±: 15
- Eƒüitim seti: 60,000 g√∂r√ºnt√º
- Test seti: 10,000 g√∂r√ºnt√º

Uygulanan Teknikler:
- Dropout ile regularization (overfitting √∂nlemesi)
- Normalization ile veri standartla≈ütƒ±rmasƒ±
- Train/test split ile veri ayrƒ±mƒ±
- Learning curves ile eƒüitim monitoringi
- Confusion matrix ile detaylƒ± deƒüerlendirme
- Classification report ile precision, recall, f1 metrikleri

√áƒ±kƒ±≈ülar:
- Eƒüitim ve validasyon loss grafikleri
- Eƒüitim ve validasyon accuracy grafikleri
- Confusion matrix visualization
- Detaylƒ± classification report
- √ñrnek tahminler ve doƒüru sƒ±nƒ±flandƒ±rmalar

Proje ba≈üarƒ±yla tamamlanmƒ±≈ütƒ±r ve t√ºm PyTorch fundamentals ve neural network konseptleri uygulanmƒ±≈ütƒ±r.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, classification_report
import seaborn as sns

# Fashion MNIST veri setini indiriyoruz
# Veriyi normalize ediyoruz (-1 ile 1 arasƒ±na √ßekiyoruz)
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

print("Veriler indiriliyor...")
train_dataset = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# Sƒ±nƒ±flarƒ±mƒ±z
classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
           'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# Modelimizi tanƒ±mlƒ±yoruz (Dropout ekledim overfitting olmasƒ±n diye)
class FashionModel(nn.Module):
    def __init__(self):
        super(FashionModel, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 256)
        self.relu = nn.ReLU()
        self.dropout1 = nn.Dropout(0.3)
        
        self.fc2 = nn.Linear(256, 128)
        self.dropout2 = nn.Dropout(0.3)
        
        self.fc3 = nn.Linear(128, 64)
        self.dropout3 = nn.Dropout(0.2)
        
        self.fc4 = nn.Linear(64, 10)
    
    def forward(self, x):
        x = x.view(-1, 28 * 28) # Fotoƒürafƒ± d√ºzle≈ütir
        x = self.relu(self.fc1(x))
        x = self.dropout1(x)
        
        x = self.relu(self.fc2(x))
        x = self.dropout2(x)
        
        x = self.relu(self.fc3(x))
        x = self.dropout3(x)
        
        x = self.fc4(x)
        return x

# Modeli GPU'ya atƒ±yoruz
model = FashionModel().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Eƒüitim d√∂ng√ºs√º
num_epochs = 15
train_losses = []
test_accs = []

print("Eƒüitim ba≈üladƒ±...")

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        
        # Sƒ±fƒ±rla, t√ºrev al, g√ºncelle
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
    
    # Her epoch sonu loss ortalamasƒ±
    epoch_loss = running_loss / len(train_loader)
    train_losses.append(epoch_loss)
    
    # Test seti √ºzerinde doƒüruluk kontrol√º (Validation)
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
    acc = 100 * correct / total
    test_accs.append(acc)
    
    print(f"Epoch {epoch+1}/{num_epochs} -> Loss: {epoch_loss:.4f}, Accuracy: %{acc:.2f}")

print("Eƒüitim tamamlandƒ±.")

# Grafikleri √ßizelim
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(train_losses)
plt.title("Training Loss")
plt.xlabel("Epoch")

plt.subplot(1, 2, 2)
plt.plot(test_accs, color='orange')
plt.title("Test Accuracy")
plt.xlabel("Epoch")
plt.show()

# Sonu√ßlarƒ± detaylƒ± g√∂relim
all_preds = []
all_labels = []

with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        all_preds.extend(predicted.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())

print("\nConfusion Matrix:")
cm = confusion_matrix(all_labels, all_preds)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', xticklabels=classes, yticklabels=classes)
plt.show()

print("\nDetaylƒ± Rapor:")
print(classification_report(all_labels, all_preds, target_names=classes))