# MAGIC Gamma Telescope Analysis

## Dataset Information

This dataset consists of Monte Carlo (MC) generated data simulating high-energy gamma particle registration in a Cherenkov gamma telescope using imaging techniques. The telescope detects gamma rays by capturing the Cherenkov radiation emitted by charged particles formed in electromagnetic showers initiated by gamma interactions in the atmosphere.

The recorded data include pulses from Cherenkov photons impacting the photomultiplier tubes arranged in a plane (the camera). Depending on the gamma energy, anywhere from a few hundred to 10,000 photons are collected, forming a shower image that helps distinguish between gamma-initiated showers (signal) and hadronic showers caused by cosmic rays (background).

After pre-processing, the shower image generally appears as an elongated cluster, with its long axis pointing toward the camera center if the telescope is aligned with a point source. A principal component analysis (PCA) is performed to determine correlation axes and define an ellipse, aiding in classification. Features such as Hillas parameters, asymmetry along the major axis, and cluster extent further assist in discrimination.

The data was produced by the Monte Carlo simulation program Corsika, detailed in:

D. Heck et al., CORSIKA: A Monte Carlo Code to Simulate Extensive Air Showers, Forschungszentrum Karlsruhe FZKA 6019 (1998).
http://rexa.info/paper?id=ac6e674e9af20979b23d3ed4521f1570765e8d68

Simulation parameters enabled the detection of events with **energies below 50 GeV**

**Source:** [UCI Machine Learning Repository - MAGIC Gamma Telescope](https://archive.ics.uci.edu/dataset/159/magic+gamma+telescope)

## Import Required Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from imblearn.over_sampling import RandomOverSampler

## Load and Explore the Dataset

In [None]:
# Define column names
cols = ["fLength", "fWidth", "fSize", "fConc", "fConc1", "fAsym", "fM3Long", "fM3Trans", "fAlpha", "fDist", "class"]

# Load the dataset
url = "https://raw.githubusercontent.com/usernameneo/Data-Science-Projects/refs/heads/day-II/fcc-project/magic04.data?token=GHSAT0AAAAAADEBRHU436LOFW5OQM3BDD2M2BITYGA"
df = pd.read_csv(url, names=cols)

# Display first few rows
df.head()

## Data Preprocessing

### Class Values
The only class values are either "g" for gamma particles or "h" for hadron particles. We will convert these to binary values: 0 for hadrons and 1 for gamma particles to assist in computation.

In [None]:
# Check unique class values
print("Unique class values:", df["class"].unique())

# Convert class labels to binary (0 for hadron 'h', 1 for gamma 'g')
df["class"] = (df["class"] == "g").astype(int)

# Display the updated dataset
df.head()

## Exploratory Data Analysis

Let's visualize the distribution of each feature for both gamma and hadron particles to understand the differences between the two classes.

In [None]:
# Create histograms for each feature, comparing gamma and hadron particles
for col in cols[:-1]:  # Exclude the 'class' column
    plt.figure(figsize=(10, 6))
    plt.hist(df[df["class"]==1][col], color="blue", label="Gamma-Ray Particles", alpha=0.7, density=True)
    plt.hist(df[df["class"]==0][col], color="red", label="Hadronic Particles", alpha=0.7, density=True)
    plt.title(f"Distribution of {col}")
    plt.ylabel("Probability")
    plt.xlabel(col)
    plt.legend()
    plt.show()

## Data Splitting and Preprocessing

### Train, Validation and Test Datasets
We'll split the data into training (60%), validation (20%), and test (20%) sets.

In [None]:
# Split the dataset into train, validation, and test sets
train, val, test = np.split(df.sample(frac=1), [int(0.6*len(df)), int(0.8*len(df))])

### Scaling and Oversampling Function
We'll create a function to standardize features and optionally oversample the minority class to handle class imbalance.

In [None]:
def scale_dataset(dataframe, oversample=False):
    X = dataframe[dataframe.columns[:-1]].values
    y = dataframe[dataframe.columns[-1]].values

    scaler = StandardScaler()
    X = scaler.fit_transform(X)

    if oversample:
        ros = RandomOverSampler()
        X, y = ros.fit_resample(X, y)

    data = np.hstack((X, np.reshape(y, (-1, 1))))
    return data, X, y

### Check Class Distribution in Training Set

In [None]:
# Check class distribution before oversampling
print("Before oversampling:")
print("Gamma:", len(train[train["class"]==1]))
print("Hadron:", len(train[train["class"]==0]))

### Apply Scaling and Oversampling

In [None]:
# Apply scaling and oversampling to training set, only scaling to validation and test sets
train, X_train, y_train = scale_dataset(train, oversample=True)
val, X_val, y_val = scale_dataset(val, oversample=False)
test, X_test, y_test = scale_dataset(test, oversample=False)

# Check class distribution after oversampling
print("After oversampling:")
print("Total:", len(y_train))
print("Gamma:", sum(y_train==1))
print("Hadron:", sum(y_train==0))

## Machine Learning Model: K-Nearest Neighbors

We'll implement a K-Nearest Neighbors classifier to distinguish between gamma and hadron particles.

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report

# Create and train the KNN model
knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model.fit(X_train, y_train)

# Make predictions on the test set
y_predicts = knn_model.predict(X_test)

# Print classification report
print("Classification Report:")
print(classification_report(y_test, y_predicts))

## Results Interpretation

The classification report shows:
- **Class 0 (Hadron particles)**: Precision: 76%, Recall: 76%, F1-score: 76%
- **Class 1 (Gamma particles)**: Precision: 87%, Recall: 87%, F1-score: 87%
- **Overall Accuracy**: 83%

The model performs better at identifying gamma particles than hadron particles, which is expected given the nature of the features and the physical differences between the two types of particles. The 83% accuracy suggests the model is reasonably effective at distinguishing between gamma and hadron particles based on the Cherenkov telescope measurements.