- Temperature -- K
- L -- L/Lo
- R -- R/Ro
- AM -- Mv
- Color -- General Color of Spectrum
- Spectral_Class -- O,B,A,F,G,K,M
  - O: The hottest stars, blue in color, with temperatures above 30,000 K.
  - B: Very hot stars, blue-white in color, with temperatures between 10,000 K and 30,000 K.
  - A: Hot stars, white to blue-white in color, with temperatures between 7,500 K and 10,000 K.
  - F: Slightly cooler stars, yellow-white in color, with temperatures between 6,000 K and 7,500 K.
  - G: Cooler stars, yellow in color, with temperatures between 5,200 K and 6,000 K. Our Sun is a G-type star.
  - K: Cooler stars, orange in color, with temperatures between 3,700 K and 5,200 K.
  - M: The coolest stars, red in color, with temperatures below 3,700 K.
- TARGET: Type(from 0 to 5
Red Dwarf - 0
Brown Dwarf - 1
White Dwarf - 2
Main Sequence - 3
Super Giants - 4
Hyper Giants - 5)

- MATH:
- Lo = 3.828 x 10^26 Watts
(Avg Luminosity of Sun)
- Ro = 6.9551 x 10^8 m
(Avg Radius of Sun)

In [None]:
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

In [None]:
df = pd.read_csv('Stars.csv')

In [None]:
df.shape

(240, 7)

In [None]:
df.head()

Unnamed: 0,Temperature,L,R,A_M,Color,Spectral_Class,Type
0,3068,0.0024,0.17,16.12,Red,M,0
1,3042,0.0005,0.1542,16.6,Red,M,0
2,2600,0.0003,0.102,18.7,Red,M,0
3,2800,0.0002,0.16,16.65,Red,M,0
4,1939,0.000138,0.103,20.06,Red,M,0


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 240 entries, 0 to 239
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Temperature     240 non-null    int64  
 1   L               240 non-null    float64
 2   R               240 non-null    float64
 3   A_M             240 non-null    float64
 4   Color           240 non-null    object 
 5   Spectral_Class  240 non-null    object 
 6   Type            240 non-null    int64  
dtypes: float64(3), int64(2), object(2)
memory usage: 13.2+ KB


In [None]:
df['Color'].unique()

array(['Red', 'Blue White', 'White', 'Yellowish White', 'Blue white',
       'Pale yellow orange', 'Blue', 'Blue-white', 'Whitish',
       'yellow-white', 'Orange', 'White-Yellow', 'white', 'yellowish',
       'Yellowish', 'Orange-Red', 'Blue-White'], dtype=object)

In [None]:
df['Spectral_Class'].unique()

array(['M', 'B', 'A', 'F', 'O', 'K', 'G'], dtype=object)

In [None]:
df['Color'] = df['Color'].str.lower().str.strip()

color_mapping = {
    'blue white': 'Blue-White',
    'blue-white': 'Blue-White',
    'yellowish white': 'Yellowish White',
    'yellow-white': 'Yellowish White',
    'white-yellow': 'White-Yellow',
    'pale yellow orange': 'Pale Yellow Orange',
    'blue': 'Blue',
    'white': 'White',
    'whitish': 'White',
    'yellowish': 'Yellowish',
    'orange red': 'Orange-Red',
    'orange': 'Orange',
    'red': 'Red',
    'white blue': 'White-Blue',
}

# Apply the mapping
df['Color'] = df['Color'].map(color_mapping).fillna(df['Color'])

In [None]:
df['Color'].unique()

array(['Red', 'Blue-White', 'White', 'Yellowish White',
       'Pale Yellow Orange', 'Blue', 'Orange', 'White-Yellow',
       'Yellowish', 'orange-red'], dtype=object)

In [None]:
from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
df['Color'] = label_encoder.fit_transform(df['Color'])

In [None]:
from sklearn.preprocessing import OrdinalEncoder

ordinal_encoder = OrdinalEncoder(categories=[['O', 'B', 'A', 'F', 'G', 'K', 'M']])
df['Spectral_Class'] = ordinal_encoder.fit_transform(df[['Spectral_Class']])

In [None]:
df.head()

Unnamed: 0,Temperature,L,R,A_M,Color,Spectral_Class,Type
0,3068,0.0024,0.17,16.12,4,6.0,0
1,3042,0.0005,0.1542,16.6,4,6.0,0
2,2600,0.0003,0.102,18.7,4,6.0,0
3,2800,0.0002,0.16,16.65,4,6.0,0
4,1939,0.000138,0.103,20.06,4,6.0,0


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 240 entries, 0 to 239
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Temperature     240 non-null    int64  
 1   L               240 non-null    float64
 2   R               240 non-null    float64
 3   A_M             240 non-null    float64
 4   Color           240 non-null    int64  
 5   Spectral_Class  240 non-null    float64
 6   Type            240 non-null    int64  
dtypes: float64(4), int64(3)
memory usage: 13.2 KB


In [None]:
X = df.drop('Type', axis=1)
y = df['Type']

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=11)

In [None]:
from sklearn.neighbors import KNeighborsClassifier

In [None]:
model = KNeighborsClassifier()

In [None]:
model.fit(X_train, y_train)

In [None]:
y_pred = model.predict(X_test)

In [None]:
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.6666666666666666


In [None]:
param_grid = {
    'n_neighbors': [3, 5, 6, 7, 8, 9, 10, 12],
    'weights': ['uniform', 'distance'],
    'algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
    'leaf_size': [20, 30, 40, 50]
}

In [None]:
from sklearn.model_selection import GridSearchCV

In [None]:
grid = GridSearchCV(model, param_grid, cv=5)

In [None]:
grid.fit(X_train, y_train)

In [None]:
grid.best_params_

{'algorithm': 'auto', 'leaf_size': 20, 'n_neighbors': 3, 'weights': 'distance'}

In [None]:
grid.best_score_

0.6470731707317073

In [None]:
y_pred_grid = grid.predict(X_test)

In [None]:
print("Accuracy:", accuracy_score(y_test, y_pred_grid))

Accuracy: 0.6666666666666666
