<a href="https://colab.research.google.com/github/ubsuny/CompPhys/blob/EclipseExample/DataScience/EclipseCategorization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Categorization of solar eclipses

[CCby4.0](https://creativecommons.org/licenses/by/4.0/), Tim Thomay, 2024)

Ideas based on this repository: (https://github.com/MrVtR/Solar_And_Lunar_Eclipses_Machine_Learning_Classification_Project/tree/main)

using data from: (https://www.kaggle.com/datasets/nasa/solar-eclipses)


In [None]:
#import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Get eclipse data

In [None]:
url = "https://raw.githubusercontent.com/ubsuny/CompPhys/main/DataScience/solar.csv"
dfSolar = pd.read_csv(url)
dfSolar.info()

### Convert Latitude / Longitude data to computer readable format

In [None]:
def lat_conv(lat):
  if lat[-1] == "S":
    lat = float(lat[:-1])*-1
  else:
    lat = float(lat[:-1])
  return lat
def lon_conv(lon):
  if lon[-1] == "W":
    lon = float(lon[:-1])*-1
  else:
    lon = float(lon[:-1])
  return lon


In [None]:
dfSolar['declat'] = dfSolar.apply(lambda row: lat_conv(row['Latitude']),axis=1)
dfSolar['declon'] = dfSolar.apply(lambda row: lon_conv(row['Longitude']),axis=1)

## types of solar eclipse
from [wikipedia](https://en.wikipedia.org/wiki/Solar_eclipse?wprov=sfti1#Types):

There are four types of solar eclipses:

- A total eclipse occurs on average every 18 months when the dark silhouette of the Moon completely obscures the intensely bright light of the Sun, allowing the much fainter solar corona to be visible. During any one eclipse, totality occurs at best only in a narrow track on the surface of Earth. This narrow track is called the path of totality.
- An annular eclipse occurs once every one or two years when the Sun and Moon are exactly in line with Earth, but the apparent size of the Moon is smaller than that of the Sun. Hence the Sun appears as a very bright ring, or annulus, surrounding the dark disk of the Moon.
- A hybrid eclipse (also called annular/total eclipse) shifts between a total and annular eclipse. At certain points on the surface of Earth, it appears as a total eclipse, whereas at other points it appears as annular. Hybrid eclipses are comparatively rare.
- A partial eclipse occurs about twice a year, when the Sun and Moon are not exactly in line with Earth and the Moon only partially obscures the Sun. This phenomenon can usually be seen from a large part of Earth outside of the track of an annular or total eclipse. However, some eclipses can be seen only as a partial eclipse, because the umbra passes above Earth's polar regions and never intersects Earth's surface. Partial eclipses are virtually unnoticeable in terms of the Sun's brightness, as it takes well over 90% coverage to notice any darkening at all. Even at 99%, it would be no darker than civil twilight.

In [None]:
tlist = dfSolar['Eclipse Type'].unique()
typeselector = ["T","A","P","H"]
tlist = typeselector
clist = plt.cm.gist_rainbow(np.linspace(0, 1, len(tlist)))
for t,c in zip(tlist,clist):
  plt.scatter(dfSolar[dfSolar['Eclipse Type']==t]['declat'],dfSolar[dfSolar['Eclipse Type']==t]['declon'],
              color=c,label=t,s=1)
plt.legend(bbox_to_anchor=(1.04, 1), loc="upper left")

### Select categorization features

In [None]:
eclipse_features = dfSolar[dfSolar['Eclipse Type'].isin(typeselector)][['declon','declat']]
# Convert types in integer
eclipse_features_types = pd.DataFrame(pd.factorize(dfSolar[dfSolar['Eclipse Type'].isin(typeselector)]['Eclipse Type'])[0]+1)

### choose ML model

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

### split in training and verification data

In [None]:
X_train, X_test, y_train, y_test = train_test_split(eclipse_features, eclipse_features_types, test_size=0.3, random_state=42)

### Train the model

In [None]:
model = KNeighborsClassifier() # n_neighbors=8)
model.fit(X_train, y_train) #.values.ravel())

### Verify the model

In [None]:
model.score(X_test, y_test)

### Use the model to predict the category of a new eclipse

In [None]:
number_of_features = eclipse_features.shape[1]
# Buffalo: -78.859415, 42.892251
new_eclipse_features = pd.DataFrame({"declon":-78.859415, "declat":42.892251},index=[0])

prediction = model.predict(new_eclipse_features)
predicted_category = typeselector[prediction[0]-1]

In [None]:
print("Predicted category:{}".format(predicted_category))

## Pycaret for multi model testing

In [None]:
# import pycaret classification
from pycaret.classification import *
# drop some columns we don't want to yest against
dfSolar.drop(["Catalog Number","Saros Number","Lunation Number", "Latitude", "Longitude"], axis=1, inplace = True) 
# init setup with specifiying the target feature
s = setup(dfSolar, target="Eclipse Type")

In [None]:
# load model to save time
best = load_model('eclipse_model')

In [None]:
# compare multiple models
# might take some time
modellist = ["lr","ridge","xgboost","nb","et","svm","rf","knn","gbc","ada","qda","dt","lda"]
exclmodel = []
best = compare_models(include=modellist) #, exclude=exclmodel)

In [None]:
# save model for reuse
save_model(best, 'eclipse_model')

In [None]:
# plot feature importance
plot_model(best, plot = 'feature')

In [None]:
# Several evaluation plots
evaluate_model(best)