# UFOS
El famoso dataset de avistamientos de OVNIs (Objetos Voladores No Identificados) contiene información recopilada sobre avistamientos de fenómenos inexplicables en el cielo, que han sido reportados por testigos en todo el mundo a lo largo de los años. Este dataset contiene varias columnas que describen diferentes aspectos de cada avistamiento, como la fecha y hora del avistamiento, la ubicación geográfica, la forma del objeto visto, la duración del avistamiento, y a veces incluye detalles adicionales proporcionados por los testigos.

Haremos algo muy simple. A partir de este dataset, tomaremos la longitud y latitud para entrenar un modelo que nos diga en que país nos encontramos en base a estos dos parámetros.

## Cleaning Data

In [74]:
import pandas as pd
import numpy as np

ufox = pd.read_csv('./data/ufos.csv')
ufox.head(3)

Unnamed: 0,datetime,city,state,country,shape,duration (seconds),duration (hours/min),comments,date posted,latitude,longitude
0,10/10/1949 20:30,san marcos,tx,us,cylinder,2700.0,45 minutes,This event took place in early fall around 194...,4/27/2004,29.883056,-97.941111
1,10/10/1949 21:00,lackland afb,tx,,light,7200.0,1-2 hrs,1949 Lackland AFB&#44 TX. Lights racing acros...,12/16/2005,29.38421,-98.581082
2,10/10/1955 17:00,chester (uk/england),,gb,circle,20.0,20 seconds,Green/Orange circular disc over Chester&#44 En...,1/21/2008,53.2,-2.916667


In [75]:
ufos = pd.DataFrame({'Country': ufox['country'],'Latitude': ufox['latitude'],'Longitude': ufox['longitude']})

ufos.Country.unique()

array(['us', nan, 'gb', 'ca', 'au', 'de'], dtype=object)

In [76]:
# Reducir la cantidad de datos eliminando los valores nulos
ufos.dropna(inplace=True)

ufos.info()

<class 'pandas.core.frame.DataFrame'>
Index: 70662 entries, 0 to 80331
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Country    70662 non-null  object 
 1   Latitude   70662 non-null  float64
 2   Longitude  70662 non-null  float64
dtypes: float64(2), object(1)
memory usage: 2.2+ MB


In [77]:
# Import Scikit-learn's `LabelEncoder` library to convert the text values for countries to a number:
from sklearn.preprocessing import LabelEncoder

ufos['Country'] = LabelEncoder().fit_transform(ufos['Country'])

ufos.head()

Unnamed: 0,Country,Latitude,Longitude
0,4,29.883056,-97.941111
2,3,53.2,-2.916667
3,4,28.978333,-96.645833
4,4,21.418056,-157.803611
5,4,36.595,-82.188889


## Build the model

In [78]:
from sklearn.model_selection import train_test_split

Selected_features = ['Latitude','Longitude']

X = ufos[Selected_features]
y = ufos['Country']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [79]:
# Train your model using logistic regression:
from sklearn.metrics import accuracy_score, classification_report 
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

print(classification_report(y_test, predictions))
print('Predicted labels: ', predictions)
print('Accuracy: ', accuracy_score(y_test, predictions))


              precision    recall  f1-score   support

           0       1.00      1.00      1.00       112
           1       0.68      0.31      0.42       611
           2       1.00      1.00      1.00        22
           3       1.00      1.00      1.00       386
           4       0.97      0.99      0.98     13002

    accuracy                           0.96     14133
   macro avg       0.93      0.86      0.88     14133
weighted avg       0.96      0.96      0.96     14133

Predicted labels:  [4 4 1 ... 4 4 4]
Accuracy:  0.9638434868746905


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


The accuracy isn't bad (around 95%), unsurprisingly, as `Country` and `Latitude/Longitude` correlate.

In [90]:
# Importar el LabelEncoder
from sklearn.preprocessing import LabelEncoder

# Definir las nuevas coordenadas
new_latitude = 53.200000
new_longitude = -2.916667

# Realizar una predicción para las nuevas coordenadas
predicted_country_encoded = model.predict([[new_latitude, new_longitude]])

# Decodificar el país predicho
predicted_country_decoded = LabelEncoder().fit(ufox['country']).inverse_transform(predicted_country_encoded)

# Imprimir el país predicho
print("País predicho para las coordenadas ({}, {}): {}".format(new_latitude, new_longitude, predicted_country_decoded))



País predicho para las coordenadas (53.2, -2.916667): ['gb']


