# Meteorite Landings

Introduction:
Meteorites are fascinating objects that have captivated the imaginations of people for centuries. These extraterrestrial objects provide a wealth of information about the origins and evolution of our solar system.
Our objective is to analyze various parameters associated with meteorite landings. We will be displaying various findings of our data graphically through this document.

About Dataset :
The meteorite landings dataset maintained by NASA is a comprehensive collection of data on meteorites that have been found on Earth. The dataset contains information on the name, ID, geolocation, latitude, longitude, class, mass, fell/found, name type, and year of each meteorite. This dataset has been widely used in research and provides valuable information for understanding the composition and origins of meteorites.There is date/time vairable as required. Multiple categorical vaiables and numerical values are also present.The dataset has 10 columns and 45715 rows.

Reading Dataset :
The name of our dataset is 'Meteorite_Landings.csv' and we have utilized various libraries by importing them. We have used: pandas, numpy, matplotlib, seaborn and contextily.

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go

# Dataset At A Glance

In [4]:
data = pd.read_csv("Meteorite_Landings.csv")
df = data
df

FileNotFoundError: [Errno 2] No such file or directory: 'Meteorite_Landings.csv'

# Insights about the dataset

In [5]:
data.info()

NameError: name 'data' is not defined

In [6]:
# No of Rows,Columns in dataframe
data.shape

NameError: name 'data' is not defined

# Handling Missing Values

In [None]:
#columns with missing values
data.isnull().sum()

In [None]:
#missing value %
data.isnull().sum().sum()/data.size * 100

In [None]:
#Focusing individual columns
meteor = data['mass (g)'].value_counts(dropna=False)
meteor

In [None]:
# 'mass(g)' in some cases in NaN
# Replacing NaN with median of column
data['mass (g)'].fillna(data['mass (g)'].median(), inplace=True)

# verifying values to check if NaNs are replaced by median value
data['mass (g)'].unique()

In [None]:
# year column
meteor_year = data['year'].value_counts(dropna=False)
meteor_year

In [None]:
# Find the mode of the 'year' column
year_mode = data['year'].mode()[0]

# Replace missing values in 'year' column with mode
data['year'].fillna(year_mode, inplace=True)

# verifying values to check if they are replaced by median value
data['year'].unique()

In [None]:
# reclat and reclong column updates
meteor_latitude = data['reclat'].value_counts(dropna=False)
meteor_longitude = data['reclong'].value_counts(dropna=False)
print(meteor_latitude)
print(meteor_longitude)

In [None]:
# Replace missing values in 'rectlat' and 'reclong' columns with median
data['reclat'].fillna(data['reclat'].median(), inplace=True)
data['reclong'].fillna(data['reclong'].median(), inplace=True)

# Verify that missing values have been replaced with median
print("Latitude - ", data['reclat'].unique())
print("Longitude - ", data['reclong'].unique())

In [None]:
#Final Validation of missing values
data.isnull().sum()

# Number of Meteorites that fell in the 20th and 21st century

In [None]:
data_before_2000 = data[data['year'] < 2000]
data_after_2000 = data[data['year'] >= 2000]
# Grouping the data by year and count the number of meteorites that fell before and after the 21st century.
counts_before_2000 = data_before_2000.groupby('year')['fall'].count()
counts_after_2000 = data_after_2000.groupby('year')['fall'].count()

# Plot for the line charts for each subset of data
plt.plot(counts_before_2000.index, counts_before_2000.values, label='Before 2000')
plt.plot(counts_after_2000.index, counts_after_2000.values, label='After 2000')
plt.xlabel('Year')
plt.xlim(1970,2013)
plt.ylabel('Number of Meteorites')
plt.title('Number of Meteorites that Fell Each Year')
plt.legend()
plt.show()

As per above line chart highest no. of meteorite observed  was 3323 in year 2003.No. of meteorite observed ranges very low from 1970-1977 but drastic increase can be seen towards year 1980. After 1985, the number of meteorites observed fluctuated, but there was an overall increase from 1985 to 2010. So we can conclude that there will more number of meteorites found in the near future.

# Total Number of Meteorites found before and after the 21st century

In [None]:
# Create two dataframes for records before 2000 and after 2000
df_before_2000 = data[data['year'] < 2000]
df_after_2000 = data[data['year'] >= 2000]
# Get the count of meteorites before and after 2000
count_before_2000 = len(df_before_2000)
count_after_2000 = len(df_after_2000)

# Create a bar chart showing the total count of meteorites before and after 2000
fig, ax = plt.subplots(figsize=(8, 6))
ax.bar(['Before 2000', 'After 2000'], [count_before_2000, count_after_2000], color=['skyblue', 'lightgreen'])
ax.set_title('Total Count of Meteorites Before and After 2000')
ax.set_ylabel('Count')

# Display count values inside each bar
for i, v in enumerate([count_before_2000, count_after_2000]):
    ax.text(i, v/2, str(v), color='black', fontweight='bold', ha='center', va='center')
plt.show()


As per above Bar chart No. of meteorite observed before 2000  are 25704 which is 23% more than meteorite observed after 2000. Therefore, the average number of meteorites observed per year is higher for the after 2000 data (1972.1) than the before 2000 data (856.8). So we can conclude that there will more number of meteorites found in the near future.

# Most prevalent classes of Meteorites before and after the 21st century

In [None]:
# Group the data by meteorite class and extract the first letter of each class name
class_first_letter = data.groupby('recclass')['recclass'].apply(lambda x: x.str[0])

# Count the number of occurrences of each unique first letter for all classes
class_first_letter_counts = class_first_letter.value_counts()

# Select the top 5 classes by unique first letter count
top_classes = class_first_letter_counts.head(5)

# Create a pie chart for the top 5 classes by unique first letter count
fig, ax = plt.subplots(figsize=(8, 8))
ax.pie(top_classes, labels=top_classes.index, autopct='%1.1f%%', startangle=90)
ax.set_title('Top 5 Meteorite Classes by Unique First Letter Found')
plt.show()


The above pie chart provides the top 5 classes of meteroties found.

In [None]:
# Create a function to group similar recclass types
def group_recclass(recclass):
    if recclass.startswith('L'):
        return 'L'
    elif recclass.startswith('H'):
        return 'H'
    elif recclass.startswith('C'):
        return 'C'
    elif recclass.startswith('E'):
        return 'E'
    else:
        return 'Other'

# Create a new column 'recclass_group' by applying the group_recclass function to the 'recclass' column
df['recclass_group'] = data['recclass'].apply(group_recclass)

# Create two dataframes for records before 2000 and after 2000
df_before_2000 = df[df['year'] < 2000]
df_after_2000 = df[df['year'] >= 2000]

# Get the value counts of the recclass_group column for both dataframes
counts_before_2000 = df_before_2000['recclass_group'].value_counts()
counts_after_2000 = df_after_2000['recclass_group'].value_counts()

# Create two pie charts, one for before 2000 and one for after 2000
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))

ax1.pie(counts_before_2000.values, labels=counts_before_2000.index, autopct='%1.1f%%')
ax1.set_title('Records Before 2000')

ax2.pie(counts_after_2000.values, labels=counts_after_2000.index, autopct='%1.1f%%')
ax2.set_title('Records After 2000')

plt.show()


From the above pie chart through out the years observed, more than 80% of meteorite falls were in L and H class.However, the percentage of L class meteorites increased by 11.6%, while the percentage of H class meteorites decreased by 12.1%. This suggests that the composition of meteorites may be changing over time, with a relative increase in L class meteorites and a relative decrease in H class meteorites.

# Frequency distribution of Meteorite masses

In [None]:

# Plot a histogram of meteorite masses
plt.hist(data['mass (g)'], bins=100, range=(0, 100000))
plt.xlabel('Meteorite Mass (grams)')
plt.ylabel('Frequency')
plt.xlim([0, 20000])
plt.ylim([0, 2000])
plt.title('Distribution of Meteorite Masses')
plt.show()

In the above histogram, most meteorites found are between 0 to 1000 grams range (around 40000).We limited the range to get a better visualization. As the mass increased after 7.5 kgs, the number of meteorites found have exponentially decreased and very heavy meteorites found were rare. We can conclude that more meteorites found are in mass range provided above.

# Box plots of Total Mass of top meteorite class by first letter against mean mass

In [None]:
filtered_data = data[['recclass', 'mass (g)']]

# Group the filtered data by the first letter of the class and calculate the total mass of each group
total_masses = filtered_data.groupby(filtered_data['recclass'].str[0])['mass (g)'].sum()

# Sort the data in descending order by total mass and take the top 5 groups
top_groups = total_masses.sort_values(ascending=False).head(5)

# Filter the data to only include the top 5 groups
filtered_data = filtered_data[filtered_data['recclass'].str[0].isin(top_groups.index)]

# Convert mass from grams to kilograms
filtered_data['mass (kg)'] = filtered_data['mass (g)'] / 1000.0

# Calculate the interquartile range (IQR)
q1, q3 = np.percentile(filtered_data['mass (kg)'], [25, 75])
iqr = q3 - q1

# Set the threshold for outliers
threshold = 1.5 * iqr

# Remove the outliers
filtered_data = filtered_data.loc[abs(filtered_data['mass (kg)'] - np.median(filtered_data['mass (kg)'])) <= threshold]

# Plot the box plots
plt.figure(figsize=(10,6))
plt.title('Total Mass and Mean Mass of Top 5 Meteorite Groups by First Letter of Class')
plt.xlabel('First Letter of Class')
plt.ylabel('Mass (kg)')
plt.xticks(rotation=45)
plt.boxplot([filtered_data[filtered_data['recclass'].str[0] == group]['mass (kg)'] for group in top_groups.index], showfliers=False)
plt.xticks(range(1, len(top_groups.index)+1), top_groups.index)
plt.plot(range(1, len(top_groups.index)+1), [np.mean(filtered_data[filtered_data['recclass'].str[0] == group]['mass (kg)']) for group in top_groups.index], 'ro')
plt.show()

Box plots of the top classes of meteorites found shows their distribution against their mass.We can conclude that, the mean masses for the top 5 classes is less than 0.1 kg, indicating very small pieces of meteorites were found in large numbers.

# Scatter Plot of Total classes of Meteorites found vs Latitude

In [None]:
# Group the data by latitude and count the number of unique classes
lat_counts = data.groupby('reclat')['recclass'].nunique()

# Create scatter plot
fig, ax = plt.subplots()
ax.scatter(lat_counts.index, lat_counts.values)

# set x-axis and y-axis limits
plt.xlim(-90,90)
plt.ylim(0, 30)

# set axis labels
plt.xlabel('Latitude')
plt.ylabel('Total Classes of Meteorites')

# Set title
ax.set_title('Scatter Plot of Latitude vs Total Classes of Meteorites')

# Show the plot
plt.show()


As per above scatter plot highest no. of classes are observed near 30-degree latitude. Most meteorite falls occur between 30 degrees north and south latitude. This is likely because the Earth's magnetic field is weakest at the poles, making it more likely for meteorites to survive entry into the atmosphere.

# Heat Map of meteorites found on the globe

In [None]:

# Filter the dataset to include only landings between 1970 and 2015
meteorites = data[(data['year'] >= 1970) & (data['year'] <= 2015)]


# Create an interactive map of the meteorite landings
fig = px.scatter_geo(meteorites, lat='reclat', lon='reclong',
                     color='year', hover_name='name',
                     scope='world', projection='natural earth')

# Add a title and adjust marker size
fig.update_layout(title='Meteorite Landings on Earth (1970-2015)',
                  geo=dict(landcolor='white', coastlinecolor='grey'),
                  showlegend=True)
fig.update_traces(marker=dict(size=4))

# Divide the map into two parts above and below the equator
fig.add_shape(type='rect', x0=-180, y0=0, x1=180, y1=90, fillcolor='rgba(0,0,255,0.1)',
              line=dict(color='blue', width=2, dash='dot'))
fig.add_shape(type='rect', x0=-180, y0=-90, x1=180, y1=0, fillcolor='rgba(255,0,0,0.1)',
              line=dict(color='red', width=2, dash='dot'))

# Add equator line
eq_lat = [0] * 360
eq_lon = [i for i in range(-180, 0)] + [i for i in range(0, 181)]
fig.add_trace(go.Scattergeo(lat=eq_lat, lon=eq_lon, mode='lines',
                             line=dict(color='black', width=2), showlegend = False, legendgroup='equator', name='Equator'))



# Calculate the percentage of meteorites above, below, and on the equator
meteorites_above_equator = meteorites[meteorites['reclat'] > 0]
meteorites_below_equator = meteorites[meteorites['reclat'] < 0]
meteorites_on_equator = meteorites[meteorites['reclat'] == 0]
percent_above = meteorites_above_equator.shape[0] / meteorites.shape[0] * 100
percent_below = meteorites_below_equator.shape[0] / meteorites.shape[0] * 100
percent_on_equator = meteorites_on_equator.shape[0] / meteorites.shape[0] * 100

# Print the results
print(f'Percentage of meteorites above equator: {percent_above:.2f}%')
print(f'Percentage of meteorites below equator: {percent_below:.2f}%')
print(f'Percentage of meteorites on equator: {percent_on_equator:.2f}%')

# Show the map
fig.show()


As per above heat map we can conclude that more meteors are found below the equator near the poles. Meteors found near equator are quite less because the Earth’s atmosphere is thicker and meteors enter at a shallow angle near the equator, so chances of meteors burning up or leaving the atmosphere are high.