# UFO Sightings

#### The objective of this assignment is for you to explain what is happening in each cell in clear, understandable language. 

#### _There is no need to code._ The code is there for you, and it already runs. Your task is only to explain what each line in each cell does.

#### The placeholder cells should describe what happens in the cell below it.

**Example**: The cell below imports `pandas` as a dependency because `pandas` functions will be used throughout the program, such as the Pandas `DataFrame` as well as the `read_csv` function.

In [None]:
import pandas as pd

Using pandas to read CSV into a data frame

In [None]:
csv_path = "Resources/ufoSightings.csv"

ufo_df = pd.read_csv(csv_path)

ufo_df.head()

Counts number of entries per column. Helpful for spotting data discrepancies, can clean so only working with complete data

In [None]:
ufo_df.count()

Remove missing values. How=any drops row or column if any NA values are present; for How=all, all values would have to be NA to drop the row
Pros to "any" - stricter criteria for inclusion means stronger study; Cons - lose data points
Pros to "all" - retain as many data points as possible; Cons - use of incomplete data

In [None]:
clean_ufo_df = ufo_df.dropna(how="any")
clean_ufo_df.count()

Creates a data frame using the csv file by naming the columns from the csv to include
Also adds "us" country column

In [None]:
columns = [
    "datetime",
    "city",
    "state",
    "country",
    "shape",
    "duration (seconds)",
    "duration (hours/min)",
    "comments",
    "date posted"
]

usa_ufo_df = clean_ufo_df.loc[clean_ufo_df["country"] == "us", columns]
usa_ufo_df.head()

Returns frequency of each state in data frame. Value counts records the occurrences of each variable called for (state, in this case)
This can be practical to quickly see the most common entries of whatever variable

In [None]:
state_counts = usa_ufo_df["state"].value_counts()
state_counts

Creates a new dataframe based on state_counts, what we just executed.
.head() only presents the top five entries

In [None]:
state_ufo_counts_df = pd.DataFrame(state_counts)
state_ufo_counts_df.head()

Renames columns to Sum of Sightings, which better represents the data being presented

In [None]:
state_ufo_counts_df = state_ufo_counts_df.rename(
    columns={"state": "Sum of Sightings"})
state_ufo_counts_df.head()

#Calling dtypes is useful in order to track if our varibales are being recognized as objects, integers, etc. That way we can write our code with that in mind

In [None]:
usa_ufo_df.dtypes

changes seconds to float64, this way we would be able to calculate the sum without error. The data types need to match

In [None]:
usa_ufo_df.loc[:, "duration (seconds)"] = usa_ufo_df["duration (seconds)"].astype("float")
usa_ufo_df.dtypes

Now we can add the seconds

In [None]:
usa_ufo_df["duration (seconds)"].sum()

Presents grouped data organized by state, then city. We are now able to see city data, organized by state.

In [None]:
grouped_data = usa_ufo_df.groupby(['state', 'city'])

grouped_data['datetime'].count()