# UFO Sightings

#### The objective of this assignment is for you to explain what is happening in each cell in clear, understandable language. 

#### _There is no need to code._ The code is there for you, and it already runs. Your task is only to explain what each line in each cell does.

#### The placeholder cells should describe what happens in the cell below it.

**Example**: The cell below imports `pandas` as a dependency because `pandas` functions will be used throughout the program, such as the Pandas `DataFrame` as well as the `read_csv` function.

In [1]:
import pandas as pd

_[This finds the path of the csv file and uses read_csv to read the csv file and pd to create a DataFrame]_

In [2]:
csv_path = "Resources/ufoSightings.csv"

ufo_df = pd.read_csv(csv_path)

ufo_df.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,datetime,city,state,country,shape,duration (seconds),duration (hours/min),comments,date posted,latitude,longitude
0,10/10/1949 20:30,san marcos,tx,us,cylinder,2700,45 minutes,This event took place in early fall around 194...,4/27/2004,29.8830556,-97.941111
1,10/10/1949 21:00,lackland afb,tx,,light,7200,1-2 hrs,1949 Lackland AFB&#44 TX. Lights racing acros...,12/16/2005,29.38421,-98.581082
2,10/10/1955 17:00,chester (uk/england),,gb,circle,20,20 seconds,Green/Orange circular disc over Chester&#44 En...,1/21/2008,53.2,-2.916667
3,10/10/1956 21:00,edna,tx,us,circle,20,1/2 hour,My older brother and twin sister were leaving ...,1/17/2004,28.9783333,-96.645833
4,10/10/1960 20:00,kaneohe,hi,us,light,900,15 minutes,AS a Marine 1st Lt. flying an FJ4B fighter/att...,1/22/2004,21.4180556,-157.803611


_[This is counting the number of rows of the table ufo dataframe. It is helpful to know to get an idea of the size of the total ufo sightings.]_

In [3]:
ufo_df.count()

datetime                80332
city                    80332
state                   74535
country                 70662
shape                   78400
duration (seconds)      80332
duration (hours/min)    80332
comments                80317
date posted             80332
latitude                80332
longitude               80332
dtype: int64

_[The dropna is used to replace the NAN values in the data. With "ANY" it will remove the column or row if just one element is NAN. This could help making calculations with the data however some data is also lost when removing the whole column or row. If using "ALL" then all values need to be NAN for the function to remove it. In this case there is no loss of potentially useful information]_

In [4]:
clean_ufo_df = ufo_df.dropna(how="any")
clean_ufo_df.count()

datetime                66516
city                    66516
state                   66516
country                 66516
shape                   66516
duration (seconds)      66516
duration (hours/min)    66516
comments                66516
date posted             66516
latitude                66516
longitude               66516
dtype: int64

_[Below we are creating a list of elements that will be used as the titles of the columns in the dataframe by using it in the second part of the loc function. The loc function using country = us is filtering the data to show only the us country data in the dataframe.]

In [5]:
columns = [
    "datetime",
    "city",
    "state",
    "country",
    "shape",
    "duration (seconds)",
    "duration (hours/min)",
    
    "comments",
    "date posted"
]

usa_ufo_df = clean_ufo_df.loc[clean_ufo_df["country"] == "us", columns]
usa_ufo_df.head()

Unnamed: 0,datetime,city,state,country,shape,duration (seconds),duration (hours/min),comments,date posted
0,10/10/1949 20:30,san marcos,tx,us,cylinder,2700,45 minutes,This event took place in early fall around 194...,4/27/2004
3,10/10/1956 21:00,edna,tx,us,circle,20,1/2 hour,My older brother and twin sister were leaving ...,1/17/2004
4,10/10/1960 20:00,kaneohe,hi,us,light,900,15 minutes,AS a Marine 1st Lt. flying an FJ4B fighter/att...,1/22/2004
5,10/10/1961 19:00,bristol,tn,us,sphere,300,5 minutes,My father is now 89 my brother 52 the girl wit...,4/27/2007
7,10/10/1965 23:45,norwalk,ct,us,disk,1200,20 minutes,A bright orange color changing to reddish colo...,10/2/1999


_[Below we are building a list of all States and showing their total count by using value_counts. This is useful to visualize the number of elements by state.]_

In [7]:
state_counts = usa_ufo_df["state"].value_counts()
state_counts

ca    8683
fl    3754
wa    3707
tx    3398
ny    2915
il    2447
az    2362
pa    2319
oh    2251
mi    1781
nc    1722
or    1667
mo    1431
co    1385
in    1268
va    1248
ma    1238
nj    1236
ga    1235
wi    1205
tn    1091
mn     996
sc     986
ct     865
ky     843
md     818
nv     778
ok     714
nm     693
ia     669
al     629
ut     611
ks     599
ar     578
la     547
me     544
id     508
nh     482
mt     460
wv     438
ne     373
ms     368
ak     311
hi     257
vt     254
ri     224
sd     177
wy     169
de     165
nd     123
pr      24
dc       7
Name: state, dtype: int64

_[Below we convert the count of each state data calculated in the previous step into a dataframe and this is important to be able to apply the dataframe functions to the data]_

In [8]:
state_ufo_counts_df = pd.DataFrame(state_counts)
state_ufo_counts_df.head()

Unnamed: 0,state
ca,8683
fl,3754
wa,3707
tx,3398
ny,2915


_[Below we change the name of State by Sum of Sightings so that is more informative/descriptive for the user of the information who will probably will not be the coder]_

In [9]:
state_ufo_counts_df = state_ufo_counts_df.rename(
    columns={"state": "Sum of Sightings"})
state_ufo_counts_df.head()

Unnamed: 0,Sum of Sightings
ca,8683
fl,3754
wa,3707
tx,3398
ny,2915


_[Below we print the dtypes which tell you what kind of data you are using. This can be helpful to know what kind of methods you can apply to each data typ]_

In [10]:
usa_ufo_df.dtypes

datetime                object
city                    object
state                   object
country                 object
shape                   object
duration (seconds)      object
duration (hours/min)    object
comments                object
date posted             object
dtype: object

_[Since we identified that duration(seconds) is an object and we cannot apply mathematical calculations to it what we do below is to convert it to datatype float so that we can sum in the followin step.]_

In [11]:
usa_ufo_df.loc[:, "duration (seconds)"] = usa_ufo_df["duration (seconds)"].astype("float")
usa_ufo_df.dtypes

datetime                 object
city                     object
state                    object
country                  object
shape                    object
duration (seconds)      float64
duration (hours/min)     object
comments                 object
date posted              object
dtype: object

_[We are adding the total duration now that the type is changed to float]_

In [13]:
usa_ufo_df["duration (seconds)"].sum()

351281285.38

_[We are grouping by State and City. When the data is groupedby the results are displayed by group. In this case the data is shown by State and inside state they are grouped by City. When counting the result will display the number of sightings in each city.]_

In [12]:
grouped_data = usa_ufo_df.groupby(['state', 'city'])

grouped_data['datetime'].count()

state  city                                                     
ak     adak                                                          1
       anchor point                                                  1
       anchorage                                                    82
       angoon                                                        1
       auke bay                                                      2
       bethel                                                        8
       big lake                                                      1
       butte                                                         1
       chugiak                                                       2
       clam gulch                                                    1
       cold bay                                                      1
       cordova                                                       2
       council                                                       1
       craig