# M1L7 Data Challenge:  Data Manipulation 

 We'll continue to work with UFO sighting data.

### **Dataset:** [UFO Sightings](https://www.kaggle.com/datasets/jonwright13/ufo-sightings-around-the-world-better?resource=download) -- This is also in your data folder 

### **Objectives:**

- Use string methods to manipulate data 
- Filter Data 
- Work more with dates in Python



**Let's get started!**

### Step 1:  Import Pandas & Numpy

In [2]:
# Import Pandas 
import pandas as pd
import datetime as dt

### Step 2: Load the dataset (csv file stored in the data folder) into a Pandas DataFrame called `ufo`

- The file is callled `ufo-sightings.csv`


In [3]:
ufo = pd.read_csv("ufo-sightings-transformed.csv")


### Step 3: Explore the Data

Use any method(s) of your choice to look at the data and explore it 


In [4]:
ufo.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80328 entries, 0 to 80327
Data columns (total 17 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Unnamed: 0                   80328 non-null  int64  
 1   Date_time                    80328 non-null  object 
 2   date_documented              80328 non-null  object 
 3   Year                         80328 non-null  int64  
 4   Month                        80328 non-null  int64  
 5   Hour                         80328 non-null  int64  
 6   Season                       80328 non-null  object 
 7   Country_Code                 80069 non-null  object 
 8   Country                      80069 non-null  object 
 9   Region                       79762 non-null  object 
 10  Locale                       79871 non-null  object 
 11  latitude                     80328 non-null  float64
 12  longitude                    80328 non-null  float64
 13  UFO_shape       

### Step 4:  Clean the UFO_shape column 
- Make the column all uppercase 
- Strip off any leading and trailing spaces 

Even if there are no actual spaces; it is still good practice to trim off spaces even if you can't see space with the naked eye

Hint:  You will use both `str.upper()` and `str.strip()` -- you can do it in one step or two separate steps 

In [5]:
ufo['UFO_shape']= ufo['UFO_shape'].str.upper().str.strip()
print(ufo['UFO_shape'])

0        CYLINDER
1           LIGHT
2          CIRCLE
3          CIRCLE
4           LIGHT
           ...   
80323       LIGHT
80324      CIRCLE
80325       OTHER
80326      CIRCLE
80327       CIGAR
Name: UFO_shape, Length: 80328, dtype: object


### Step 5:  Use `pd.crosstab` to sum the number of shapes seen by season

- Add a comment of a main takeaway from the output 

In [6]:
pd.crosstab(ufo['UFO_shape'],ufo['Season'])
#Add comment here: throughout all of the seasons, it seems like summer has the most UFO sighting.

Season,Autumn,Spring,Summer,Winter
UFO_shape,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CHANGED,0,0,1,0
CHANGING,552,443,565,402
CHEVRON,333,215,225,179
CIGAR,519,421,783,334
CIRCLE,2002,1464,2604,1537
CONE,85,70,84,77
CRESCENT,1,1,0,0
CROSS,63,53,70,47
CYLINDER,332,272,449,230
DELTA,3,1,2,1


In [7]:
# Run this cell without changes before moving on to step 6!

ufo['Date_time'] = pd.to_datetime(ufo['Date_time'], format="%Y-%m-%d %H:%M:%S")

### Step 6:  Filter the data where the region is equal to `New York`

In [12]:
nyufo = ufo[ufo['Region'] == 'New York']
nyufo.head()

Unnamed: 0.1,Unnamed: 0,Date_time,date_documented,Year,Month,Hour,Season,Country_Code,Country,Region,Locale,latitude,longitude,UFO_shape,length_of_encounter_seconds,Encounter_Duration,Description
12,12,1970-10-10 16:00:00,5/11/2000,1970,10,16,Autumn,USA,United States,New York,Nassau County,40.668611,-73.5275,DISK,1800.0,30 min.,silver disc seen by family and neighbors
27,27,1978-10-10 02:00:00,2/1/2007,1978,10,2,Autumn,USA,United States,New York,Alden Manor,40.700833,-73.713333,RECTANGLE,300.0,5min,A memory I will never forget that happened men...
28,28,1979-10-10 00:00:00,4/16/2005,1979,10,0,Autumn,USA,United States,New York,Poughkeepsie,41.700278,-73.921389,CHEVRON,900.0,15 minutes,1/4 moon-like&#44 its &#39chord&#39 or flat s...
38,38,1984-10-10 22:00:00,8/10/1999,1984,10,22,Autumn,USA,United States,New York,White Plains,41.033889,-73.763333,FORMATION,20.0,15-20 seconds,Saw a hugh object in sky with lights intermitt...
40,40,1986-10-10 20:00:00,10/8/2007,1986,10,20,Autumn,USA,United States,New York,Holmes,41.523427,-73.646795,CHEVRON,180.0,3 minutes,Football Field Sized Chevron with bright white...


### Step 7:  Get the most recent `Date_time` that a UFO was sighted in New York 

Hint:  Make sure you saved your filtered data from Step 6 to a new dataframe object aka varaible.  You can use `.max()` right after a column name to get the max of that column

You are using the `Date_time` column for this question

In [10]:
recent = ufo['Date_time'].max()
print(recent)

2014-05-08 18:45:00


## Above and Beyond (AAB)  -- OPTIONAL

### Question 1:  How many days have passed between the first UFO sighting in NY and the most recent sighting in NY based on this data?

In [19]:
print(nyufo['Date_time'].max() - nyufo['Date_time'].min())


30654 days 01:04:00


### Question 2:  Filter the data where UFO_shape is `UNKNOWN` and the Region is `New York` 

In [20]:
unknownufo = ufo[(ufo['UFO_shape'] == 'UNKNOWN') & (ufo['Region'] == 'New York')]
unknownufo.head()


Unnamed: 0.1,Unnamed: 0,Date_time,date_documented,Year,Month,Hour,Season,Country_Code,Country,Region,Locale,latitude,longitude,UFO_shape,length_of_encounter_seconds,Encounter_Duration,Description
661,661,1999-10-01 17:00:00,2/18/2001,1999,10,17,Autumn,USA,United States,New York,New York,40.714167,-74.006389,UNKNOWN,5.0,5 seconds,I witnessed a being in the middle of the day i...
816,816,2006-10-01 21:00:00,10/30/2006,2006,10,21,Autumn,USA,United States,New York,Village of Orchard Park,42.7675,-78.744167,UNKNOWN,37800.0,approx. 1 1/2 hours,Spotted again as before........
923,923,2011-10-01 00:00:00,12/12/2011,2011,10,0,Autumn,USA,United States,New York,New York,40.579532,-74.150201,UNKNOWN,600.0,10 minutes,Huge bright fireball descends over Staten Island.
1059,1059,2003-10-12 02:00:00,11/26/2003,2003,10,2,Autumn,USA,United States,New York,Lark Street,42.6525,-73.756667,UNKNOWN,30.0,30 seconds,object emmited bright light then sped off in a...
1495,1495,1997-10-14 16:00:00,8/5/2001,1997,10,16,Autumn,USA,United States,New York,Syracuse,43.048056,-76.147778,UNKNOWN,30.0,30 sec. max,4 Military planes fly past flying rod&#44 and...
