In [37]:
import pandas as pd
import persist_ext as PR
import altair as alt

In [7]:
df = pd.read_csv("ufo.csv") 
df.head()

Unnamed: 0,city,state,date__,time__,shape,duration -- seconds,comments,date posted,latitude,longitude
0,boise,id,1991-03-10,03:00:00,disk,600.0,Saucer Shaped Craft Followed Me,5/29/2012,43.613611,-116.2025
1,pasadena,tx,1991-11-15,19:30:00,unknown,600.0,UFO sighting in Pasadena TX in November 1991.,3/19/2009,29.690833,-95.208889
2,commerce,tx,1991-08-01,02:00:00,circle,120.0,spoked wheel flies in front of the moon,12/23/2002,33.246944,-95.899722
3,levittown,ny,1991-06-10,23:35:00,circle,300.0,My Mother her boyfriend and myself seen a spac...,3/19/2009,40.725833,-73.514722
4,grants (southeast of),nm,1991-05-05,20:30:00,light,120.0,bright light travelling normal speed stops abo...,8/28/2003,35.147222,-107.850833


## Interactive Table

### Fixing column names

The columns in the dataframe have some issues we need to fix:

- Some of the columns in the dataframe have trailing `__` in the name.
- One column is titled `duration -- seconds`, we should rename it to `duration_seconds`.
- Finally we have a column called `shape`. `shape` is also an attribute on pandas dataframes. To avoid conflicts when using the column, we should change the name to `ufo_shape`

## Changing data types

The `ufo_shape` (previously `shape`) has the dtype of `string`. We should assign it the dtype of `category`

In [8]:
PR.PersistTable(df)

PersistWidget(data_values=[{'__id_column': '1', 'city': 'boise', 'state': 'id', 'date__': '1991-03-10', 'time_…

In [29]:
cleaned_df.shape

(4649, 11)

## Interactive Bar Chart

We have a bar chart which shows counts of ufo sightings grouped by year. We can see we have very few data points for `1998` and before. Also `2014` seems to have less data. We can remove the years with less data.

In [26]:
PR.plot.barchart(cleaned_df, x="utcyear(date):O", y="count()")

PersistWidget(data_values=[{'__id_column': '1', 'city': 'boise', 'state': 'id', 'date': 668563200000, 'time': …

In [30]:
df_filtered.head()

Unnamed: 0,city,state,date,time,ufo_shape,duration_seconds,comments,date posted,latitude,longitude,__annotations
0,mecosta,mi,1999-06-12,22:30:00,sphere,240.0,object viewed due North. Low on horizon. Hov...,8/10/1999,43.620278,-85.226389,No Annotation
1,boise,id,1999-09-11,21:30:00,teardrop,5.0,unknown object&#44possible space junk&#44illum...,10/2/1999,43.613611,-116.2025,No Annotation
2,raymond,nh,1999-10-14,22:50:00,triangle,900.0,ITMOVED VERY SLOW&#44FLASHED BRIGHT PINK LIGHT...,8/5/2001,43.036111,-71.183889,No Annotation
3,gibsonville,nc,1999-12-16,07:15:00,sphere,120.0,Yellow-Orange soccer ball sized sphere photogr...,7/4/2012,36.105556,-79.5425,No Annotation
4,rancho cordova,ca,1999-01-01,23:59:00,unknown,300.0,I heard like a swoshing noise that was high pi...,8/30/1999,38.589167,-121.301667,No Annotation


In [31]:
df_filtered.shape

(4170, 11)

### Duration of sightings

Below is the binned plot of UFO sightings, there are a lot of outliers with some sighting events lasting as long 

In [63]:
PR.plot.barchart(cleaned_df, x="duration_seconds", y="count()", width=600)

PersistWidget(data_values=[{'__id_column': '1', 'city': 'boise', 'state': 'id', 'date': 668563200000, 'time': …

In [67]:
sightings_sub_1hour.head()

Unnamed: 0,city,state,date,time,ufo_shape,duration_seconds,comments,date posted,latitude,longitude,__annotations
0,boise,id,1991-03-10,03:00:00,disk,600.0,Saucer Shaped Craft Followed Me,5/29/2012,43.613611,-116.2025,No Annotation
1,pasadena,tx,1991-11-15,19:30:00,unknown,600.0,UFO sighting in Pasadena TX in November 1991.,3/19/2009,29.690833,-95.208889,No Annotation
2,commerce,tx,1991-08-01,02:00:00,circle,120.0,spoked wheel flies in front of the moon,12/23/2002,33.246944,-95.899722,No Annotation
3,levittown,ny,1991-06-10,23:35:00,circle,300.0,My Mother her boyfriend and myself seen a spac...,3/19/2009,40.725833,-73.514722,No Annotation
4,grants (southeast of),nm,1991-05-05,20:30:00,light,120.0,bright light travelling normal speed stops abo...,8/28/2003,35.147222,-107.850833,No Annotation


In [66]:
longer_sightings.head()

Unnamed: 0,city,state,date,time,ufo_shape,duration_seconds,comments,date posted,latitude,longitude,__annotations
0,chowchilla,ca,1991-08-15,03:00:00,disk,259200.0,When I was 16 years old I worked on my Dad&#82...,3/18/2014,37.123056,-120.259167,No Annotation
1,madras,or,1995-06-01,20:00:00,other,109800.0,We were possibly abducted by a pyramid shaped ...,5/15/2013,44.633611,-121.128333,No Annotation
2,coopersville,mi,1997-07-07,23:00:00,unknown,64800.0,Fireball impact &#44 intelligent colorful ligh...,1/31/2011,43.063889,-85.934722,No Annotation
3,west liberty,ky,1999-09-05,22:30:00,light,37800.0,bright pulseating light diffferencating from g...,9/6/2002,37.318333,-84.939444,No Annotation
4,north whitefield,me,2001-01-01,19:00:00,light,109800.0,I have seen this craft for years and would lik...,1/3/2001,44.221944,-69.587778,No Annotation


In [72]:
PR.plot.scatterplot(sightings_sub_1hour, x="duration_seconds", y="time")

PersistWidget(data_values=[{'__id_column': '1', 'city': 'boise', 'state': 'id', 'date': 668563200000, 'time': …