## 3.	What is pandas series and how do I select it from a DataFrame?

A pandas series is a column in a data frame. We often need to perform analysis or manipulate a particular column. So how do we select series? 

In [1]:
import pandas as pd

We will use the UFO sightings report dataset to learn how to select a series.

In [2]:
ufo = pd.read_csv("http://bit.ly/uforeports", parse_dates=["Time"])
ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3,Abilene,,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00


In [3]:
type(ufo)

pandas.core.frame.DataFrame

We can select a series in two ways ufo[“City”] or ufo.City. We can use ufo.City because panda associates all the names of the column as a data attribute. If you are familiar with defining "class" in python you might have already used dot notation to view the value of data attribute. When we use attribute we don’t need “( )" as we do with methods. You can press the "tab" after the dot to get a list of available attributes and methods applicable. Also, notice that column names are case sensitive.

In [4]:
ufo["City"]

0                      Ithaca
1                 Willingboro
2                     Holyoke
3                     Abilene
4        New York Worlds Fair
                 ...         
18236              Grant Park
18237             Spirit Lake
18238             Eagle River
18239             Eagle River
18240                    Ybor
Name: City, Length: 18241, dtype: object

In [5]:
ufo.City

0                      Ithaca
1                 Willingboro
2                     Holyoke
3                     Abilene
4        New York Worlds Fair
                 ...         
18236              Grant Park
18237             Spirit Lake
18238             Eagle River
18239             Eagle River
18240                    Ybor
Name: City, Length: 18241, dtype: object

We can use type( ) function to find out the type of object. We should also be aware that dot notation will not in some cases. First would be when the column name includes space, second would we when column name conflicts with other built-in data attributes (if you have a column name shape, you cannot use ufo.shape to select the series), and finally while performing assignment operation with a series for modifying an existing column or creating a new one on the left side of the assignment operator.

In [6]:
type(ufo["City"])

pandas.core.series.Series

In [7]:
ufo["Colors Reported"]

0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
        ... 
18236    NaN
18237    NaN
18238    NaN
18239    RED
18240    NaN
Name: Colors Reported, Length: 18241, dtype: object

In [8]:
ufo.shape

(18241, 5)

In [9]:
ufo.City + ", " + ufo.State

0                      Ithaca, NY
1                 Willingboro, NJ
2                     Holyoke, CO
3                     Abilene, KS
4        New York Worlds Fair, NY
                   ...           
18236              Grant Park, IL
18237             Spirit Lake, IA
18238             Eagle River, WI
18239             Eagle River, WI
18240                    Ybor, FL
Length: 18241, dtype: object

In [10]:
ufo["Location"] = ufo.City + ", " + ufo.State
ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time,Location
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00,"Ithaca, NY"
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00,"Willingboro, NJ"
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00,"Holyoke, CO"
3,Abilene,,DISK,KS,1931-06-01 13:00:00,"Abilene, KS"
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00,"New York Worlds Fair, NY"


A DateFrame in essence is an arrangement of multiple series. We can simply pass a list with column names to filter out a subset of DataFrame consisting of just the columns from the list.

In [11]:
ufo[["Shape Reported", "Time", "Location"]]

Unnamed: 0,Shape Reported,Time,Location
0,TRIANGLE,1930-06-01 22:00:00,"Ithaca, NY"
1,OTHER,1930-06-30 20:00:00,"Willingboro, NJ"
2,OVAL,1931-02-15 14:00:00,"Holyoke, CO"
3,DISK,1931-06-01 13:00:00,"Abilene, KS"
4,LIGHT,1933-04-18 19:00:00,"New York Worlds Fair, NY"
...,...,...,...
18236,TRIANGLE,2000-12-31 23:00:00,"Grant Park, IL"
18237,DISK,2000-12-31 23:00:00,"Spirit Lake, IA"
18238,,2000-12-31 23:45:00,"Eagle River, WI"
18239,LIGHT,2000-12-31 23:45:00,"Eagle River, WI"
