

## Basics of Pandas 
1. [Commonly used operations](#1-Commonly-used-operations)
2. [Series](#2-Series)
3. [Dataframes](#3-Data-Frames)
4. [Index Objects](#4-Index-objects)
5. [Data Selections](#5-Data-Selections)
   - [Boolean Maskings](#51-Boolean-Maskings)
   - [Implict Explicit Selection](#52-Implict-Explicit-Selection)
   - [Operations with Dataframes](#53-Operations-with-Dataframes)
6. [Hierarchical Indexing](#6-Hierarchical-Indexing)
7. [Concats Merge](#7-Concats-Merge)
8. [Grouping Aggregations](#8-Grouping-Aggregations)
9. [Pivot Tables](#9-Pivot-Tables)
10. [Times Series](#10-Time-Series)


anchor # must match 


In [11]:
import numpy as np 
import pandas as pd 

#### 1 Commonly used functions  

In [18]:
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

df.dtypes # all dtypes of all columns 

A    int64
B    int64
dtype: object

In [29]:
df.info() # summary of dtypes of each column 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   A       3 non-null      int64
 1   B       3 non-null      int64
dtypes: int64(2)
memory usage: 176.0 bytes


In [28]:
df.shape  # get the shape of the column => Recall that shape = "length" of each dimension (row, col) in this case 

(3, 2)

In [24]:
df.size # get the size of the column => Recall that siuze = "Volume" of the data frame 

6

In [27]:
df.describe() # common statistics of numerical columns 

Unnamed: 0,A,B
count,3.0,3.0
mean,2.0,5.0
std,1.0,1.0
min,1.0,4.0
25%,1.5,4.5
50%,2.0,5.0
75%,2.5,5.5
max,3.0,6.0


In [31]:
df.columns # columns of the df (features)

Index(['A', 'B'], dtype='object')

#### Helpful Reminder 


`Shape` => Length of each sides 

`size` => Volume of the shape 

--------------------------------------------------------------------------

In [33]:
df.rows # makes no sense, since rows are just values 

AttributeError: 'DataFrame' object has no attribute 'rows'

In [37]:
df.index # gets the indexes of the df -  In this case, the default is just the numerical index 

RangeIndex(start=0, stop=3, step=1)

In [142]:
df.head(n=5) # first n samples 
df.tail(n=5) # last n samples
df.nunique() # number of unique samples per col 

data1    4
data2    4
dtype: int64

#### 2 Series 

In [2]:
# the most basic array - Pandas Series are glorifed array, in the sense that they have explicit indexes 
data = pd.Series([0.25, 0.5, 0.75, 1.0])
data 

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

In [5]:
data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])
data 

# another way to think about series is that they are like dict with key-value pairs 

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [7]:
population_dict = {
    'California': 38332521,
    'Texas': 26448193,
    'New York': 19651127,
    'Florida': 19552860,
    'Illinois': 12882135
}

population = pd.Series(population_dict)

population

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
dtype: int64

In [10]:
population['California':'New York'] # Note that when implicitly indexed, we use the indexs, rather than the position itself, thus this is quite similar to an array 

# series => key-dict init behavior + array indexing behaviour 

California    38332521
Texas         26448193
New York      19651127
dtype: int64

#### 3 Data Frames 

Dataframes are just a generalization of the series, meaning that instead of 1D (index + value), we havve multiple features now.

And the most intuitive behaviour is to align them on the same index 

In [41]:
population_dict = {
    'California': 38332521,
    'Texas': 26448193,
    'New York': 19651127,
    'Florida': 19552860,
    'Illinois': 12882135
}

area_dict = {
    'California': 423967, 
    'Texas': 695662, 
    'New York': 141297,
    'Florida': 170312, 
    'Illinois': 149995
}

states = pd.DataFrame({'population': population, 'area': area_dict}) # in this case, we init 2 series, then align on the same index 
states

Unnamed: 0,population,area
California,38332521,423967
Texas,26448193,695662
New York,19651127,141297
Florida,19552860,170312
Illinois,12882135,149995


In [46]:
states['population'] # note that when we perform as such, we infact takle the FIRST column, and the assoicated index, rather than the first row 

# COnverntionally on a 2D array, mat[0] would return the first ROW since mat[row][col]

# But in the case of 2D array, to ensure that it fits with our intuition that 
# Columns = Features, taking the first element would simply mean taking the first feature. And not to forget the indexes as it references what are we talking about

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
Name: population, dtype: int64

#### 4 Index objects 

Indexes are the foundations of everything in pandas. Since its what we refer to generally, rather than the features.

In [83]:
ind = pd.Index([2, 3, 5, 7, 11])
ind 

Index([2, 3, 5, 7, 11], dtype='int64')

In [50]:
# We can perform the same operations on it 

ind[1]

ind[:1]

ind.size
ind.shape
ind.ndim # number of dimensions
ind.dtype


dtype('int64')

In [52]:
ind[1] = 0 # This is not allowed since indexes are not mutable for obvious reasons

TypeError: Index does not support mutable operations

#### 5 Data Selections

In [53]:
# For the case of series, just think of it as key-value of dicts 

data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [None]:
for key, val in enumerate(data.items()): # very similar to dicts ! 
    print("key", key)
    print("val", val, "\n")

key 0
val ('a', 0.25) 

key 1
val ('b', 0.5) 

key 2
val ('c', 0.75) 

key 3
val ('d', 1.0) 



In [58]:
# explict by the NAME of the index 
data['a':'c']

a    0.25
b    0.50
c    0.75
dtype: float64

In [59]:
# implicit by the positional index. Note that while the index may be of different value such as A B C ..., still inheritently has an integer attached to it .
data[0:2]

a    0.25
b    0.50
dtype: float64

##### 5.1 Boolean Maskings

In [None]:
data[(data > 0.3) & (data < 0.8)] # masking 

# Note that masking is a technique to FILTER the dataframe. 
# In this example, we are using a BOOLEAN MASK. And that the result for each conditinon would return a BOOLEAN DATAFRAME. 

b    0.50
c    0.75
dtype: float64

In [64]:
(data > 0.3)

a    False
b     True
c     True
d     True
dtype: bool

In [65]:
(data < 0.8)

a     True
b     True
c     True
d    False
dtype: bool

In [68]:
# Since boolean, we can juse use binary operators - Essentially return the rows where the operator AND returns TRUE 
data[(data > 0.3) & (data < 0.8)] 

# Since we are working with Boolean DF, AND/OR/NOT that only works with singular values would not work here. Hence must use bitwise.

b    0.50
c    0.75
dtype: float64

#### 5.2 Implict Explicit Selection

In [None]:
# Previously we seen that:
# implicit = by using the very default numerical indexing 
# explicit = by using the actual value of the index (that may or may not be integers)

data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])

# But this creates a source of confusion, that is that when indexing, using numbers are explicit (since its int here). 
# when using during splicing, the same numbers used becomes an IMPLICIT manner.

In [None]:
data[1] # explicit index when indexing 

'a'

In [71]:
data[1:3] # implicit when using the actual numerical values 

3    b
5    c
dtype: object

In [76]:
# To solve this, we use loc and iloc 
# loc = explicit 
# iloc = implicit 
data.loc[1] # eplictly say that we use the VALUE of the index, which is 1:a 

'a'

In [77]:
data.loc[1:3] # eplictly say that we use the VALUE of the index, which is 1:a + 3:b

1    a
3    b
dtype: object

In [79]:
data.iloc[1] # implicity say that we use the DEFAULT NUMERIC INDEX of the element, which in this case refer to the second element in the dict 

'b'

In [80]:
data.iloc[1:3]

3    b
5    c
dtype: object

In [82]:
"""
As a rule of thumb: Explicit > Implicit
"""


'\nAs a rule of thumb: Explicit > Implicit\n'

#### 5.3 Operations with Dataframes

More towards commonly used operations with DFs 

In [84]:
area = pd.Series({
    'California': 423967, 
    'Texas': 695662,
    'New York': 141297, 
    'Florida': 170312,
    'Illinois': 149995
})

pop = pd.Series({
    'California': 38332521, 
    'Texas': 26448193,
    'New York': 19651127, 
    'Florida': 19552860,
    'Illinois': 12882135
})

data = pd.DataFrame({'area':area, 'pop':pop})
data

Unnamed: 0,area,pop
California,423967,38332521
Texas,695662,26448193
New York,141297,19651127
Florida,170312,19552860
Illinois,149995,12882135


In [90]:
data['density'] = data['pop'] / data['area'] # adding more attributes to the dataframe 
data

Unnamed: 0,area,pop,density
California,423967,38332521,90.413926
Texas,695662,26448193,38.01874
New York,141297,19651127,139.076746
Florida,170312,19552860,114.806121
Illinois,149995,12882135,85.883763


In [94]:
data.values # note that unlike a series, we do not write a method call 

array([[4.23967000e+05, 3.83325210e+07, 9.04139261e+01],
       [6.95662000e+05, 2.64481930e+07, 3.80187404e+01],
       [1.41297000e+05, 1.96511270e+07, 1.39076746e+02],
       [1.70312000e+05, 1.95528600e+07, 1.14806121e+02],
       [1.49995000e+05, 1.28821350e+07, 8.58837628e+01]])

In [92]:
data.columns

Index(['area', 'pop', 'density'], dtype='object')

In [93]:
data.T # transpose 

Unnamed: 0,California,Texas,New York,Florida,Illinois
area,423967.0,695662.0,141297.0,170312.0,149995.0
pop,38332520.0,26448190.0,19651130.0,19552860.0,12882140.0
density,90.41393,38.01874,139.0767,114.8061,85.88376


In [96]:
data.loc[:'Illinois', :'pop']

# Recall that we acan reference a 2D matrix by performing mat[:, :], This would index the row and column specifically 
# In the case for dataframes, it mimics df[keep these sets of rows][keep these sets of columns]

Unnamed: 0,area,pop
California,423967,38332521
Texas,695662,26448193
New York,141297,19651127
Florida,170312,19552860
Illinois,149995,12882135


In [98]:
data[data.density > 100] # Direct masking 

Unnamed: 0,area,pop,density
New York,141297,19651127,139.076746
Florida,170312,19552860,114.806121


#### 5.4 Null Values 

In [103]:
df = pd.DataFrame([
    [1,np.nan, 2],
    [2, 3, 5],
    [np.nan, 4, 6]
])

df

Unnamed: 0,0,1,2
0,1.0,,2
1,2.0,3.0,5
2,,4.0,6


In [104]:
df.isnull() # generates a boolean mask 

Unnamed: 0,0,1,2
0,False,True,False
1,False,False,False
2,True,False,False


In [106]:
df.dropna() # drops the rows that have at least 1 NULL 

Unnamed: 0,0,1,2
1,2.0,3.0,5


In [108]:
df.dropna(axis='columns') # drops the COLUMNS that have at least 1 NULL 

Unnamed: 0,2
0,2
1,5
2,6


In [109]:
df.dropna(axis='columns', how='all') # Drop the columns that have ALL null 

Unnamed: 0,0,1,2
0,1.0,,2
1,2.0,3.0,5
2,,4.0,6


In [110]:
df.fillna(0) # fill the NULL values with 0 

Unnamed: 0,0,1,2
0,1.0,0.0,2
1,2.0,3.0,5
2,0.0,4.0,6


In [113]:
df.fillna(method='ffill') # forward fill by taking the rolling count 

  df.fillna(method='ffill') # forward fill by taking the rolling count


Unnamed: 0,0,1,2
0,1.0,,2
1,2.0,3.0,5
2,2.0,4.0,6


In [117]:
df.fillna(method='ffill', axis=1) 

# source of confusion 

# axis = 0 = COL 
# axis = 1 = ROW 

  df.fillna(method='ffill', axis=1)


Unnamed: 0,0,1,2
0,1.0,1.0,2.0
1,2.0,3.0,5.0
2,,4.0,6.0


#### 6 Hierarchical Indexing

Consider the following example: 

Store A and B both sells Apples and Oranges. This means a nested format. 

Store A = [apples, orangles]

Store B = [apples, orangles]

In [119]:
index = pd.MultiIndex.from_tuples([
    ("Store A", "Apple"),
    ("Store A", "Banana"),
    ("Store B", "Apple"),
    ("Store B", "Banana")
], names=["Store", "Fruit"])

sales = pd.Series([500, 700, 600, 800], index=index)

print(sales)

# Each tuple = heircachy of index. Store A => Apple => Price 


Store    Fruit 
Store A  Apple     500
         Banana    700
Store B  Apple     600
         Banana    800
dtype: int64


In [122]:
sales.loc["Store A", "Apple"]

np.int64(500)

In [None]:
# Other ways of instantiating

df = pd.DataFrame(
    np.random.rand(4, 2),
    index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],
    columns=['data1', 'data2'])

df # Notice the implicit way of grouping 

Unnamed: 0,Unnamed: 1,data1,data2
a,1,0.294423,0.557391
a,2,0.069131,0.8774
b,1,0.203344,0.519243
b,2,0.754819,0.14333


In [124]:
data = {
    ('California', 2000): 33871648,
    ('California', 2010): 37253956,
    ('Texas', 2000): 20851820,
    ('Texas', 2010): 25145561,
    ('New York', 2000): 18976457,
    ('New York', 2010): 19378102
}

pd.Series(data)

California  2000    33871648
            2010    37253956
Texas       2000    20851820
            2010    25145561
New York    2000    18976457
            2010    19378102
dtype: int64

#### 7 Concats Merge 

In [125]:
index = pd.MultiIndex.from_tuples([
    ("Store A", "Electronics"),
    ("Store A", "Clothing"),
    ("Store B", "Electronics"),
    ("Store B", "Clothing")
], names=["Store", "Category"])

data = pd.DataFrame({
    "Sales": [10000, 5000, 12000, 6000],
    "Profit": [2000, 800, 2500, 1000]
}, index=index)

print(data)


                     Sales  Profit
Store   Category                  
Store A Electronics  10000    2000
        Clothing      5000     800
Store B Electronics  12000    2500
        Clothing      6000    1000


In [129]:
data2 = pd.DataFrame({
    "Sales": [9000, 4000],
    "Profit": [1800, 700]
}, index=pd.MultiIndex.from_tuples([
    ("Store C", "Electronics"),
    ("Store C", "Clothing")
], names=["Store", "Category"]))

# Concatenating along rows (default axis=0, by columns (jsut note that its the opposite of 0 = row))
result = pd.concat([data, data2])
print(result)

# APPEND is deprecated, hence use CONCAT - Which is literally just adding rows 

                     Sales  Profit
Store   Category                  
Store A Electronics  10000    2000
        Clothing      5000     800
Store B Electronics  12000    2500
        Clothing      6000    1000
Store C Electronics   9000    1800
        Clothing      4000     700


In [131]:
additional_info = pd.DataFrame({
    "Category": ["Electronics", "Clothing"],
    "Tax Rate": [0.08, 0.05]
})

# reset index => reset to 0,1.... 
# merge is simply join() in sql 
# on = the feature to join on
# Note that in this case, we have category as the common feature 
merged = data.reset_index().merge(additional_info, on="Category").set_index(["Store", "Category"])
print(merged)


                     Sales  Profit  Tax Rate
Store   Category                            
Store A Electronics  10000    2000      0.08
        Clothing      5000     800      0.05
Store B Electronics  12000    2500      0.08
        Clothing      6000    1000      0.05


In [133]:
# Join() in pandas is BASED ON COLUMNS, rather than common feature in merge()

tax_rates = pd.DataFrame({
    "Tax Rate": [0.08, 0.05]
}, index=pd.Index(["Electronics", "Clothing"], name="Category")) # Note that in this case we need to define the index. This is much much faster 

result = data.join(tax_rates, on="Category")
print(result)


                     Sales  Profit  Tax Rate
Store   Category                            
Store A Electronics  10000    2000      0.08
        Clothing      5000     800      0.05
Store B Electronics  12000    2500      0.08
        Clothing      6000    1000      0.05


In [134]:
# Left merge = Keep all elements of the left DF, merge those on the right if there is a match based on the common category 

# Left DataFrame
df1 = pd.DataFrame({
    "CustomerID": [1, 2, 3, 4],
    "Name": ["Alice", "Bob", "Charlie", "David"]
})

# Right DataFrame
df2 = pd.DataFrame({
    "CustomerID": [1, 3, 4, 5],
    "Order": ["Laptop", "Phone", "Tablet", "TV"]
})

result = pd.merge(df1, df2, how="left", on="CustomerID")
print(result)

   CustomerID     Name   Order
0           1    Alice  Laptop
1           2      Bob     NaN
2           3  Charlie   Phone
3           4    David  Tablet


In [137]:
# Inner join is used for the default merge => That is keep all rows in the dataframes iff there is a match 
# Outer join is used to keep all the rows for both DF => WOuld result in alot of NaN values 

result = pd.merge(df1, df2, how="outer", on="CustomerID")
print(result)


   CustomerID     Name   Order
0           1    Alice  Laptop
1           2      Bob     NaN
2           3  Charlie   Phone
3           4    David  Tablet
4           5      NaN      TV


#### 8 Grouping Aggregations

In [138]:
import seaborn as sns

planets = sns.load_dataset('planets')
planets.shape

(1035, 6)

In [139]:
planets.head()

Unnamed: 0,method,number,orbital_period,mass,distance,year
0,Radial Velocity,1,269.3,7.1,77.4,2006
1,Radial Velocity,1,874.774,2.21,56.95,2008
2,Radial Velocity,1,763.0,2.6,19.84,2011
3,Radial Velocity,1,326.03,19.4,110.62,2007
4,Radial Velocity,1,516.22,10.5,119.47,2009


In [144]:
planets.dropna().describe()
# Notice that while many planets were dated back 1989, most planets were discovered only recently mainly due to keplar mission! 

Unnamed: 0,number,orbital_period,mass,distance,year
count,498.0,498.0,498.0,498.0,498.0
mean,1.73494,835.778671,2.50932,52.068213,2007.37751
std,1.17572,1469.128259,3.636274,46.596041,4.167284
min,1.0,1.3283,0.0036,1.35,1989.0
25%,1.0,38.27225,0.2125,24.4975,2005.0
50%,1.0,357.0,1.245,39.94,2009.0
75%,2.0,999.6,2.8675,59.3325,2011.0
max,6.0,17337.5,25.0,354.0,2014.0


In [None]:
# GROUPBY => Groups a "key", then each key becomes a GROUP 
planets.groupby('method')['orbital_period']  # note that this returns an object ! 

<pandas.core.groupby.generic.SeriesGroupBy object at 0x0000012EDA17CFD0>

In [None]:
planets.groupby('method')['orbital_period'].mean() # This would get the mean for each group 

# Notice that the grouping is performed in a single dimension - That is 1D and what we get is a series. But what if we want to further groupby and return something on a more granular scale ? 

method
Astrometry                          631.180000
Eclipse Timing Variations          4751.644444
Imaging                          118247.737500
Microlensing                       3153.571429
Orbital Brightness Modulation         0.709307
Pulsar Timing                      7343.021201
Pulsation Timing Variations        1170.000000
Radial Velocity                     823.354680
Transit                              21.102073
Transit Timing Variations            79.783500
Name: orbital_period, dtype: float64

In [148]:
# This is where pivot tables comes in, where what is returned is a DF instead 
titanic = sns.load_dataset('titanic')
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [150]:
titanic.groupby('sex')['survived'].mean()

# Note that this information returns us the survival rate of women and man seperately.
# But what if we want something more granular ? That is the survivalship by classes ? 

sex
female    0.742038
male      0.188908
Name: survived, dtype: float64

In [None]:
# One way to do so is to group by set and class. Notice that there would be a hierarchal indexing 
# hence we can perform an unstacking 
titanic.groupby(['sex', 'class'])['survived'].aggregate('mean')

# Note the output => This is a multihierarcahl indexing ! 

  titanic.groupby(['sex', 'class'])['survived'].aggregate('mean')


sex     class 
female  First     0.968085
        Second    0.921053
        Third     0.500000
male    First     0.368852
        Second    0.157407
        Third     0.135447
Name: survived, dtype: float64

In [None]:
titanic.groupby(['sex', 'class'])['survived'].aggregate('mean').unstack()

# What unstack() does it to perform a convertion of heirarcahl indexing to a dataframe 
# the second index (class) is removed and placed to a column
# By default, unstack() works on the last column, thus would hit the last index. 

  titanic.groupby(['sex', 'class'])['survived'].aggregate('mean').unstack()


class,First,Second,Third
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.968085,0.921053,0.5
male,0.368852,0.157407,0.135447


In [155]:
# FOr a more compelx example - Say 3D, 

# Example with a 3-level MultiIndex
arrays = [
    ['A', 'A', 'A', 'B', 'B', 'B'],
    ['X', 'Y', 'Z', 'X', 'Y', 'Z'],
    ['one', 'two', 'three', 'one', 'two', 'three']
]
index = pd.MultiIndex.from_arrays(arrays, names=('first', 'second', 'third'))

df = pd.DataFrame({
    'value': [1, 2, 3, 4, 5, 6]
}, index=index)

print(df)

unstacked = df.unstack(level=-1)  # Unstacking the last level
print(unstacked)


                    value
first second third       
A     X      one        1
      Y      two        2
      Z      three      3
B     X      one        4
      Y      two        5
      Z      three      6
             value           
third          one three  two
first second                 
A     X        1.0   NaN  NaN
      Y        NaN   NaN  2.0
      Z        NaN   3.0  NaN
B     X        4.0   NaN  NaN
      Y        NaN   NaN  5.0
      Z        NaN   6.0  NaN


#### 9 Pivot Tables 

In [None]:
# Now coming back to pivot tables, instead of perofrming an unstacking manually, we can use pivot_tables
# index = index of the df | column = more granular information / other features 
# think of index as the row 
titanic.pivot_table('survived', index='sex', columns='class')

  titanic.pivot_table('survived', index='sex', columns='class')


class,First,Second,Third
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.968085,0.921053,0.5
male,0.368852,0.157407,0.135447


In [157]:
age = pd.cut(titanic['age'], [0, 18, 80])
titanic.pivot_table('survived', ['sex', age], 'class')

  titanic.pivot_table('survived', ['sex', age], 'class')


Unnamed: 0_level_0,class,First,Second,Third
sex,age,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
female,"(0, 18]",0.909091,1.0,0.511628
female,"(18, 80]",0.972973,0.9,0.423729
male,"(0, 18]",0.8,0.6,0.215686
male,"(18, 80]",0.375,0.071429,0.133663


In [158]:
# Note that pivot tables perform mean() by default , to define other functions : 

titanic.pivot_table(index='sex', columns='class', aggfunc={'survived':sum, 'fare':'mean'})

  titanic.pivot_table(index='sex', columns='class', aggfunc={'survived':sum, 'fare':'mean'})
  titanic.pivot_table(index='sex', columns='class', aggfunc={'survived':sum, 'fare':'mean'})


Unnamed: 0_level_0,fare,fare,fare,survived,survived,survived
class,First,Second,Third,First,Second,Third
sex,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
female,106.125798,21.970121,16.11881,91,70,72
male,67.226127,19.741782,12.661633,45,17,47


#### 10 Time Series

Timestamp: Single point of time 

DatetimeIndex: Indexes containing multiple timestamps 

TimeDelta: Time between 2 dates 

Period: Fixed time interval

In [165]:
date = pd.to_datetime("2025-03-08")
date # note that this returns an object 

Timestamp('2025-03-08 00:00:00')

In [None]:
index = pd.DatetimeIndex(['2014-07-04', '2014-08-04', '2015-07-04', '2015-08-04']) # we can define datetime as indexes 

data = pd.Series([0, 1, 2, 3], index=index)
data

2014-07-04    0
2014-08-04    1
2015-07-04    2
2015-08-04    3
dtype: int64

In [162]:
# Think of it as just another index, but with added bonuses such as: 

data['2015']

2015-07-04    2
2015-08-04    3
dtype: int64