# Introduction to Pandas

## Learning outcomes
- Create Pandas series and dataframes.
- Be able to access values from a DataFrame by indexing, slicing and boolean indexing using notation such as `df[]`, `df.loc[]`, `df.iloc[]`.
- Perform basic arithmetic operations between two series and anticipate the result.
- Describe how Pandas assigns dtypes to Series and what the `object` dtype is
- Read a standard `.csv` file from a local path or url using Pandas `pd.read_csv()`.

The W3School website also has a great [Pandas Tutorial](https://www.w3schools.com/python/pandas/default.asp) for those who are unfamiliar to the library and Python in general

## Introduction to Pandas
---

![](img/lecture6/pandas.png)

- The most popular Python library for tabular data structures
- You can think of Pandas as an extremely powerful version of Excel (but free and with a lot more features!) 
- It is like Python's version of R's `dplyr`
- The only tool you'll need for many (most?) data wrangling tasks

![](img/lecture6/computer_panda.gif)

[Source: giphy.com](https://giphy.com/gifs/panda-angry-breaking-EPcvhM28ER9XW)

- Pandas can be installed using `conda` (if not already):

```
conda install pandas
```

- We usually import pandas with the alias `pd`
- You'll see these two imports at the top of most data science workflows!

In [1]:
import pandas as pd
import numpy as np

## Pandas Series
---

### What are Series?

- A Series is a labeled list of values, similar to a python dictionary
- They are strictly 1-dimensional and can contain any data type (integers, strings, floats, objects, etc), including a mix of them
- Series labels may be integers or strings
- Can be created from a scalar, a list, ndarray or dictionary using `pd.Series()` (**note the captial "S"**)
- Here are some example series:

![](img/lecture6/series.png)

### Creating Series

- By default, series are labelled with indices starting from 0
- For example:

In [2]:
pd.Series(data=[-5, 1.3, 21, 6, 3])

0    -5.0
1     1.3
2    21.0
3     6.0
4     3.0
dtype: float64

- But you can add a custom index:

In [7]:
pd.Series(data=[-5, 1.3, 21, 6, 3], index=["a", "b", "c", "d", "e"])

a    -5.0
b     1.3
c    21.0
d     6.0
e     3.0
dtype: float64

- From a dictionary:

In [8]:
pd.Series(data={'a': 10, 'b': 20, 'c': 30})

a    10
b    20
c    30
dtype: int64

### Series operations

- Unlike ndarrays, operations between Series (`+`, `-`, `/`, `*`) align values based on their **LABELS** (not their position in the structure)
- The resulting index will be the __*sorted union*__ of the two indexes
- This gives you the flexibility to run operations on series regardless of their labels (but you might get some unexpected results!)

In [53]:
s1 = pd.Series(data=range(4),
               index=["A", "B", "C", "D"])
s1

A    0
B    1
C    2
D    3
dtype: int64

In [54]:
s2 = pd.Series(data=range(10, 14),
               index=["B", "C", "D", "E"])
s2

B    10
C    11
D    12
E    13
dtype: int64

In [55]:
s1 + s2

A     NaN
B    11.0
C    13.0
D    15.0
E     NaN
dtype: float64

- Indices that match will be operated on
- Indices that don't match will appear in the product but with `NaN` values

![](img/lecture6/series_addition.png)

- We can also perform standard operations on a series, like multiplying or squaring
- NumPy also accepts series as an argument to most functions because series are built off numpy arrays (more on that later)

In [56]:
s1 ** 2

A    0
B    1
C    4
D    9
dtype: int64

- Finally, just like arrays, series have many built-in methods for various operations
- You can find them all by running `help(pd.Series)`

In [58]:
s1

A    0
B    1
C    2
D    3
dtype: int64

In [59]:
s1.mean()

1.5

In [60]:
s1.sum()

6

In [63]:
s1.astype(float)

A    0.0
B    1.0
C    2.0
D    3.0
dtype: float64

- **"Chaining"** operations together is also common with pandas

In [67]:
s1.add(3.141).pow(2).astype(int).mean()

22.25

### Data types

- Series can hold all the data types (`dtypes`) you're used to
- e.g., `int`, `float`, `bool`, etc
- There are a few other special data types too (`object`, `DateTime` and `Categorical`) which we'll talk about in this and later lectures
- You can always read more about pandas dtypes [in the documentation too](https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#dtypes)
- For example, here's a series of `dtype` int64:

In [39]:
x = pd.Series(range(5))
x.dtype

dtype('int64')

- The dtype "`object`" is used for series of strings or mixed data

In [68]:
x = pd.Series(['A', 'B'])
x

0    A
1    B
dtype: object

## Pandas DataFrames
---

### What are DataFrames?

- Pandas DataFrames are your new best friend
- They are like the Excel spreadsheets you may be used to, and like dataframes/tibbles in R
- DataFrames are really just Series stuck together!
- Think of a DataFrame as a dictionary of series, with the "keys" being the column labels and the "values" being the series data



![](img/lecture6/dataframe.png)

### Creating DataFrames

- Dataframes can be created using `pd.DataFrame()` (note the capital "D" and "F")
- Like series, index and column labels of dataframes are labelled starting from 0 by default

In [80]:
pd.DataFrame([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6
2,7,8,9


- We can use the `index` and `columns` arguments to give them labels:

In [81]:
pd.DataFrame([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]],
             index=["R1", "R2", "R3"],
             columns=["C1", "C2", "C3"])

Unnamed: 0,C1,C2,C3
R1,1,2,3
R2,4,5,6
R3,7,8,9


- There are so many ways to create dataframes
- I most often create them from dictionaries or ndarrays

In [82]:
pd.DataFrame({"C1": [1, 2, 3],
              "C2": ['A', 'B', 'C']},
             index=["R1", "R2", "R3"])

Unnamed: 0,C1,C2
R1,1,A
R2,2,B
R3,3,C


In [87]:
pd.DataFrame(np.array([['Arman', 7], ['Mike', 15], ['Tiffany', 3]]))

Unnamed: 0,0,1
0,Arman,7
1,Mike,15
2,Tiffany,3


- But here's a table of the main ways you can create dataframes
- See the [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html#dataframe) for more:

|Create DataFrame from|Code|
|---|---|
|Lists of lists|`pd.DataFrame([['Arman', 7], ['Mike', 15], ['Tiffany', 3]])`|
|ndarray|       `pd.DataFrame(np.array([['Arman', 7], ['Mike', 15], ['Tiffany', 3]]))`|
|Dictionary|    `pd.DataFrame({"Name": ['Arman', 'Mike', 'Tiffany'], "Number": [7, 15, 3]})`|
|List of tuples|`pd.DataFrame(zip(['Arman', 'Mike', 'Tiffany'], [7, 15, 3]))`|
|Series|        `pd.DataFrame({"Name": pd.Series(['Arman', 'Mike', 'Tiffany']), "Number": pd.Series([7, 15, 3])})`|


### Indexing and slicing DataFrames

- There are several main ways to select data from a DataFrame:
    1. `[]`
    2. `.loc[]`
    3. `.iloc[]`
    4. Boolean indexing

In [89]:
df = pd.DataFrame({"Name": ["Arman", "Mike", "Tiffany"],
                   "Language": ["Python", "Python", "R"],
                   "Courses": [511, 512, 523]})
df

Unnamed: 0,Name,Language,Courses
0,Arman,Python,511
1,Mike,Python,512
2,Tiffany,R,523


#### Indexing with `[]`
- Select columns by single labels, lists of labels, or slices

In [90]:
df['Name']  # returns a series

0      Arman
1       Mike
2    Tiffany
Name: Name, dtype: object

In [92]:
df[['Name']]  # returns a dataframe!

Unnamed: 0,Name
0,Arman
1,Mike
2,Tiffany


In [93]:
df[['Name', 'Language']]

Unnamed: 0,Name,Language
0,Arman,Python
1,Mike,Python
2,Tiffany,R


- You can only index rows by using slices, not single values (but not recommended, see preferred methods below)

In [94]:
df[0] # doesn't work

KeyError: 0

In [95]:
df[0:1] # does work

Unnamed: 0,Name,Language,Courses
0,Arman,Python,511


In [96]:
df[1:] # does work

Unnamed: 0,Name,Language,Courses
1,Mike,Python,512
2,Tiffany,R,523


#### Indexing with `.loc` and `.iloc`
- Pandas created the methods `.loc[]` and `.iloc[]` as more flexible alternatives for accessing data from a dataframe
- Indexing with integers: `df.iloc[]`
- Indexing with labels: `df.loc[]`
- These are typically the [recommended methods of indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated)
- **Note**: when you want to modify data in a dataframe you *should* use `.loc` or `.iloc` as opposed to `[]` or you'll run into the common Pandas warning `SettingWithCopy` warning which we'll look at next lecture

In [97]:
df

Unnamed: 0,Name,Language,Courses
0,Arman,Python,511
1,Mike,Python,512
2,Tiffany,R,523


- First we'll try out `.iloc` which accepts *integers* as references to rows/columns

In [100]:
df.iloc[0]  # returns a series

Name         Arman
Language    Python
Courses        511
Name: 0, dtype: object

In [101]:
df.iloc[0:2]  # slicing returns a dataframe

Unnamed: 0,Name,Language,Courses
0,Arman,Python,511
1,Mike,Python,512


In [102]:
df.iloc[2, 1]  # returns the indexed object

'R'

In [106]:
df.iloc[[0, 2], [1, 2]]  # returns a dataframe

Unnamed: 0,Language,Courses
0,Python,511
2,R,523


- Now let's look at `.loc` which accepts *labels* as references to rows/columns

In [107]:
df.loc[:, 'Name']

0      Arman
1       Mike
2    Tiffany
Name: Name, dtype: object

In [108]:
df.loc[:, 'Name':'Language']

Unnamed: 0,Name,Language
0,Arman,Python
1,Mike,Python
2,Tiffany,R


In [111]:
df.loc[[0, 2], ['Language']]

Unnamed: 0,Language
0,Python
2,R


- Sometimes we want to use a mix of integers and labels to reference data in a dataframe
- The easiest way to do this is to use `.loc[]` with a label then use an integer in combinations with `.index` or `.columns`

In [112]:
df.index

RangeIndex(start=0, stop=3, step=1)

In [113]:
df.columns

Index(['Name', 'Language', 'Courses'], dtype='object')

In [119]:
df.loc[df.index[0], 'Courses']  # I want to reference the first row and the column named "Courses"

511

In [120]:
df.loc[2, df.columns[1]]  # I want to reference row "2" and the second column

'R'

#### Boolean indexing
- Just like with series, we can select data based on boolean masks

In [121]:
df

Unnamed: 0,Name,Language,Courses
0,Arman,Python,511
1,Mike,Python,512
2,Tiffany,R,523


In [123]:
df[df['Courses'] > 511]

Unnamed: 0,Name,Language,Courses
1,Mike,Python,512
2,Tiffany,R,523


In [124]:
df[df['Name'] == "Arman"]

Unnamed: 0,Name,Language,Courses
0,Arman,Python,511


#### Indexing cheatsheet

|Method|Syntax|Output|
|---|---|---|
|Select column|`df[col_label]`|Series|
|Select row slice|`df[row_1_int:row_2_int]`|DataFrame|
|Select row/column by label|`df.loc[row_label(s), col_label(s)]`|Object for single selection, Series for one row/column, otherwise DataFrame|
|Select row/column by integer|`df.iloc[row_int(s), col_int(s)]`|Object for single selection, Series for one row/column, otherwise DataFrame|
|Select by row integer & column label|`df.loc[df.index[row_int], col_label]`|Object for single selection, Series for one row/column, otherwise DataFrame|
|Select by row label & column integer|`df.loc[row_label, df.columns[col_int]]`|Object for single selection, Series for one row/column, otherwise DataFrame|
|Select by boolean|`df[bool_vec]`|Object for single selection, Series for one row/column, otherwise DataFrame|

### Reading/Writing data from external sources

#### `.csv` files

- A lot of the time you will be loading `.csv` files for use in pandas
- You can use the `pd.read_csv()` function for this
- Here we'll take a look at the [historical weather data of the Vancouver International Airport](https://climate.weather.gc.ca/historical_data/search_historic_data_e.html)
- There are so many arguments that can be used to help read in your .csv file in an efficient and appropriate manner, feel free to check them out now (by using `shift + tab` in Jupyter, or typing `?pd.read_csv`)

In [215]:
path = 'data/YVR_weather_data.csv'
df = pd.read_csv(path, index_col=0, parse_dates=True)
df

Unnamed: 0_level_0,Year,Month,Mean Max Temp (°C),Mean Max Temp Flag,Mean Min Temp (°C),Mean Min Temp Flag,Mean Temp (°C),Mean Temp Flag,Extr Max Temp (°C),Extr Max Temp Flag,...,Total Snow (cm),Total Snow Flag,Total Precip (mm),Total Precip Flag,Snow Grnd Last Day (cm),Snow Grnd Last Day Flag,Dir of Max Gust (10's deg),Dir of Max Gust Flag,Spd of Max Gust (km/h),Spd of Max Gust Flag
Date/Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Jan-37,1937,1,0.6,,-8.1,,-3.8,,6.1,,...,,M,,M,,,,,,
Feb-37,1937,2,5.2,,-1.3,,2.0,,10.0,S,...,,M,,M,,,,,,
Mar-37,1937,3,11.7,,2.9,,7.3,,17.2,,...,0.0,,59.7,,,,,,,
Apr-37,1937,4,11.9,,4.8,,8.4,,16.1,,...,0.0,,114.0,,,,,,,
May-37,1937,5,16.3,,6.6,,11.5,,20.6,,...,0.0,,44.2,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Feb-13,2013,2,7.8,,3.0,,5.4,,10.4,,...,0.0,,74.4,,0.0,,28.0,E,63.0,E
Mar-13,2013,3,10.5,,3.9,,7.2,,15.7,,...,0.0,,108.0,,0.0,,27.0,,80.0,
Apr-13,2013,4,12.8,,6.2,,9.5,,17.2,,...,0.0,T,115.8,,0.0,,29.0,E,76.0,E
May-13,2013,5,17.1,,9.5,,13.3,,22.2,,...,0.0,,66.0,,0.0,,30.0,E,54.0,E


- You can print a dataframe to .csv using `df.to_csv()`
- Be sure to check out all of the possible arguments to write your dataframe exactly how you want it

#### url

- Pandas also facilitates reading directly from a url
- `pd.read_csv()` accepts urls as input
- For example, take a look at 10 random rows of the [WHO Covid Situation Report](https://github.com/CSSEGISandData/COVID-19/tree/master/who_covid_19_situation_reports):

In [174]:
url = 'https://covid19.who.int/WHO-COVID-19-global-data.csv'
pd.read_csv(url).sample(10)

Unnamed: 0,Date_reported,Country_code,Country,WHO_region,New_cases,Cumulative_cases,New_deaths,Cumulative_deaths
96352,2021-04-22,PK,Pakistan,EMRO,5499,772381,147,16600
49995,2021-07-05,GL,Greenland,EURO,1,51,0,0
137478,2021-08-22,VI,United States Virgin Islands,AMRO,38,5594,1,45
66063,2020-12-04,KE,Kenya,AFRO,1253,86383,16,1500
19777,2021-04-28,BF,Burkina Faso,AFRO,0,13263,0,156
114601,2020-02-03,RS,Serbia,EURO,0,0,0,0
97796,2020-04-22,PG,Papua New Guinea,WPRO,0,7,0,0
83054,2021-03-21,MN,Mongolia,WPRO,148,4806,1,5
102727,2020-08-07,QA,Qatar,EMRO,287,112092,0,178
73741,2020-06-26,LU,Luxembourg,EURO,19,3335,0,110


#### Other
- Pandas can read data from all sorts of other file types including HTML, JSON, Excel, Parquet, Feather, etc
- There are generally dedicated functions for reading these file types, see the [Pandas documentation here](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-tools-text-csv-hdf5)
- You'll explore some of these file types in your lab

### Common DataFrame operations

- DataFrames have built-in functions for performing most common operations, e.g., `.sample()`, `.min()`, `idxmin()`, `sort_values()`, etc
- They're all documented in the [Pandas documentation here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)
- I'll demonstrate a few below

In [216]:
df = pd.read_csv('data/YVR_weather_data.csv')
df

Unnamed: 0,Date/Time,Year,Month,Mean Max Temp (°C),Mean Max Temp Flag,Mean Min Temp (°C),Mean Min Temp Flag,Mean Temp (°C),Mean Temp Flag,Extr Max Temp (°C),...,Total Snow (cm),Total Snow Flag,Total Precip (mm),Total Precip Flag,Snow Grnd Last Day (cm),Snow Grnd Last Day Flag,Dir of Max Gust (10's deg),Dir of Max Gust Flag,Spd of Max Gust (km/h),Spd of Max Gust Flag
0,Jan-37,1937,1,0.6,,-8.1,,-3.8,,6.1,...,,M,,M,,,,,,
1,Feb-37,1937,2,5.2,,-1.3,,2.0,,10.0,...,,M,,M,,,,,,
2,Mar-37,1937,3,11.7,,2.9,,7.3,,17.2,...,0.0,,59.7,,,,,,,
3,Apr-37,1937,4,11.9,,4.8,,8.4,,16.1,...,0.0,,114.0,,,,,,,
4,May-37,1937,5,16.3,,6.6,,11.5,,20.6,...,0.0,,44.2,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
913,Feb-13,2013,2,7.8,,3.0,,5.4,,10.4,...,0.0,,74.4,,0.0,,28.0,E,63.0,E
914,Mar-13,2013,3,10.5,,3.9,,7.2,,15.7,...,0.0,,108.0,,0.0,,27.0,,80.0,
915,Apr-13,2013,4,12.8,,6.2,,9.5,,17.2,...,0.0,T,115.8,,0.0,,29.0,E,76.0,E
916,May-13,2013,5,17.1,,9.5,,13.3,,22.2,...,0.0,,66.0,,0.0,,30.0,E,54.0,E


In [230]:
df.sample(5)

Unnamed: 0,Date/Time,Year,Month,Mean Max Temp (°C),Mean Max Temp Flag,Mean Min Temp (°C),Mean Min Temp Flag,Mean Temp (°C),Mean Temp Flag,Extr Max Temp (°C),...,Total Snow (cm),Total Snow Flag,Total Precip (mm),Total Precip Flag,Snow Grnd Last Day (cm),Snow Grnd Last Day Flag,Dir of Max Gust (10's deg),Dir of Max Gust Flag,Spd of Max Gust (km/h),Spd of Max Gust Flag
812,Sep-04,2004,9,17.7,,11.2,,14.5,,22.4,...,0.0,,169.4,,0.0,,27.0,,56.0,
111,Apr-46,1946,4,12.3,,4.9,,8.6,,17.8,...,0.0,,127.0,,,,,,,
630,Jul-89,1989,7,21.8,,12.8,,17.3,,24.4,...,0.0,,34.1,,0.0,,29.0,,46.0,
399,Apr-70,1970,4,11.7,,4.1,,7.9,,15.6,...,0.0,T,112.3,,0.0,,27.0,,72.0,
375,Apr-68,1968,4,12.3,,4.5,,8.4,,18.9,...,0.8,,33.0,,0.0,,32.0,,68.0,


In [218]:
df.max(numeric_only=True)

Year                          2013.0
Month                           12.0
Mean Max Temp (°C)              25.6
Mean Min Temp (°C)              15.6
Mean Temp (°C)                  20.6
Extr Max Temp (°C)              34.4
Extr Min Temp (°C)              13.2
Total Rain (mm)                350.8
Total Snow (cm)                121.9
Total Precip (mm)              350.8
Snow Grnd Last Day (cm)         48.0
Dir of Max Gust (10's deg)      32.0
Spd of Max Gust (km/h)         129.0
dtype: float64

In [220]:
df['Extr Max Temp (°C)'].max()

34.4

In [221]:
df['Extr Max Temp (°C)'].idxmax()

870

In [222]:
df.iloc[870]

Date/Time                     Jul-09
Year                            2009
Month                              7
Mean Max Temp (°C)              24.1
Mean Max Temp Flag               NaN
Mean Min Temp (°C)              15.0
Mean Min Temp Flag               NaN
Mean Temp (°C)                  19.6
Mean Temp Flag                   NaN
Extr Max Temp (°C)              34.4
Extr Max Temp Flag               NaN
Extr Min Temp (°C)              10.6
Extr Min Temp Flag               NaN
Total Rain (mm)                 20.0
Total Rain Flag                  NaN
Total Snow (cm)                  0.0
Total Snow Flag                  NaN
Total Precip (mm)               20.0
Total Precip Flag                NaN
Snow Grnd Last Day (cm)          0.0
Snow Grnd Last Day Flag          NaN
Dir of Max Gust (10's deg)      30.0
Dir of Max Gust Flag               B
Spd of Max Gust (km/h)          48.0
Spd of Max Gust Flag               B
Name: 870, dtype: object

In [223]:
df.mean(axis=0, numeric_only=True)

Year                          1974.751634
Month                            6.480392
Mean Max Temp (°C)              13.694547
Mean Min Temp (°C)               6.319193
Mean Temp (°C)                  10.030643
Extr Max Temp (°C)              19.160742
Extr Min Temp (°C)               1.001309
Total Rain (mm)                 90.259235
Total Snow (cm)                  3.799890
Total Precip (mm)               93.862144
Snow Grnd Last Day (cm)          0.420655
Dir of Max Gust (10's deg)      23.188427
Spd of Max Gust (km/h)          61.519288
dtype: float64

- Some methods require arguments to be specified, like `.sort_values()`

In [224]:
df.sort_values(by='Extr Max Temp (°C)', ascending=False)

Unnamed: 0,Date/Time,Year,Month,Mean Max Temp (°C),Mean Max Temp Flag,Mean Min Temp (°C),Mean Min Temp Flag,Mean Temp (°C),Mean Temp Flag,Extr Max Temp (°C),...,Total Snow (cm),Total Snow Flag,Total Precip (mm),Total Precip Flag,Snow Grnd Last Day (cm),Snow Grnd Last Day Flag,Dir of Max Gust (10's deg),Dir of Max Gust Flag,Spd of Max Gust (km/h),Spd of Max Gust Flag
870,Jul-09,2009,7,24.1,,15.0,,19.6,,34.4,...,0.0,,20.0,,0.0,,30.0,B,48.0,B
283,Aug-60,1960,8,20.3,,13.0,,16.7,,33.3,...,0.0,,67.6,,0.0,,27.0,,58.0,
738,Jul-98,1998,7,23.3,,15.3,,19.3,,31.9,...,0.0,,39.8,,0.0,,30.0,E,41.0,E
643,Aug-90,1990,8,23.6,,14.4,,19.0,,31.9,...,0.0,,38.0,,0.0,,11.0,,43.0,
294,Jul-61,1961,7,23.3,,14.3,,18.8,,31.7,...,0.0,,37.3,,0.0,,27.0,,53.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
229,Feb-56,1956,2,4.2,,-0.9,,1.7,,7.8,...,35.8,,110.2,,0.0,,,,,
156,Jan-50,1950,1,-2.9,,-9.7,,-6.3,,7.8,...,94.0,,138.2,,20.0,,,,,
384,Jan-69,1969,1,0.1,,-5.8,,-2.9,,6.7,...,64.8,,126.7,,23.0,,29.0,,56.0,
0,Jan-37,1937,1,0.6,,-8.1,,-3.8,,6.1,...,,M,,M,,,,,,


In [225]:
df.sort_values(by='Extr Max Temp (°C)', ascending=False).loc[:, ['Date/Time', 'Extr Max Temp (°C)']]

Unnamed: 0,Date/Time,Extr Max Temp (°C)
870,Jul-09,34.4
283,Aug-60,33.3
738,Jul-98,31.9
643,Aug-90,31.9
294,Jul-61,31.7
...,...,...
229,Feb-56,7.8
156,Jan-50,7.8
384,Jan-69,6.7
0,Jan-37,6.1


- Some methods will operate on the index/columns, like `.sort_index()`

In [229]:
df.sample(20).sort_index()

Unnamed: 0,Date/Time,Year,Month,Mean Max Temp (°C),Mean Max Temp Flag,Mean Min Temp (°C),Mean Min Temp Flag,Mean Temp (°C),Mean Temp Flag,Extr Max Temp (°C),...,Total Snow (cm),Total Snow Flag,Total Precip (mm),Total Precip Flag,Snow Grnd Last Day (cm),Snow Grnd Last Day Flag,Dir of Max Gust (10's deg),Dir of Max Gust Flag,Spd of Max Gust (km/h),Spd of Max Gust Flag
35,Dec-39,1939,12,9.8,,4.2,,7.0,,13.9,...,0.0,T,217.4,,,,,,,
60,Jan-42,1942,1,6.3,,-0.6,,2.9,,11.7,...,3.0,,64.5,,,,,,,
81,Oct-43,1943,10,14.5,,7.0,,10.8,,21.1,...,0.0,,128.3,,,,,,,
111,Apr-46,1946,4,12.3,,4.9,,8.6,,17.8,...,0.0,,127.0,,,,,,,
144,Jan-49,1949,1,2.2,,-5.0,,-1.4,,8.3,...,9.7,,18.3,,3.0,,,,,
157,Feb-50,1950,2,6.6,,0.4,,3.5,,11.1,...,3.8,,205.0,,0.0,,,,,
193,Feb-53,1953,2,8.0,,1.7,,4.9,,11.1,...,0.0,T,72.6,,0.0,,,,,
280,May-60,1960,5,15.4,,8.6,,12.0,,21.7,...,0.0,,87.1,,0.0,,9.0,,55.0,S
308,Sep-62,1962,9,19.0,,10.2,,14.6,,23.9,...,0.0,,64.3,,0.0,,29.0,,51.0,S
333,Oct-64,1964,10,13.9,,5.2,,9.6,,21.1,...,0.0,,42.7,,0.0,,29.0,,51.0,S
