---
# Row and Column Selection
Selecting rows and columns of **dataframe** and **series**. 

---

In [None]:
import pandas as pd
import numpy as np
from IPython.display import display

---
## Selecting rows and columns in a dataframe
Selecting rows and columns are commonly done by using the `loc()` or `iloc()` method, as follows  

---

### .loc()
```
df.loc[<row_indexer>, <column_indexer>]
```

Where:  
 `row_indexer` can be a filter, or label (or index) name.  
 `column_indexer` is the column label.

---

### .iloc()
```
df.loc[<row_indexer>, <column_indexer>]
```

Where:  
 `row_indexer` is the row's index number.  
 `column_indexer` is the column label.

---

In [None]:
# A pandas dataframe is like a dict, except it has more functionalities:
# Python dict
people = {
    "first name": ["Klint", "Foo", "Cat"],
    "last name": ["Labadia", "Bar", "Dog"],
    "email": ["ckl@a.a", "foobar@a.a", "catdog@a.a"],
    "age": [25, 19, 7],
}

# Convert dict to a dataframe
people_df = pd.DataFrame(people)
people_df


In [None]:
# Viewing df columns
display(people_df.columns)

# Use dtypes to see data types of columns
people_df.dtypes


In [None]:
## Selecting Columns

# Single column by bracket (preferred)
display(people_df["email"])
# By dot notation (can't use for column names with space)
display(people_df.email)

# Multiple columns
people_df[["first name", "email"]]


In [None]:
## Selecting Rows using iloc (index location)

a = people_df.iloc[0]

# Multiple rows
x = people_df.iloc[[0, 2]]  # Index 0 and index 2

# Note that slicing in iloc behaves like normal list slicing
# in that it will return first endpoint, but not the final endpoint,
# unlike loc which returns both endpoints. I think this is to allow
# slicing to include the last element in a collection.

y = people_df.iloc[0:2]  # Index 0 through 1

display(a)
display(x)
display(y)


In [None]:
## Selecting Column from Row using iloc

# 1 = row, 2 = column
# i.e. email (column) of entry at index 1 (row)
x = people_df.iloc[1, 2]

# Example 2 - First name (0) and last name (1) of entries at index 0 to 1 (0:2)
y = people_df.iloc[0:2, [0, 1]]

display(x)
display(y)


In [None]:
# Change Indices of people_df to str for
# the purpose of practicing row selection using loc
# More on indices on next notebook.

people_df2 = people_df.copy()
people_df2.index = list("ABC")
people_df2


In [None]:
## Selecting Rows using loc (by label)

x = people_df2.loc["A"]

# Multiple rows
y = people_df2.loc[["A", "C"]]  # Index "A" and index "C"

# Note that slicing in iloc behaves like normal list slicing
# in that it will return first endpoint, but not the final endpoint,
# unlike loc which returns both endpoints. I think this is to allow
# slicing to include the last element in a collection.

# Index "A" through index "C" ("C" included, unlike iloc)
y = people_df2.loc["A":"C"]

display(x)
display(y)

In [None]:
## Selecting Column from Row using loc

# "A" = index (row), "email" = column
# Note that unlike iloc, the column should be the column name.

# i.e. email (column) of entry at index A (row)
x = people_df2.loc["A", "email"]

# Example 2 - Last name and email of entries index A and B
y = people_df2.loc[["A", "B"], ["last name", "email"]]

display(x)
display(y)


---
## Example from stackoverflow data set
---

In [None]:
# Load csv files as df
df = pd.read_csv("data/survey_results_public_2022.csv")
schema_df = pd.read_csv("data/survey_results_schema.csv")


In [None]:
# Configure display options
pd.set_option("display.max_rows", 80)
pd.set_option("display.max_columns", 80)


In [None]:
# return top 5 items from the top of df
df.head()


In [None]:
df.columns


In [None]:
# First four rows, from column "YearsCode" to column "OrgSize"
df.loc[0:4, "YearsCode":"OrgSize"]
