# Selection

After we create a `DataFrame`, we have several ways to select data from either **row-wise** or **column-wise**, either through **indexing** or **slicing**, and either by **label's name** or **position**.

I have concluded a table for selecting the data.

|                     | **Single Column**                                | **Multiple Columns**                                       | **Continuous Columns**              | **All Columns**  |
| ------------------- | ------------------------------------------------ | ---------------------------------------------------------- | ----------------------------------- | ---------------- |
| **Single Row**      | `df.loc[row, column]` or <br> `df.at[row, column]` | `df.loc[row, [column, column]]`                            | `df.loc[row, column:column]`        | `df.loc[row]`    |
| **Multiple Rows**   | `df.loc[[row, row], column]`                     | `df.loc[[row, row], [column, column]]`                     | `df.loc[[row, row], column:column]` | `df[[row, row]]` |
| **Continuous Rows** | `df.loc[row:row, column]`                        | `df.loc[row:row, [column, column]]`                        | `df.loc[row:row, column:column]`    | `df[row:row]`    |
| **All Rows**        | `df[column]`                                     | `df[[column, column]]` or <br> `df.loc[:, [column, column]]` | `df.loc[:, column:column]`          | `df`             |

- `df.iloc` is same as `df.loc` but using position.
- `df.iat` is same as `df.at` but using position.

In [106]:
import numpy as np
import pandas as pd

In [107]:
df = pd.DataFrame(
    np.arange(30).reshape(6, 5),
    index=list("abcdef"),
    columns=["col1", "col2", "col3", "col4", "col5"]
    )
df

Unnamed: 0,col1,col2,col3,col4,col5
a,0,1,2,3,4
b,5,6,7,8,9
c,10,11,12,13,14
d,15,16,17,18,19
e,20,21,22,23,24
f,25,26,27,28,29


We will use the `DataFrame` above to demonstrate the techniques of `selection`.

---

## Getting Directly

In [46]:
df["col1"]  # same as `df.col1`

Unnamed: 0,col1
a,0
b,5
c,10
d,15
e,20
f,25


In [89]:
df[["col1", "col2"]]

Unnamed: 0,col1,col2
a,0,1
b,5,6
c,10,11
d,15,16
e,20,21
f,25,26


In [11]:
df[0:3]  # same as `df["a":"c"]`

Unnamed: 0,col1,col2,col3,col4,col5
a,0,1,2,3,4
b,5,6,7,8,9
c,10,11,12,13,14


---

## Selection with `loc` and `at` method (select by label)

The first element in `loc` is the parameter of `row`, and the second one is of `column`.

There are three implementation for a parameter:

- single element (e.g., `"a"`)
- list (e.g., `["a", "c", "e"]`)
- slicing (e.g., `"a":"e"`)


In [19]:
df.loc["a":"d", ["col1", "col2"]]

Unnamed: 0,col1,col2
a,0,1
b,5,6
c,10,11
d,15,16


In [26]:
df.loc["a", "col5"]
df.at["a", "col5"]

4

---

## Selection with `iloc` and `iat` method (select by position)

`iloc` is almost the same as `loc` method but using positions as the index.

In [31]:
df.iloc[0:2, [0, 3, 4]]

Unnamed: 0,col1,col4,col5
a,0,3,4
b,5,8,9


In [33]:
df.iloc[1, 4]
df.iat[1, 4]

9

---

## Boolean indexing

In [56]:
df[(df["col1"] > 18)]  # Using a single column's values to select data.

Unnamed: 0,col1,col2,col3,col4,col5
e,20,21,22,23,24
f,25,26,27,28,29


In [58]:
df[(df > 6) & (df < 25)]  # Selecting values from a DataFrame where a boolean condition is met.

Unnamed: 0,col1,col2,col3,col4,col5
a,,,,,
b,,,7.0,8.0,9.0
c,10.0,11.0,12.0,13.0,14.0
d,15.0,16.0,17.0,18.0,19.0
e,20.0,21.0,22.0,23.0,24.0
f,,,,,


In [52]:
df[df["col1"].isin([10, 15, 0])]  # Using the isin() method for filtering.

Unnamed: 0,col1,col2,col3,col4,col5
a,0,1,2,3,4
c,10,11,12,13,14
d,15,16,17,18,19


---

# Reference

- https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html#selection
- https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html#column-selection-addition-deletion