# Miscellaneous Subset Selection

There's actually an alternative way to select a single column of data and that is with dot notation. Let's read in the the `sample_data2.csv` dataset.

In [1]:
import pandas as pd
df = pd.read_csv('sample_data2.csv')
df

Unnamed: 0,name,average score,max
0,Niko,99,100
1,Penelope,100,102
2,Aria,88,93


Place the name of the column directly after the dot as if it were an attribute.

In [2]:
df.name

0        Niko
1    Penelope
2        Aria
Name: name, dtype: object

This produces an identical result as using *just the brackets*.

In [3]:
df['name']

0        Niko
1    Penelope
2        Aria
Name: name, dtype: object

Dot notation is unable to select columns that are the same name as methods. For instance, `max` is a method that all DataFrames have. In this particular DataFrame, it also the name of the column. Attempting to select it via dot notation will access the method.

In [5]:
df.max

<bound method NDFrame._add_numeric_operations.<locals>.max of        name  average score  max
0      Niko             99  100
1  Penelope            100  102
2      Aria             88   93>

Again, the only way to select this column is with *just the brackets*.

In [6]:
df['max']

0    100
1    102
2     93
Name: max, dtype: int64

Dot notation is unable to select a column using a variable name. Let's say we assign the variable `col` to the string 'name' which is the name of the first column. Attempting to select it via dot notation raises an error.

In [7]:
col = 'name'
df.col

AttributeError: ignored

Once again, use *just the brackets*.

In [8]:
df[col]

0        Niko
1    Penelope
2        Aria
Name: name, dtype: object

### I do not recommend using slicing with *just the brackets*

Although slicing with *just the brackets* seems simple, I do not recommend using it. This is because it is ambiguous and can make selections either by integer location or by label. I always prefer explicit, unambiguous methods. Both `loc` and `iloc` are unambiguous and explicit. 

## Selecting a single cell with `at` and `iat`

pandas provides two more rarely seen indexers, `at`, and `iat`. These indexers are analogous to `loc` and `iloc` respectively, but only select a single cell of a DataFrame. Since they only select a single cell, you must pass both a row and column selection as either a label (`loc`) or an integer location (`iloc`). Let's see an example of each.

In [14]:
bikes.at[40, 'temperature']

87.1

In [15]:
bikes.iat[-30, 5]

15.0

All usages of `at` and `iat` may be replaced with `loc` and `iloc` and would produce the exact same results. The `at` and `iat` indexers are optimized to select a single cell of data and therefore provide slightly better performance than `loc` or `iloc`. Let's verify this below.

In [16]:
bikes.loc[40, 'temperature']

87.1

In [17]:
bikes.iloc[-30, 5]

15.0

### Much bigger performance improvement using numpy directly

If you truly wanted a large performance improvement for single-cell selection, you would select directly from numpy arrays and not a pandas DataFrame. Below, the data is extracted into the underlying numpy array with the `values` attribute. We then time the performance of selecting with numpy and also with `iat` and `iloc` on a DataFrame. 

The timing is done using the magic command `%time`. This is a special command only available in a Jupyter Notebook (or IPython shell). The **Wall time** provides the total time it took to complete the operation. On my machine, `iat` shows a negligible improvement over `iloc`, but selecting with numpy is about 15x as fast. 

In [18]:
values = bikes.values

In [19]:
%time values[-30, 5]

CPU times: user 7 µs, sys: 2 µs, total: 9 µs
Wall time: 11.2 µs


15.0

In [20]:
%time bikes.iat[-30, 5]

CPU times: user 211 µs, sys: 0 ns, total: 211 µs
Wall time: 214 µs


15.0

In [21]:
%time bikes.iloc[-30, 5]

CPU times: user 199 µs, sys: 32 µs, total: 231 µs
Wall time: 235 µs


15.0