# Selectors
- The `polars` library includes a `selectors` submodule.
- `selectors` helps create more complex expressions targeting specific columns.
- The common alias for `selectors` is `cs`.

## Introducing the Dataset
- The `spotify` dataset is a collection of popular tracks on the music streaming service Spotify.

- The `pl.col` function is pretty flexible by itself.
- It can target one column or multiple columns. It accepts a variety of inputs.

- We can pass a data type to `col` to target columns by data type.

- We can pass a regular expression to `col` to target columns by pattern match.
- Regular expressions must start with `^` and end with `$`.
- The `^` and `$` anchors mark the beginning and end of the target string.
- The `.` regex symbol matches any character.
- The `+` symbol means 1 or more of any character.
- The regex below evaluates to "look for 1 or more of any character, then the text `name` before the end of the string".
- The regex matches the two columns that end with `name`.

- The `pl.exclude` function creates an expression that rejects columns.
- We can target a column by name, by type, by regular expression, and more.

### Further Reading
- https://docs.pola.rs/api/python/stable/reference/expressions/col.html
- https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.all.html
- https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.exclude.html

## Introducing Selectors
- The `cs` (column selectors) module has 30+ functions for specifying columns.
- The selectors can be applied anywhere expressions are expected (`select`, `filter`, `group_by`, etc)

- The `cs.by_name` selector targets a column by name.

- Selectors like `by_name` are not particularly helpful. Their syntax is more verbose compared to `pl.col`.
- Other selectors can simplify targeting columns. For example, the `ends_with` method is easier than passing a regex to `pl.col`.
- The next example uses `cs.ends_with` to target all columns that end with `"name"`.

- The complementary `cs.starts_with` selector targets columns that start with a prefix.

### Further Reading
- https://docs.pola.rs/user-guide/expressions/expression-expansion/#more-flexible-column-selections
- https://docs.pola.rs/api/python/stable/reference/selectors.html#polars.selectors.by_name
- https://docs.pola.rs/api/python/stable/reference/selectors.html#polars.selectors.starts_with
- https://docs.pola.rs/api/python/stable/reference/selectors.html#polars.selectors.ends_with

## Selecting by Data Type
- There are special selectors for targeting columns by data types.
- For example, `cs.numeric` will target all numeric columns (integers and floating-points).

- `pl.col` can target columns by one or more data types but we have to explicitly write out each type.
- Polars has 5 types of integers! (`Int8`, `Int16`, `Int32`, `Int64`, `Int128`).
- The `cs.integer` selector targets all integer columns irrespective of their exact integer type.
- The `cs.float` selector targets all floating-points columns.

- Or perhaps we just want to target all numeric columns irrespective of their data type.

- The `cs.date`, `cs.time`, and `cs.datetime` selectors target date, time, and datetime columns.

- The `cs.temporal` selector targets any date/time/datetime columns.

- The `cs.alpha` selector targets columns whose names contain only alphabetic characters (a-z).
- The selector eliminates columns with underscores.
- The `cs.alphanumeric` selector targets columns whose names contain only alphabetic characters or numbers/digits.

### Further Reading
- https://docs.pola.rs/api/python/stable/reference/selectors.html#polars.selectors.integer
- https://docs.pola.rs/api/python/stable/reference/selectors.html#polars.selectors.float
- https://docs.pola.rs/api/python/stable/reference/selectors.html#polars.selectors.numeric
- https://docs.pola.rs/api/python/stable/reference/selectors.html#polars.selectors.date
- https://docs.pola.rs/api/python/stable/reference/selectors.html#polars.selectors.time
- https://docs.pola.rs/api/python/stable/reference/selectors.html#polars.selectors.temporal
- https://docs.pola.rs/api/python/stable/reference/selectors.html#polars.selectors.alpha
- https://docs.pola.rs/api/python/stable/reference/selectors.html#polars.selectors.alphanumeric

## Selecting by Column Position

- The `cs.by_index` selector targets columns by index position.
- Index positions start counting at 0. The first column is index 0, the second column is index 1, and so on.

- Passing an invalid index will lead to a `ColumnNotFound` error.

- Pass multiple values to target multiple columns by index.

- Negative values will extract from the end of the `DataFrame`.
- -1 pulls the last column, -3 pulls the third-to-last column and so on.

- We can mix and match positive and negative values.
- The following targets the last column (-1), third-to-last column(-3), and the third column (2).
- Every column name must be unique so Polars will raise an exception if we target the same column twice.

- The `first` and `last` selectors target the first and last columns.
- The `first` method is equivalent to `cs.by_index(0)`.
- The `last` method is equivalent to `cs.by_index(-1)`.

### Further Reading
- https://docs.pola.rs/api/python/stable/reference/selectors.html#polars.selectors.by_index
- https://docs.pola.rs/api/python/stable/reference/selectors.html#polars.selectors.first
- https://docs.pola.rs/api/python/stable/reference/selectors.html#polars.selectors.last

## Set Operations with Selectors
- Selectors from `cs` support common set operations (union, intersection, difference, symmetric difference, and complement).
- A set is an unordered collection of unique values. Python has a `set` type.
- Set operations refer to various comparison operations between two sets.

### Union
- We can use symbols to combine selectors. Different symbols apply different set operations.
- The `|` symbol creates a union (either/or) operation between the selectors.
- The union is the combination of two sets' values regardless of whether the value exists in one set or both.
- The next example targets columns that end with `"name"` or store temporal/datetime data (or both).

### Intersection
- The `&` symbol performs an intersection (AND). The value must exist in both sets.
- The `cs.contains` selector checks if the column name contains a substring.
- The code below targets columns that hold string values AND whose names contain `"ar"`.
- The selector excludes the `popularity` column. It holds `"ar"` but it is not a string column.

### Difference
- The `-` symbol calculates the difference between two sets.
- A difference operation removes entries from one set if they are found in the other set.
- Think of it as "subtracting" values from the set.
- The following example selects string columns _except for_ those that contain `name`.

### Symmetric Difference/Exclusive Or
- The `^` (exclusive or symbol) selects columns that satisfy either one condition or the other but _not_ both.
- Equivalently, the `^` targets values that exist in the first set or the second set but not both sets.
- The next example selects columns that are either strings or contain the text `name` but not both.
- Polars excludes `track_name` and `album_name` because they are string columns that contain `"name"`.

### Exclusion
- The `~` (tilde) symbol performs exclusion/negation.
- The selector targets the inverse of the results set.
- For example, `~cs.temporal` will select all non-temporal columns.

- We can accomplish the same result with the `cs.exclude` function.
- The `cs.exclude` function accepts other selectors. The `pl.exclude` function will not work here.
- The next example excludes all temporal columns (date/time/datetime).

- Selectors can utilize other selectors.
- The next example excludes columns that fall in the set of string columns that do not contain `"art"`.
- `track_name` and `album_name` are string columns that do not contain `"art"`.
- Polars thus excludes `track_name` and `album_name` from the `DataFrame`.

### Further Reading
- https://docs.pola.rs/user-guide/expressions/expression-expansion/#combining-selectors-with-set-operations
- https://docs.pola.rs/api/python/stable/reference/selectors.html#polars.selectors.string
- https://docs.pola.rs/api/python/stable/reference/selectors.html#polars.selectors.contains
- https://docs.pola.rs/api/python/stable/reference/selectors.html#polars.selectors.temporal

## Complete List of Selectors
- Docs: https://docs.pola.rs/user-guide/expressions/expression-expansion/#complete-reference