# Reshaping

## Wide vs. Long DataFrames
- Wide and long describe two ways of organizing data in a table.
- A variable is a data attribute that can have multiple values.
- Wide `DataFrames` store the same variable across multiple columns.
- Wide DataFrames expand horizontally -- their number of columns grows.
- Long `DataFrames` store each variable in a single column.
- Long `DataFrames` expand vertically -- their number of rows grows.

### Example
- The `wide_store_sales.csv` dataset is a wide dataset.
- It stores the same attribute/variable (revenue) across multiple columns (Jan, Feb, etc).
- As more revenue values arrive for future months, the `DataFrame` will expand in width.

## The unpivot Method to Convert a Wide DataFrame to a Long DataFrame
- The `unpivot` method transforms a `DataFrame` from a wide format to a long format.
- The `on` parameter accepts the column(s) with the duplicate values by category.
- The `index` parameter accepts the column(s) with the identifiers. Polars will extract the unique values from this column.
- The equivalent Pandas method is `melt`.

- The `index` parameter supports a list argument too.
- If the `on` parameter is omitted, Polars will include all columns that are not provided to `index`.
- The dataset has 6 month columns x 5 rows = 30 total rows in new `DataFrame`.

- Use the `variable_name` parameter to rename the variable column (the one that will hold the former column names).
- Use the `value_name` parameter to rename the value column (the one that will hold the cell values).

### Further Reading
- https://docs.pola.rs/user-guide/transformations/unpivot/
- https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.unpivot.html

## The pivot Method to Convert a Long DataFrame to a Wide DataFrame
- The complementary `pivot` method converts a `DataFrame` from a long format to a wide format.
- The `student_grades.csv` dataset stores each variable in a single column.
- Student names, subjects, and score values are _not_ scattered across multiple columns.

- A wider view can make it easy to parse a student's performance in all subjects.
- The `pivot` method transforms a long `DataFrame` to a wide one.
- The `index` parameter sets the columns whose values will be kept as unique row identifiers.
- The `on` parameter sets the columns whose distinct values will be extracted to new columns.
- The `values` parameter sets the columns whose values will be distributed in the cells of the new table.
- Polars will use a `null` if there is no value for the intersection of an identifier (student name) and value (test score).

### Further Reading
- https://docs.pola.rs/user-guide/transformations/pivot/#eager
- https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.pivot.html

## Pivot Tables I
- A pivot table reshapes data by turning unique values into new rows or columns, then summarizing corresponding values.
- The `student_grades_expanded` `DataFrame` has multiple entries for a student and grade.

- The dataset stores student grades over 2 years at the school.
- Thus, certain combinations of student and subject may appear twice.

- A regular `pivot` method fails because of duplicate values in the `student` column.
- Each combination of student name and subject appears twice (once per year) so Polars cannot choose a single score per combination.

- The `aggregate_function` parameter sets the algorithm for selecting the value per each duplicate combination.
- An argument of `first` selects the first occurrence of each unique value.
- In this example, Polars will use the test score for the first occurrence of each student name + school subject.
- This dataset represents the first year of grades.

- An argument of `last` selects the last occurrence of each unique value.
- This dataset represents the second year of grades.

- There are complementary `max` and `min` arguments for choosing the largest or smallest value for every match.

### Further Reading
- https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.pivot.html

## Pivot Tables II
- The aggregate functions in the previous lesson (`first`, `last`, `max`, and `min`) chose one value from a set of possible values.
- Other functions can perform aggregate operations across all values within each combination.

- The `sum` aggregation adds the values together.

- The `mean` aggregation takes the average of the values.

- The `len` aggregate function counts the number of values in each group.

- One advantage of pivot tables is that we can look at data from different angles.
- Let's swap the axes: subjects as row values, students as columns.

### Further Reading
- https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.pivot.html

## The transpose Method
- The `transpose` method swaps the axes of the `DataFrame`.
- Current row values become new column headers, and current column headers become new row values.
- Let's start with the pivot table from the end of the previous lesson.

- The `column_names` parameter identifies the column whose values will become new columns.
- Polars will arrange the values so they match the original intersection of row and column (but now inverted/transposed).
- Pass `True` to the `include_header` parameter to include the former column headers in a new column.
- The `header_name` parameter sets a custom name for the column of header values.

### Further Reading
- https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.transpose.html