# Arrays and Lists
- A collection type is one that can store multiple values. The list and tuple are 2 examples from Python.
- Arrays (`pl.Array`) and lists (`pl.List`) are two collection types in Polars.
- Certain Polars expressions return a column where every row stores an array/list.
- The design enables each row to store an ordered sequence of values, a collection of values.
- Polars will use arrays when each row has the same number of values within the collection.
- Polars will use lists when rows have a different number of values within the collection.

In [1]:
import polars as pl

### Python Lists vs. Polars Lists
- Python lists can store heterogenous data (values of different types).
- Polars lists/arrays store homogenous data (values must be of the same type).

In [2]:
[1, False, "hello", 4.14]

[1, False, 'hello', 4.14]

- Let's construct a `DataFrame` that stores a column of lists.
- We'll pass the `pl.DataFrame` constructor a dictionary that maps column names to values.
- Pass a list of lists for the column values.
- Polars assumes a type of `pl.List` by default. This is a distinct Polars type.
- Polars will infer the data type of the list's elements.
- Notice the data type of the `names` column's values is a list storing strings (`list[str]`).

In [3]:
pl.DataFrame(
    {"names": [["Paul", "Molly"], ["John", "Jenna", "Jim"], ["Shenny", "Pauline"], []]}
)

names
list[str]
"[""Paul"", ""Molly""]"
"[""John"", ""Jenna"", ""Jim""]"
"[""Shenny"", ""Pauline""]"
[]


- We can explicitly assign the Polars data type (`pl.List`) to the `names` column.
- The `schema` parameter accepts a dictionary with a complete mapping of columns to corresponding types.
- The `schema_overrides` parameter accepts a dictionary that provides the column data types to _override_.
- The `schema` and `schema_overrides` parameters are equivalent here because there is only one column of data.

In [4]:
pl.DataFrame(
    {"names": [["Paul", "Molly"], ["John", "Jenna", "Jim"], ["Shenny", "Pauline"], []]},
    schema_overrides={"names": pl.List},
)

names
list[str]
"[""Paul"", ""Molly""]"
"[""John"", ""Jenna"", ""Jim""]"
"[""Shenny"", ""Pauline""]"
[]


- `pl.List` is a constructor by itself. We can pass it the explicit data type of the list's elements.

In [5]:
pl.DataFrame(
    {"names": [["Paul", "Molly"], ["John", "Jenna", "Jim"], ["Shenny", "Pauline"], []]},
    schema_overrides={"names": pl.List(pl.String)},
)

names
list[str]
"[""Paul"", ""Molly""]"
"[""John"", ""Jenna"", ""Jim""]"
"[""Shenny"", ""Pauline""]"
[]


- The `pl.List` syntax is helpful when overriding Polars' defaults.
- For example, Polars will infer a default integer type of `i64` for a column of integer lists.
- Use `pl.List` to change the `i64` integer type to another integer type.

In [6]:
pl.DataFrame({"values": [[1, 2, 3]]}, schema_overrides={"values": pl.List(pl.Int8)})

values
list[i8]
"[1, 2, 3]"


### Further Reading
- https://docs.pola.rs/user-guide/expressions/lists-and-arrays/#the-data-type-list
- https://docs.pola.rs/api/python/stable/reference/api/polars.datatypes.List.html#polars.datatypes.List

## The str.split Method
- You'll often arrive at the `pl.List` data type through a transformation.
- Imagine you are tracking the guests at a dinner party.
- The guests arrive in the Python program as a list of strings.
- Each string represents a group of people at a table.

In [7]:
guests = pl.DataFrame({"names": ["Paul,Molly", "John,Jenna,Jim", "Shenny,Pauline"]})
guests

names
str
"""Paul,Molly"""
"""John,Jenna,Jim"""
"""Shenny,Pauline"""


- Python's `split` method cuts a string at every occurrence of a substring.
- It returns a list with the pieces left after the split.

In [8]:
"Paul,Molly".split(",")

['Paul', 'Molly']

- The `str` namespace/attribute holds methods for string operations/expressions.
- The `str.split` method applies the `split` logic to every row value.
- The `str.split` method returns a column of lists.

In [9]:
guests.select(pl.col("names").str.split(","))

names
list[str]
"[""Paul"", ""Molly""]"
"[""John"", ""Jenna"", ""Jim""]"
"[""Shenny"", ""Pauline""]"


### Further Reading
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.str.split.html

## The list Namespace
- Polars defines list functionalities within a `list` attribute/namespace.
- The `len` method returns the length of each list.
- The `head` method extracts the first `n` number of elements from each list. It returns a column of lists.
- The `tail` method extracts the last `n` number of elements from each list. It returns a column of lists.
- The `first` method extracts the first element from each list.
- The `last` method extracts the last element from each list.

In [10]:
guests = pl.DataFrame(
    {"names": [["Paul", "Molly"], ["John", "Jenna", "Jim"], ["Shenny", "Pauline"], []]},
)
guests

names
list[str]
"[""Paul"", ""Molly""]"
"[""John"", ""Jenna"", ""Jim""]"
"[""Shenny"", ""Pauline""]"
[]


In [11]:
guests.with_columns(
    pl.col("names").list.len().alias("length"),
    pl.col("names").list.head(2).alias("head"),
    pl.col("names").list.tail(2).alias("tail"),
    pl.col("names").list.first().alias("first"),
    pl.col("names").list.last().alias("last"),
)

names,length,head,tail,first,last
list[str],u32,list[str],list[str],str,str
"[""Paul"", ""Molly""]",2,"[""Paul"", ""Molly""]","[""Paul"", ""Molly""]","""Paul""","""Molly"""
"[""John"", ""Jenna"", ""Jim""]",3,"[""John"", ""Jenna""]","[""Jenna"", ""Jim""]","""John""","""Jim"""
"[""Shenny"", ""Pauline""]",2,"[""Shenny"", ""Pauline""]","[""Shenny"", ""Pauline""]","""Shenny""","""Pauline"""
[],0,[],[],,


- The `contains` method checks for the inclusion of an element within each row's list.
- Pass a Boolean expression into the `filter` method to filter rows.

In [12]:
guests.with_columns(pl.col("names").list.contains("Jenna").alias("has_jenna"))

names,has_jenna
list[str],bool
"[""Paul"", ""Molly""]",False
"[""John"", ""Jenna"", ""Jim""]",True
"[""Shenny"", ""Pauline""]",False
[],False


In [13]:
guests.filter(pl.col("names").list.contains("J").alias("has_jenna"))

names
list[str]


### Further Reading
- https://docs.pola.rs/user-guide/expressions/lists-and-arrays/#the-namespace-list
- https://docs.pola.rs/user-guide/expressions/lists-and-arrays/#operating-on-lists
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.list.len.html
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.list.head.html
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.list.tail.html
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.list.first.html
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.list.last.html
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.list.contains.html

## Sorting the Lists
- The `sort` method on the `DataFrame` will sort the column based on the list values.
- Polars will compare the elements across lists based on shared index position.
- "John" comes before "Paul" who comes before "Shenny".

In [14]:
guests = pl.DataFrame(
    {"names": [["Paul", "Molly"], ["John", "Jenna", "Jim"], ["Shenny", "Pauline"], []]},
)
guests

names
list[str]
"[""Paul"", ""Molly""]"
"[""John"", ""Jenna"", ""Jim""]"
"[""Shenny"", ""Pauline""]"
[]


In [15]:
guests.sort("names")

guests.select(pl.col("names").sort())

names
list[str]
[]
"[""John"", ""Jenna"", ""Jim""]"
"[""Paul"", ""Molly""]"
"[""Shenny"", ""Pauline""]"


- The `list.sort` method sorts each list's elements. The order of the rows in the column remains the same.
- Polars will accommodate lists of different lengths.

In [16]:
guests.with_columns(pl.col("names").list.sort().alias("sorted_names"))

names,sorted_names
list[str],list[str]
"[""Paul"", ""Molly""]","[""Molly"", ""Paul""]"
"[""John"", ""Jenna"", ""Jim""]","[""Jenna"", ""Jim"", ""John""]"
"[""Shenny"", ""Pauline""]","[""Pauline"", ""Shenny""]"
[],[]


### Further Reading
- https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.sort.html
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.list.sort.html

## The explode Method
- The `list.explode` method creates a row entry for every list value.
- The `list.explode` method is a "flatten" operation. It creates a one-dimensional sequence of values from a collection of nested lists.


In [17]:
guests = pl.DataFrame(
    {"names": [["Molly", "Paul"], ["John", "Jenna", "Jim"], ["Shenny", "Pauline"], []]}
)
guests

names
list[str]
"[""Molly"", ""Paul""]"
"[""John"", ""Jenna"", ""Jim""]"
"[""Shenny"", ""Pauline""]"
[]


- Combining the `with_columns` method with `list.explode` would trigger an error.
- The `list.explode` method returns a column that is longer than `names` (one row per name).
- The `names` column has 4 rows but there are 7 total names to extract.
- The `select` method returns a new column that can be of any length.
- An empty list evaluates to a single null value.

In [18]:
guests.select(pl.col("names").list.explode())

names
str
"""Molly"""
"""Paul"""
"""John"""
"""Jenna"""
"""Jim"""
"""Shenny"""
"""Pauline"""
""


- The `flatten` method accomplishes the same result.

In [19]:
guests.select(pl.col("names").flatten())

names
str
"""Molly"""
"""Paul"""
"""John"""
"""Jenna"""
"""Jim"""
"""Shenny"""
"""Pauline"""
""


- We can also invoke `explode` on the `DataFrame` and pass in a column expression.

In [20]:
guests.explode(pl.col("names")).drop_nulls(pl.col("names"))

names
str
"""Molly"""
"""Paul"""
"""John"""
"""Jenna"""
"""Jim"""
"""Shenny"""
"""Pauline"""


### Further Reading
- https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.explode.html
- https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.drop_nulls.html
- https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.flatten.html
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.list.explode.html

## Exploding with Multiple Columns of Lists
- Let' say we're mapping guests at an event to the entrees delivered to their table.

In [21]:
guests = pl.DataFrame(
    {
        "names": [
            ["Paul", "Molly"],
            ["John", "Jenna", "Jim"],
            ["Shenny", "Pauline"],
            [],
        ],
        "entrees": [
            ["Fish", "Steak"],
            ["Fish", "Chicken", "Steak"],
            ["Chicken", "Steak"],
            [],
        ],
    },
)

guests

names,entrees
list[str],list[str]
"[""Paul"", ""Molly""]","[""Fish"", ""Steak""]"
"[""John"", ""Jenna"", ""Jim""]","[""Fish"", ""Chicken"", ""Steak""]"
"[""Shenny"", ""Pauline""]","[""Chicken"", ""Steak""]"
[],[]


In [22]:
guests.explode("names", "entrees")

names,entrees
str,str
"""Paul""","""Fish"""
"""Molly""","""Steak"""
"""John""","""Fish"""
"""Jenna""","""Chicken"""
"""Jim""","""Steak"""
"""Shenny""","""Chicken"""
"""Pauline""","""Steak"""
,


- The list of entrees may be longer or shorter than the number of guests.
- We cannot explode both columns at once because they contain a different number of entries.
- The flattened column of names has 7 names, while the flattened column of entrees has 6 meals.

In [23]:
guests = pl.DataFrame(
    {
        "names": [
            ["Paul", "Molly"],
            ["John", "Jenna", "Jim"],
            ["Shenny", "Pauline"],
            [],
        ],
        "entrees": [
            ["Fish", "Steak"],
            ["Fish", "Chicken"],
            ["Chicken", "Steak", "Barbecue"],
            [],
        ],
    },
)

guests

names,entrees
list[str],list[str]
"[""Paul"", ""Molly""]","[""Fish"", ""Steak""]"
"[""John"", ""Jenna"", ""Jim""]","[""Fish"", ""Chicken""]"
"[""Shenny"", ""Pauline""]","[""Chicken"", ""Steak"", ""Barbecue""]"
[],[]


In [24]:
# guests.explode("names", "entrees")

- We can explode the `names` column first to get each combination of person with the entrees they ordered.
- Polars matches each name inside a row's list to the full list of `entrees` they received.

In [25]:
guests.explode("names")

names,entrees
str,list[str]
"""Paul""","[""Fish"", ""Steak""]"
"""Molly""","[""Fish"", ""Steak""]"
"""John""","[""Fish"", ""Chicken""]"
"""Jenna""","[""Fish"", ""Chicken""]"
"""Jim""","[""Fish"", ""Chicken""]"
"""Shenny""","[""Chicken"", ""Steak"", ""Barbecue""]"
"""Pauline""","[""Chicken"", ""Steak"", ""Barbecue""]"
,[]


- Now, we can explode the `entrees` list to create a row for every combination of guest and the each meal in the entrees list.

In [26]:
guests.explode("names").explode("entrees")

names,entrees
str,str
"""Paul""","""Fish"""
"""Paul""","""Steak"""
"""Molly""","""Fish"""
"""Molly""","""Steak"""
"""John""","""Fish"""
…,…
"""Shenny""","""Barbecue"""
"""Pauline""","""Chicken"""
"""Pauline""","""Steak"""
"""Pauline""","""Barbecue"""


- Reversing the order of `explode` method calls leads to the same `DataFrame`.

In [27]:
guests.explode("entrees").explode("names").drop_nulls()

names,entrees
str,str
"""Paul""","""Fish"""
"""Molly""","""Fish"""
"""Paul""","""Steak"""
"""Molly""","""Steak"""
"""John""","""Fish"""
…,…
"""Pauline""","""Chicken"""
"""Shenny""","""Steak"""
"""Pauline""","""Steak"""
"""Shenny""","""Barbecue"""


### Further Reading
- https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.explode.html
- https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.drop_nulls.html

## Mathematical Operations
- The `list` attribute includes various mathematical methods.
- Polars will invoke the operation on every row/list within a column.
- Polars may truncate the display of list elements if the list is long.
- The ellipses (`...`) marks a gap in the list.

In [28]:
work = pl.DataFrame(
    {
        "employee": ["Alice", "Bob", "Carol", "Dave"],
        "emotional_damage_per_meeting": [[6, 10, 8], [7, 8, 1, 1], [4, 4], []],
    }
)

work

employee,emotional_damage_per_meeting
str,list[i64]
"""Alice""","[6, 10, 8]"
"""Bob""","[7, 8, … 1]"
"""Carol""","[4, 4]"
"""Dave""",[]


- The `list.sum` method adds up the values in each list.
- The `list.max` and `list.min` methods extract the largest and smallest value from each list.
- The `list.median` method returns the middle point of each list when sorted in order.
- The `list.n_unique` method returns the number of unique elements within each list.

In [29]:
work.with_columns(
    pl.col("emotional_damage_per_meeting").list.sum().alias("sum"),
    pl.col("emotional_damage_per_meeting").list.max().alias("max"),
    pl.col("emotional_damage_per_meeting").list.min().alias("min"),
    pl.col("emotional_damage_per_meeting").list.mean().alias("mean"),
    pl.col("emotional_damage_per_meeting").list.median().alias("median"),
    pl.col("emotional_damage_per_meeting").list.n_unique().alias("n_unique"),
)

employee,emotional_damage_per_meeting,sum,max,min,mean,median,n_unique
str,list[i64],i64,i64,i64,f64,f64,u32
"""Alice""","[6, 10, 8]",24,10.0,6.0,8.0,8.0,3
"""Bob""","[7, 8, … 1]",17,8.0,1.0,4.25,4.0,3
"""Carol""","[4, 4]",8,4.0,4.0,4.0,4.0,1
"""Dave""",[],0,,,,,0


### Further Reading
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.list.sum.html
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.list.max.html
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.list.min.html
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.list.mean.html
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.list.median.html
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.list.n_unique.html

## The list.eval, list.any, and list.all Methods
- The `list.eval` method maps each list element to a new value.
- The `list.eval` method returns a list with the evaluations.

In [30]:
work = pl.DataFrame(
    {
        "employee": ["Alice", "Bob", "Carol", "Dave"],
        "emotional_damage_per_meeting": [[6, 10, 8], [7, 8, 1, 1], [4, 4], []],
    }
)
work

employee,emotional_damage_per_meeting
str,list[i64]
"""Alice""","[6, 10, 8]"
"""Bob""","[7, 8, … 1]"
"""Carol""","[4, 4]"
"""Dave""",[]


- The `pl.element` function return each list element one by one.
- The `pl.element` function is designed for use with the `eval` method.

In [31]:
work.with_columns(
    pl.col("emotional_damage_per_meeting").list.eval(pl.element()).alias("new_column")
)

employee,emotional_damage_per_meeting,new_column
str,list[i64],list[i64]
"""Alice""","[6, 10, 8]","[6, 10, 8]"
"""Bob""","[7, 8, … 1]","[7, 8, … 1]"
"""Carol""","[4, 4]","[4, 4]"
"""Dave""",[],[]


In [32]:
work.with_columns(
    pl.col("emotional_damage_per_meeting")
    .list.eval(pl.element() + 1)
    .alias("new_column")
)

employee,emotional_damage_per_meeting,new_column
str,list[i64],list[i64]
"""Alice""","[6, 10, 8]","[7, 11, 9]"
"""Bob""","[7, 8, … 1]","[8, 9, … 2]"
"""Carol""","[4, 4]","[5, 5]"
"""Dave""",[],[]


In [33]:
work.with_columns(
    pl.col("emotional_damage_per_meeting")
    .list.eval(pl.element() > 5)
    .alias("tough_meeting")
)

employee,emotional_damage_per_meeting,tough_meeting
str,list[i64],list[bool]
"""Alice""","[6, 10, 8]","[true, true, true]"
"""Bob""","[7, 8, … 1]","[true, true, … false]"
"""Carol""","[4, 4]","[false, false]"
"""Dave""",[],[]


- The `list.eval` method conveniently gave us a column of Boolean lists.
- The `list.any` method returns True if a list contains at least one True value.
- The `list.all` method returns True if all values inside the list are True.

In [34]:
work.with_columns(
    pl.col("emotional_damage_per_meeting")
    .list.eval(pl.element() > 5)
    .alias("tough_meeting")
).with_columns(
    pl.col("tough_meeting").list.any().alias("at_least_1_tough_meeting"),
    pl.col("tough_meeting").list.all().alias("all_tough_meetings"),
).filter(pl.col("at_least_1_tough_meeting"))

employee,emotional_damage_per_meeting,tough_meeting,at_least_1_tough_meeting,all_tough_meetings
str,list[i64],list[bool],bool,bool
"""Alice""","[6, 10, 8]","[true, true, true]",True,True
"""Bob""","[7, 8, … 1]","[true, true, … false]",True,False


### Further Reading
- https://docs.pola.rs/user-guide/expressions/lists-and-arrays/#operating-on-lists
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.list.eval.html
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.list.any.html
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.list.all.html
- https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.element.html

## Concatenating Column Values

In [35]:
action_stars = pl.DataFrame(
    {
        "first_name": ["Arnold", "Sylvester", "Jean-Claude"],
        "last_name": ["Schwarzenegger", "Stallone", "Van Damme"],
    }
)

action_stars

first_name,last_name
str,str
"""Arnold""","""Schwarzenegger"""
"""Sylvester""","""Stallone"""
"""Jean-Claude""","""Van Damme"""


- The `format` function enables the interpolation of multiple column values inside the formatted string.

In [36]:
action_stars.with_columns(
    pl.format(
        "{} {} the Great", pl.col("first_name").str.to_uppercase(), pl.col("last_name")
    ).alias("full_name")
)

first_name,last_name,full_name
str,str,str
"""Arnold""","""Schwarzenegger""","""ARNOLD Schwarzenegger the Grea…"
"""Sylvester""","""Stallone""","""SYLVESTER Stallone the Great"""
"""Jean-Claude""","""Van Damme""","""JEAN-CLAUDE Van Damme the Grea…"


- The `pl.concat_str` function concatenates content across columns.
- We can use the `pl.lit` function (literal) to add a constant space between the first and last names.

In [37]:
action_stars.with_columns(
    pl.concat_str(pl.col("first_name"), pl.col("last_name")).alias("full_name")
)

first_name,last_name,full_name
str,str,str
"""Arnold""","""Schwarzenegger""","""ArnoldSchwarzenegger"""
"""Sylvester""","""Stallone""","""SylvesterStallone"""
"""Jean-Claude""","""Van Damme""","""Jean-ClaudeVan Damme"""


In [38]:
action_stars.with_columns(
    pl.concat_str(pl.col("first_name"), pl.lit(" "), pl.col("last_name")).alias(
        "full_name"
    )
)

first_name,last_name,full_name
str,str,str
"""Arnold""","""Schwarzenegger""","""Arnold Schwarzenegger"""
"""Sylvester""","""Stallone""","""Sylvester Stallone"""
"""Jean-Claude""","""Van Damme""","""Jean-Claude Van Damme"""


- Alternatively, we can use the `separator` parameter.

In [39]:
action_stars.with_columns(
    pl.concat_str(pl.col("first_name"), pl.col("last_name"), separator=" ").alias(
        "full_name"
    )
)

first_name,last_name,full_name
str,str,str
"""Arnold""","""Schwarzenegger""","""Arnold Schwarzenegger"""
"""Sylvester""","""Stallone""","""Sylvester Stallone"""
"""Jean-Claude""","""Van Damme""","""Jean-Claude Van Damme"""


- The `pl.concat_list` function concatenates content across columns into a list instead..

In [40]:
action_stars.with_columns(
    pl.concat_list(pl.col("first_name"), pl.col("last_name")).alias("name_list")
)

first_name,last_name,name_list
str,str,list[str]
"""Arnold""","""Schwarzenegger""","[""Arnold"", ""Schwarzenegger""]"
"""Sylvester""","""Stallone""","[""Sylvester"", ""Stallone""]"
"""Jean-Claude""","""Van Damme""","[""Jean-Claude"", ""Van Damme""]"


- Say we have the reverse situation: we have a column of lists and we want to concatenate the contents into a new column.

In [41]:
action_stars.with_columns(
    pl.concat_list(pl.col("first_name"), pl.col("last_name")).alias("name_list")
).select(pl.col("name_list").list.join(separator=" ").alias("full_name"))

full_name
str
"""Arnold Schwarzenegger"""
"""Sylvester Stallone"""
"""Jean-Claude Van Damme"""


### Further Reading
- https://docs.pola.rs/user-guide/expressions/lists-and-arrays/#row-wise-computations
- https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.format.html#polars.format
- https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.concat_str.html
- https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.concat_list.html
- https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.list.join.html

## Arrays
- The Polars array is a complementary storage type for collections.
- A column of lists can have a different list length in each row.
- A column of arrays must have the same array length in each row.
- The consistency in length enables the Array to be more memory efficient and performant.
- The Polars documentation recommends using the Array over the List _if possible_.
- When instantiating a `DataFrame`, Polars will assume a list type even with equally-sized collection inputs.

In [42]:
pl.DataFrame(
    {
        "burritos": [
            ["white rice", "pinto beans", "steak"],
            ["brown rice", "black beans", "chicken"],
        ]
    }
)

burritos
list[str]
"[""white rice"", ""pinto beans"", ""steak""]"
"[""brown rice"", ""black beans"", ""chicken""]"


- Use the `schema` or `schema_overrides` parameters to override the inferred type for a column.
- The `pl.Array` constructor accepts two arguments: the type of each array element and the length of each array.

In [43]:
lunch = pl.DataFrame(
    {
        "burritos": [
            ["white rice", "pinto beans", "steak"],
            ["brown rice", "black beans", "chicken"],
        ]
    },
    schema_overrides={"burritos": pl.Array(pl.String, shape=3)},
)
lunch

burritos
"array[str, 3]"
"[""white rice"", ""pinto beans"", ""steak""]"
"[""brown rice"", ""black beans"", ""chicken""]"


In [44]:
lunch.schema

Schema([('burritos', Array(String, shape=(3,)))])

- Let's expand the `DataFrame` to include a column of floating-point arrays.

In [45]:
lunch = pl.DataFrame(
    {
        "burritos": [
            ["white rice", "pinto beans", "steak"],
            ["brown rice", "black beans", "chicken"],
        ],
        "calories": [[205, 245, 349], [218, 227, 215]],
    },
    schema_overrides={
        "burritos": pl.Array(pl.String, shape=3),
        "calories": pl.Array(pl.UInt16, shape=3),
    },
)
lunch

burritos,calories
"array[str, 3]","array[u16, 3]"
"[""white rice"", ""pinto beans"", ""steak""]","[205, 245, 349]"
"[""brown rice"", ""black beans"", ""chicken""]","[218, 227, 215]"


### Further Reading
- https://docs.pola.rs/user-guide/expressions/lists-and-arrays/#the-data-type-array
- https://docs.pola.rs/user-guide/expressions/lists-and-arrays/#creating-an-array-column
- https://docs.pola.rs/api/python/stable/reference/api/polars.datatypes.Array.html#polars.datatypes.Array

## The arr Attribute
- The array supports a similar collection of methods as the list.
- Polars nests the methods under the `arr` attribute/namespace.

In [46]:
lunch = pl.DataFrame(
    {
        "burritos": [
            ["white rice", "pinto beans", "steak"],
            ["brown rice", "black beans", "chicken"],
        ],
        "calories": [[205, 245, 349], [218, 227, 215]],
    },
    schema_overrides={
        "burritos": pl.Array(pl.String, shape=3),
        "calories": pl.Array(pl.Int32, shape=3),
    },
)

lunch

burritos,calories
"array[str, 3]","array[i32, 3]"
"[""white rice"", ""pinto beans"", ""steak""]","[205, 245, 349]"
"[""brown rice"", ""black beans"", ""chicken""]","[218, 227, 215]"


In [47]:
lunch.with_columns(
    pl.col("calories").arr.sum().alias("calorie_sum"),
    pl.col("calories").arr.mean().alias("calorie_average"),
    pl.col("calories").arr.max().alias("largest"),
    pl.col("calories").arr.min().alias("smallest"),
    pl.col("burritos").arr.head(2).alias("first_two"),
    pl.col("burritos").arr.tail(2).alias("last_two"),
    pl.col("burritos").arr.first().alias("first"),
    pl.col("burritos").arr.last().alias("last"),
)

burritos,calories,calorie_sum,calorie_average,largest,smallest,first_two,last_two,first,last
"array[str, 3]","array[i32, 3]",i32,f64,i32,i32,list[str],list[str],str,str
"[""white rice"", ""pinto beans"", ""steak""]","[205, 245, 349]",799,266.333333,349,205,"[""white rice"", ""pinto beans""]","[""pinto beans"", ""steak""]","""white rice""","""steak"""
"[""brown rice"", ""black beans"", ""chicken""]","[218, 227, 215]",660,220.0,227,215,"[""brown rice"", ""black beans""]","[""black beans"", ""chicken""]","""brown rice""","""chicken"""


### Further Reading
- https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.arr.sum.html
- https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.arr.mean.html
- https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.arr.max.html
- https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.arr.min.html
- https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.arr.head.html
- https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.arr.tail.html
- https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.arr.first.html
- https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.arr.last.html