### PySpark Explode Array and Map Columns to Rows

1) "**explode**(e: Column)" is used to explode or create array or map columns to rows. 

      When an array is passed to this function, it creates a new default column “col1” and it contains all array elements. 
      When a map is passed, it creates two new columns one for key and one for value and each element in map split into the rows.
      This will ignore elements that have null or empty.

2) "**explode_outer**(e: Column)" is used to create a row for each element in the array or map column. 
      Unlike explode, if the array or map is null or empty, explode_outer returns null.

3) "**posexplode**(e: Column)" creates a row for each element in the array and creates two columns “pos’ to hold the position of the array element and the ‘col’ to hold the actual array value. And when the input column is a map, posexplode function creates 3 columns “pos” to hold the position of the map element, “key” and “value” columns.This will ignore elements that have null or empty.

4) "**posexplode_outer**(e: Column)" creates a row for each element in the array and creates two columns “pos’ to hold the position of the array element and the ‘col’ to hold the actual array value. Unlike posexplode, if the array or map is null or empty, posexplode_outer function returns null, null for pos and col columns. Similarly for the map, it returns rows with nulls.

In [0]:
# Create data
arrayData = [
    ('Aarav', ['Python', 'SQL'], {'hair': 'black', 'eye': 'brown'}),
    ('Diya', ['Spark', 'Python', None], {'hair': 'brown', 'eye': None}),
    ('Karan', ['Go', ''], {'hair': 'red', 'eye': ''}),
    ('Isha', None, None),
    ('Rahul', ['1', '2'], {})
]

# Create DataFrame
df = spark.createDataFrame(data=arrayData, schema=['name', 'knownLanguages', 'properties'])

# Print schema and show data
df.printSchema()
df.show(truncate=False)

In [0]:
# explode() on array column
from pyspark.sql.functions import explode
df2 = df.select(df.name,explode(df.knownLanguages))
df2.printSchema()
df2.show()

In [0]:
# explode() on map column
from pyspark.sql.functions import explode
df3 = df.select(df.name,explode(df.properties))
df3.printSchema()
df3.show()

In [0]:
# explode_outer() on array and map column
from pyspark.sql.functions import explode_outer
df.select(df.name,explode_outer(df.knownLanguages)).show()
df.select(df.name,explode_outer(df.properties)).show()

In [0]:
# posexplode() on array and map
from pyspark.sql.functions import posexplode
df.select(df.name,posexplode(df.knownLanguages)).show()
df.select(df.name,posexplode(df.properties)).show()

In [0]:
# posexplode_outer() on array and map 
from pyspark.sql.functions import posexplode_outer
df.select("name",posexplode_outer("knownLanguages")).show()
df.select(df.name,posexplode_outer(df.properties)).show()

IQ:
How can I use explode() with multiple columns?

You can use explode() on one column at a time. If you need to explode multiple columns simultaneously, you can chain multiple select() statements.