# Complex Array

This example demonstrates how to read and write various arrays to Vertica. In particular, we will write a regular array, a nested array, and an array representative of a hash map.

## Spark Setup

First we start with the basics of setting up Spark to work with Vertica. To do this we need to create a Spark Context that has the Spark Connector passed through it as a configuration option.

In [None]:
# Get Connector JAR name
import glob
import os

files = glob.glob("/spark-connector/connector/target/scala-2.12/spark-vertica-connector-assembly-*")
os.environ["CONNECTOR_JAR"] = files[0]
print(os.environ["CONNECTOR_JAR"])

In [None]:
# Create the Spark session and context
from pyspark.sql import SparkSession

spark = (SparkSession.builder
    .config("spark.master", "spark://spark:7077")
    .config("spark.driver.memory", "2G")
    .config("spark.executor.memory", "1G")
    .config("spark.jars", os.environ["CONNECTOR_JAR"])
    .getOrCreate())
sc = spark.sparkContext

In [None]:
# Display the context information
print(sc.version)
print(sc.master)
display(sc.getConf().getAll())

## Read/Write

Now we can build the schema we want to use. Using Spark's Types we first create a StructType as this will define the structure of our Spark DataFrame. We then populate it with StructFields which represent a column in our table. Finally, we create a Row to fill out some example data.

In [None]:
# Perform a write with complex arrays, then read using the Spark Connector
from pyspark.sql.types import StringType, ArrayType, StructType, StructField, IntegerType, Row

schema = StructType([StructField("native_array", ArrayType(IntegerType())), 
                     StructField("nested_array", ArrayType(ArrayType(IntegerType()))),
                     StructField("internal_map", ArrayType(StructType([
                              StructField("key", StringType()),
                              StructField("value", IntegerType()),
                            ])))])
data = [Row([1234567], [[1,2], [3, 4]], [["key1", 5], ["key2", 6]])]

We will now create our Spark DataFrame. However as we do that, we will also use the parallelize method to create an [RDD](https://spark.apache.org/docs/latest/rdd-programming-guide.html).

Finally we write this DataFrame to Vertica into a table called "Complex_Array_Examples," which we shortly read from and print out.

In [None]:
df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema).coalesce(1)

df.write.mode("overwrite").format("com.vertica.spark.datasource.VerticaSource").options(
    host="vertica",
    user="dbadmin",
    password="",
    db="docker",
    table="Complex_Array_Examples",
    staging_fs_url="webhdfs://hdfs:50070/complextypes").save()

df = spark.read.load(format="com.vertica.spark.datasource.VerticaSource",
    host="vertica",
    user="dbadmin",
    password="",
    db="docker",
    table="Complex_Array_Examples",
    staging_fs_url="webhdfs://hdfs:50070/complextypes")
df.rdd.collect()
df.show()