# Writing Data

Just as there are many ways to read data, we have just as many ways to write data.

In this notebook, we will take a quick peek at how to write data back out to Parquet files.

**Technical Accomplishments:**
- Writing data to Parquet files

In [None]:
%run Utilities

## ➡️ Writing Data

Let's start with one of our original CSV data sources, **pageviews_by_second.tsv**:


In [None]:
from pyspark.sql.types import *

csvFile = "Files/sampledata/pageviews-by-second.tsv"

csvSchema = StructType([
  StructField("timestamp", StringType(), False),
  StructField("site", StringType(), False),
  StructField("requests", IntegerType(), False)
])

csvDF = (spark.read                        # The DataFrameReader
   .option("header", "true")       # Use first line of all files as header
   .option("sep", "\t")            # Use tab delimiter (default is comma-separator)
   .option("inferSchema", "true")  # Automatically infer data types
   .csv(csvFile)                   # Creates a DataFrame from CSV after reading in the file
)

Now that we have a `DataFrame`, we can write it back out as Parquet files or other various formats.

In [None]:
fileName = "Files/sampledata/pageviews-by-second.parquet"
print("Output location: " + fileName)

(csvDF.write                       # Our DataFrameWriter
  .option("compression", "snappy") # One of none, snappy, gzip, and lzo
  .mode("overwrite")               # Replace existing files
  .parquet(fileName)               # Write DataFrame to Parquet files
)

Now that the file has been written out, we can see it in the lakehouse:

In [None]:
%%sh
ls /lakehouse/default/Files/sampledata

And lastly we can read that same parquet file back in and display the results:

In [None]:
display(
  spark.read.parquet(fileName)
)