###What is Parquet file?
 Parquet file is a columnar storage format. While querying columnar storage, it skips the non-relevant data very quickly, making query execution faster. As a result, aggregation queries consume less time compared to row-oriented databases. <br>


###PySpark Read/Write Parquet File 
- Read a Parquet file into PySpark Dataframe
- Write a PySpark Dataframe to Parquet file 

Syntax: <br>
df.**write**.parquet("/path/file.parquet") <br>
df=spark.**read**.parquet("/path/file.parquet") <br>

spark.read.**format**("json").**load**("/path/file.json")

In [0]:
file_path = "/Volumes/workspace/training/test_data/parquet"
file_name1 = "employees.parquet"
full_path = f"{file_path}/{file_name1}"

In [0]:
# Create dataframe
data = [
    ("Aarav", "Kumar", "Patel", "1993-08-14", "M", 5500),
    ("Diya", "Rani", "Sharma", "1998-03-22", "F", 6200),
    ("Karan", "", "Mehta", "1989-11-10", "M", 7200),
    ("Meera", "Anand", "Nair", "1995-07-05", "F", 5800),
    ("Rohan", "", "Verma", "1990-12-30", "M", 5000),
    ("Sneha", "L.", "Reddy", "1996-04-18", "F", 6100),
    ("Vikram", "", "Singh", "1988-09-25", "M", 6800),
    ("Priya", "G.", "Iyer", "1992-01-16", "F", 6400),
    ("Aditya", "", "Khan", "1999-02-28", "M", 4700),
    ("Neha", "", "Chopra", "1997-10-12", "F", 5900)
]

columns = ["firstname", "middlename", "lastname", "dob", "gender", "salary"]

df = spark.createDataFrame(data, columns)
df.show()


In [0]:
# Write DataFrame to parquet file using write.parquet()
df.write.mode("overwrite").parquet(f"{full_path}")

In [0]:
# Read parquet file using read.parquet()
parDF=spark.read.parquet(f"{full_path}")
display(parDF)

####Types of Saving Modes of Parquet File:
- df.write.mode("**append**").parquet("path/to/parquet/file")
- df.write.mode("**overwrite**").parquet("path/to/parquet/file")
- df.write.mode("**ignore**").parquet("path/to/parquet/file")
- df.write.mode("**error**").parquet("path/to/parquet/file")

In [0]:
# Creating a temp view on Parquet file
spark.sql(f"""
          CREATE OR REPLACE TEMPORARY VIEW employee 
          USING parquet 
          OPTIONS (path '{full_path}')
          """)
spark.sql("SELECT * FROM employee").show()


In [0]:
#  PySpark SQL
parDF.createOrReplaceTempView("ParquetTable")
spark.sql("select * from ParquetTable where salary >= 6000 ").show()