## Validate Spark 3 using CLIs

Let us validate Spark 3.1.2 with Scala, Python as well as SQL using CLIs. We will also make sure that Spark 3.1.2 is integrated with Hive by running some queries against the databases and tables that are created using Hive earlier.

* Validate Spark using Scala by running `/opt/spark3/bin/spark-shell --master yarn --conf spark.ui.port=0`.
* Validate Spark using Python by running `/opt/spark3/bin/pyspark --master yarn --conf spark.ui.port=0`. 
* Make sure to export PYSPARK_PYTHON to point to Python 3 if the default version is 2.7.

```shell
export PYSPARK_PYTHON=python3
/opt/spark3/bin/pyspark --master yarn --conf spark.ui.port=0
```

* You can update **.profile** with `export PYSPARK_PYTHON=python3` so that we don't need to export in the session while launching `pyspark`.
* Validate Spark SQL by running `/opt/spark3/bin/spark-sql --master yarn --conf spark.ui.port=0`.
* You can also validate whether you can access Hive Metastore tables and databases using Spark.

```python
spark.sql('SHOW databases').show()
spark.sql('USE retail_db')

spark.sql('SHOW tables').show()
spark.sql('SELECT * FROM orders').show()
spark.sql('SELECT count(1) FROM orders').show()

exit()
```

In [None]:
!spark-sql --master yarn  --conf spark.ui.port=0  --conf spark.sql.warehouse.dir=/user/${USER}/warehouse


Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


In [None]:
spark-sql> show databases;
default
retail_db
Time taken: 2.971 seconds, Fetched 2 row(s)

    
spark-sql> USE retail_db;
Time taken: 0.043 seconds

    
spark-sql> SHOW tables;
retail_db	orders	false
Time taken: 0.141 seconds, Fetched 1 row(s)


    
spark-sql> SELECT * FROM orders LIMIT 10;
1	2013-07-25 00:00:00.0	11599	CLOSED
2	2013-07-25 00:00:00.0	256	PENDING_PAYMENT
3	2013-07-25 00:00:00.0	12111	COMPLETE
4	2013-07-25 00:00:00.0	8827	CLOSED
5	2013-07-25 00:00:00.0	11318	COMPLETE
6	2013-07-25 00:00:00.0	7130	COMPLETE
7	2013-07-25 00:00:00.0	4530	COMPLETE
8	2013-07-25 00:00:00.0	2911	PROCESSING
9	2013-07-25 00:00:00.0	5657	PENDING_PAYMENT
10	2013-07-25 00:00:00.0	5648	PENDING_PAYMENT
Time taken: 24.034 seconds, Fetched 10 row(s)

    

    
spark-sql> SELECT count(1) FROM orders;
68883
Time taken: 11.83 seconds, Fetched 1 row(s)
