### Schema Management
##### Schema Management in Databricks (or Apache Spark in general) refers to how we define, enforce, and evolve the structure of data (like columns, data types) in tables and DataFrames.

##### You typically manage schemas in three contexts:

##### When creating DataFrames.

##### When saving DataFrames as tables.

##### When using Unity Catalog (Databricks managed catalogs/schemas/tables).

### Spark SQL
##### Spark SQL allows you to query structured data using SQL queries, or interact with DataFrames using SQL-like APIs in Python/Scala/SQL.

In [0]:
%sql
create database if not exists arya_college;
use arya_college;

In [0]:
%python
data = [
  (1 ,'Laptop','Electronics' ,39000),
  (2,'Tablet','Electronic',25000),
  (3,'Chair','Furniture',5000)
]
schema =['Id','product','category','price']


In [0]:
%python
df = spark.createDataFrame(data,schema)

In [0]:
%python
df.display()

Id,product,category,price
1,Laptop,Electronics,39000
2,Tablet,Electronic,25000
3,Chair,Furniture,5000


In [0]:
%python
df.printSchema()

root
 |-- Id: long (nullable = true)
 |-- product: string (nullable = true)
 |-- category: string (nullable = true)
 |-- price: long (nullable = true)



### Save DataFrame as a Table (Managed Table)

In [0]:
%python
df.write.mode('overwrite').saveAsTable('sales_data_')

In [0]:
%sql
SELECT * FROM sales_data_ WHERE category = 'Electronics';


Id,product,category,price
1,Laptop,Electronics,39000


In [0]:
%python
df_sql = spark.sql("SELECT * FROM sales_data_ WHERE category = 'Electronics'")
df_sql.show()

+---+-------+-----------+-----+
| Id|product|   category|price|
+---+-------+-----------+-----+
|  1| Laptop|Electronics|39000|
+---+-------+-----------+-----+



In [0]:
%sql
SELECT category, COUNT(*) AS product_count, AVG(price) AS avg_price
FROM sales_data_
GROUP BY category;


category,product_count,avg_price
Electronics,1,39000.0
Electronic,1,25000.0
Furniture,1,5000.0


In [0]:
%python 
df_grouped = spark.sql("""
    SELECT category, COUNT(*) AS product_count, AVG(price) AS avg_price
    FROM sales_data_
    GROUP BY category
""")
df_grouped.show()


+-----------+-------------+---------+
|   category|product_count|avg_price|
+-----------+-------------+---------+
|Electronics|            1|  39000.0|
| Electronic|            1|  25000.0|
|  Furniture|            1|   5000.0|
+-----------+-------------+---------+



In [0]:
%sql
DROP TABLE IF EXISTS sales_data_;
