# PySpark Column Selection & Manipulation: Key Techniques
## 1. Different Methods to Select Columns
###### In PySpark, you can select specific columns in multiple ways:

###### Using col() function / column() / string way:

In [22]:
df = spark.read.format("csv").option("header","true").load("Files/Pyspark_files/BigMartSales.csv")

StatementMeta(, 79a031ad-bee4-40ca-a8a9-f8e54237d3e8, 24, Finished, Available, Finished)

In [23]:
from pyspark.sql.functions import col,column

# Using col() function
df.select(col("Item_Identifier")).show(5)

# Using column() function
df.select(column("Item_Type")).show(5)

# Directly using string name
df.select("Outlet_Type").show(5)


StatementMeta(, 79a031ad-bee4-40ca-a8a9-f8e54237d3e8, 25, Finished, Available, Finished)

+---------------+
|Item_Identifier|
+---------------+
|          FDA15|
|          DRC01|
|          FDN15|
|          FDX07|
|          NCD19|
+---------------+
only showing top 5 rows

+--------------------+
|           Item_Type|
+--------------------+
|               Dairy|
|         Soft Drinks|
|                Meat|
|Fruits and Vegeta...|
|           Household|
+--------------------+
only showing top 5 rows

+-----------------+
|      Outlet_Type|
+-----------------+
|Supermarket Type1|
|Supermarket Type2|
|Supermarket Type1|
|    Grocery Store|
|Supermarket Type1|
+-----------------+
only showing top 5 rows



# 2. Selecting Multiple Columns Together
###### You can combine different methods to select multiple columns:

In [24]:
#multiple column
df_multiple_col = df.select("Item_Identifier", "Item_Fat_Content",
                col("Item_Weight"), column("Item_Visibility"),df.Item_Type )
df_multiple_col.show()

StatementMeta(, 79a031ad-bee4-40ca-a8a9-f8e54237d3e8, 26, Finished, Available, Finished)

+---------------+----------------+-----------+---------------+--------------------+
|Item_Identifier|Item_Fat_Content|Item_Weight|Item_Visibility|           Item_Type|
+---------------+----------------+-----------+---------------+--------------------+
|          FDA15|         Low Fat|        9.3|    0.016047301|               Dairy|
|          DRC01|         Regular|       5.92|    0.019278216|         Soft Drinks|
|          FDN15|         Low Fat|       17.5|    0.016760075|                Meat|
|          FDX07|         Regular|       19.2|              0|Fruits and Vegeta...|
|          NCD19|         Low Fat|       8.93|              0|           Household|
|          FDP36|         Regular|     10.395|              0|        Baking Goods|
|          FDO10|         Regular|      13.65|    0.012741089|         Snack Foods|
|          FDP10|         Low Fat|       NULL|    0.127469857|         Snack Foods|
|          FDH17|         Regular|       16.2|    0.016687114|        Frozen

# 3. Listing All Columns in a DataFrame
### To get a list of all the column names:

In [25]:
#get all column name
df.columns

StatementMeta(, 79a031ad-bee4-40ca-a8a9-f8e54237d3e8, 27, Finished, Available, Finished)

['Item_Identifier',
 'Item_Weight',
 'Item_Fat_Content',
 'Item_Visibility',
 'Item_Type',
 'Item_MRP',
 'Outlet_Identifier',
 'Outlet_Establishment_Year',
 'Outlet_Size',
 'Outlet_Location_Type',
 'Outlet_Type',
 'Item_Outlet_Sales']

# 4.Renaming Columns with alias()
###### You can rename columns using the alias() method:

In [26]:
df.select(
    col("Item_Identifier").alias('Item_Id'), 
    col("Outlet_Identifier").alias('Outlet_Id'),
    "Item_MRP",
    column("Outlet_Type"),
    df.Item_Type
).show()

StatementMeta(, 79a031ad-bee4-40ca-a8a9-f8e54237d3e8, 28, Finished, Available, Finished)

+-------+---------+--------+-----------------+--------------------+
|Item_Id|Outlet_Id|Item_MRP|      Outlet_Type|           Item_Type|
+-------+---------+--------+-----------------+--------------------+
|  FDA15|   OUT049|249.8092|Supermarket Type1|               Dairy|
|  DRC01|   OUT018| 48.2692|Supermarket Type2|         Soft Drinks|
|  FDN15|   OUT049| 141.618|Supermarket Type1|                Meat|
|  FDX07|   OUT010| 182.095|    Grocery Store|Fruits and Vegeta...|
|  NCD19|   OUT013| 53.8614|Supermarket Type1|           Household|
|  FDP36|   OUT018| 51.4008|Supermarket Type2|        Baking Goods|
|  FDO10|   OUT013| 57.6588|Supermarket Type1|         Snack Foods|
|  FDP10|   OUT027|107.7622|Supermarket Type3|         Snack Foods|
|  FDH17|   OUT045| 96.9726|Supermarket Type1|        Frozen Foods|
|  FDU28|   OUT017|187.8214|Supermarket Type1|        Frozen Foods|
|  FDY07|   OUT049| 45.5402|Supermarket Type1|Fruits and Vegeta...|
|  FDA03|   OUT046|144.1102|Supermarket Type1|  