There are multiple ways to rename Spark Data Frame Columns or Expressions.
* We can rename column or expression using `alias` as part of `select`
* We can add or rename column or expression using `withColumn` on top of Data Frame.
* We can rename one column at a time using `withColumnRenamed` on top of Data Frame.
* We typically use `withColumn` to perform row level transformations and then to provide a name to the result. If we provide the same name as existing column, then the column will be replaced with new one.
* If we want to just rename the column then it is better to use `withColumnRenamed`.
* If we want to apply any transformation, we need to either use `select` or `withColumn`
* We can rename bunch of columns using `toDF`.

In [0]:
%run "./01_Create_Df_Template"

In [0]:
# Naming Derived Cols Using With Column
# Using Concat and lit and select
from pyspark.sql.functions import col, concat, lit
usersDf.select(
    "id", "first_name", "last_name",
    (concat(col("first_name"), lit(", "), col("last_name"))).alias("full_name")
).show()

+---+----------+------------+--------------------+
| id|first_name|   last_name|           full_name|
+---+----------+------------+--------------------+
|  1|    Corrie|Van den Oord|Corrie, Van den Oord|
|  2|  Nikolaus|     Brewitt|   Nikolaus, Brewitt|
|  3|    Orelie|      Penney|      Orelie, Penney|
|  4|     Ashby|    Maddocks|     Ashby, Maddocks|
|  5|      Kurt|        Rome|          Kurt, Rome|
+---+----------+------------+--------------------+





+---+----------+------------+--------------------+--------------------+-------+-----------+-----------+-------------+-------------------+
| id|first_name|   last_name|               email|       phone_numbers|courses|is_customer|amount_paid|customer_from|    last_updated_ts|
+---+----------+------------+--------------------+--------------------+-------+-----------+-----------+-------------+-------------------+
|  1|    Corrie|Van den Oord|cvandenoord0@etsy...|{+1 234 567 8901,...| [1, 2]|       true|    1000.55|   2021-01-15|2021-02-10 01:15:00|
|  2|  Nikolaus|     Brewitt|nbrewitt1@dailyma...|{+1 234 567 8923,...|    [3]|       true|      900.0|   2021-02-14|2021-02-18 03:33:00|
|  3|    Orelie|      Penney|openney2@vistapri...|{+1 714 512 9752,...| [2, 4]|       true|     850.55|   2021-01-21|2021-03-15 15:16:55|
|  4|     Ashby|    Maddocks|  amaddocks3@home.pl|        {null, null}|     []|      false|        NaN|         null|2021-04-10 17:45:30|
|  5|      Kurt|        Rome|krome

In [0]:
# Using withColumn syntax
usersDf.select("id", "first_name", "last_name") \
    .withColumn("full_name", concat(col("first_name"), lit(", "), col("last_name"))) \
    .show()

+---+----------+------------+--------------------+
| id|first_name|   last_name|           full_name|
+---+----------+------------+--------------------+
|  1|    Corrie|Van den Oord|Corrie, Van den Oord|
|  2|  Nikolaus|     Brewitt|   Nikolaus, Brewitt|
|  3|    Orelie|      Penney|      Orelie, Penney|
|  4|     Ashby|    Maddocks|     Ashby, Maddocks|
|  5|      Kurt|        Rome|          Kurt, Rome|
+---+----------+------------+--------------------+



In [0]:
help(usersDf.withColumn)

Help on method withColumn in module pyspark.sql.dataframe:

withColumn(colName: str, col: pyspark.sql.column.Column) -> 'DataFrame' method of pyspark.sql.dataframe.DataFrame instance
    Returns a new :class:`DataFrame` by adding a column or replacing the
    existing column that has the same name.
    
    The column expression must be an expression over this :class:`DataFrame`; attempting to add
    a column from some other :class:`DataFrame` will raise an error.
    
    .. versionadded:: 1.3.0
    
    .. versionchanged:: 3.4.0
        Support Spark Connect.
    
    Parameters
    ----------
    colName : str
        string, name of the new column.
    col : :class:`Column`
        a :class:`Column` expression for the new column.
    
    Returns
    -------
    :class:`DataFrame`
        DataFrame with new or replaced column.
    
    Notes
    -----
    This method introduces a projection internally. Therefore, calling it multiple
    times, for instance, via loops in order to a

In [0]:
from pyspark.sql.functions import size
usersDf.select("id", "courses") \
    .withColumn("course_count", size("courses")) \
    .show()

+---+-------+------------+
| id|courses|course_count|
+---+-------+------------+
|  1| [1, 2]|           2|
|  2|    [3]|           1|
|  3| [2, 4]|           2|
|  4|     []|           0|
|  5|     []|           0|
+---+-------+------------+



In [0]:
help(usersDf.withColumnRenamed)

Help on method withColumnRenamed in module pyspark.sql.dataframe:

withColumnRenamed(existing: str, new: str) -> 'DataFrame' method of pyspark.sql.dataframe.DataFrame instance
    Returns a new :class:`DataFrame` by renaming an existing column.
    This is a no-op if the schema doesn't contain the given column name.
    
    .. versionadded:: 1.3.0
    
    .. versionchanged:: 3.4.0
        Support Spark Connect.
    
    Parameters
    ----------
    existing : str
        string, name of the existing column to rename.
    new : str
        string, new name of the column.
    
    Returns
    -------
    :class:`DataFrame`
        DataFrame with renamed column.
    
    Examples
    --------
    >>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", "name"])
    >>> df.withColumnRenamed('age', 'age2').show()
    +----+-----+
    |age2| name|
    +----+-----+
    |   2|Alice|
    |   5|  Bob|
    +----+-----+



In [0]:
# Renaming Cols withColumnRenamed
# If we want to rename bunch of cols the we need to use toDF
usersDf.select("id", "first_name") \
    .withColumnRenamed("id", "user_id") \
    .withColumnRenamed("first_name", "user_first_name") \
    .show()

+-------+---------------+
|user_id|user_first_name|
+-------+---------------+
|      1|         Corrie|
|      2|       Nikolaus|
|      3|         Orelie|
|      4|          Ashby|
|      5|           Kurt|
+-------+---------------+



In [0]:
# Renaming using alais
from pyspark.sql.functions import lit, col, concat

usersDf.select(
    col("id").alias("user_id"), 
    col("first_name").alias("user_first_name"),
    col("last_name").alias("user_last_name"),
    concat(col("first_name"), lit(", "), col("last_name")).alias("user_full_name")) \
    .show()

+-------+---------------+--------------+--------------------+
|user_id|user_first_name|user_last_name|      user_full_name|
+-------+---------------+--------------+--------------------+
|      1|         Corrie|  Van den Oord|Corrie, Van den Oord|
|      2|       Nikolaus|       Brewitt|   Nikolaus, Brewitt|
|      3|         Orelie|        Penney|      Orelie, Penney|
|      4|          Ashby|      Maddocks|     Ashby, Maddocks|
|      5|           Kurt|          Rome|          Kurt, Rome|
+-------+---------------+--------------+--------------------+



In [0]:
# Renaming and Reordering mulitple Cols
required_cols=["id", "first_name", "last_name"]
tgt_cols=["user_id","user_first_name", "user_last_name"]

usersDf.select(required_cols).toDF(*tgt_cols).show()

+-------+---------------+--------------+
|user_id|user_first_name|user_last_name|
+-------+---------------+--------------+
|      1|         Corrie|  Van den Oord|
|      2|       Nikolaus|       Brewitt|
|      3|         Orelie|        Penney|
|      4|          Ashby|      Maddocks|
|      5|           Kurt|          Rome|
+-------+---------------+--------------+

