Friday, November 18, 2022

How to Rename columns in DataFrame using PySpark

 There are multiple ways to rename columns in dataframe using PySpark.

  1. withColumnRenamed
    1. df = df.withColumnRenamed("Old_ColumnName1", "New_ColumnName1").withColumnRenamed("Old_ColumnName2", "New_ColumnName2")
  2. selectExpr
    1. df = df.selectExpr("Old_ColumnName1 AS NewColumnName1","Old_ColumnName2 AS NewColumnName2")
  3. select(col().alias(), col())
    1. df2 = df.select(col("Old_ColumnName1").alias("NewColumnName1"), col("Old_ColumnName2"))

No comments:

Post a Comment