Friday, November 18, 2022

How to ADD New Columns in DataFrame using PySpark

 Below are different ways to add new columns to dataframe in PySpark:

  1. withColumn and lit
    1. df.withColumn("NewColumnName", lit("default value for new column"))
  2. withColumn and col (Derived column)
    1. df.withColumn("NewColumnName", col("Column1") * col("Column2"))
  3. select
    1. df.select(lit("default column value").alias("NewColumnName"), col("Column1"), col("Column2"))

No comments:

Post a Comment