How To Convert a DataFrame to Map Column in Spark-Scala.

Hi Friends,

In this post, I'd like to explore a small use case of convert/create Spark DataFrame to Map.

I'll use Spark map function to get it. Apparently the precondition is, we must have all the columns with same type otherwise we will get spark error. in This example all columns are String Type.

Input DataFrame :


Output DataFrame :

1.  Create a Map column in existed DataFrame, Map key should be column Name, Map Value should be column value.

Example output with created new column map_value :



2.  Create a Map column with Nested Map in existed DataFrame, for All Map values, key should be column Name and for one Nested Map Value, key should be another column value instead of column name.

Example output with created new column map_value :


Below is the code with output  for above each use cases :

import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._

object DfToMap extends App {

  //Creating SparkSession
  lazy val conf = new SparkConf().setAppName("df-to-map").set("spark.default.parallelism", "2")
    .setIfMissing("spark.master", "local[*]")
  lazy val sparkSession = SparkSession.builder().config(conf).getOrCreate()
  import sparkSession.implicits._

  //Creating raw DataFrame
  val rawDF = Seq(("Samsung", "L", "68", "Delhi", "India"), ("Sony", "XL", "100", "Tokyo", "Japan"))
    .toDF("brand", "size", "sales", "city", "country")

  rawDF.show(false)


  //Final Map with created Map Column (map_value) where Column_Name as Key and Value as Value
  val finalMapwithAllColumnsAsKey = rawDF.withColumn("map_value",map(lit("brand"),$"brand",lit("size"),$"size",lit("sales"),$"sales",
    lit("city"),$"city",lit("country"), $"country"))

  finalMapwithAllColumnsAsKey.show(false)

  //Final Map with created Map Column (map_value) where Column_Name as Key and Value as Value along with a Nested Map where Column Value as Key of this Map.
  val finalMapwithCountryValueAsKeyForInnermap = rawDF.withColumn("map_value",map(lit("brand"),$"brand",lit("size"),$"size",lit("sales"),$"sales",
    lit("city"),$"city",lit("country"), $"country", $"country",map(lit("brand"),$"brand",lit("size"),$"size",lit("sales"),$"sales",
    lit("city"),$"city",lit("country"), $"country")
   .cast("string"))).select("map_value")

  finalMapwithCountryValueAsKeyForInnermap.show(false)


}


Output with above code :





I Hope, This post was helpful, please do like, comment and share.  

Thank You!

Comments

Popular posts from this blog

Transformations and Actions in Spark

How to Convert a Spark DataFrame to Map in Scala

How to Handle and Convert DateTime format in Spark-Scala.