How to Convert Map Data to Spark DataFrame in Scala

Hi Friends,

In this post, I'd like to explore that how to create DataFrame from Map Data as below two use cases  :

Input Map Data :

Map(Id -> 111, Name -> Anamika, City -> Bangalore, State -> Karnataka)
Map(Id -> 222, Name -> Divyanshu, City -> Bangalore, State -> Karnataka)
Map(Id -> 333, Name -> Himanshu, City -> Gandhi Nagar, State -> Gujrat)
Map(Id -> 444, Name -> Priyanshu, City -> Allahabad, State -> Uttar Pradesh)
Map(Id -> 555, Name -> Pranjal, City -> Allahabad, State -> Uttar Pradesh)


Case 1. Create DataFrame from Map Data and mapping with keys as column names

Expected Output for case 1 : 




Case 2. Create DataFrame (Map Columns [key, value]) from Map Data and mapping with keys as column names  

Expected Output for case 2 :


Below is the code with explanation :

import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.RDD

object ConvertMapToDataFrame extends App {


  //Creating SparkSession
  lazy val conf = new SparkConf().setAppName("map-to-dataframe").setIfMissing("spark.master", "local[*]")
  lazy val sparkSession = SparkSession.builder().config(conf).getOrCreate()
  lazy val sparkContext: SparkContext = sparkSession.sparkContext
  import sparkSession.implicits._

  //Creating Input Map to Convert into DataFrame
  val rawMapRDD: RDD[Map[String, String]] = sparkContext.parallelize(Seq(
    Map("Id" -> "111", "Name" -> "Anamika", "City" -> "Bangalore", "State" -> "Karnataka"),
    Map("Id" -> "222", "Name" -> "Divyanshu", "City" -> "Bangalore", "State" -> "Karnataka"),
    Map("Id" -> "333", "Name" -> "Himanshu", "City" -> "Gandhi Nagar", "State" -> "Gujrat"),
    Map("Id" -> "444", "Name" -> "Priyanshu", "City" -> "Allahabad", "State" -> "Uttar Pradesh"),
    Map("Id" -> "555", "Name" -> "Pranjal", "City" -> "Allahabad", "State" -> "Uttar Pradesh")))

  //Printing above RDD : rawMapRDD
  rawMapRDD.foreach(println)


  //Getting column names from RDD to create DataFrame
  val getColumns = rawMapRDD.take(1).flatMap(x => x.keys)

  //Printing column names
  getColumns.foreach(println)

  //Case 1. Creating DataFrame from Map Data and mapping with column names from getColumns
  val finalDF = rawMapRDD.map { value =>
    val dataList = value.values.toList
    (dataList(0), dataList(1), dataList(2), dataList(3))
  }.toDF(getColumns: _*)

  //Printing DataFrame
  finalDF.show(false)


  
  //Case 2. Creating DataFrame (Map Columns [key, value]) from Map Data and mapping with column names from getColumns
  val finalMapDF = rawMapRDD.map { value =>
    val dataList = value.toList
    (dataList(0), dataList(1), dataList(2), dataList(3))
  }.toDF(getColumns: _*)

 //Printing Map Column's DataFrame
  finalMapDF.show(false)


}


Output DataFrames from above code :







I hope, this post was helpful. Please do like, comment and share.

Thank You!

Comments

Popular posts from this blog

Transformations and Actions in Spark

How to convert XML String to JSON String in Spark-Scala

How to Convert a Spark DataFrame to Map in Scala