How to Convert Map Data to Spark DataFrame in Scala
Hi Friends,
In this post, I'd like to explore that how to create DataFrame from Map Data as below two use cases :
Input Map Data :
Map(Id -> 111, Name -> Anamika, City -> Bangalore, State -> Karnataka)
Case 1. Create DataFrame from Map Data and mapping with keys as column names
Expected Output for case 1 :
Case 2. Create DataFrame (Map Columns [key, value]) from Map Data and mapping with keys as column names
Expected Output for case 2 :
Below is the code with explanation :
import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.RDD
object ConvertMapToDataFrame extends App {
//Creating SparkSession
lazy val conf = new SparkConf().setAppName("map-to-dataframe").setIfMissing("spark.master", "local[*]")
lazy val sparkSession = SparkSession.builder().config(conf).getOrCreate()
lazy val sparkContext: SparkContext = sparkSession.sparkContext
import sparkSession.implicits._
//Creating Input Map to Convert into DataFrame
val rawMapRDD: RDD[Map[String, String]] = sparkContext.parallelize(Seq(
Map("Id" -> "111", "Name" -> "Anamika", "City" -> "Bangalore", "State" -> "Karnataka"),
Map("Id" -> "222", "Name" -> "Divyanshu", "City" -> "Bangalore", "State" -> "Karnataka"),
Map("Id" -> "333", "Name" -> "Himanshu", "City" -> "Gandhi Nagar", "State" -> "Gujrat"),
Map("Id" -> "444", "Name" -> "Priyanshu", "City" -> "Allahabad", "State" -> "Uttar Pradesh"),
Map("Id" -> "555", "Name" -> "Pranjal", "City" -> "Allahabad", "State" -> "Uttar Pradesh")))
//Printing above RDD : rawMapRDD
rawMapRDD.foreach(println)
//Getting column names from RDD to create DataFrame
val getColumns = rawMapRDD.take(1).flatMap(x => x.keys)
//Printing column names
getColumns.foreach(println)
//Case 1. Creating DataFrame from Map Data and mapping with column names from getColumns
val finalDF = rawMapRDD.map { value =>
val dataList = value.values.toList
(dataList(0), dataList(1), dataList(2), dataList(3))
}.toDF(getColumns: _*)
//Printing DataFrame
finalDF.show(false)
//Case 2. Creating DataFrame (Map Columns [key, value]) from Map Data and mapping with column names from getColumns
val finalMapDF = rawMapRDD.map { value =>
val dataList = value.toList
(dataList(0), dataList(1), dataList(2), dataList(3))
}.toDF(getColumns: _*)
//Printing Map Column's DataFrame
finalMapDF.show(false)
}
Output DataFrames from above code :
I hope, this post was helpful. Please do like, comment and share.
Thank You!
In this post, I'd like to explore that how to create DataFrame from Map Data as below two use cases :
Input Map Data :
Map(Id -> 111, Name -> Anamika, City -> Bangalore, State -> Karnataka)
Map(Id -> 222, Name -> Divyanshu, City -> Bangalore, State -> Karnataka)
Map(Id -> 333, Name -> Himanshu, City -> Gandhi Nagar, State -> Gujrat)
Map(Id -> 444, Name -> Priyanshu, City -> Allahabad, State -> Uttar Pradesh)
Map(Id -> 555, Name -> Pranjal, City -> Allahabad, State -> Uttar Pradesh)
Case 1. Create DataFrame from Map Data and mapping with keys as column names
Expected Output for case 1 :
Case 2. Create DataFrame (Map Columns [key, value]) from Map Data and mapping with keys as column names
Expected Output for case 2 :
Below is the code with explanation :
import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.RDD
object ConvertMapToDataFrame extends App {
//Creating SparkSession
lazy val conf = new SparkConf().setAppName("map-to-dataframe").setIfMissing("spark.master", "local[*]")
lazy val sparkSession = SparkSession.builder().config(conf).getOrCreate()
lazy val sparkContext: SparkContext = sparkSession.sparkContext
import sparkSession.implicits._
//Creating Input Map to Convert into DataFrame
val rawMapRDD: RDD[Map[String, String]] = sparkContext.parallelize(Seq(
Map("Id" -> "111", "Name" -> "Anamika", "City" -> "Bangalore", "State" -> "Karnataka"),
Map("Id" -> "222", "Name" -> "Divyanshu", "City" -> "Bangalore", "State" -> "Karnataka"),
Map("Id" -> "333", "Name" -> "Himanshu", "City" -> "Gandhi Nagar", "State" -> "Gujrat"),
Map("Id" -> "444", "Name" -> "Priyanshu", "City" -> "Allahabad", "State" -> "Uttar Pradesh"),
Map("Id" -> "555", "Name" -> "Pranjal", "City" -> "Allahabad", "State" -> "Uttar Pradesh")))
//Printing above RDD : rawMapRDD
rawMapRDD.foreach(println)
//Getting column names from RDD to create DataFrame
val getColumns = rawMapRDD.take(1).flatMap(x => x.keys)
//Printing column names
getColumns.foreach(println)
//Case 1. Creating DataFrame from Map Data and mapping with column names from getColumns
val finalDF = rawMapRDD.map { value =>
val dataList = value.values.toList
(dataList(0), dataList(1), dataList(2), dataList(3))
}.toDF(getColumns: _*)
//Printing DataFrame
finalDF.show(false)
//Case 2. Creating DataFrame (Map Columns [key, value]) from Map Data and mapping with column names from getColumns
val finalMapDF = rawMapRDD.map { value =>
val dataList = value.toList
(dataList(0), dataList(1), dataList(2), dataList(3))
}.toDF(getColumns: _*)
//Printing Map Column's DataFrame
finalMapDF.show(false)
}
Output DataFrames from above code :
I hope, this post was helpful. Please do like, comment and share.
Thank You!
Comments
Post a Comment