Posts

Showing posts from April, 2020

How to change the Map Key with another Key's Value of Same Map in Scala.

Image
Hi Friends, Today I'd like to show that how we can change or update the key of a Map Value with another key's value of Same Map. In this below example I'll change the key  rplc_key with value of key   obj_class. Input Map : Map(obj_class -> Hardware, A -> 1, rplc_key -> Map(No -> 1, Status -> Fine) Output Map : Map(obj_class -> Hardware, A -> 1, Hardware -> Map(No -> 1, Status -> Fine) Below is the Code : object UpdateMapKey extends App {   //Creating Input Map    val inputMap: scala.collection.immutable.Map[Any,Any] = Map("obj_class" -> "Hardware", "A" -> "1", "rplc_key" -> Map("No" -> "1", "Status" -> "Fine"))   println(inputMap)   //Updating Map Key with Value of another Key   val outputMapWithUpdatedKey = (inputMap ++ Map((inputMap.get("obj_class").getOrElse(None), inputMap.get("rplc_key"

How To Convert a DataFrame to Map Column in Spark-Scala.

Image
Hi Friends, In this post, I'd like to explore a small use case of convert/create Spark DataFrame to Map. I'll use Spark map function to get it. Apparently the  precondition  is,  we must have all the columns with same type  otherwise we will get spark error. in This example all columns are String Type. Input DataFrame : Output DataFrame : 1.  Create a Map column in existed DataFrame, Map key should be column Name, Map Value should be column value. Example output with created new column  map_value : 2.  Create a Map column with Nested Map in existed DataFrame, for All Map values, key should be column Name and for one Nested Map Value, key should be another column value instead of column name. Example output with created new column  map_value : Below is the code with output  for above each use cases : import org.apache.spark.SparkConf import org.apache.spark.sql.SparkSession import org.apache.spark.sql.functions._ object DfToMap extends App {

How to Convert Map Data to Spark DataFrame in Scala

Image
Hi Friends, In this post, I'd like to explore that how to create DataFrame from Map Data as below two use cases  : Input Map Data : Map(Id -> 111, Name -> Anamika, City -> Bangalore, State -> Karnataka) Map(Id -> 222, Name -> Divyanshu, City -> Bangalore, State -> Karnataka) Map(Id -> 333, Name -> Himanshu, City -> Gandhi Nagar, State -> Gujrat) Map(Id -> 444, Name -> Priyanshu, City -> Allahabad, State -> Uttar Pradesh) Map(Id -> 555, Name -> Pranjal, City -> Allahabad, State -> Uttar Pradesh) Case 1. Create DataFrame from Map Data and mapping with keys as column names Expected Output for case 1 :  Case 2. Create DataFrame (Map Columns [key, value]) from Map Data and mapping with keys as column names   Expected Output for case 2 : Below is the code with explanation : import org.apache.spark.sql.SparkSession import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.rdd

How to Convert Map Data into Key - Value Columns DataFrame in Scala.

Image
Hi Friends, In this post, I'd like to explain that how to convert a Map data to a Spark DataFrame having columns named key and value which contains key and value data from the given Map Data. Input Data :   Map( "Id" ->  "111" , "Name"  ->  "Anamika Singh" , "City"  ->   "Bangalore") Output DataFrame : Below is the code with explanation to achieve the above output from the given Map data. import org.apache.spark.sql.SparkSession import org.apache.spark.SparkConf import org.apache.spark.sql.functions._ object ConvertMapToColumn extends App {   //Creating SparkSession   lazy val conf = new SparkConf().setAppName("map-to-key-value-column").setIfMissing("spark.master", "local[*]")   lazy val sparkSession = SparkSession.builder().config(conf).getOrCreate()   import sparkSession.implicits._   //Creating Input Map to Convert into DataFrame  

How to Convert a Spark DataFrame to Map in Scala

Image
Hi Friends, In this post, I am going to explain that how we can convert Spark DataFrame to Map. Input DataFrame : Output map : Below is the explained code with all the steps along with its Output.   import org.apache.spark.SparkConf import org.apache.spark.sql.{Row, SparkSession} import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType} object ConvertDFToMap {   def main(args: Array[String]) {     //Creating SparkSession     lazy val conf = new SparkConf().setAppName("DataFrame-to-Map").set("spark.default.parallelism", "1")       .setIfMissing("spark.master", "local[*]")     lazy val sparkSession = SparkSession.builder().config(conf).getOrCreate()     // Creating Seq of Raw Data to create DataFrame to test     val rawData = Seq(       Row(1, "Anamika", "Uttar Pradesh"),       Row(2, "Ananya", "Delhi"),       Row(3, "Ambika",

How to Convert a Spark DataFrame String Type Column to Array Type and Split all the json files of this column into rows : Part - 2

Image
Hi Friends, In this post, I'd like to explore a project scenario of json data.  Suppose, We are getting a DataFrame from Source which has a column ArrayOfJsonStrings, which is actually an Array of Json files/data, but Data Type of this Column is String. We need to Split All the json files of this ArrayOfJsonStrings column into possible  number of  rows. This above use case has been already detailed explained in this  previous post . In This Post I'll explain with another approach to solve the same use case and get expected output. Below is the Input and Output DataFrames : Input DataFrame : Output DataFrame : Below is the code with explanation and output.  import org.apache.spark.SparkConf import org.apache.spark.sql.SparkSession import org.apache.spark.sql.functions._ object SplitArrayOfJsonsToRows {   def main(args: Array[String]) {     // Creating SparkSession   lazy val conf = new SparkConf().setAppName("split-array-of-json-to

How to Convert a Spark DataFrame String Type Column to Array Type and Split all the json files of this column into rows : Part - 1

Image
Hi Friends, In this post, I'd like to explore a project scenario of json data. Suppose, We are getting a DataFrame from Source which has a column ArrayOfJsonStrings, which is actually an Array of Json files/data, but Data Type of this Column is String. We need to Split All the json files of this ArrayOfJsonStrings column into possible number of rows. Below is the Input and Output DataFrames : Input DataFrame : Output DataFrame : Below is the code for the same with explained steps : import org.apache.spark.SparkConf import org.apache.spark.sql.SparkSession import org.apache.spark.sql.functions._ object SplitArrayOfJsonsToRows {   def main(args: Array[String]) {     //Creating SparkSession     lazy val conf = new SparkConf().setAppName("split-array-of-json-to-row").set("spark.default.parallelism", "1")       .setIfMissing("spark.master", "local[*]")     lazy val sparkSession = SparkSession.builder().conf

How to Create a Map Column with different type of key from a Spark DataFrame in Scala

Image
Hi Friends, In this post, I'd like to explore a small use case of convert/create Spark DataFrame to Map. Apparently the  precondition  is,  we must have all the columns with same type  otherwise we will get spark error. 1.  Create a Map column in existed DataFrame, Map key should be column Name, Map Value should be column value. Example output with created new column created_map : 2.  Create a Map column in existed DataFrame, for All Map values, key should be column Name and for one Map Value, key should be another column value instead of column name. Example output with created new column  created_map_key_from_value : Example output with created new column  created_map_key_from_another_column_value : Below is the code with output  for above each use cases : 1.  Create a Map column in existed DataFrame, Map key should be column Name, Map Value should be column value. import org.apache.spark.SparkConf import org.apache.spark.sql.{Column, SparkSession}

How to Handle and Convert DateTime format in Spark-Scala.

Image
Hi Friends, In this post, I'd like to explain that how to handle and convert the date format in expected date format. Below I'm providing few examples : 1. Change Date format to yyyy-mm-dd from dd-mm-yyyy or mm-dd-yyyy : Input Date :  03-04-2020T15:26:51+05:30 Converted Output : 2020-04-03T15:26:051 Code to get the same output : def changeDateFormat (inputDateString: String ) = { var convertedDateFormat = "" try { val inputDateFormat: SimpleDateFormat = new SimpleDateFormat( "dd-MM-yyyy'T'HH:mm:sss" ) val date = inputDateFormat.parse(inputDateString) val df = new SimpleDateFormat( "yyyy-MM-dd'T'HH:mm:sss" ) convertedDateFormat = df.format(date) } catch { case ex: Exception => { "Exception is : " + ex } logger .info( "[HandleDateTimeUtils] Exception while converting time to 'yyyy-MM-dd'T'HH:mm:sss' Format. Input is: " + inputDateString , ex.getMessag