Posts

Showing posts from February, 2020

How to create a Singleton SparkSession in Scala Application ?

Image
Hi Friends, Today I am going to explain how to create Singleton SparkSession in Scala Application. We can create SparkSession method in a class which returns SparkSession and call that in main class. Below is the code for the same, in which I've used Logger also to log the info. import java.util.Properties import org.apache.log4j.PropertyConfigurator import org.apache.spark.SparkConf import org.apache.spark.sql.SparkSession import org.slf4j.LoggerFactory /** * Created by anamika_singh on 2/13/2020. */ class SingletonSparkSession { val logger = LoggerFactory. getLogger ( classOf [SingletonSparkSession]) //Read the properties file and store the properties to use during processing //Creating connection to read provided variable's value in properties file. val connectionParam = new Properties connectionParam .load(getClass().getResourceAsStream( "/generic.properties" )) PropertyConfigurator. configure ( connectionParam ) //Get the require

How to read the value from properties file in Scala to make a Generic Application ?

Image
Hello Friends, Due to request of  many friends, today I'm going to explain how to read the value from properties file in Scala to make a Application generic, where we can pass the value based upon requirement and Environment without any changes in code. This is really very helpful to maintain the project and we can use the same application with few configuration level changes for a similar requirement. Same Application we can use/test in different environments to perform the testing and the environment level details we can provide into properties file. For example, let's say I've a Scala project config-file-test-in-scala  which inserts data into hive table hive_test_1 . With the same logic again if we have to work for different module/data and have to insert the data in hive table hive_test_2 . then hive table name we can change/get from properties file and reuse the same application without any code change. In properties file we provide the data in key=value pair

A Sample use case for Spark SQL with solution in Scala

Use Case We have a given Mobile log files from mobile sdks, where each line is in below format : orgGpackageGmobileIdHnameH1 where org is organization name package is android application package name mobileId is mobile unique identifier name is application name corresponding to package All above fields are always in lower case Application name for any given package is not unique, Example we may have the following 2 entries for the same package  airtelGcom.whatsappG1Hwhatsapp messengerH1 airtelGcom.whatsappG2Hwhatsapp indiaH1 Expected Output: Output_1: org G package G total count for the package For above example : airtelGcom.whatsappG3 Output_2: org G package G list of app names aliases For above example : airtelGcom.whatsappGwhatsapp messenger, whatsapp india Below is the Sample Data which I'll save as log.txt file to use. airtelGcom.whatsappG1Hwhatsapp messengerH1 airtelGcom.whatsappG2Hwhatsapp indiaH1 airtelGcom.ubercabG1Huber indiaH1