Interacting Spark with Hive




My dear friends, today we will learn that in Spark how to interact with hive and do whatever we want!


Sample data to use for Spark-Hive Interaction :

File Name : Hive_Test.csv
111,Raghav,Delhi
222,Hemant,Bangalore
333,Anushka,Chennai
444,John,Mumbai
555,Karan,Delhi




 If we want to work with Hive using Spark, then we can do it using below commands :
  

First we need to import below classes :

import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession
import spark.implicits._
import spark.sql
  



Now we have to create a variable warehouseLocation.

val warehouseLocation = "spark-warehouse"




  
Next, we need to run below command to get required all configuration :

val Hive_with_Spark = SparkSession.builder().appName("Spark Hive Example").
config("spark.sql.warehouse.dir", warehouseLocation).
enableHiveSupport().getOrCreate() 
 


Fianally, We are ready to work with Hive using Spark as below commands :


Here, first I’ll show you in hive that there is no database named learning.



But after run the below command we will be able to find this database learning along with table spark_table and whatever we did in Hive. Let’s start!


Using below code, I’ll create the database learning, then using next command create the hive table spark_table and load the data into it from local, then see the output of table.



sql("create database learning") 

sql("use learning")

sql("CREATE TABLE IF NOT EXISTS spark_table (Emp_No int, Emp_Name string, Emp_City string)row format delimited fields terminated by ','")

sql("LOAD DATA LOCAL INPATH '/home/g.aspiredev.001/hive_test.csv' INTO TABLE spark_table")

sql("SELECT * FROM spark_table ").show()




Finally, we got our desired output, let’s verify in hive :





                                                      **********************************

Same above output we can achieve using below codes too, where we are using Spark variables to store all this process, which can be use with another variables for different operations or store to any different locations  as per our needs.



Here I’ll show you the same above code using Spark variables with little highlighted difference, but both will give the same output.



1. Create Spark Variables using above created variable Hive_with_Spark. 

val create_db = Hive_with_Spark.sql("create database learning1")

val use_db = Hive_with_Spark.sql("use learning1")

val create_tbl = Hive_with_Spark.sql("CREATE TABLE IF NOT EXISTS spark_table1 (Emp_No int, Emp_Name string, Emp_City string)row format delimited fields terminated by ','")

val load_data = Hive_with_Spark.sql("LOAD DATA LOCAL INPATH '/home/g.aspiredev.001/hive_test.csv' INTO TABLE spark_table1")

val select_data = Hive_with_Spark.sql("SELECT * FROM spark_table1 ")

val show_data = select_data.show()






 Let’s verify in hive (database learning1 and table spark_table1 created and loaded data)  :





2.  Create Spark Variables without using above created variable Hive_with_Spark. 

val create_db = sql("create database learning2")

val use_db = sql("use learning2")

val create_tbl = sql("CREATE TABLE IF NOT EXISTS spark_table2 (Emp_No int, Emp_Name string, Emp_City string)row format delimited fields terminated by ','")

val load_data = sql("LOAD DATA LOCAL INPATH '/home/g.aspiredev.001/hive_test.csv' INTO TABLE spark_table2")

val select_data = sql("SELECT * FROM spark_table2 ")

val show_data = select_data.show()

Comments

  1. Should not it be df=Hive_with_Spark.sql("sql query")

    ReplyDelete
  2. We can also communicate directly sqlContext.sql(sql query)..

    Hive-site.xml needs to be in spark/conf

    ReplyDelete

Post a Comment

Popular posts from this blog

Transformations and Actions in Spark

Knowledge about Apache Sqoop and its all basic commands to import and export the Data

How to Convert a Spark DataFrame String Type Column to Array Type and Split all the json files of this column into rows : Part - 1