Interacting Spark with Hive

December 24, 2018

My dear friends, today we will learn that in Spark how to interact with hive and do whatever we want!

Sample data to use for Spark-Hive Interaction :

File Name : Hive_Test.csv

111,Raghav,Delhi

222,Hemant,Bangalore

333,Anushka,Chennai

444,John,Mumbai

555,Karan,Delhi

If we want to work with Hive using Spark, then we can do it using below commands :

First we need to import below classes :

import org.apache.spark.sql.Row

import org.apache.spark.sql.SparkSession

import spark.implicits._

import spark.sql

Now we have to create a variable warehouseLocation.

val warehouseLocation = "spark-warehouse"

Next, we need to run below command to get required all configuration :

val Hive_with_Spark = SparkSession.builder().appName("Spark Hive Example").

config("spark.sql.warehouse.dir", warehouseLocation).

enableHiveSupport().getOrCreate()

Fianally, We are ready to work with Hive using Spark as below commands :

Here, first I’ll show you in hive that there is no database named learning.

But after run the below command we will be able to find this database learning along with table spark_table and whatever we did in Hive. Let’s start!

Using below code, I’ll create the database learning, then using next command create the hive table spark_table and load the data into it from local, then see the output of table.

sql("create database learning")

sql("use learning")

sql("CREATE TABLE IF NOT EXISTS spark_table (Emp_No int, Emp_Name string, Emp_City string)row format delimited fields terminated by ','")

sql("LOAD DATA LOCAL INPATH '/home/g.aspiredev.001/hive_test.csv' INTO TABLE spark_table")

sql("SELECT * FROM spark_table ").show()

Finally, we got our desired output, let’s verify in hive :

**********************************

Same above output we can achieve using below codes too, where we are using Spark variables to store all this process, which can be use with another variables for different operations or store to any different locations as per our needs.

Here I’ll show you the same above code using Spark variables with little highlighted difference, but both will give the same output.

1. Create Spark Variables using above created variable Hive_with_Spark.

val create_db = Hive_with_Spark.sql("create database learning1")

val use_db = Hive_with_Spark.sql("use learning1")

val create_tbl = Hive_with_Spark.sql("CREATE TABLE IF NOT EXISTS spark_table1 (Emp_No int, Emp_Name string, Emp_City string)row format delimited fields terminated by ','")

val load_data = Hive_with_Spark.sql("LOAD DATA LOCAL INPATH '/home/g.aspiredev.001/hive_test.csv' INTO TABLE spark_table1")

val select_data = Hive_with_Spark.sql("SELECT * FROM spark_table1 ")

val show_data = select_data.show()

Let’s verify in hive (database learning1 and table spark_table1 created and loaded data) :

2. Create Spark Variables without using above created variable Hive_with_Spark.

val create_db = sql("create database learning2")

val use_db = sql("use learning2")

val create_tbl = sql("CREATE TABLE IF NOT EXISTS spark_table2 (Emp_No int, Emp_Name string, Emp_City string)row format delimited fields terminated by ','")

val load_data = sql("LOAD DATA LOCAL INPATH '/home/g.aspiredev.001/hive_test.csv' INTO TABLE spark_table2")

val select_data = sql("SELECT * FROM spark_table2 ")

val show_data = select_data.show()

Let’s verify in hive (database learning2 and table spark_table2 created and loaded data) :

Thank you very much to read this post, kindly share it if you like, and if you have any questions or suggestions kindly leave your comment, I’ll be back to you soon.

Thank You!

Comments

UnknownDecember 24, 2018 at 6:13 AM
Should not it be df=Hive_with_Spark.sql("sql query")
ReplyDelete
Replies
UnknownDecember 24, 2018 at 6:15 AM
We can also communicate directly sqlContext.sql(sql query)..

Hive-site.xml needs to be in spark/conf
ReplyDelete
Replies

Add comment

Search This Blog

Big Data!! Simple Solutions!!!

Interacting Spark with Hive

Comments

Post a Comment

Popular posts from this blog

Knowledge about Apache Sqoop and its all basic commands to import and export the Data

Transformations and Actions in Spark

How to Convert a Spark DataFrame String Type Column to Array Type and Split all the json files of this column into rows : Part - 1