How to convert XML String to JSON String in Spark-Scala


Hello Friends,

Today I'd like to share the method for conversion of XML data to Json data using Scala.
This is done with org.json library which we use to encode or decode JSON data.

Input XML :

<Summary_Log>
<Monitor>
       <HardwareInfo>FD2F</HardwareInfo>
      <Id>0</Id>
      <Monitor_Type>LCD</Monitor_Type>
      <Vendor_Data>110HL</Vendor_Data>
</Monitor>
<Processor>
      <HardwareInfo>FD2F</HardwareInfo>
     <Id>1</Id>
       <Used>80</Used>
     <Utilization>19</Utilization>
</Processor>
</Summary_Log>

Output Json :

{"Summary_Log": {

    "Monitor": {
        "Monitor_Type": "LCD",
        "Vendor_Data": "110HL",
        "HardwareInfo": "FD2F",
        "Id": 0
    },
    "Processor": {
        "Utilization": 19,
        "Used": 80,
        "HardwareInfo": "FD2F",
        "Id": 1
    }
}}



To work with org.json, please add below dependency to your pom.xml file :


<dependency>    
     <groupId>org.json</groupId>    
     <artifactId>json</artifactId>    
     <version>20180813</version>
</dependency>


Program to convert XML Data into Json.

import scala.io.Source
import org.json.XML

object XmlToJsonConverter {


  def main(args: Array[String]) = {


    //Get Input XML to convert into Json.


  //1. If want to read XML file from path

  val xmlFile: String = Source.fromFile("C:/Users/anamika_singh/Desktop/Test.xml").getLines.mkString

  //2. If want to read direct XML String

  val xmlString = "<Summary_Log>\n<Monitor>\n<HardwareInfo>FD2F</HardwareInfo>\n<Id>0</Id>\n<Monitor_Type>LCD</Monitor_Type>" +
    "\n<Vendor_Data>110HL</Vendor_Data>\n</Monitor><Processor>\n<HardwareInfo>FD2F</HardwareInfo>\n<Id>1</Id>" +
    "\n<Used>80</Used>\n<Utilization>19</Utilization>\n</Processor>\n</Summary_Log>"

  //Method to convert XML to Json

    def xmlToJson(xml: String) = {
      var PRETTY_PRINT_INDENT_FACTOR = 4
      try {
        val xmlJSONObj = XML.toJSONObject(xml)
        val convertedJsonWithPrettyPrint = xmlJSONObj.toString(PRETTY_PRINT_INDENT_FACTOR)
        convertedJsonWithPrettyPrint
       } catch {
        case ex: Exception =>
          println(ex.toString)
      }
    }

    //Convert the Given xml to Json using above method : xmlToJson and print the output.

    println(xmlToJson(xmlFile))
    println(xmlToJson(xmlString))
  }
}

Output  Json :






Above program will print the expected output. And if you want to convert Json as a Single line String, below Small modification in method by removing PRETTY_PRINT_INDENT_FACTOR will provide the expected output as below :


Modified Program to convert XML Data into Single line Json.

import scala.io.Source
import org.json.XML

object XmlToJsonConverter {


  def main(args: Array[String]) = {


    //Get Input XML to convert into Json.


  //1. If want to read XML file from path

  val xmlFile: String = Source.fromFile("C:/Users/anamika_singh/Desktop/Test.xml").getLines.mkString

  //2. If want to read direct XML String

  val xmlString = "<Summary_Log>\n<Monitor>\n<HardwareInfo>FD2F</HardwareInfo>\n<Id>0</Id>\n<Monitor_Type>LCD</Monitor_Type>" +
    "\n<Vendor_Data>110HL</Vendor_Data>\n</Monitor><Processor>\n<HardwareInfo>FD2F</HardwareInfo>\n<Id>1</Id>" +
    "\n<Used>80</Used>\n<Utilization>19</Utilization>\n</Processor>\n</Summary_Log>"

  //Method to convert XML to Json

        def xmlToJson(xml: String) = {
      try {
        val xmlJSONObj = XML.toJSONObject(xml)
        val convertedJsoInSingleLine = xmlJSONObj
        convertedJsoInSingleLine
       } catch {
        case ex: Exception =>
          println(ex.toString)
      }
    }

    //Convert the Given xml to Json using above method : xmlToJson and print the output.
    println(xmlToJson(xmlFile))
    println(xmlToJson(xmlString))
  }
}



Output  Json  as a Single line Json String :






I Hope, This post was helpful, please do like, comment and share.  

Thank You!

Comments

Popular posts from this blog

Transformations and Actions in Spark

How to Convert a Spark DataFrame to Map in Scala

How to Handle and Convert DateTime format in Spark-Scala.