Posts

Showing posts from November, 2019

Knowledge about Apache Sqoop and its all basic commands to import and export the Data

Sqoop is a command-line interface application for transferring data between Relational Databases and Hadoop. We can say that Sqoop is a connector between RDBMS to Hadoop (Import) or Hadoop to RDBMS(Export). Options while importing data to Hadoop: .        Importing table data from RDBMS table to HDFS(file system) .        Importing table data from RDBMS table to Hive table .        Importing table data from RDBMS table to HBase  Options while exporting data to RDBMS:  .        Exporting data from HDFS(file system) to RDBMS table  .        Export data from Hadoop (Hive) to RDBMS table List all databases in MySQL sqoop-list-databases --connect jdbc:mysql://localhost --username hadoop --password hadoop List all tables in MySQL sqoop-list-tables --connect jdbc:mysql://localhost/hive --username hadoop --P Note :  If we pass -P as parameter, we can give the password in the run time so that the password is not hard-coded for security reasons. Pass param

How Yarn works in a Agile style ?

Hadoop :  How Yarn works in a Agile style ? 1. Resource Manager - Project Manager(PM) 2. Application Master - Scrum Master(SM) 3. Node Manager - Technical Resources ------------------------------------------------------------------------------------------------------------------------------------------------------------ 1. PM gets a requirement from the client.  2. PM consult with Bench(NameNode) about the resource availability. In our case it is data blocks location.  3. PM appoints a scrum master(Application Master) for every client requirement. RM initiates an AM container in Node manager for every application.  4. SM knows who are the resources required to accomplish the task on the current location. AM requests the container allocation to the RM based upon data locality. 5. PM either accepts the resource request given by SM or allocate relevant resources. RM Will provide the available resource to AM.  6. Tasks given to the technical resourc

All Basic Commands and use case of Apache Spark

  // spark2-shell --conf spark.ui.port=4040 Create a Dataframe by parallelize in Memory and collect the output: val x = sc.parallelize(List("spark rdd example", "sample example")) val x = sc.parallelize(List("spark rdd example", "sample example"),4) x.collect() Read File from local : first create a test file in your local path and try below commands to understand it better : val textFileLocalTest   = sc.textFile("test.txt"); val textBigFileLocalTest =  sc.textFile("/Users/Anamika_Singh3/HadoopExamples/file1.txt"); val textFileLocalTest  = sc.textFile("file:///home/Anamika_Singh3/test.txt"); textFileLocalTest.first(); val textFileLocalTest = sc.textFile("file:///home/Anamika_Singh3/HadoopExamples/file1.txt"); Read file from HDFS val textFileFromHDFS = sc.textFile(“test.txt"); val textFileFromHDFS = sc.textFile(""/hdfs_path/test.txt&