What is DStream and readStream in Spark Streaming


           
           What is  DStream and readStream in Spark Streaming



DStream :  A DStream is a sequence of RDDs representing a data stream. A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous sequence of RDDs (of the same type) representing a continuous stream of data (You can refer spark.RDD for more details on RDDs). DStreams can either be created from live data (such as, data from HDFS, Kafka or Flume) or it can be generated by transformation existing DStreams using operations such as map, window and reduceByKeyAndWindow.

readStream :  readStream is a component of Spark Structured streaming. Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. You can express your streaming computation the same way you would express a batch computation on static data. The Spark SQL engine will take care of running it incrementally and continuously and updating the final result as streaming data continues to arrive. By default,Structured Streaming queries are processed using a micro-batch processing engine, which processes data streams as a series of small batch jobs thereby achieving end-to-end latencies as low as 100 ms and exactly-once fault-tolerance guarantees.


Comments

Popular posts from this blog

Transformations and Actions in Spark

How to convert XML String to JSON String in Spark-Scala

How to Convert a Spark DataFrame to Map in Scala