What is DStream and readStream in Spark Streaming
What is DStream and readStream in Spark Streaming
DStream : A DStream is a sequence
of RDDs representing a data stream. A Discretized Stream (DStream), the basic
abstraction in Spark Streaming, is a continuous sequence of RDDs (of the same
type) representing a continuous stream of data (You can refer spark.RDD for
more details on RDDs). DStreams can either be created from live data (such as,
data from HDFS, Kafka or Flume) or it can be generated by transformation
existing DStreams using operations such as map, window and reduceByKeyAndWindow.
readStream : readStream is a
component of Spark Structured streaming. Structured Streaming is a scalable and
fault-tolerant stream processing engine built on the Spark SQL engine. You can
express your streaming computation the same way you would express a batch
computation on static data. The Spark SQL engine will take care of running it
incrementally and continuously and updating the final result as streaming data
continues to arrive. By default,Structured Streaming queries are processed
using a micro-batch processing engine, which processes data streams as a series
of small batch jobs thereby achieving end-to-end latencies as low as 100 ms and
exactly-once fault-tolerance guarantees.
Comments
Post a Comment