Performance Tuning in Apache Spark
Performance Tuning in Apache Spark : The process of adjusting settings to record for memory, cores, and all the instances used by the system is termed tuning. This process guarantees that the Spark has optimal performance and prevents resource bottlenecking. Effective changes are made to each property and settings, to ensure the correct usage of resources based on system-specific setup. Apache Spark has in-memory computation nature. As a result resources in the cluster (CPU, memory etc.) may get bottlenecked. Sometimes to decrease memory usage RDD are stored in serialized form. Data serialization plays very important role in good network performance and can also help in reducing memory usage, and memory tuning. If used these properly, tuning can be do: 1. Ensure proper use of all resources in an effective manner. 2.Eliminates those jobs that run long. 3.Improves the performance time of the system. 4.Guarantees that jobs are on correct execution engine. Now we can