CICD for Data Engineers with easy understanding!
Let's say we are working on a Customer Analysis Project & have a Jira ticket assigned to us as CA-111 If we are a developer, we would create a feature branch as below : feature/CA-111 and work on it. As soon as we make a git push, and GitHub receives the new code, a pipeline should run which involves below steps : 1. Build : Creating a virtual environment & install all the dependencies (In case of python). 2. Test : Run unit test cases / Quality checks. 3. Package : Create a package, can be a zip of code. 4. Deploy : Send the code bundle to edge node using SCP (Secure Copy). If we do all of these manually then it will be a time consuming and error prone. So all of the above steps should run as a automated pipeline step by step. We can automate it using a automation server such as Jenkins. So anytime a new branch is created, or a git push happens, all of the build -> test -> package -> deploy should run without any manual involvement. Now let us understa