From the course: Data Pipeline Automation with GitHub Actions Using R and Python
Data pipeline deployment - GitHub Tutorial
From the course: Data Pipeline Automation with GitHub Actions Using R and Python
Data pipeline deployment
- [Instructor] In this video, we will connect what we have learned so far in the course and deploy the data pipeline into GitHub actions. We will use a similar deployment method as we saw in the previous video. Let's start with the workflow general requirements. We want to pull the repository content using the checkout actions. This will enable us to use the data pipeline functions and files from the repository during the runtime of the workflow. In addition, we want to run this workflow every 12 hours. We will set two versions for the deployment, one for R and a second for Python. Other than that, we will use the exact same setting as before. Let's review the deployment files of the Python and R version side by side. As before both files are in the workflows folder, and they're named as data_refresh_py.yml for the Python version and data_refresh_R.yml for the R version. Let's go over the workflow functionality. We set the scheduler to trigger the chron job every 12 hours. Like the previous example, we will use the checkout action to pull the course repository using the main branch as reference. Last but not least, we'll let the API key to the workflow environment variables and called a bash script with the run argument that triggers the data refresh process. As you can see, the bash script names are the only difference between the R and Python versions. Here are the bash scripts that trigger the data pipeline process. Both the Bash scripts have the same functionality, with the exception of the Python version. as we cannot deploy two dashboards on GitHub action at the same time, we only make the deployment for the Python version, and this is the difference between the two Bash scripts. In the next chapter, we will dive into more details about the dashboard deployment process. Let's review the functionality of the Bash script using the R example. It starts with cleaning the previous rendered files if they exist. It then renders the quarto doc. This is where the data pipeline starts to run. Next, it creates a copy of the outputs to the docs folder. This enables you to link the data pipeline render outputs to the dashboard if needed. We will dive into more details about the functionality of the Docs folder in the next chapter. Last but not least, it commits the changes back to the main repository as we saw in the previous example. Congratulations. We have now two pipelines running on GitHub actions. On the left side is the Python version, and on the right side is the R version.