From the course: Data Pipeline Automation with GitHub Actions Using R and Python

Setting GitHub Actions workflow

- [Instructor] In this video, we will create our first workflow. Let's first define the workflow scope. We want the workflow to simply print a "Hello World." We are going to use Ubuntu as our OS, and we will use our course image to set the environment. There are two methods for adding a new workflow. First, via GitHub. Let's open the browser and go to GitHub. On the main menu, go to Actions, and then click on the left side, New workflow. As you can see, GitHub offer many templates. Over here, you can see there is a Python version and a R version. We will set our workflow from scratch. Select set up a workflow yourself. And here, this is leading to the GitHub editor. And you can start edit and set your workflow. And at the end you save it by committing the changes back to the repository. You can see by default GitHub named this workflow as main.yml. And this file is under the .github/workflows folder. Let's now not go with this route and see how we can do it on VS Code. So we're now in the course repository. To set a workflow via the local machine, you first need to make sure that you have the .github folder. And below this, you should have the workflows folder. If it is your first workflow, so go ahead and set those folders and open the workflows folder. Within here, you see that I already have some workflow and we are going to create a new one. So I'm going to create here, under the workflows folder, a new script, and I'm going to name it hello-world, with the extension of YAML without A, yml. Click Enter. And now, we have the file over here on the editor and we can start to edit. This is a YAML file and it's fairly similar to a JSON file. We'll start by defining the workflow name. Let's use the name argument. So let's set the name, use colons, and call it Hello World. Next, we want to set the trigger mechanism. Given that we want to set a cron job, we will use the on argument. So on define on what we're going to trigger this job. And beneath this we will set scheduler with the schedule argument, okay, -dule. And we set the cron. The cron argument enable us to set the time that we want to trigger a job. So let's go ahead and define cron, and we put it inside quotes. And by default, we use the asterisk to set cron. So let me just write the five asterisk and I explain what each one define. So each one of those five asterisk define a time unit. So starting from left to right, this is defined in minutes, hours, day of the month, the month, and the day of the week. And by edit this you can define when you want to run the job. So for example, we want to run it hourly. So on the second asterisk, which define the hours, we will add a backslash and add one. We just saying we want to run it every one hour. If for example, you want to run it every 12 hours, so you just add, set here to 12, and so on. If you want to run it every four hours, it will be four. So let's run it every one hour. And let's say that we want to define the minutes in the hour that you want to run it. So here you can use, for example, set it at the beginning of the hour. So let's set it zero. If for example you want to have it 10 minutes after the hour, we'll set it as 10. Okay? So let's just keep it as zero. I will note here that the free version is depend on availability. So if you set it to zero, not necessarily that it will start at the beginning of the hour, it depend on the availability of the resource that GitHub offer for the free version. Okay, so we set the cron job. Let's now go ahead and set the jobs that we want to run. So we want to use the jobs argument and we can start set the job. So here we're going to first name the job name. So let's call the job name hello-world, and end it with semicolons. The first thing we want to define is the OS or what type of machine we want to run this job. So let's select the runs-on argument and define it as Ubuntu. And the version should be 22.04. This is aligned with the version we are using in our docker container. And then we want to set our container. So we're going to use the container argument, and define it with the image argument. And over here I will copy it, 'cause it's too long, rkrispin/data-pipeline-automation-with-github-actions, and the version is prod. Next, we want to set the steps in this job. So I need to indent it back. And we have here, a single step. We want to print "Hello World." So first we need to name this step name. As you can see, we define for each object here, we define a name, so this object is a step, and we start by defining the name of the step. So we're going to use again name, and we call it, the step, print-hello-world. And we want to define the command. So we're going to use the run argument. And over here, once you set the run argument, essentially you are in the command line of this Ubuntu machine. So that's how, the way, you should think about it. From here, you can execute script. For example, if you want to call a Bash script, you will use the bash command, and the path of the file. Or if you run a Python file or R, you will use the, for example, for Python is python or python3, and the file name. So you get the gist over here. So since we want to just print "Hello World," I'm going to use the echo command line, command echo, and I want to say "Hello World." And that's it. This is how you set a workflow. So the next step is to save it. Okay? And we want to commit back this job, 'cause now it's only living in our local machine. We want now to commit it back to GitHub to our repository. Let's now go ahead and commit the changes on the terminal. So first we can check that we have a change. So let's do git status. And you can see there is a new workflow. So let's now go ahead and add it, git add .github/workflow/, and the file name. Add commit message, commit -m. And we call it, "adding new workflow to actions." And let's push the change, git push, okay. So there is probably a workflow that are already running. So we first need to merge. So let's do git pull. So on the backend, there are some workflow that are already running. So that's why I get this error message. And now let's do git push. Now let's go back to the repository and see if the workflow was committed. So let's maybe refresh first, and you can see there was commit over here, and you can see the commit message, "adding new workflow to actions." And if you go to Actions, you can see the Hello World. Now it might take some time until it will run, because we set it to run every hour. So from the moment you push it, it typically will run in the next hour. So in the next video, we're going to wait until the job will run and we go ahead and review the logs of the job.

Contents