LinkedIn respects your privacy

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Join now Sign in

From the course: Data Pipeline Automation with GitHub Actions Using R and Python

API limitations with Python - GitHub Tutorial

From the course: Data Pipeline Automation with GitHub Actions Using R and Python

Start my 1-month free trial

API limitations with Python

“

- [Instructor] In the previous video, we saw how we can set a get request to pull data from the API using the IGET function. In this video we'll review the output and explore some of the limitation of the get request. Let's go back to the data frame we pulled during the previous video DF1. Recall the data frame has 5,000 rows or observations and seven variables. Let's use Plotly to plot the data. So we're going to use the plot line function as it says time series. Let's start with the input object, which TF1.data and our X axis is our timestamp, which is period, and our numeric value is the value column. Let's go ahead and run it. And as you can notice in this time sales plot, there are some weird lines that do not fit the serious pattern. We can go ahead and zoom in and and explore it. You can see here that some observations are missings and therefore you get those straight lines in between. So for example, between December 14th, 2018 and Jan 13, 2019, there are no any observation. If you keep exploring further in a more dense area of this plot, you can see that this pattern return or exist. So as you can see over here, there are all of the straight line represent missing values in between. The reason that we get those missing values is related to the API 5,000 observations limit per get request. If we are trying to pull the five years of hourly time series data, that is more than 40,000 observations and we cannot pull it with a single request. One way to address this issue is to bound the get request with a time range using the start and end arguments. This could be useful if you want to pull a small period of the series. Let's go ahead and modify the get request we used previously and add the start and end argument to limit the request for observations between January 1st and February 24 in 2024. So we're going to use the date time to set the start and end as a time object. So let's go, the start should be 2024 and it's January 1st and the first hour. And for the end time, we want it February 24. So it's going to be 2024 and it's going to be two for February, and the day is 24. And we'll select the last hour 23. And as you can notice, this is the exact same request as before. We just had the start and end argument over here. Let's go ahead and execute the request and we can go ahead and report the request. Now we can see that the data looks fairly normal. We'll dive into more details about monitoring the data output and identify missing values in the next chapters. While you can use the IGET function to pull a large dataset looping manually over the timestamp of the series, it could be very tedious to run it manually. This is where the AI backfill functions comes into place, enabling us to pull large dataset beyond the API limitation. In the next video, we'll repo the series this time using the AI backfill function.

Contents