From the course: Data Pipeline Automation with GitHub Actions Using R and Python
API limitations with Python - GitHub Tutorial
From the course: Data Pipeline Automation with GitHub Actions Using R and Python
API limitations with Python
- [Instructor] In the previous video, we saw how we can set a get request to pull data from the API using the IGET function. In this video we'll review the output and explore some of the limitation of the get request. Let's go back to the data frame we pulled during the previous video DF1. Recall the data frame has 5,000 rows or observations and seven variables. Let's use Plotly to plot the data. So we're going to use the plot line function as it says time series. Let's start with the input object, which TF1.data and our X axis is our timestamp, which is period, and our numeric value is the value column. Let's go ahead and run it. And as you can notice in this time sales plot, there are some weird lines that do not fit the serious pattern. We can go ahead and zoom in and and explore it. You can see here that some observations are missings and therefore you get those straight lines in between. So for example, between December 14th, 2018 and Jan 13, 2019, there are no any observation. If you keep exploring further in a more dense area of this plot, you can see that this pattern return or exist. So as you can see over here, there are all of the straight line represent missing values in between. The reason that we get those missing values is related to the API 5,000 observations limit per get request. If we are trying to pull the five years of hourly time series data, that is more than 40,000 observations and we cannot pull it with a single request. One way to address this issue is to bound the get request with a time range using the start and end arguments. This could be useful if you want to pull a small period of the series. Let's go ahead and modify the get request we used previously and add the start and end argument to limit the request for observations between January 1st and February 24 in 2024. So we're going to use the date time to set the start and end as a time object. So let's go, the start should be 2024 and it's January 1st and the first hour. And for the end time, we want it February 24. So it's going to be 2024 and it's going to be two for February, and the day is 24. And we'll select the last hour 23. And as you can notice, this is the exact same request as before. We just had the start and end argument over here. Let's go ahead and execute the request and we can go ahead and report the request. Now we can see that the data looks fairly normal. We'll dive into more details about monitoring the data output and identify missing values in the next chapters. While you can use the IGET function to pull a large dataset looping manually over the timestamp of the series, it could be very tedious to run it manually. This is where the AI backfill functions comes into place, enabling us to pull large dataset beyond the API limitation. In the next video, we'll repo the series this time using the AI backfill function.
Contents
-
-
-
EIA API2m 47s
-
Setting an environment variable3m 22s
-
The EIA API dashboard4m 10s
-
GET request structure5m 41s
-
Querying the data via the browser4m 4s
-
Querying data with R and Python2m 50s
-
Pulling metadata from API with R3m 5s
-
Sending a simple GET request with R5m 19s
-
API limitations with R4m 43s
-
Handling a large data request with R4m 27s
-
Pulling metadata from API with Python3m 47s
-
Sending a simple GET request with Python4m 44s
-
API limitations with Python3m 54s
-
Handling a large data request with Python3m 10s
-
Challenge: Query the API1m 2s
-
Solution: Query the API with R7m 28s
-
Solution: Query the API with Python7m 45s
-
-
-
-
-