From the course: Data Pipeline Automation with GitHub Actions Using R and Python
Handling a large data request with Python - GitHub Tutorial
From the course: Data Pipeline Automation with GitHub Actions Using R and Python
Handling a large data request with Python
- [Instructor] In the previous video, we saw the limitation of the eia_get function to pull large dataset due to the API row limitation that got request. In this video, we'll see how to handle a large data request from the API using the eia_backfill function. The eia_backfill function splits a large data request into a sequence of small request. Send those request to the API using the eia_get function on the backend. It then appends the outputs into a single table. The function uses the same arguments as the eia_get function. Let's re-pull the series this time using the backfill function. We'll set the start argument to July 1st, 2018, and the end argument to February 24th, 2024. So we're going to again use the datetime to reformat the object into time format. So the start is going to be 2018 and it's going to be July 1st. And the first data point is at eight o'clock in the morning. And for the end, we are going to set it at 2024 and send it as before, February 24. And let's set it to the end of the day. Next is the offset argument. The offset argument enable us to control the size of the sequential request the function sends to the API. For example, if we are pulling a series with 10,000 observations and we set the offset to 1,000, the function will generate 10 sequential requests, each of a size of 1,000 observations. While you can set it up to 5,000 observations, it is recommended not to set the offset beyond 2,500 observations. Let's set it to 2,250. So we're going to set the variable overview offset to 2,250. Now we can resend the request. We this time name it as df3. It might take a few seconds until it completes the request, as we are pulling close to 50,000 observations. And it's done, it took about close to 20 seconds. Let's now check the data structure. And as you can see, it follows the same structure as the eia_get function. The table is close to 50,000 observation. Now we can re-plot the data again using the same Plotly function. And you can see that the series now looks much better than before. You can still notice that some observation that are missing, but those are not available on the API as well.
Contents
-
-
-
EIA API2m 47s
-
Setting an environment variable3m 22s
-
The EIA API dashboard4m 10s
-
GET request structure5m 41s
-
Querying the data via the browser4m 4s
-
Querying data with R and Python2m 50s
-
Pulling metadata from API with R3m 5s
-
Sending a simple GET request with R5m 19s
-
API limitations with R4m 43s
-
Handling a large data request with R4m 27s
-
Pulling metadata from API with Python3m 47s
-
Sending a simple GET request with Python4m 44s
-
API limitations with Python3m 54s
-
Handling a large data request with Python3m 10s
-
Challenge: Query the API1m 2s
-
Solution: Query the API with R7m 28s
-
Solution: Query the API with Python7m 45s
-
-
-
-
-