LinkedIn respects your privacy

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Join now Sign in

From the course: Data Pipeline Automation with GitHub Actions Using R and Python

Handling a large data request with Python - GitHub Tutorial

From the course: Data Pipeline Automation with GitHub Actions Using R and Python

Start my 1-month free trial

Handling a large data request with Python

“

- [Instructor] In the previous video, we saw the limitation of the eia_get function to pull large dataset due to the API row limitation that got request. In this video, we'll see how to handle a large data request from the API using the eia_backfill function. The eia_backfill function splits a large data request into a sequence of small request. Send those request to the API using the eia_get function on the backend. It then appends the outputs into a single table. The function uses the same arguments as the eia_get function. Let's re-pull the series this time using the backfill function. We'll set the start argument to July 1st, 2018, and the end argument to February 24th, 2024. So we're going to again use the datetime to reformat the object into time format. So the start is going to be 2018 and it's going to be July 1st. And the first data point is at eight o'clock in the morning. And for the end, we are going to set it at 2024 and send it as before, February 24. And let's set it to the end of the day. Next is the offset argument. The offset argument enable us to control the size of the sequential request the function sends to the API. For example, if we are pulling a series with 10,000 observations and we set the offset to 1,000, the function will generate 10 sequential requests, each of a size of 1,000 observations. While you can set it up to 5,000 observations, it is recommended not to set the offset beyond 2,500 observations. Let's set it to 2,250. So we're going to set the variable overview offset to 2,250. Now we can resend the request. We this time name it as df3. It might take a few seconds until it completes the request, as we are pulling close to 50,000 observations. And it's done, it took about close to 20 seconds. Let's now check the data structure. And as you can see, it follows the same structure as the eia_get function. The table is close to 50,000 observation. Now we can re-plot the data again using the same Plotly function. And you can see that the series now looks much better than before. You can still notice that some observation that are missing, but those are not available on the API as well.

Contents