From the course: Data Pipeline Automation with GitHub Actions Using R and Python
Solution: Query the API with Python - GitHub Tutorial
From the course: Data Pipeline Automation with GitHub Actions Using R and Python
Solution: Query the API with Python
(upbeat music) - [Instructor] We are going to use this Jupyter notebook for the solution of chapter one challenge. This notebook Py_challenge_solutions can be found under the course repository chapter one folder. In addition, we're going to use the AI_API Python script as before. This Python script provides a set of functions that will enable us to query data from the API. Let's get started by loading those libraries, and we're going to use the same libraries as before, EIA_API and set it as API. The OS library we're going to use to load environment variables. The date/time we'll use to reformat date and time objects. And last is Plotly, which we're going to use to visualize the data. Let's go to the first question. We were asked to extract from the EIA dashboard, the metadata of the San Diego Gas and Electric Balancing Authority sub region. Let's go to the eia.gov website and navigate to the dashboard. So on the main page of eia.gov, scroll down to the feature section, then click on the API icon, then go to the browse the API and click it, which will lead you to the API dashboard. Let's start by setting the route. As the main category is electricity, let's go ahead and select electricity. And next we want to select the subcategory. We're interested in hourly data, so let's go ahead and select electric power operations daily and hourly. And the last subcategory is the sub region, hourly demand by sub region. Once it's finished to load the information behind, we can start to filter by using the facet. The facet enables us to select a specific series under this route. Let's go ahead and expand it. There are two ways we can go about filtering. Either we can go to the sub region. As you can see there are 83 overall sub regions, and filter for the one that we are interested. In this case, we are interested in San Diego. So let's search San Diego, and you see it pop up over here and select it. Then select self-selection and submit. You can see in the output that the facet here is just specifying the subba, and in this case this is SDGE, which stands for San Diego Gas and Electric. This is sufficient, but let's see the other method you can use to filter. Let's remove the facet and start with the balancing authority. Given that San Diego is under California Independent System Operator, let's select CISO and save the selection. Next, let's select the sub region. And as you can see, it's narrowed it down to the four balancing authorities sub regions under the parent. And here we can go ahead and select the San Diego. Don't forget to save selection. And let's resubmit. As you can see, the API returned the metadata and the query information. On the left side on the URL, you can see the gate request. We are interested here in the route for the API path which indicate which route we want to select in the API. In this case, this is the electricity route, so it's electricity, RTO, region sub, BA data. And at the end we need to add data to indicate to the API that we are interested in data and not metadata. On there we can extract some information such as the frequency. We are going to use again hourly. For the facet we are going to use CISO as the parent. And for the subba we're going to use SDGE. Let's go back to VS Code and set the get request. So we're going to define the following six arguments, assign them to the variables, and then use those variables to assign them to the functions arguments. Like before, we are going to use the get end function from the OS library to load the API key. If you didn't set environment variable, you can set it here directly and load it for here. Next we're going to set the API path. As we saw before, this is the path. Don't forget to add data at the end. The frequency, we are going to select as hourly. If you remember from the previous videos, hourly stands for setting the timestamp in a UTC time zone. And then the facets argument. We are going to set this list where the parents is set as CISO, and the subregional subba set is SDGE. We then want to bound the request between January 1st and January 31st in 2024. So we're going to use a start and end arguments to set the time. We will use the date/time function to convert the input into a time object. Once it's set, we can run it and send the request. We can go ahead and check the output, the F1.data. And as you can see where the data frame is expected, this time the subba is set to SDGE or San Diego Gas and Electric. You can also see that the series start data point is January 1st, 1:00 AM and the endpoint is January 31st at 11 in the after night. Next we want to visualize the data. Let's use the plotly function again to visualize the data. And as you can see, the series looks great as expected. Let's go to the next question. In question three, we were asked to use the backfill function to pull data between January 1st, 2020 and February 1st, 2024. Given this is a large pool with more than 5,000 observation, we're going to use the backfill function. We need to set the start and end according to this time range. So we're going to set the start again with the date/time to January 1st, 2020, and the end time to February 1st, 2024. We're then going to set the offset to 2000 observation the request, and then assign all to the backfill function. Let's go ahead and execute the code. This might take a few seconds as we are now going to pull about 35,000 observations, and it took 12 seconds to pull the data. Let's go ahead and check the output. And as you can see, we received the same data frame with the same structure where now the starting point is January 1st, 2020, and the endpoint is February 1st, '23 or 11 at night. And as you can see, this data frame is more than 35,000 observations. Last but not least, let's plot the data. And as you can see, the series aligned with our expectation. There doesn't seem any missing values or anything suspicious.
Contents
-
-
-
EIA API2m 47s
-
Setting an environment variable3m 22s
-
The EIA API dashboard4m 10s
-
GET request structure5m 41s
-
Querying the data via the browser4m 4s
-
Querying data with R and Python2m 50s
-
Pulling metadata from API with R3m 5s
-
Sending a simple GET request with R5m 19s
-
API limitations with R4m 43s
-
Handling a large data request with R4m 27s
-
Pulling metadata from API with Python3m 47s
-
Sending a simple GET request with Python4m 44s
-
API limitations with Python3m 54s
-
Handling a large data request with Python3m 10s
-
Challenge: Query the API1m 2s
-
Solution: Query the API with R7m 28s
-
Solution: Query the API with Python7m 45s
-
-
-
-
-