Using R and Python to get forest resource data through the EVALIDator API

If you work with forestry data across the United States, you’ve probably had to turn to Forest Inventory and Analysis (FIA) data at one time or another to look at the status and trends in our forest resources. The FIA database has information contained from over 130,000 forest inventory plots across the country.

One of the tools that analysts often use to access FIA data is the the EVALIDator web interface. EVALIDator is a web-based Application Programming Interface (API) that generates population estimates for core forestry metrics. It provides the ability to query FIA information at the state level or a circular retrieval of information around a fixed geographic point.

There was an excellent webinar offered in August hosted by the FIA National User Group that highlighted some new features available through EVALIDator. For a long while, analysts have used it’s web interface to point and click their way to estimating forest resources after selecting the forestry metrics and geographic scale they’re interested in. As a long-time EVALIDator user, this was a great way to access FIA data. But dealing with the output in an HTML format and getting it into different software to analyze was always an additional step.

I was happy to learn in the webinar about new ways to directly access FIA estimates through R and Python using the EVALIDator API. For me, this can save time and effort given most of my analyses, graphing, and reporting are done through R.

All of the documentation for these steps can be found in the FIADB-API & EVALIDator user documentation. I’ll go through a few examples where I access data from the state of Maine.

Using R to access FIA data at the state level

This first example will access the total amount of carbon stored in the live aboveground portions of trees growing on forestland in Maine, measured in metric tonnes. I’ll ask for estimates to be provided by forest type group and ownership.

First, I’ll use R and will load a few packages to access the data:

library(tidyverse)
library(httr)
library(jsonlite)
library(rlist)
library(knitr)

If you look at the bottom of the documentation webpage, you’ll see example use cases for extracting FIA estimates using R and Python. There are two examples for each that use GET and POST scripts. The GET scripts allow you to enter a complete URL if you know the attributes you’re interested in. The POST script, which I copy here with the fiadb_api_POST() function, allows you to directly input the variables you’re interested in in R:

# fiadb_api_POST() will accept a FIADB-API full report URL and return data frames
# See descriptor: https://apps.fs.usda.gov/fiadb-api/

fiadb_api_POST <- function(argList){
  # make request
  resp <- POST(url = "https://apps.fs.usda.gov/fiadb-api/fullreport", 
               body = argList, 
               encode = "form")
  # parse response from JSON to R list
  respObj <- content(resp, "parsed", encoding = "ISO-8859-1")
  # create empty output list
  outputList <- list()
  # add estimates data frame to output list
  outputList[['estimates']] <- as.data.frame(do.call(rbind,respObj$estimates))

  # if estimate includes totals and subtotals, add those data frames to output list
  if ('subtotals' %in% names(respObj)){
    subtotals <- list()
    # one subtotal data frame for each grouping variable
    for (i in names(respObj$subtotals)){
      subtotals[[i]] <- as.data.frame(do.call(rbind,respObj$subtotals[[i]]))
    }
    outputList[['subtotals']] <- subtotals

    # totals data frame
    outputList[['totals']] <- as.data.frame(do.call(rbind,respObj$totals))
  }

  # add estimate metadata
  outputList[['metadata']] <- respObj$metadata

  return(outputList)
}

The first item to know before accessing FIA data is the numeric code corresponding to the variable you’re interested in. In my case, this is snum = 98, corresponding to the variable that represents “Forest carbon pool 1: live aboveground, in metric tonnes, on forest land.” A note of caution: there are thousands of variables to choose from, but I suspect the most popular ones are listed toward the top of the page.

The second item to know is which wc code you’re interested in. I have no idea why it’s abbreviated “wc”, but it connects the state FIPS code with the 4-digit FIA inventory year. For example, I’m interested in wc = 232021 to obtain data from Maine’s (FIPS code 23) most recent inventory, collected in 2021. You could go back to inventories collected decades ago if you were interested in looking at trends, and those codes are described here.

Finally, you can use the rselected and cselected statements to identify the variables you’d like to group by in rows and columns. In our case this is forest type group and ownership group. I’ll obtain the data in an NJSON format, but you can also obtain the data in HTML, XML, or other formats. You can specify these parameters in arg_list and then store the data in a data frame called all_rows. I love how the data are presented in a long and “tidy” format:

arg_list_maine <- list(snum = 98,
               wc = 232021,
               rselected = 'Forest type group', 
               cselected = 'Ownership group - Major',
               outputFormat = 'NJSON')

# submit list to POST request function
post_data_maine <- fiadb_api_POST(arg_list_maine)

# estimate data frame
all_rows_maine <- post_data_maine[['estimates']]

print(all_rows_maine)

    ESTIMATE                                    GRP1          GRP2 PLOT_COUNT
1    3904898 `0001 White \\/ red \\/ jack pine group  `0001 Public         22
2   34789659 `0001 White \\/ red \\/ jack pine group `0002 Private        248
3   14310434              `0002 Spruce \\/ fir group  `0001 Public        110
4  105397247              `0002 Spruce \\/ fir group `0002 Private       1022
5   62478.28 `0005 Loblolly \\/ shortleaf pine group `0002 Private          1
6   200517.1            `0018 Exotic softwoods group  `0001 Public          1
7     531124            `0018 Exotic softwoods group `0002 Private          5
8   823381.8                `0020 Oak \\/ pine group  `0001 Public          5
9   10607119                `0020 Oak \\/ pine group `0002 Private         73
10  992844.6             `0021 Oak \\/ hickory group  `0001 Public          8
11  11742364             `0021 Oak \\/ hickory group `0002 Private         85
12  409060.3  `0023 Elm \\/ ash \\/ cottonwood group  `0001 Public          7
13   6009873  `0023 Elm \\/ ash \\/ cottonwood group `0002 Private         85
14  14742425   `0024 Maple \\/ beech \\/ birch group  `0001 Public        111
15 136482887   `0024 Maple \\/ beech \\/ birch group `0002 Private       1244
16   3260952             `0025 Aspen \\/ birch group  `0001 Public         28
17  29091440             `0025 Aspen \\/ birch group `0002 Private        330
18  5415.465             `0030 Other hardwoods group  `0001 Public          2
19  734850.1             `0030 Other hardwoods group `0002 Private         26
20  2668.444                        `0034 Nonstocked  `0001 Public          1
21  56632.09                        `0034 Nonstocked `0002 Private         10
         SE SE_PERCENT     VARIANCE
1  989847.4   25.34887 979797935531
2   2438499   7.009263 5.946276e+12
3   1346913   9.412105 1.814175e+12
4   3093327   2.934922 9.568672e+12
5  69535.69   111.2958   4835212844
6  197339.8   98.41544   3.8943e+10
7  322121.4     60.649 103762218840
8  383501.6    46.5764 147073471553
9   1405489   13.25043 1.975398e+12
10 376671.9   37.93866 141881751049
11  1392492    11.8587 1.939034e+12
12 210167.8    51.3782  44170517675
13 792626.1   13.18873 628256055243
14  1316485   8.929905 1.733132e+12
15  3598018   2.636241 1.294574e+13
16   719201   22.05494 517250138459
17  1931424   6.639148 3.730397e+12
18 4321.885   79.80635     18678690
19 191250.2   26.02575  36576657832
20 2380.922    89.2251      5668789
21 31032.09   54.79594    962990504

This makes the output easy to plot immediately. Here’s a graph of the output, where you can see the vast amount of carbon stored on private land in Maine, mostly in the spruce/fir and maple/beech/birch forest type groups:

ggplot(all_rows_maine, aes(x = as.character(GRP2), 
                           y = as.numeric(ESTIMATE))) +
  geom_bar(stat = "identity") +
  facet_wrap(~as.character(GRP1)) +
  labs(x = "Ownership",
       y = "Forest carbon (metric tonnes)")

Note you’ll need to play with the names of variables a bit to tidy them up, e.g., turning “`0001 Public” to simply “Public”. But the R functions allow you to quickly obtain the data of interest. You can also grab the subtotals of the output to sum all values within each forest type or ownership category:

subtotals_maine <- post_data_maine[['subtotals']]

print(subtotals_maine)

$GRP1
    ESTIMATE                                    GRP1 PLOT_COUNT       SE
1   38694556 `0001 White \\/ red \\/ jack pine group        269  2618535
2  119707681              `0002 Spruce \\/ fir group       1127  3344993
3   62478.28 `0005 Loblolly \\/ shortleaf pine group          1 69535.69
4   731641.2            `0018 Exotic softwoods group          6 377174.6
5   11430501                `0020 Oak \\/ pine group         78  1452201
6   12735208             `0021 Oak \\/ hickory group         92  1432097
7    6418933  `0023 Elm \\/ ash \\/ cottonwood group         92   815680
8  151225312   `0024 Maple \\/ beech \\/ birch group       1348  3791540
9   32352393             `0025 Aspen \\/ birch group        357  2057692
10  740265.6             `0030 Other hardwoods group         28 191299.1
11  59300.54                        `0034 Nonstocked         11 31123.29
   SE_PERCENT     VARIANCE
1    6.767193 6.856727e+12
2    2.794301 1.118898e+13
3    111.2958   4835212844
4    51.55185 142260655272
5    12.70461 2.108887e+12
6    11.24518 2.050902e+12
7    12.70741 665333890414
8    2.507213 1.437578e+13
9    6.360247 4.234097e+12
10   25.84195  36595336521
11     52.484    968659293

$GRP2
   ESTIMATE          GRP2 PLOT_COUNT      SE SE_PERCENT     VARIANCE
1  38652597  `0001 Public        274 1659512   4.293404  2.75398e+12
2 335505672 `0002 Private       2891 3373908   1.005619 1.138325e+13

Finally, you can grab the total number that sums all values. In this case, we learn that there’s about 374 million metric tonnes of carbon stored in the live aboveground portions of trees in Maine forests:

totals_maine <- post_data_maine[['totals']]
print(totals_maine)

   ESTIMATE PLOT_COUNT      SE SE_PERCENT     VARIANCE
1 374158269       3143 3430416  0.9168357 1.176776e+13

Using R to access FIA data around a fixed geographic point

A favorite use of EVALIDator by many is the ability to query FIA data around a specific location. For example, a user can generate population estimates for a 50-mile radius around a proposed mill that uses wood.

Here’s an example that queries FIA data in a 25-mile radius surrounding Bangor, Maine. It estimates the total forestland area (snum = 2) by stand age class in 20-year increments and stand size class (large-, medium-, or small-diameter trees). The latitude and longitude coordinates are specified in the function:

arg_list_bangor <- list(lat = 44.809,
                        lon = -68.771, 
                        radius = 25,
                        wc = 232021,
                        snum = 2,
                        rselected = 'Stand-size class', 
                        cselected = 'Stand age 20 yr classes (0 to 100 plus)',
                        outputFormat = 'NJSON')

# submit list to POST request function
post_data_bangor <- fiadb_api_POST(arg_list_bangor)

# estimate data frame
all_rows_bangor <- post_data_bangor[['estimates']]

print(all_rows_bangor)

   ESTIMATE                  GRP1               GRP2 PLOT_COUNT       SE
1   14155.6  `0001 Large diameter  `0002 21-40 years          4 7338.611
2  45473.05  `0001 Large diameter  `0003 41-60 years         10    15372
3  90436.76  `0001 Large diameter  `0004 61-80 years         18 22385.23
4    107109  `0001 Large diameter `0005 81-100 years         23 24368.78
5  41243.83  `0001 Large diameter   `0006 100+ years          7  15696.3
6  51611.41 `0002 Medium diameter  `0002 21-40 years         12 16379.57
7  113830.6 `0002 Medium diameter  `0003 41-60 years         25 24352.08
8  199754.6 `0002 Medium diameter  `0004 61-80 years         37 33455.07
9  103521.8 `0002 Medium diameter `0005 81-100 years         21 24195.48
10   5758.7 `0002 Medium diameter   `0006 100+ years          1 6011.665
11  19655.8  `0003 Small diameter   `0001 0-20 years          4 10104.88
12 90910.76  `0003 Small diameter  `0002 21-40 years         16 23393.71
13    40205  `0003 Small diameter  `0003 41-60 years          7 15304.59
14 13209.02  `0003 Small diameter  `0004 61-80 years          3 8759.869
15 21404.85  `0003 Small diameter `0005 81-100 years          3  11056.3
16 5305.999  `0003 Small diameter   `0006 100+ years          1 5045.363
   SE_PERCENT   VARIANCE
1    51.84246   53855209
2    33.80465  236298512
3    24.75236  501098555
4    22.75138  593837455
5    38.05733  246373784
6    31.73633  268290228
7    21.39326  593023809
8    16.74808 1119241412
9    23.37235  585421221
10   104.3927   36140117
11   51.40919  102108689
12   25.73261  547265614
13   38.06637  234230329
14   66.31735   76735306
15   51.65323  122241669
16    95.0879   25455691

You can plot these data directly to visualize the trends within the 25-mile radius:

ggplot(all_rows_bangor, aes(x = as.character(GRP2), 
                            y = as.numeric(ESTIMATE),
                            fill = as.character(GRP1))) +
  geom_bar(stat = "identity") +
  labs(x = "Stand age",
       y = "Forestland area (acres)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Conclusion

The new features available in the EVALIDator API that allow you to access data through R or Python can help to speed up data analysis using FIA data. You can access the entire history of FIA data and analyze forest resources data by state or a circular retrieval from a fixed geographic point. Special thanks to the USDA Forest Service staff that have made this available!

–

By Matt Russell. Email Matt with any questions or comments.