Importing FIA data to analyze forest carbon with tidyFIA

April 5, 2021
analytics data import tidyFIA

The tidyFIA package.

I used to cringe every time I needed to import Forest Inventory and Analysis data. The data files that contain tree observations for a single state can exceed 100 megabytes. The file I downloaded six months ago might or might not contain the most recent data for that state.

There are a few R packages that have been developed in the last several years that help import FIA data. The tidyFIA package is one of them. Developed by the biometricians at SilviaTerra, the package allows you to download and import FIA data from the FIA database.

The tidyFIA package

To install tidyFIA on your version of R, you can obtain it from GitHub:


And then you can load the library to use the package:


I’ll also load a few other packages to help with my data analysis, such as the tidycensus package to work with state and county names and the urbnmapr package to make maps:


The tidy_fia() function will import any data table from the FIA database using either a state (e.g., states = "ME") or an area of interest. I’ll use the package to import data from Maine to analyze forest carbon across the state.

The table_names = statement allows you to specify the data tables from the FIA database that you want to work with. I’ll gather the PLOT, COND, and TREE tables to have a suite of information on the forest and trees to determine carbon

The postgis statement can be set to FALSE to access the database without any required authentication. You can set this to TRUE to access the files from SilviaTerra’s PostGIS database. This will be quicker, but you would need to email the developers for the password for the database:

me_data <- tidy_fia(
  states = "ME",
  table_names = c("plot", "tree", "cond"),
  postgis = FALSE

A map of the FIA plot locations can be obtained with the plot() function in tidyFIA: <- plot(me_data)

Depending on the amount of data collected in the state or geographic area of interest, it can take some time to import the data of interest. I will query the Maine data to obtain all inventories which have occurred since 1999, the year that annual inventories began in the state (i.e., when data began to be collected in a consistent manner):

plot <- me_data[["plot"]] %>% filter(invyr >= 1999)
cond <- me_data[["cond"]] %>% filter(invyr >= 1999)
tree <- me_data[["tree"]] %>% filter(invyr >= 1999)

The tidyFIA vignette contains some useful functions that can complement your forest data analysis. The plot_stats function returns the basal area (bapa) and aboveground carbon in live trees (tons per acre; cpa_tons). The diameter of the tree (dia) and the amount of carbon in the aboveground portion of the tree (carbon_ag) are multiplied by the FIA plot expansion factor (tpa_unadj) to obtain these values. Carbon is converted to US tons per acre:

plot_stats <- tree %>%
  group_by(plt_cn) %>%
  filter(statuscd == 1) %>% 
    bapa = sum(tpa_unadj * 0.005454 * dia ^ 2, na.rm = TRUE),
    cpa =  sum(tpa_unadj * carbon_ag, na.rm = TRUE)
  ) %>%
    plot %>% select(cn, statecd, unitcd, countycd, plot, invyr),
    by = c("plt_cn" = "cn")
  ) %>%
  mutate(cpa_tons = cpa /2000) %>% 
  replace_na(replace = list(bapa = 0, cpa = 0))

There is a strong positive correlation between the basal area and aboveground carbon in live trees in Maine:

p.carbon <- ggplot(plot_stats, aes(x = bapa, y = cpa_tons)) +
  geom_point() +
  labs (x = "Basal area (sq. ft per acre)",
        y = "Aboveground carbon (tons per acre)")

You might also be interested in the most common species found in Maine. In the code below, the ref_species data frame contains the FIA species code and common names. The six most common species are gathered from the FIA database and then graphed to observe trends between tree diameter and aboveground carbon:

ref_species <- read_ref_table("REF_SPECIES") %>%
  select(spcd, common_name)

common_spp <- me_data[["tree"]] %>%
  group_by(spcd) %>%
  tally %>%
  top_n(6, wt = n) %>%

me_data[["tree"]] %>%
  filter(spcd %in% common_spp) %>%
  left_join(ref_species) %>%
  ggplot(aes(x = dia, y = carbon_ag)) +
    geom_point(alpha = 0.2) +
    facet_wrap(~ common_name) +
    labs (x = "Tree diameter (inches)",
          y = "Aboveground carbon (pounds)")

In Maine, some FIA plots have been remeasured up to five times. The next set of code filters the data to obtain only the most recent inventory by matching the remeasurement id number (rem_id; 1 through 5) with the most recent remeasurement (max_rem_id):

plot2 <- plot_stats %>% 
  group_by(statecd, countycd, plot) %>% 
  mutate(plot_dummy = 1,
         rem_id = cumsum(plot_dummy))

plot3 <- plot2 %>% 
  group_by(statecd, countycd, plot) %>% 
  summarize(max_rem_id = max(rem_id))
plot4 <- inner_join(plot3, plot2) %>%  
  filter(rem_id == max_rem_id)

Across Maine’s 3,592 FIA plots, we can calculate the mean, standard deviation, minimum, and maximum values for the aboveground carbon values. Values are in US tons per acre:

plot_stats_c<- plot4 %>% 
  ungroup() %>% 
  summarize(`FIA plots` = n(), 
            `Mean carbon` = round(mean(cpa_tons, na.rm = T), 1),
            `SD carbon` = round(sd(cpa_tons, na.rm = T), 1),
            `Min carbon` = round(min(cpa_tons, na.rm = T), 1),
            `Max carbon` = round(max(cpa_tons, na.rm = T), 1))
FIA plots Mean carbon SD carbon Min carbon Max carbon
3592 18.1 10.8 0.1 64.4

We can also examine the data by county. The largest number of FIA plots (736) are located in Aroostook County and on average the greatest amount of aboveground carbon (24.8 tons per acre) is found in Cumberland County:

fips <- fips_codes %>% 
  rename(countycd = county_code) %>% 
  mutate(countycd = as.numeric(countycd)) %>% 
  filter(state == "ME")

plot4 <- inner_join(plot4, fips)

plot_stats_c_county<- plot4 %>% 
  group_by(county) %>% 
  summarize(`FIA plots` = n(), 
            `Mean carbon` = round(mean(cpa_tons, na.rm = T), 1),
            `SD carbon` = round(sd(cpa_tons, na.rm = T), 1),
            `Min carbon` = round(min(cpa_tons, na.rm = T), 1),
            `Max carbon` = round(max(cpa_tons, na.rm = T), 1))
county FIA plots Mean carbon SD carbon Min carbon Max carbon
Androscoggin County 53 19.9 10.3 6.4 56.3
Aroostook County 736 15.9 10.0 0.1 64.4
Cumberland County 111 24.8 10.8 2.0 48.5
Franklin County 187 20.8 11.3 0.3 50.4
Hancock County 198 17.4 10.3 1.0 50.0
Kennebec County 104 21.1 11.3 1.3 47.6
Knox County 42 18.2 10.0 2.4 44.9
Lincoln County 53 22.2 10.5 1.6 48.5
Oxford County 251 24.5 11.9 0.4 54.9
Penobscot County 383 16.1 9.7 0.2 52.9
Piscataquis County 491 17.9 10.7 0.2 57.2
Sagadahoc County 32 24.0 10.8 0.2 52.7
Somerset County 445 17.3 10.9 0.2 59.3
Waldo County 81 19.7 10.5 2.0 49.8
Washington County 308 14.3 8.1 0.2 49.5
York County 117 23.4 11.8 1.5 52.5

We can also make a map of the average carbon (US tons per acre) found across Maine’s 16 counties by producing a choropleth map. We’ll do some data manipulation to bind together the state and county FIPS codes found in the FIA data, in addition to some functions from the urbnmapr package:

plot_stats_c_county<- plot4 %>% 
  group_by(statecd, countycd) %>% 
  summarize(`FIA plots` = n(), 
            `Mean carbon` = round(mean(cpa_tons, na.rm = T), 1),
            `SD carbon` = round(sd(cpa_tons, na.rm = T), 1),
            `Min carbon` = round(min(cpa_tons, na.rm = T), 1),
            `Max carbon` = round(max(cpa_tons, na.rm = T), 1))

plot_stats_c_county <- plot_stats_c_county %>% 
  mutate(county_fips = paste(statecd, ifelse(countycd < 10, paste0("00", countycd),
                              paste0("0", countycd)), sep = ""))

carbon_data <- left_join(plot_stats_c_county, counties, by = "county_fips") 

carbon_data %>%
  ggplot(aes(long, lat, group = group, fill = `Mean carbon`)) +
  geom_polygon(color = NA) +
  coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
  labs(fill = "Mean carbon \n(tons per acre)")


The tidyFIA package is a useful one to quickly bring in FIA data into R. Plus, it works easily with the tidyverse suite of functions. Try it out to see how it can help your data analysis.

By Matt Russell. Sign up for my monthly newsletter for in-depth analysis on data and analytics in the forest products industry.

Testing the assumption: Half of the biomass of a tree is carbon

June 18, 2021
analytics carbon forest measurements

Forest carbon cheat sheet: updated for 2021

June 12, 2021
analytics carbon forest measurements forest products climate

How well does the "quick and dirty" standard deviation approximation work?

June 6, 2021
analytics Data science statistics teaching statistics standard deviation