Getting your data into R from Google Sheets

May 31, 2020
analytics Data science data management Google Sheets googlesheets4 data import

If you’re like me, more of the data you use on a daily basis are stored in Google Sheets. You might be using fewer Excel spreadsheets and .csv files. I recently researched the most up-to-date ways to import data from an Access database into R. (It had been a while since I imported data from a database.)

Using Google Sheets to store data has several advantages:

According to a 2018 post, analysts that use Google Sheets tend to be younger and work for younger organizations. Excel still dominates the spreadsheet universe, especially within established organizations.

If you perform your forest data analysis in R, getting data imported and in the format you want is often half the battle. This post describes the googlesheets4 package, an R interface to the Google Sheets that is a part of the tidyverse.

The googlesheets4 package

The googlesheets4 package is a re-release of the googlesheets package developed by Jenny Bryan with R Studio. The “4” is designated because it uses the fourth version of the Google Sheets API.

The package can be installed from CRAN:

install.packages("googlesheets4")

and attached with the library() function:

library(googlesheets4)

For this example, I’ll import that data located in this Google Sheet. The data contain a list of the 15 most forested countries, with forest area measured in millions of hectares. The data set will be named forestArea.

First, you’ll need to authenticate with your Google account if you’re using the read_sheet() function to access a Google Sheet. R will attempt to authenticate your Google account through the Tidyverse API:

The read_sheet() will read in the data for you. Those of you familiar with read_csv() or readxl() functions will see the similarities.

forestArea <- read_sheet("https://docs.google.com/spreadsheets/d/12V808zXNxFvWldpsfU3A8vo9ZNVbqFB7LQmTht_Pjbc/edit?usp=sharing")
## Using an auto-discovered, cached token.
## To suppress this message, modify your code or options to clearly consent to the use of a cached token.
## See gargle's "Non-interactive auth" vignette for more details:
## https://gargle.r-lib.org/articles/non-interactive-auth.html
## The googlesheets4 package is using a cached token for matt@arbor-analytics.com.
## Reading from "ForestArea"
## Range "Sheet1"

We can print the data set to view its contents:

forestArea
## # A tibble: 15 x 2
##    Country            ForestAreaMillHect
##    <chr>                           <dbl>
##  1 Russian Federation                815
##  2 Brazil                            494
##  3 Canada                            347
##  4 USA                               310
##  5 China                             208
##  6 Dem. Rep. Congo                   153
##  7 Australia                         125
##  8 Indonesia                          91
##  9 Peru                               74
## 10 India                              71
## 11 Mexico                             66
## 12 Colombia                           59
## 13 Angola                             58
## 14 Bolivia                            55
## 15 Zambia                             49

You can also read in the Sheet ID after you’ve authenticated your account. The Sheet ID is the “long” portion of the Google Sheets URL:

read_sheet("12V808zXNxFvWldpsfU3A8vo9ZNVbqFB7LQmTht_Pjbc")

Now that you have the data, you can plot it:

library(ggplot2)

p.forestArea <- ggplot(forestArea, aes(x = reorder(Country, ForestAreaMillHect), y = ForestAreaMillHect)) +
  geom_bar(stat =  "identity") +
  coord_flip() +
  labs(x = "Country",
       y = "Forest area (million hectares)",
       caption = "Source: UN Food and Agriculture Organization, 2017") +
    theme(panel.background = element_rect(fill = "NA"),
        axis.line = element_line(color = "black"))

p.forestArea

You can also create Google Sheets directly from R using the gs4_create() and sheet_write() functions found in the package.

Other helpful tutorials include:

Conclusion

The googlesheets4 package can import data in Google Sheets directly into R. It uses many functions used in the tidyverse suite of data analysis tools. So long as you have a Google account to make authentication easier, the package is a handy one for importing data from a Google Sheet into R.

By Matt Russell. Email Matt with any questions or comments.

New state-level forest carbon fact sheets available

September 14, 2020
analytics carbon forest products greenhouse gases

States with the biggest gains in forest carbon over the last 30 years

September 7, 2020
analytics carbon forest inventory forestry

31 R packages for forest analysts

August 26, 2020
analytics R forest measurements Data science statistics R packages