May 31, 2020analytics Data science data management Google Sheets googlesheets4 data import
If you’re like me, more of the data you use on a daily basis are stored in Google Sheets. You might be using fewer Excel spreadsheets and .csv files. I recently researched the most up-to-date ways to import data from an Access database into R. (It had been a while since I imported data from a database.)
Using Google Sheets to store data has several advantages:
- Multiple people can collaborate on the same data set,
- Information entered into a Google Form can populate data in a Google Spreadsheet,
- Data can be accessed and edited both online and offline, and
- Google Sheets can be converted to Excel spreadsheets, and vice versa.
According to a 2018 post, analysts that use Google Sheets tend to be younger and work for younger organizations. Excel still dominates the spreadsheet universe, especially within established organizations.
If you perform your forest data analysis in R, getting data imported and in the format you want is often half the battle. This post describes the googlesheets4 package, an R interface to the Google Sheets that is a part of the tidyverse.
The googlesheets4 package
The package can be installed from CRAN:
and attached with the
For this example, I’ll import that data located in this Google Sheet. The data contain a list of the 15 most forested countries, with forest area measured in millions of hectares. The data set will be named
First, you’ll need to authenticate with your Google account if you’re using the
read_sheet() function to access a Google Sheet. R will attempt to authenticate your Google account through the Tidyverse API:
read_sheet() will read in the data for you. Those of you familiar with
readxl() functions will see the similarities.
forestArea <- read_sheet("https://docs.google.com/spreadsheets/d/12V808zXNxFvWldpsfU3A8vo9ZNVbqFB7LQmTht_Pjbc/edit?usp=sharing")
## Using an auto-discovered, cached token. ## To suppress this message, modify your code or options to clearly consent to the use of a cached token. ## See gargle's "Non-interactive auth" vignette for more details: ## https://gargle.r-lib.org/articles/non-interactive-auth.html
## The googlesheets4 package is using a cached token for firstname.lastname@example.org.
## Reading from "ForestArea"
## Range "Sheet1"
We can print the data set to view its contents:
## # A tibble: 15 x 2 ## Country ForestAreaMillHect ## <chr> <dbl> ## 1 Russian Federation 815 ## 2 Brazil 494 ## 3 Canada 347 ## 4 USA 310 ## 5 China 208 ## 6 Dem. Rep. Congo 153 ## 7 Australia 125 ## 8 Indonesia 91 ## 9 Peru 74 ## 10 India 71 ## 11 Mexico 66 ## 12 Colombia 59 ## 13 Angola 58 ## 14 Bolivia 55 ## 15 Zambia 49
You can also read in the Sheet ID after you’ve authenticated your account. The Sheet ID is the “long” portion of the Google Sheets URL:
Now that you have the data, you can plot it:
library(ggplot2) p.forestArea <- ggplot(forestArea, aes(x = reorder(Country, ForestAreaMillHect), y = ForestAreaMillHect)) + geom_bar(stat = "identity") + coord_flip() + labs(x = "Country", y = "Forest area (million hectares)", caption = "Source: UN Food and Agriculture Organization, 2017") + theme(panel.background = element_rect(fill = "NA"), axis.line = element_line(color = "black")) p.forestArea
You can also create Google Sheets directly from R using the
sheet_write() functions found in the package.
Other helpful tutorials include:
googlesheets4 package can import data in Google Sheets directly into R. It uses many functions used in the tidyverse suite of data analysis tools. So long as you have a Google account to make authentication easier, the package is a handy one for importing data from a Google Sheet into R.
By Matt Russell. Email Matt with any questions or comments.