Ask how you want your data - and be specific

October 7, 2020
analytics data management Communicating data forest measurements

I have been analyzing the submissions to the University of Minnesota Extension’s Tree and Woodland Carbon Capture Challenge. The Challenge is an activity that asked Minnesotans to measure trees and woodlands to observe how much carbon they sequester over a year.

The audience that participated in this activity included homeowners and private woodland owners with a love for trees. In total, 26 people measured an individual tree and typed their measurements into a Google Form answering the question “What is the diameter of your tree, measured in inches?”

Here are the responses I received:

Entries in response to the question: What is the diameter of your tree?

I received answers that mixed decimals and fractions, had different significant digits, and mixed character and numeric values. Needless to say, the data were messy and difficult to immediately analyze.

The take-home message is to be specific about how you ask for data and give good examples of what you expect. I did not do this for the activity, and while the quality of the data were good, it’s taking more time to cleaning and reorganize the data.

I could have used data validation features in the Google Form I provided. I could have gave an example like “Measure and enter data for your tree to the nearest tenth of an inch, for example, enter”10.7" for a tree that is 10.7 inches in diameter at breast height.

Audience matters too. If I asked professional foresters the same question, I would have likely received a data set of diameters measured to the nearest tenth of an inch, without me asking specifically for it.

This activity is a reminder to know your audience when asking for data. Automating a data analysis workflow by using data validations can benefit the data analyst and lead to fewer problems when data are ultimately used to make decisions.

By Matt Russell. Sign up for my monthly newsletter for in-depth analysis on data and analytics in the forest products industry.

Random forests in a nutshell

September 19, 2020
analytics random forests classification Machine learning Data science

New state-level forest carbon fact sheets available

September 12, 2020
analytics carbon forest products greenhouse gases

States with the biggest gains in forest carbon over the last 30 years

September 7, 2020
analytics carbon forest inventory forestry