The three kinds of data scientists in forestry

January 2, 2020
Data science jobs careers hiring statistics analytics

Data scientists are highly sought after professionals across tens of thousands of companies. And there’s a difference between what a data scientist does and what a statistician does.

The field of statistics has been around for centuries, while data science programs at universities are just now beginning to graduate their first students.

In statistics, hypotheses are evaluated. In data science, hypotheses are generated. Statistics is often considered a primary data analysis technique, while data science is often considered secondary. Data scientists generate hypotheses while statisticians evaluate hypotheses.

Data scientists aren’t just found in tech companies. Titles that include the word “data science” are increasingly being found in job descriptions across fields in the natural resources.

We’ve hosted the Data Science for Forestry Applications workshop at the Society of American Forester’s Convention every year since 2016. The University of Florida was seeking to hire a data science instructor within it’s School of Forest Resources and Conservation. Forestry departments within universities are increasingly teaching data science skills with programs like R.

Despite the relative infancy of the field, there are different “flavors” of data scientists. The different skills that data scientists have can bring added value to any forestry company. Recently, Elena Grewal outlined three kinds of data scientists in a LinkedIn article.

Elena is the Head of Data Science at Airbnb, a company that receives over 30,000 applications annually for it’s job applications. She and her company have learned a great deal about the different skills that data scientists have.

So what are the different kind of data scientists in forestry? The post describes data scientists in forestry that typically focus in one of three areas: analytics, algorithms, or inference. It’s a reflection of my own observations across several forestry organizations.

The analytics-focused forest data scientist

The analytics-focused forest data scientist builds tools that are operational. Their roles within organizations are often focused and they are typically business-minded. While they may not have advanced degrees in statistics or computer science, these professionals can often see how data are integrated across an organization.

Analytics-focused data scientists also tend to be great communicators. In short, they have the ability to tell stories with data. They may have skills in data visualization, be strong writers, and may work across several teams in your forestry organization.

These data scientists are likely using programs like R, Excel, GIS, and Tableau on an everday basis.

The algorithms-focused forest data scientist

The algorithms-focused forest data scientist develops tools that allow an oganization to get the most out of their data. They may have a background in computer science and/or information technology. Their roles within organizations are often designed so that they collaborate with the field crews that collect the data in the woods and the analysts that summarize the data in the office.

These professionals often lead efforts in data management and quality assurance within a company. They may also use data to perform machine learning tasks. A large part of their role is likely in helping others within their organization to use data effectively across diverse platforms. These platforms could include shared internal networks and cloud-based software.

These data scientists are likely using programs like R, Python, MS Access, SQL, and Amazon Web Services on an everday basis.

The inference-focused forest data scientist

The inference-focused forest data scientist investigates causes and effects with an organization’s data. This kind of forest data scientist often has significant training in statistics and modeling and they likely have the word “biometrician” in their job title. They are often the first ones to analyze and model data after its been collected, and probably provided input into how the data were collected in the first place (e.g., by determining how much data to collect).

These professionals and their perspectives are often valued in a consulting role. They may also be an integral part of a company’s research and development efforts and may represent the company as a part of industry-university research cooperatives.

These data scientists are likely using programs like R and SAS on an everday basis.

Conclusion

As the field of data science has grown, specialties have arose with individuals typically focusing on either analytics, algorithms, or inference. In small forestry organizations, one person may fill two of these roles. In smaller organizations, one person may fill the roles of all three.

How do people in your ogranization fit within these three data science roles? Leave a comment below or email Matt with any questions or comments.

By Matt Russell. Leave a comment below or email Matt with any questions or comments.

How to determine accuracy of classification models (with forestry data)

January 13, 2020
classification statistics Machine learning regression

Ruffed grouse and West Nile: A review of the binomial distribution

November 27, 2019
binomial statistics ruffed grouse sampling

New logging companies with old equipment: Finding patterns in logging businesses using clustering

An example of using machine learning in the forest products industry.
Data science Machine learning clustering k-means logging
comments powered by Disqus