Using clustering with logging industry data

June 30, 2020
analytics clustering Machine learning k-means logging forest industry

A forest products company may not always require a precise numeric answer to solve a problem. Oftentimes, understanding a pattern and how data are clustered can suffice.

The benefit of using pattern recognition techniques is that data need not be labeled prior to analyzing them. In contrast to statistical techniques like regression and classification, identifying patterns can unearth new trends in how data are arranged in your organization.

Clustering is an unsupervised ML algorithm that groups items. Data points that get grouped in the same cluster are similar in nature. A variety of clustering algorithms, such as k-means clustering, can identify how to group data.

Analysts can identify the appropriate number of clusters to group data. The process of clustering allows us to recognize patterns that might appear in the data, without needing to analyze it through detailed quantitative analysis.

In the forest products industry, potential applications of clustering and other unsupervised ML techniques include the following:

Identifying patterns through techniques like clustering can reduce the burden of having too much data that are messy and complex. Patterns can help to partition large volumes of data into smaller groups that can be used to more efficiently solve complex tasks.

Case study: Logger demographics and equipment

A recent study of loggers in Minnesota indicated that logging businesses are aging and many of them use old equipment in their operations. The age of a logging business is likely related to other demographics such as the amount of wood volume produced by a logger, how much revenue the business takes in, and the kind of equipment used by the logger.

We can use clustering to help us find patterns or groups within the Minnesota logger data. The figure shows four clusters of loggers based on their years in operation and the age of the newest feller buncher used in their operations:

Clustering can be used to identify logger needs for new harvesting equipment based on their years in operation and the age of the equipment they currently use.

Investigating each cluster provides insight into each group that emerges:

  • Established companies, new equipment (far right). These are established companies that are working with new logging equipment. They have a long history and likely have strong relationships with landowners that supply the wood that they harvest.
  • Younger companies, new equipment (middle, bottom). These are younger companies that are working with new logging equipment. They likely have solid growth opportunities as a company because their equipment can lead to efficient harvesting of wood and increased in-woods productivity.
  • Older companies, older equipment (middle, top). These are older companies that are working with relatively old logging equipment. These loggers may be at the greatest risk of market changes: they have not invested in capital expenditures in their equipment, yet likely have established relationships with landowners that provide wood for harvest.
  • Young companies, old equipment (far left). These are young companies that do not have a long history and are working with old logging equipment. They may be seeking to establish relationships with landowners.

Conclusion

Clustering can identify patterns and unearth new trends in how data are arranged in your organization. Data points are grouped into clusters if they have similar characteristics. It can be a useful technique for forest analysts that don’t necessarily require a numeric answer to analytics problem.

By Matt Russell. Email Matt with any questions or comments.

Polar area diagrams in celebration of Florence Nightingale

June 21, 2020
analytics Data viz forest inventory polar area diagram Communicating data

How do loggers feel? Sentiment analysis on the opinions of logging contractors

June 14, 2020
analytics Data science forest products logging tidytext sentiment analysis

Getting your data into R from Google Sheets

May 31, 2020
analytics Data science data management Google Sheets googlesheets4 data import