2024 HARPER PRIZE SHORTLIST: For the next two weeks, we are featuring the articles shortlisted for the 2024 Harper Prize. The Harper Prize is an annual award for the best early career research paper published in Journal of Ecology. Karin Guo’s ‘Using machine learning to link climate, phylogeny and leaf area in eucalypts through a 50-fold expansion of leaf trait datasets’ is one of those shortlisted for the award.
About me:
I’m Karina Guo (she/her), now a doctorate student with the Research Centre for Ecosystem Resilience at the Botanic Gardens of Sydney, Australia, and at the University of Sydney, Australia. My interest in ecology started off in zoology with dreams of working in veterinary science. After realising a love for plants, it quickly pivoted to botany, then inevitably plant ecology. The plant ecology aspect has stuck around and expanded into finding different ways of applying techniques to learn more. From the machine learning in this Honours project on leaf size on herbarium sheets, to genetics and evolution in my current doctoral project, I’ve discovered that: 1. Each application brings a new perspective and helps reveal one piece of the bigger ecological puzzle, 2. It can be valuable to pivot and dive deep – new techniques may seem daunting at the start with the new jargon, but going straight in with a project at hand is the best way to learn. Now that I’m working on myrtle rust (Austropuccinia psidii), my ecological research has further shifted and changed, but simultaneously my history in different areas have interweaved, interlaced and shifted my ways of thinking.
https://www.botanicgardens.org.au/our-science/what-we-do/research-centre-ecosystem-resilience

About the paper:
In the shortlisted paper, I used machine learning to extract traits of leaves on the massive eucalypt genera. Traditionally, leaf traits, such as leaf area, are measured tediously by hand. This means that the volume of measurements taken can be highly limited by funding and the labour available. Machine learning was used to bypass this (Fig. 1). By pairing the machine learning model with digitised herbarium images, I was able to expand our trait datasets by 50-fold! This spanned taxonomic and geographical coverage that we lacked in existing datasets that were generated with traditional methods (Fig. 2). From this I was able to interrogate questions that were previously out of reach.


One aspect that I found surprising was the exploration into the evolution of the relationship between trait and climate. Our expanded dataset gave me the ability to investigate this in greater depth. Traditionally, the trait ecology field have tended to approach investigations in trait-climate relationships at the species level or across larger taxonomic groupings such as genera. This has revealed that within species, trait-climate relationships can be starkly different to the same relationships across diverse species! With some species’ relationships trending neutral across climate or even in the opposite direction (Ackerly et al., 2002; An et al., 2021; McDonald et al., 2003; Westerband et al., 2021; Wilde et al., 2023, Fig. 3). Knowing this, I wanted to ask, what happened between these two groupings? How does the vastly varying species trend converge into what we see at the broader level – was it a linear correlation or perhaps an exponential correlation? In my paper, I outline our novel method used to ask this and the findings we deduced. In brief, we saw a large shift in the mean trait-climate slope when grouping species using the most recent common ancestor (MRCA) at the 8.5 MYA in the phylogenetic depths of the dated eucalypt tree (Thornhill et al., 2019). Here, the trait-climate relationship lost the majority of its variation and plateaued into what the trait-climate relationship observed in eucalypts as a whole (Fig. 3)!

The next step in this research area is to expand to different taxonomic groups with different leaf shapes. My previous model was focused on eucalypts, with simple, entire margins. To encompass different leaf shapes the model will need to be retrained to ensure accuracy in recognising and measuring leaves of different characteristics. For example, the presence of lobed and serrated leaves in taxa such as Quercus sp. and Veronica sp., and the compound leaves of Acacia sp. which will require the recognition of a ‘leaf’ and ‘leaflets’
Another key step for this field will be the responsible integration of the results from machine learning attained traits into trait databases. It is important for users of databases to know if there are potentially differences in trait measurements between machine learning models and traditional methods. For instance, in my paper, the model retrieves leaf area trait data on juvenile leaves, and does not limit measurements to fully expanded leaves, which is the standard plant physiology protocol. However, it is highly important that machine learning trait data is easily accessible to users and should be integrated into these datasets. My paper clearly how the significantly larger dataset generated by the use of machine learning can open up new opportunities to explore questions hidden by the scarcity present in traditionally measured datasets.
The combination of these two steps will allow us to expand this project to other taxa. With herbaria across the world digitising their herbarium sheets, this poses as a treasure trove of information just waiting to be uncovered. This will resolve into richer datasets, at a scale previously unimaginable. From this, we can reinterrogate fundamental hypotheses in trait ecology and integrate into modern modelling software, such as improving our understanding of links between traits and global carbon biogeochemistry.
However, it is important for me to note that this project was not an easy task. It was undeniably daunting at the start! Machine learning was a new field not only to myself but also relatively new to ecology. This paper highlights one of the first analytical applications of machine learning in trait ecology with herbarium sheets! From this project, I’ve learnt lots, including learning to appreciate and respect the capabilities and limitations of machine learning in science. It has encouraged me to be creative and see potential applications in all the new work I’ve since embarked on.
Find the other early career researchers and their articles that have been shortlisted for the 2024 Harper Prize here!
References
Ackerly, D., Knight, C., Weiss, S., Barton, K. and Starmer, K. (2002) Leaf size, specific leaf area and microhabitat distribution of chaparral woody plants: contrasting patterns in species level and community level analyses. Oecologia 130, 449–457. https://doi.org/10.1007/s004420100805
An, N., Lu, N., Fu, B., Wang, M. and He, N. (2021) Distinct Responses of Leaf Traits to Environment and Phylogeny Between Herbaceous and Woody Angiosperm Species in China. Front. Plant Sci. 12, 799401. https://doi.org/10.3389/fpls.2021.799401
Falster, D., Gallagher, R., Wenk, E.H., Wright, I.J., Indiarto, D., Andrew, S.C. et al. (2021) AusTraits, a curated plant trait database for the Australian flora. Sci Data 8, 254. https://doi.org/10.1038/s41597-021-01006-6
McDonald, P.G., Fonseca, C.R., Overton, J.McC. and Westoby, M. (2003) Leaf‐size divergence along rainfall and soil‐nutrient gradients: is the method of size reduction common among clades? Funct. Ecol. 17, 50–57. https://doi.org/10.1046/j.1365-2435.2003.00698.x
Thornhill, A.H., Crisp, M.D., Külheim, C., Lam, K.E., Nelson, L.A., Yeates, D.K. and Miller, J.T. (2019) A dated molecular perspective of eucalypt taxonomy, evolution and diversification. Aust. Syst. Bot. 32, 29–48. https://doi.org/10.1071/SB18015
Westerband, A.C., Funk, J.L. and Barton, K.E., (2021) Intraspecific trait variation in plants: a renewed focus on its role in ecological processes. Ann. Bot. 127, 397–410. https://doi.org/10.1093/aob/mcab011
Wilde, B.C., Bragg, J.G. and Cornwell, W. (2023) Analyzing trait‐climate relationships within and among taxa using machine learning and herbarium specimens. Am. J. Bot. 110, e16167. https://doi.org/10.1002/ajb2.16167
Wright, I.J., Dong, N., Maire, V., Prentice, I.C., Westoby, M., Díaz, S., et al. (2017) Global climatic drivers of leaf size. Science 357, 917–921. https://doi.org/10.1126/science.aal4760