Should we sample vegetation following statistical rules or botanists’ experience? Possibly both!

The post provided by Nicola Alessi, Gianmaria Bonari and Piero Zannini

Italian forest types examples: A) Warm Temperate forest type represented by Quercus ilex-dominated forests in Tuscany; B) Temperate forest type represented by Fagus sylvatica-dominated forests in Northern Apennines; C) Cold Temperate forest type represented by Picea abies-dominated forests in the Alps. Photo credit: Gianmaria Bonari (2022).

This post refers to the article Probabilistic and preferential sampling approaches offer integrated perspectives of Italian forest diversity by Alessi et al., published in the Journal of Vegetation Science (

Field surveying represents one of the most important and exciting parts of the job of vegetation scientists. Indeed, it allows them to explore wildlands, seeing many plant species and communities they belong to. It is undeniable that such experience has endeared generations of botanists. However, since there are neither enough funds nor time to collect complete censuses, ecologists typically collect samples of vegetation to investigate biodiversity patterns and phenomena. By using a randomized procedure, probabilistic sampling ensures the collection of representative samples with an equal chance of being chosen. This approach estimates the most frequent species and aggregates of species in an area. However, complex environmental gradients, biogeography, and human history may generate local anomalies in patterns of diversity distribution which are rarely detectable by chance. It could be the equivalent of looking for a needle in a haystack, but with eyes closed. Thus, the representativeness of samples throughout the vegetation ecology history has rarely been obtained. Typically, ecologists interested in representing the diversity of plant communities used preferential sampling corresponding to the traditional phytosociological method of Josias Braun-Blanquet, in which sampling sites are selected according to expert judgment and vegetation uniformity and composition. This approach better accounts for local environmental conditions and gradients based on expert knowledge that considers not only vegetation uniformity but also geology, pedology, geomorphology, and biogeography.

Modern field ecologists know the feeling of frustration after walking for miles to look for the site where the random vegetation plot falls, and not having the possibility to include in the data collection all the peculiar and rare aspects of diversity encountered on its way. Given the urgent need to monitor both common and rare natural species and habitats required by national and international environmental agencies, we looked for efficient, cost-effective sampling approaches capable of considering the multiple aspects of species diversity. Besides the probabilistic and the preferential approaches, many are the possible choices when sampling vegetation communities. Nevertheless, the comparison of these two methodological categories may help to better understand the criteria for solving this sampling dilemma.

We thus developed a conceptual framework to estimate the common and exclusive information on plant diversity detected by the two selected sampling approaches (see the scheme below). We then calculated the performance for each sampling approach presenting an index estimating the additional information with respect to the common information collected by both approaches. We tested the framework by evaluating the number of habitat-specialist species, diversity of species aggregates, and species richness of the three major forest types of Italy (see the photos at the beginning of this post). Since the difficulty of carrying out ad-hoc probabilistic and preferential vegetation surveys on the whole country, we filtered and aggregated the two data sets from several (inter-)national vegetation databases. The results of the comparison were graphically rendered by using Venn diagrams of the detected diversity information.

Graphical conceptualization of the methodology used to measure the performance of the two sampling approaches. The shared and exclusive biodiversity information resulting from the data sets highlights similarities, differences, and the overall performance of the two sampling approaches. The index of performance is presented to evaluate the additional information with respect to the common information collected by each sampling approach, weighted on their sum.

We found that the two approaches have a good performance for different goals. While the probabilistic approach performed better at detecting species and vegetation diversity, the preferential approach performed better at detecting regional richness and habitat-specialist species. However, the performance of the approaches seems to depend on the complexity of the surveyed area originating from geological, biogeographical, and human history. Increasing complexity requires targeted diversity surveys designed and based on the available knowledge of the area, whether the knowledge is provided by experts or available data. We believe that both approaches are to be possibly used in combination to obtain a multifaceted evaluation of species diversity, where probabilistic approaches can be used to infer conclusions and preferential approaches to describe particular ecological conditions.

Brief personal summaries:

Nicola Alessi is currently a technologist at the Italian Institute for Environmental Protection and Research (Rome, Italy). His scientific interests include ecology and biogeography of vegetation, with a focus on forest vegetation. He is also interested in remote sensing of the environment and nature conservation.

Gianmaria Bonari is currently a researcher at the Free University of Bozen-Bolzano (Italy). His scientific interests revolve around plants and involve a thorough understanding of plant communities and habitats at different spatial scales. He is especially interested in vegetation classification for better conservation. He is a member of the Council of the European Vegetation Archive (EVA), of the European Vegetation Classification Committee, of the Global Vegetation Database (sPlot), and the custodian of the international database CircumMed Forest.

Piero Zannini is currently a postdoctoral researcher and adjunct professor at Bologna University (Italy), where he teaches R programming for statistics courses at the MSc level. He is also taking a continuous professional education course at MIT in data science. He is interested in conservation biology, biogeography, and vegetation science. He has been a founding member of the Italian Chapter of the Society for Conservation Biology, where he has also served as Secretary. You can find him on Twitter (@PieroZannini).