Back to index

Before starting
If you would like to see this tutorial in your viewer within RStudio, please execute the following chunk of code:

install.packages("rstudioapi") # installs a required R-package
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/community/index.html",
              destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/community/lesson3.html",
              destfile = file.path(dir, "lesson3.html"))
htmlFile <- file.path(dir, "lesson3.html")
rstudioapi::viewer(htmlFile)
In this class, we will go through the two main strategies to compare species richness while taking into account differences in sampling effort: rarefaction and species accumulation curves.
How to compare these two plots?

How to compare these two plots?


We apply these techniques to the Barro Colorado Island dataset. Refer to the practical 2 for more details regarding this dataset.

Import data

BCI <- read.table("data/BCI_Data.csv",
                  header = TRUE, # does the file have column names
                  sep = ",", # what separates the column
                  row.names = 1) # does the file have row names
BCI <- t(BCI) # transposing the matrix to have sites in rows

BCI[4:6, 5:7]
##       Adelia.triloba Aegiphila.panamensis Alchornea.costaricensis
## Plot4              3                    0                      18
## Plot5              1                    1                       3
## Plot6              0                    0                       2

1. Rarefaction

Rarefaction estimates the species richness across sites based on a pre-defined minimum number of individuals.
Rarefaction principle.

Rarefaction principle.

This method relies on the construction of rarefaction curves that opposes the species richness in a plot to the sample size, here the number of individuals. From these rarefaction curves, we can then extract a rarefied species richness, comparable across plots with different sampling effort.

1.1. Rarefaction curves

First, we manually construct a rarefaction curve for one plot and then use integrated functions from vegan package.

1.1.1. Based on a for-loop

Let’s consider the plot 35. What species does this plot have?

Solution
plot35 <- BCI[35, ][BCI[35, ] > 0]
head(plot35)
##           Adelia.triloba  Alchornea.costaricensis         Alseis.blackiana 
##                        6                        8                       93 
##            Apeiba.aspera         Apeiba.tibourbou Astrocaryum.standleyanum 
##                        5                        1                        3


For the rarefaction, we will randomly sample species from the list of individuals in the plot 35. We can therefore repeat each species name according its number of individuals.

sp35 <- rep(names(plot35), plot35)
sp35[1:10]
##  [1] "Adelia.triloba"          "Adelia.triloba"         
##  [3] "Adelia.triloba"          "Adelia.triloba"         
##  [5] "Adelia.triloba"          "Adelia.triloba"         
##  [7] "Alchornea.costaricensis" "Alchornea.costaricensis"
##  [9] "Alchornea.costaricensis" "Alchornea.costaricensis"
length(sp35)
## [1] 601

Now that we have the pool of individuals for the plot 35, we will draw random samples of various sizes from this pool and calculate the species richness at each step.

For example, if we sample randomly 10 individuals from the pool, how many species do we get?

set.seed(1)
ex_sample <- sample(x = sp35, size = 10)
ex_sample
##  [1] "Cecropia.insignis"    "Platypodium.elegans"  "Inga.sapindoides"    
##  [4] "Gustavia.superba"     "Gustavia.superba"     "Faramea.occidentalis"
##  [7] "Gustavia.superba"     "Gustavia.superba"     "Macrocnemum.roseum"  
## [10] "Gustavia.superba"
unique(ex_sample)
## [1] "Cecropia.insignis"    "Platypodium.elegans"  "Inga.sapindoides"    
## [4] "Gustavia.superba"     "Faramea.occidentalis" "Macrocnemum.roseum"
length(unique(ex_sample))
## [1] 6

To repeat this step over many sample sizes, we can use a for-loop. The size of the random samples vary from one individual to the actual number of individuals of the community.

# Empty data.frame that will contain the results
rar <- data.frame()

# for-loop varying across the different sample sizes (i is the sample size)
for(i in 1:length(sp35)){
  tmp <- sample(x = sp35, size = i)
  # binding results
  rar <- rbind(rar,
               data.frame(sample_size = i,
                          rich = length(unique(tmp))))
}

We can now plot the rarefaction curve. The rarefied richness would fall where the minimum number of individuals is reached.

plot(rar$sample_size, rar$rich,
     type = "l", pch = 16,
     main = "Rarefaction curve for plot 35",
     xlab = "Sample size (nb of individuals)",
     ylab = "Species richness")
abline(v = min(rowSums(BCI)), col = "firebrick3")
legend(x = 400, y = 40, legend = "Minimum nb\nof individuals",
       col = "firebrick3", lwd = 1, bty = "n")

1.1.2. Integrated vegan functions

The code above is quite tedious, and it could get even longer if we wanted to draw the rarefaction curves for every plot. Fortunately, there are ready-to-use functions implemented in the vegan package. The one to draw rarefaction curve is called rarecurve().

Before using it, we first need to define a rarefied number, which is usually the minimum number of individuals found in a community.

What is the minimum number of individuals found in one plot?
Solution
# Minimum community size
min_ind <- min(rowSums(BCI))
min_ind
## [1] 340


We can now compare our manual rarefaction curve on plot 35 with the one provided by vegan::rarecurve().

library(vegan)

sum(BCI[35, ][BCI[35, ] > 0]) # 601 individuals
## [1] 601
length(BCI[35, ][BCI[35, ] > 0]) # 83 species
## [1] 83
# Using a binary site/species matrix
BCI_bin <- BCI
BCI_bin[BCI_bin > 0] <- 1
sum(BCI_bin[35, ])
## [1] 83
plot(rar$sample_size, rar$rich, type = "l", pch = 16, lwd = 0.5,
     xlab = "", ylab = "")
par(new = TRUE)
rarecurve(BCI[35, , drop = FALSE], # drop = FALSE to keep the matrix format
          sample = min_ind,
          main = "Plot 35", col = "dodgerblue", cex = 1, lwd = 3)
legend(x = 400, y = 40, title = "Rarefaction", legend = c("manual", "vegan"),
       col = c("black", "dodgerblue"), lwd = 2, bty = "n")

Both curves coincide, although the one provided by vegan is smoother. The vertical line correspond to the minimum number of individuals in one plot. It crosses with the horizontal line at the corresponding species richness, which is therefore the rarefied richness.

We could also plot these rarefaction curves for all the BCI plots but it would not be readable, due to the large amount of plots. To get an idea of the variation between rarefaction curves, we can compare two extreme plots.

Which plots have the minimum and maximum number of individuals?

Solution
which.min(rowSums(BCI))
## Plot23 
##     23
which.max(rowSums(BCI))
## Plot35 
##     35


How many individuals do they have?

Solution
sum(BCI[which.min(rowSums(BCI)), ])
## [1] 340
sum(BCI[which.max(rowSums(BCI)), ])
## [1] 601


Let’s draw the rarefaction curves for both these plots.

rarecurve(BCI[c(23, 35), ], sample = min_ind,
          main = "Plots 23 and 35", col = "blue", cex = 0.6)

We can observe that the rarefied richness of the plot 35 is around 60 species, which is actually far less than the amount of species plot 23 has for the same number of individuals.

1.2. Rarefied versus observed richness

Now that we visualized the rarefied richness with the rarefaction curves, let’s extract it. The function rarefy() returns the rarefied richness per community. We can plot the rarefied species richness against the observed one.

S <- specnumber(BCI) # observed species richness
Srare <- rarefy(BCI, min_ind) # rarefied richness

plot(S, Srare, main = "Observed vs Rarefied Richness",
     xlab = "Observed No. of Species",
     ylab = "Rarefied No. of Species", pch = 16)
abline(0, 1) # 1:1 line

All rarefied richness, except the one for the plot having the smallest number of individuals, falls below the 1:1.

2. Species richness estimators

While rarefaction is based on subsampling communities, species accumulation curves and species richness estimators are conversely used to extrapolate richness beyond the limited sampling size. This part shows how to estimate the number of unseen species.

SAC principle.

SAC principle.

So, how do we extrapolate species richness using R?

2.1. Plot level

Based on the observed species richness and the distribution of species abundances, we can use several indices to extrapolate the species richness. One of them is the Chao’s index, which formula is:

\[ Chao = S_o + \frac{f_1(f_1-1)}{2(f_2+1)} \]

with \(S_o\) the observed species richness in a given plot, \(f_1\) the number of species with one single individual and \(f_2\) the number of species with 2 individuals.

Let’s first calculate manually this index on the plot 35 and then use dedicated vegan functions.

2.1.1. Chao manually

# Observed species richness in plot 35
s0 <- length(plot35)

# Number of singletons
plot35[plot35 == 1]
##              Apeiba.tibourbou          Astronium.graveolens 
##                             1                             1 
##         Beilschmiedia.pendula           Brosimum.alicastrum 
##                             1                             1 
##       Calophyllum.longifolium     Cavanillesia.platanifolia 
##                             1                             1 
##               Ceiba.pentandra       Chrysophyllum.argenteum 
##                             1                             1 
##      Coccoloba.manzanillensis         Coussarea.curvigemmia 
##                             1                             1 
##          Croton.billbergianus             Cupania.sylvatica 
##                             1                             1 
##            Drypetes.standleyi     Erythroxylum.macrophyllum 
##                             1                             1 
##          Eugenia.coloradensis              Ficus.yoponensis 
##                             1                             1 
##           Guapira.standleyana             Hirtella.triandra 
##                             1                             1 
##                Inga.marginata                 Inga.punctata 
##                             1                             1 
##              Inga.umbellifera          Lacistema.aggregatum 
##                             1                             1 
##          Lacmellea.panamensis             Licania.hypoleuca 
##                             1                             1 
##           Maquira.costaricana                 Ocotea.cernua 
##                             1                             1 
##                Ocotea.oblonga              Ormosia.coccinea 
##                             1                             1 
##         Phoebe.cinnamomifolia           Picramnia.latifolia 
##                             1                             1 
##         Platymiscium.pinnatum           Platypodium.elegans 
##                             1                             1 
##          Protium.costaricense           Protium.tenuifolium 
##                             1                             1 
##            Sapium.glandulosum               Solanum.hayesii 
##                             1                             1 
##           Spondias.radlkoferi Swartzia.simplex.var.ochnacea 
##                             1                             1 
##             Tabebuia.guayacan          Tachigali.versicolor 
##                             1                             1 
##       Tetragastris.panamensis             Trichilia.pallida 
##                             1                             1 
##          Triplaris.cumingiana         Zanthoxylum.panamense 
##                             1                             1
f1 <- length(plot35[plot35 == 1])

# Number of doublon species
f2 <- length(plot35[plot35 == 2])

# Chao value
chao35 <- s0 + (f1*(f1 - 1))/(2*(f2 + 1))
chao35
## [1] 169

2.1.2. With vegan package

estimateR() from vegan allows us to compute several species richness estimators, including Chao’s index.

sr_estim <- estimateR(BCI)
sr_estim[, c(23, 35)] # two plots 23 and 35
##              Plot23     Plot35
## S.obs     99.000000  83.000000
## S.chao1  140.130435 169.000000
## se.chao1  16.395758  37.212366
## S.ACE    155.213269 149.766025
## se.ACE     7.091696   6.510576
identical(chao35, sr_estim[, 35][["S.chao1"]])
## [1] TRUE

With these results, we can plot the extrapolated species richness versus the observed one.

# Observed species richness
S <- specnumber(BCI)

plot(S, sr_estim["S.chao1", ], main = "Observed vs Extrapolated Richness",
     xlab = "Observed No. of Species",
     ylab = "Extrapolated No. of Species", pch = 16, col = "dodgerblue")
abline(0, 1) # 1:1 line
legend(x = 105, y = 130, title = "Index", legend = c("Chao1"),
       col = c("dodgerblue"), pch = c(16), bty = "n")

2.2. Pool level

specpool() returns several estimators of species richness based on species accumulation curves at the pool level. poolaccum() returns species richness estimators for different sample size (in terms of number of sampled communities) for the whole pool of sample sites.

specpool(BCI)
##     Species     chao chao.se  jack1 jack1.se    jack2     boot  boot.se  n
## All     225 236.3732 6.54361 245.58 5.650522 247.8722 235.6862 3.468888 50
plot(poolaccum(BCI),
     xlab = "Number of 1ha plots",
     ylab = "Species richness")

2.3. Sample coverage

What index to chose based on the sample coverage, i.e. how many species does each plot cover??

# Total sp number
pool <- ncol(BCI)

# Richness per plots
rowSums(BCI_bin)
##  Plot1  Plot2  Plot3  Plot4  Plot5  Plot6  Plot7  Plot8  Plot9 Plot10 Plot11 
##     93     84     90     94    101     85     82     88     90     94     87 
## Plot12 Plot13 Plot14 Plot15 Plot16 Plot17 Plot18 Plot19 Plot20 Plot21 Plot22 
##     84     93     98     93     93     93     89    109    100     99     91 
## Plot23 Plot24 Plot25 Plot26 Plot27 Plot28 Plot29 Plot30 Plot31 Plot32 Plot33 
##     99     95    105     91     99     85     86     97     77     88     86 
## Plot34 Plot35 Plot36 Plot37 Plot38 Plot39 Plot40 Plot41 Plot42 Plot43 Plot44 
##     92     83     92     88     82     84     80    102     87     86     81 
## Plot45 Plot46 Plot47 Plot48 Plot49 Plot50 
##     81     86    102     91     91     93
# Sample coverage per plot
100*rowSums(BCI_bin)/pool
##    Plot1    Plot2    Plot3    Plot4    Plot5    Plot6    Plot7    Plot8 
## 41.33333 37.33333 40.00000 41.77778 44.88889 37.77778 36.44444 39.11111 
##    Plot9   Plot10   Plot11   Plot12   Plot13   Plot14   Plot15   Plot16 
## 40.00000 41.77778 38.66667 37.33333 41.33333 43.55556 41.33333 41.33333 
##   Plot17   Plot18   Plot19   Plot20   Plot21   Plot22   Plot23   Plot24 
## 41.33333 39.55556 48.44444 44.44444 44.00000 40.44444 44.00000 42.22222 
##   Plot25   Plot26   Plot27   Plot28   Plot29   Plot30   Plot31   Plot32 
## 46.66667 40.44444 44.00000 37.77778 38.22222 43.11111 34.22222 39.11111 
##   Plot33   Plot34   Plot35   Plot36   Plot37   Plot38   Plot39   Plot40 
## 38.22222 40.88889 36.88889 40.88889 39.11111 36.44444 37.33333 35.55556 
##   Plot41   Plot42   Plot43   Plot44   Plot45   Plot46   Plot47   Plot48 
## 45.33333 38.66667 38.22222 36.00000 36.00000 38.22222 45.33333 40.44444 
##   Plot49   Plot50 
## 40.44444 41.33333
# Average sample coverage
mean(100*rowSums(BCI_bin)/pool)
## [1] 40.34667

Jacknife order 3 would be recommended (see lecture).

3. References

vegan R package

Back to Index

Index