Back to index

Before starting
If you would like to see this tutorial in your viewer within RStudio, please execute the following chunk of code:

install.packages("rstudioapi") # installs a required R-package
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/community/index.html",
              destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/community/lesson3.html",
              destfile = file.path(dir, "lesson3.html"))
htmlFile <- file.path(dir, "lesson3.html")
rstudioapi::viewer(htmlFile)
In this class, we will go through the two main strategies to compare species richness while taking into account differences in sampling effort: rarefaction and species accumulation curves.
How to compare these two plots?

How to compare these two plots?


We will apply these techniques to the Barro Colorado Island (BCI) dataset. Refer to the practical 2 for more details regarding this dataset. (If you still have not downloaded the data, please download the file BCI_Data.csv on Stud-IP and copy it into the data directory in your project folder.)

Import data

BCI <- read.table("data/BCI_Data.csv",
                  header = TRUE, # does the file have column names
                  sep = ",", # what separates the column
                  row.names = 1) # does the file have row names
# Remember from Lesson 2 that this data set has species as rows and plots as columns. For 
# convenience, we want the opposite, so we have to transpose our matrix: 

BCI <- t(BCI) # transposing the matrix to have sites (plots) in rows and species as columns

BCI[4:6, 5:7] # plotting plots 4 to 6, and species in columns 5 to 7
##       Adelia.triloba Aegiphila.panamensis Alchornea.costaricensis
## Plot4              3                    0                      18
## Plot5              1                    1                       3
## Plot6              0                    0                       2

1. Rarefaction

Rarefaction estimates the species richness across sites based on a pre-defined minimum number of individuals.
Rarefaction principle.

Rarefaction principle.

This method relies on the construction of rarefaction curves that opposes the species richness in a plot to the sample size, here the number of individuals. From these rarefaction curves, we can then extract a rarefied species richness, comparable across plots with different sampling effort.

1.1. Rarefaction curves

First, we manually construct a rarefaction curve for one plot and then use integrated functions from vegan package.

1.1.1. Based on a for-loop

Let’s consider the plot 35. What species does this plot have (i.e. what species are actually present in this plot)?

Solution
plot35 <- BCI[35, ][BCI[35, ] > 0]
head(plot35)
##           Adelia.triloba  Alchornea.costaricensis         Alseis.blackiana 
##                        6                        8                       93 
##            Apeiba.aspera         Apeiba.tibourbou Astrocaryum.standleyanum 
##                        5                        1                        3


What is the Species Richness of this plot?

Solution
length(BCI[35, ][BCI[35, ] > 0])
## [1] 83


How many individuals of these species we have in total?

Solution
sum(plot35)
## [1] 601


For the rarefaction, we will randomly sample species from the list of individuals in the plot 35. To do this, we first repeat each species name according its number of individuals.

sp35 <- rep(names(plot35), plot35) # Generate a new character vector in which each species name
# is repeated based on its number of individuals
sp35[1:10] 
##  [1] "Adelia.triloba"          "Adelia.triloba"         
##  [3] "Adelia.triloba"          "Adelia.triloba"         
##  [5] "Adelia.triloba"          "Adelia.triloba"         
##  [7] "Alchornea.costaricensis" "Alchornea.costaricensis"
##  [9] "Alchornea.costaricensis" "Alchornea.costaricensis"
length(sp35) # the length of the vector is equal to the total number of individuals in plot 35
## [1] 601

Now that we have the pool of individuals for the plot 35, we will draw random samples of various sizes from this pool and calculate the species richness at each step.

For example, if we sample randomly 10 individuals from the pool, how many species do we get?

set.seed(1) #this function permits to specify a Random Generator Number (RNG) seed. It's useful when 
# performing "random" sampling to obtain same (i.e. comparable) results if we perform the analysis again. 
ex_sample <- sample(x = sp35, size = 10) # we randomly sample 10 individuals from vector sp35
ex_sample
##  [1] "Cecropia.insignis"    "Platypodium.elegans"  "Inga.sapindoides"    
##  [4] "Gustavia.superba"     "Gustavia.superba"     "Faramea.occidentalis"
##  [7] "Gustavia.superba"     "Gustavia.superba"     "Macrocnemum.roseum"  
## [10] "Gustavia.superba"
unique(ex_sample) # unique() functions returns only the "unique" values (i.e. filters duplicate values),
## [1] "Cecropia.insignis"    "Platypodium.elegans"  "Inga.sapindoides"    
## [4] "Gustavia.superba"     "Faramea.occidentalis" "Macrocnemum.roseum"
# so in this case the species that are present in our random subsample
length(unique(ex_sample)) # Which actually tells us the SPECIES RICHNESS of our subsample
## [1] 6

To repeat this step over many sample sizes, we can use a for-loop. The size of the random samples vary from one individual to the actual number of individuals of the community.

# Empty data.frame that will contain the results
rar <- data.frame()

# for-loop varying across the different sample sizes (i is the sample size)
for(i in 1:length(sp35)){
  tmp <- sample(x = sp35, size = i)
  # binding results
  rar <- rbind(rar,  # rbind () combines objects by rows 
               data.frame(sample_size = i,
                          rich = length(unique(tmp))))
}

We can now plot the rarefaction curve. The rarefied richness would fall where the minimum number of individuals (considering all plots) is reached.

plot(rar$sample_size, rar$rich,
     type = "l", pch = 16,
     main = "Rarefaction curve for plot 35",
     xlab = "Sample size (nb of individuals)",
     ylab = "Species richness")
abline(v = min(rowSums(BCI)), col = "firebrick3") # plot a vertical line corresponding to the minimum number of individuals found in a plot   
legend(x = 400, y = 40, legend = "Minimum nb\nof individuals in all plots",
       col = "firebrick3", lwd = 1, bty = "n")

1.1.2. Integrated vegan functions

The code above is quite tedious, and it could get even longer if we wanted to draw the rarefaction curves for every plot. Fortunately, there are ready-to-use functions implemented in the vegan package. The one to draw rarefaction curve is called rarecurve().

Before using it, we first need to define a rarefied number, which is usually the minimum number of individuals found in a community.

What is the minimum number of individuals found in one plot?
Solution
# Minimum community size
min_ind <- min(rowSums(BCI))
min_ind
## [1] 340


To what plot it correspond?
Solution
sort(rowSums(BCI), decreasing = F) # Plot 23
## Plot23 Plot18 Plot29 Plot12 Plot17 Plot28 Plot24 Plot11 Plot41 Plot26 Plot43 
##    340    347    364    366    381    387    392    401    402    407    407 
## Plot21  Plot9 Plot13 Plot44  Plot6 Plot42 Plot48  Plot7 Plot27 Plot22 Plot31 
##    408    409    409    409    412    414    415    416    417    418    421 
## Plot39 Plot47 Plot49 Plot20 Plot36 Plot46  Plot8 Plot50 Plot19  Plot2 Plot37 
##    424    425    427    429    430    430    431    432    433    435    435 
## Plot33 Plot16 Plot14 Plot25 Plot45 Plot34 Plot38  Plot1 Plot32 Plot15  Plot3 
##    436    437    438    442    444    447    447    448    459    462    463 
## Plot30 Plot10 Plot40  Plot5  Plot4 Plot35 
##    475    483    489    505    508    601


We can now compare our manual rarefaction curve on plot 35 with the one provided by vegan::rarecurve().

library(vegan)

sum(BCI[35, ][BCI[35, ] > 0]) # 601 individuals
## [1] 601
length(BCI[35, ][BCI[35, ] > 0]) # 83 species
## [1] 83
# Our manual rarefaction curve for plot 35
plot(rar$sample_size, rar$rich, type = "l", pch = 16, lwd = 0.5,
     xlab = "", ylab = "")  

par(new = T) # parameter to specify that the next plot will be plotted over our previous one

rarecurve(BCI[35, , drop = FALSE], # drop = FALSE to keep the matrix format
          sample = min_ind, # specify the minimum n of individuals
          main = "Plot 35", col = "dodgerblue", cex = 1, lwd = 3)

legend(x = 400, y = 40, title = "Rarefaction", legend = c("manual", "vegan"),
       col = c("black", "dodgerblue"), lwd = 2, bty = "n")

Both curves coincide, although the one provided by vegan is smoother. The vertical line correspond to the minimum number of individuals in one plot. It crosses with the horizontal line at the corresponding species richness, which is therefore the rarefied richness.

We could also plot these rarefaction curves for all the BCI plots but it would not be readable, due to the large amount of plots. To get an idea of the variation between rarefaction curves, we can compare two extreme plots.

Which plots have the minimum and maximum number of individuals?

Solution
which.min(rowSums(BCI))
## Plot23 
##     23
which.max(rowSums(BCI))
## Plot35 
##     35


How many individuals do they have?

Solution
sum(BCI[which.min(rowSums(BCI)), ])
## [1] 340
sum(BCI[which.max(rowSums(BCI)), ])
## [1] 601


Let’s draw the rarefaction curves for both these plots.

rarecurve(BCI[c(23, 35), ], 
          sample = min_ind,
          main = "Plots 23 and 35", col = "blue", cex = 0.6)

We can observe that the rarefied richness of the plot 35 is around 60 species, which is actually far less than the amount of species plot 23 has for the same number of individuals.

1.2. Rarefied versus observed richness

Now that we visualized the rarefied richness with the rarefaction curves, let’s extract it. The function rarefy() returns the rarefied richness per community. We can plot the rarefied species richness against the observed one.

S <- specnumber(BCI) # observed species richness
head(S)
## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6 
##    93    84    90    94   101    85
Srare <- rarefy(BCI, min_ind) # rarefied richness
head(Srare)
##    Plot1    Plot2    Plot3    Plot4    Plot5    Plot6 
## 84.33992 76.53165 79.11504 82.46571 86.90901 78.50953
plot(S, Srare, main = "Observed vs Rarefied Richness",
     xlab = "Observed No. of Species",
     ylab = "Rarefied No. of Species", pch = 16)
abline(0, 1) # plotting a 1:1 line

All rarefied richness, except the one for the plot having the smallest number of individuals, falls below the 1:1.

2. Species richness estimators

While rarefaction is based on subsampling communities, species accumulation curves and species richness estimators are conversely used to extrapolate richness beyond the limited sampling size. This part shows how to estimate the number of unseen species.

SAC principle.

SAC principle.

So, how do we extrapolate species richness using R?

2.1. Plot level

Based on the observed species richness and the distribution of species abundances, we can use several indices to extrapolate the species richness. One of them is the Chao’s index, which formula is:

\[ Chao = S_o + \frac{f_1(f_1-1)}{2(f_2+1)} \]

with \(S_o\) the observed species richness in a given plot, \(f_1\) the number of species with only one single individual (singletons) and \(f_2\) the number of species with only 2 individuals (doubletons). This index adds an estimate of unobserved species (calculated from the singleton/doubleton ratio) to the count of observed species, this way providing a more “realistic” richness value.

Let’s first calculate manually this index on the plot 35 and then use dedicated vegan functions.

2.1.1. Chao manually

# Observed species richness in plot 35
s0 <- length(plot35)

# Number of singletons
plot35[plot35 == 1]
##              Apeiba.tibourbou          Astronium.graveolens 
##                             1                             1 
##         Beilschmiedia.pendula           Brosimum.alicastrum 
##                             1                             1 
##       Calophyllum.longifolium     Cavanillesia.platanifolia 
##                             1                             1 
##               Ceiba.pentandra       Chrysophyllum.argenteum 
##                             1                             1 
##      Coccoloba.manzanillensis         Coussarea.curvigemmia 
##                             1                             1 
##          Croton.billbergianus             Cupania.sylvatica 
##                             1                             1 
##            Drypetes.standleyi     Erythroxylum.macrophyllum 
##                             1                             1 
##          Eugenia.coloradensis              Ficus.yoponensis 
##                             1                             1 
##           Guapira.standleyana             Hirtella.triandra 
##                             1                             1 
##                Inga.marginata                 Inga.punctata 
##                             1                             1 
##              Inga.umbellifera          Lacistema.aggregatum 
##                             1                             1 
##          Lacmellea.panamensis             Licania.hypoleuca 
##                             1                             1 
##           Maquira.costaricana                 Ocotea.cernua 
##                             1                             1 
##                Ocotea.oblonga              Ormosia.coccinea 
##                             1                             1 
##         Phoebe.cinnamomifolia           Picramnia.latifolia 
##                             1                             1 
##         Platymiscium.pinnatum           Platypodium.elegans 
##                             1                             1 
##          Protium.costaricense           Protium.tenuifolium 
##                             1                             1 
##            Sapium.glandulosum               Solanum.hayesii 
##                             1                             1 
##           Spondias.radlkoferi Swartzia.simplex.var.ochnacea 
##                             1                             1 
##             Tabebuia.guayacan          Tachigali.versicolor 
##                             1                             1 
##       Tetragastris.panamensis             Trichilia.pallida 
##                             1                             1 
##          Triplaris.cumingiana         Zanthoxylum.panamense 
##                             1                             1
f1 <- length(plot35[plot35 == 1])

# Number of doubletons species
f2 <- length(plot35[plot35 == 2])

# Chao value
chao35 <- s0 + (f1*(f1 - 1))/(2*(f2 + 1))
chao35
## [1] 169

2.1.2. With vegan package

estimateR() from vegan allows us to compute several species richness estimators, including Chao’s index.

sr_estim <- estimateR(BCI) # Computes indices for all plots in our data set
sr_estim[, c(23, 35)] # two plots 23 and 35
##              Plot23     Plot35
## S.obs     99.000000  83.000000
## S.chao1  140.130435 169.000000
## se.chao1  16.395758  37.212366
## S.ACE    155.213269 149.766025
## se.ACE     7.091696   6.510576
identical(chao35, sr_estim[, 35][["S.chao1"]]) # Is the result for estimatedR() equal to the one we obtained manually? 
## [1] TRUE

With these results, we can plot the extrapolated species richness versus the observed one.

# Observed species richness
S <- specnumber(BCI)

plot(S, sr_estim["S.chao1", ], main = "Observed vs Extrapolated Richness",
     xlab = "Observed No. of Species",
     ylab = "Extrapolated No. of Species", pch = 16, col = "dodgerblue")
abline(0, 1) # 1:1 line
legend(x = 105, y = 130, title = "Index", legend = c("Chao1"),
       col = c("dodgerblue"), pch = c(16), bty = "n")

2.2. Pool level

specpool() returns several estimators of species richness based on species accumulation curves at the pool level (i.e. considering all the sites!). poolaccum() returns species richness estimators for different sample size (in terms of number of sampled communities) for the whole pool of sample sites.

specpool(BCI)
##     Species     chao chao.se  jack1 jack1.se    jack2     boot  boot.se  n
## All     225 236.3732 6.54361 245.58 5.650522 247.8722 235.6862 3.468888 50
plot(poolaccum(BCI),
     xlab = "Number of 1ha plots",
     ylab = "Species richness")

2.3. Sample coverage

What index to chose based on the sample coverage (see Lecture 3 Slide 40), i.e. how many species does each plot cover??

# Total sp number
pool <- ncol(BCI)

# Using a binary site/species matrix
BCI_bin <- BCI
BCI_bin[BCI_bin > 0] <- 1
sum(BCI_bin[35, ])
## [1] 83
# Richness per plots
rowSums(BCI_bin)
##  Plot1  Plot2  Plot3  Plot4  Plot5  Plot6  Plot7  Plot8  Plot9 Plot10 Plot11 
##     93     84     90     94    101     85     82     88     90     94     87 
## Plot12 Plot13 Plot14 Plot15 Plot16 Plot17 Plot18 Plot19 Plot20 Plot21 Plot22 
##     84     93     98     93     93     93     89    109    100     99     91 
## Plot23 Plot24 Plot25 Plot26 Plot27 Plot28 Plot29 Plot30 Plot31 Plot32 Plot33 
##     99     95    105     91     99     85     86     97     77     88     86 
## Plot34 Plot35 Plot36 Plot37 Plot38 Plot39 Plot40 Plot41 Plot42 Plot43 Plot44 
##     92     83     92     88     82     84     80    102     87     86     81 
## Plot45 Plot46 Plot47 Plot48 Plot49 Plot50 
##     81     86    102     91     91     93
# Sample coverage per plot
100*rowSums(BCI_bin)/pool
##    Plot1    Plot2    Plot3    Plot4    Plot5    Plot6    Plot7    Plot8 
## 41.33333 37.33333 40.00000 41.77778 44.88889 37.77778 36.44444 39.11111 
##    Plot9   Plot10   Plot11   Plot12   Plot13   Plot14   Plot15   Plot16 
## 40.00000 41.77778 38.66667 37.33333 41.33333 43.55556 41.33333 41.33333 
##   Plot17   Plot18   Plot19   Plot20   Plot21   Plot22   Plot23   Plot24 
## 41.33333 39.55556 48.44444 44.44444 44.00000 40.44444 44.00000 42.22222 
##   Plot25   Plot26   Plot27   Plot28   Plot29   Plot30   Plot31   Plot32 
## 46.66667 40.44444 44.00000 37.77778 38.22222 43.11111 34.22222 39.11111 
##   Plot33   Plot34   Plot35   Plot36   Plot37   Plot38   Plot39   Plot40 
## 38.22222 40.88889 36.88889 40.88889 39.11111 36.44444 37.33333 35.55556 
##   Plot41   Plot42   Plot43   Plot44   Plot45   Plot46   Plot47   Plot48 
## 45.33333 38.66667 38.22222 36.00000 36.00000 38.22222 45.33333 40.44444 
##   Plot49   Plot50 
## 40.44444 41.33333
# Average sample coverage
mean(100*rowSums(BCI_bin)/pool)
## [1] 40.34667

Jacknife order 3 would be recommended (see lecture).

3. References

vegan R package

Back to Index

Index