Before starting
If you would like to see this tutorial in your viewer within RStudio, please execute the following chunk of code:
install.packages("rstudioapi") # installs a required R-package
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/community/index.html",
destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/community/lesson3.html",
destfile = file.path(dir, "lesson3.html"))
htmlFile <- file.path(dir, "lesson3.html")
rstudioapi::viewer(htmlFile)How to compare these two plots?
We apply these techniques to the Barro Colorado Island dataset. Refer to the practical 2 for more details regarding this dataset.
Import data
BCI <- read.table("data/BCI_Data.csv",
header = TRUE, # does the file have column names
sep = ",", # what separates the column
row.names = 1) # does the file have row names
BCI <- t(BCI) # transposing the matrix to have sites in rows
BCI[4:6, 5:7]## Adelia.triloba Aegiphila.panamensis Alchornea.costaricensis
## Plot4 3 0 18
## Plot5 1 1 3
## Plot6 0 0 2
Rarefaction principle.
This method relies on the construction of rarefaction curves that opposes the species richness in a plot to the sample size, here the number of individuals. From these rarefaction curves, we can then extract a rarefied species richness, comparable across plots with different sampling effort.
First, we manually construct a rarefaction curve for one plot and then use integrated functions from vegan package.
Let’s consider the plot 35. What species does this plot have?
plot35 <- BCI[35, ][BCI[35, ] > 0]
head(plot35)## Adelia.triloba Alchornea.costaricensis Alseis.blackiana
## 6 8 93
## Apeiba.aspera Apeiba.tibourbou Astrocaryum.standleyanum
## 5 1 3
For the rarefaction, we will randomly sample species from the list of individuals in the plot 35. We can therefore repeat each species name according its number of individuals.
sp35 <- rep(names(plot35), plot35)
sp35[1:10]## [1] "Adelia.triloba" "Adelia.triloba"
## [3] "Adelia.triloba" "Adelia.triloba"
## [5] "Adelia.triloba" "Adelia.triloba"
## [7] "Alchornea.costaricensis" "Alchornea.costaricensis"
## [9] "Alchornea.costaricensis" "Alchornea.costaricensis"
length(sp35)## [1] 601
Now that we have the pool of individuals for the plot 35, we will draw random samples of various sizes from this pool and calculate the species richness at each step.
For example, if we sample randomly 10 individuals from the pool, how many species do we get?
set.seed(1)
ex_sample <- sample(x = sp35, size = 10)
ex_sample## [1] "Cecropia.insignis" "Platypodium.elegans" "Inga.sapindoides"
## [4] "Gustavia.superba" "Gustavia.superba" "Faramea.occidentalis"
## [7] "Gustavia.superba" "Gustavia.superba" "Macrocnemum.roseum"
## [10] "Gustavia.superba"
unique(ex_sample)## [1] "Cecropia.insignis" "Platypodium.elegans" "Inga.sapindoides"
## [4] "Gustavia.superba" "Faramea.occidentalis" "Macrocnemum.roseum"
length(unique(ex_sample))## [1] 6
To repeat this step over many sample sizes, we can use a for-loop. The size of the random samples vary from one individual to the actual number of individuals of the community.
# Empty data.frame that will contain the results
rar <- data.frame()
# for-loop varying across the different sample sizes (i is the sample size)
for(i in 1:length(sp35)){
tmp <- sample(x = sp35, size = i)
# binding results
rar <- rbind(rar,
data.frame(sample_size = i,
rich = length(unique(tmp))))
}We can now plot the rarefaction curve. The rarefied richness would fall where the minimum number of individuals is reached.
plot(rar$sample_size, rar$rich,
type = "l", pch = 16,
main = "Rarefaction curve for plot 35",
xlab = "Sample size (nb of individuals)",
ylab = "Species richness")
abline(v = min(rowSums(BCI)), col = "firebrick3")
legend(x = 400, y = 40, legend = "Minimum nb\nof individuals",
col = "firebrick3", lwd = 1, bty = "n")The code above is quite tedious, and it could get even longer if we wanted to draw the rarefaction curves for every plot. Fortunately, there are ready-to-use functions implemented in the vegan package. The one to draw rarefaction curve is called rarecurve().
Before using it, we first need to define a rarefied number, which is usually the minimum number of individuals found in a community.
What is the minimum number of individuals found in one plot?# Minimum community size
min_ind <- min(rowSums(BCI))
min_ind## [1] 340
We can now compare our manual rarefaction curve on plot 35 with the one provided by vegan::rarecurve().
library(vegan)
sum(BCI[35, ][BCI[35, ] > 0]) # 601 individuals## [1] 601
length(BCI[35, ][BCI[35, ] > 0]) # 83 species## [1] 83
# Using a binary site/species matrix
BCI_bin <- BCI
BCI_bin[BCI_bin > 0] <- 1
sum(BCI_bin[35, ])## [1] 83
plot(rar$sample_size, rar$rich, type = "l", pch = 16, lwd = 0.5,
xlab = "", ylab = "")
par(new = TRUE)
rarecurve(BCI[35, , drop = FALSE], # drop = FALSE to keep the matrix format
sample = min_ind,
main = "Plot 35", col = "dodgerblue", cex = 1, lwd = 3)
legend(x = 400, y = 40, title = "Rarefaction", legend = c("manual", "vegan"),
col = c("black", "dodgerblue"), lwd = 2, bty = "n")Both curves coincide, although the one provided by vegan is smoother. The vertical line correspond to the minimum number of individuals in one plot. It crosses with the horizontal line at the corresponding species richness, which is therefore the rarefied richness.
We could also plot these rarefaction curves for all the BCI plots but it would not be readable, due to the large amount of plots. To get an idea of the variation between rarefaction curves, we can compare two extreme plots.
Which plots have the minimum and maximum number of individuals?
which.min(rowSums(BCI))## Plot23
## 23
which.max(rowSums(BCI))## Plot35
## 35
How many individuals do they have?
sum(BCI[which.min(rowSums(BCI)), ])## [1] 340
sum(BCI[which.max(rowSums(BCI)), ])## [1] 601
Let’s draw the rarefaction curves for both these plots.
rarecurve(BCI[c(23, 35), ], sample = min_ind,
main = "Plots 23 and 35", col = "blue", cex = 0.6)We can observe that the rarefied richness of the plot 35 is around 60 species, which is actually far less than the amount of species plot 23 has for the same number of individuals.
Now that we visualized the rarefied richness with the rarefaction curves, let’s extract it. The function rarefy() returns the rarefied richness per community. We can plot the rarefied species richness against the observed one.
S <- specnumber(BCI) # observed species richness
Srare <- rarefy(BCI, min_ind) # rarefied richness
plot(S, Srare, main = "Observed vs Rarefied Richness",
xlab = "Observed No. of Species",
ylab = "Rarefied No. of Species", pch = 16)
abline(0, 1) # 1:1 lineAll rarefied richness, except the one for the plot having the smallest number of individuals, falls below the 1:1.
While rarefaction is based on subsampling communities, species accumulation curves and species richness estimators are conversely used to extrapolate richness beyond the limited sampling size. This part shows how to estimate the number of unseen species.
SAC principle.
So, how do we extrapolate species richness using R?
Based on the observed species richness and the distribution of species abundances, we can use several indices to extrapolate the species richness. One of them is the Chao’s index, which formula is:
\[ Chao = S_o + \frac{f_1(f_1-1)}{2(f_2+1)} \]
with \(S_o\) the observed species richness in a given plot, \(f_1\) the number of species with one single individual and \(f_2\) the number of species with 2 individuals.
Let’s first calculate manually this index on the plot 35 and then use dedicated vegan functions.
# Observed species richness in plot 35
s0 <- length(plot35)
# Number of singletons
plot35[plot35 == 1]## Apeiba.tibourbou Astronium.graveolens
## 1 1
## Beilschmiedia.pendula Brosimum.alicastrum
## 1 1
## Calophyllum.longifolium Cavanillesia.platanifolia
## 1 1
## Ceiba.pentandra Chrysophyllum.argenteum
## 1 1
## Coccoloba.manzanillensis Coussarea.curvigemmia
## 1 1
## Croton.billbergianus Cupania.sylvatica
## 1 1
## Drypetes.standleyi Erythroxylum.macrophyllum
## 1 1
## Eugenia.coloradensis Ficus.yoponensis
## 1 1
## Guapira.standleyana Hirtella.triandra
## 1 1
## Inga.marginata Inga.punctata
## 1 1
## Inga.umbellifera Lacistema.aggregatum
## 1 1
## Lacmellea.panamensis Licania.hypoleuca
## 1 1
## Maquira.costaricana Ocotea.cernua
## 1 1
## Ocotea.oblonga Ormosia.coccinea
## 1 1
## Phoebe.cinnamomifolia Picramnia.latifolia
## 1 1
## Platymiscium.pinnatum Platypodium.elegans
## 1 1
## Protium.costaricense Protium.tenuifolium
## 1 1
## Sapium.glandulosum Solanum.hayesii
## 1 1
## Spondias.radlkoferi Swartzia.simplex.var.ochnacea
## 1 1
## Tabebuia.guayacan Tachigali.versicolor
## 1 1
## Tetragastris.panamensis Trichilia.pallida
## 1 1
## Triplaris.cumingiana Zanthoxylum.panamense
## 1 1
f1 <- length(plot35[plot35 == 1])
# Number of doublon species
f2 <- length(plot35[plot35 == 2])
# Chao value
chao35 <- s0 + (f1*(f1 - 1))/(2*(f2 + 1))
chao35## [1] 169
estimateR() from vegan allows us to compute several species richness estimators, including Chao’s index.
sr_estim <- estimateR(BCI)
sr_estim[, c(23, 35)] # two plots 23 and 35## Plot23 Plot35
## S.obs 99.000000 83.000000
## S.chao1 140.130435 169.000000
## se.chao1 16.395758 37.212366
## S.ACE 155.213269 149.766025
## se.ACE 7.091696 6.510576
identical(chao35, sr_estim[, 35][["S.chao1"]])## [1] TRUE
With these results, we can plot the extrapolated species richness versus the observed one.
# Observed species richness
S <- specnumber(BCI)
plot(S, sr_estim["S.chao1", ], main = "Observed vs Extrapolated Richness",
xlab = "Observed No. of Species",
ylab = "Extrapolated No. of Species", pch = 16, col = "dodgerblue")
abline(0, 1) # 1:1 line
legend(x = 105, y = 130, title = "Index", legend = c("Chao1"),
col = c("dodgerblue"), pch = c(16), bty = "n")specpool() returns several estimators of species richness based on species accumulation curves at the pool level. poolaccum() returns species richness estimators for different sample size (in terms of number of sampled communities) for the whole pool of sample sites.
specpool(BCI)## Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All 225 236.3732 6.54361 245.58 5.650522 247.8722 235.6862 3.468888 50
plot(poolaccum(BCI),
xlab = "Number of 1ha plots",
ylab = "Species richness")What index to chose based on the sample coverage, i.e. how many species does each plot cover??
# Total sp number
pool <- ncol(BCI)
# Richness per plots
rowSums(BCI_bin)## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6 Plot7 Plot8 Plot9 Plot10 Plot11
## 93 84 90 94 101 85 82 88 90 94 87
## Plot12 Plot13 Plot14 Plot15 Plot16 Plot17 Plot18 Plot19 Plot20 Plot21 Plot22
## 84 93 98 93 93 93 89 109 100 99 91
## Plot23 Plot24 Plot25 Plot26 Plot27 Plot28 Plot29 Plot30 Plot31 Plot32 Plot33
## 99 95 105 91 99 85 86 97 77 88 86
## Plot34 Plot35 Plot36 Plot37 Plot38 Plot39 Plot40 Plot41 Plot42 Plot43 Plot44
## 92 83 92 88 82 84 80 102 87 86 81
## Plot45 Plot46 Plot47 Plot48 Plot49 Plot50
## 81 86 102 91 91 93
# Sample coverage per plot
100*rowSums(BCI_bin)/pool## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6 Plot7 Plot8
## 41.33333 37.33333 40.00000 41.77778 44.88889 37.77778 36.44444 39.11111
## Plot9 Plot10 Plot11 Plot12 Plot13 Plot14 Plot15 Plot16
## 40.00000 41.77778 38.66667 37.33333 41.33333 43.55556 41.33333 41.33333
## Plot17 Plot18 Plot19 Plot20 Plot21 Plot22 Plot23 Plot24
## 41.33333 39.55556 48.44444 44.44444 44.00000 40.44444 44.00000 42.22222
## Plot25 Plot26 Plot27 Plot28 Plot29 Plot30 Plot31 Plot32
## 46.66667 40.44444 44.00000 37.77778 38.22222 43.11111 34.22222 39.11111
## Plot33 Plot34 Plot35 Plot36 Plot37 Plot38 Plot39 Plot40
## 38.22222 40.88889 36.88889 40.88889 39.11111 36.44444 37.33333 35.55556
## Plot41 Plot42 Plot43 Plot44 Plot45 Plot46 Plot47 Plot48
## 45.33333 38.66667 38.22222 36.00000 36.00000 38.22222 45.33333 40.44444
## Plot49 Plot50
## 40.44444 41.33333
# Average sample coverage
mean(100*rowSums(BCI_bin)/pool)## [1] 40.34667
Jacknife order 3 would be recommended (see lecture).