Before starting
If you would like to see this tutorial in your viewer within RStudio,
please execute the following chunk of code:
install.packages("rstudioapi") # installs a required R-package
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/community/index.html",
destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/community/lesson3.html",
destfile = file.path(dir, "lesson3.html"))
htmlFile <- file.path(dir, "lesson3.html")
rstudioapi::viewer(htmlFile)How to compare these two plots?
We will apply these techniques to the Barro Colorado Island
(BCI) dataset. Refer to the practical 2 for
more details regarding this dataset. (If you still have not downloaded
the data, please download the file BCI_Data.csv
on Stud-IP and copy it into the data directory in your project
folder.)
Import data
BCI <- read.table("data/BCI_Data.csv",
header = TRUE, # does the file have column names
sep = ",", # what separates the column
row.names = 1) # does the file have row names
# Remember from Lesson 2 that this data set has species as rows and plots as columns. For
# convenience, we want the opposite, so we have to transpose our matrix:
BCI <- t(BCI) # transposing the matrix to have sites (plots) in rows and species as columns
BCI[4:6, 5:7] # plotting plots 4 to 6, and species in columns 5 to 7## Adelia.triloba Aegiphila.panamensis Alchornea.costaricensis
## Plot4 3 0 18
## Plot5 1 1 3
## Plot6 0 0 2
Rarefaction principle.
This method relies on the construction of rarefaction curves that opposes the species richness in a plot to the sample size, here the number of individuals. From these rarefaction curves, we can then extract a rarefied species richness, comparable across plots with different sampling effort.
First, we manually construct a rarefaction curve for one plot and
then use integrated functions from vegan package.
Let’s consider the plot 35. What species does this plot have (i.e. what species are actually present in this plot)?
## Adelia.triloba Alchornea.costaricensis Alseis.blackiana
## 6 8 93
## Apeiba.aspera Apeiba.tibourbou Astrocaryum.standleyanum
## 5 1 3
What is the Species Richness of this plot?
How many individuals of these species we have in total?
For the rarefaction, we will randomly sample species from the list of individuals in the plot 35. To do this, we first repeat each species name according its number of individuals.
sp35 <- rep(names(plot35), plot35) # Generate a new character vector in which each species name
# is repeated based on its number of individuals
sp35[1:10] ## [1] "Adelia.triloba" "Adelia.triloba"
## [3] "Adelia.triloba" "Adelia.triloba"
## [5] "Adelia.triloba" "Adelia.triloba"
## [7] "Alchornea.costaricensis" "Alchornea.costaricensis"
## [9] "Alchornea.costaricensis" "Alchornea.costaricensis"
## [1] 601
Now that we have the pool of individuals for the plot 35, we will draw random samples of various sizes from this pool and calculate the species richness at each step.
For example, if we sample randomly 10 individuals from the pool, how many species do we get?
set.seed(1) #this function permits to specify a Random Generator Number (RNG) seed. It's useful when
# performing "random" sampling to obtain same (i.e. comparable) results if we perform the analysis again.
ex_sample <- sample(x = sp35, size = 10) # we randomly sample 10 individuals from vector sp35
ex_sample## [1] "Cecropia.insignis" "Platypodium.elegans" "Inga.sapindoides"
## [4] "Gustavia.superba" "Gustavia.superba" "Faramea.occidentalis"
## [7] "Gustavia.superba" "Gustavia.superba" "Macrocnemum.roseum"
## [10] "Gustavia.superba"
unique(ex_sample) # unique() functions returns only the "unique" values (i.e. filters duplicate values),## [1] "Cecropia.insignis" "Platypodium.elegans" "Inga.sapindoides"
## [4] "Gustavia.superba" "Faramea.occidentalis" "Macrocnemum.roseum"
# so in this case the species that are present in our random subsample
length(unique(ex_sample)) # Which actually tells us the SPECIES RICHNESS of our subsample## [1] 6
To repeat this step over many sample sizes, we can use a
for-loop. The size of the random samples vary from one
individual to the actual number of individuals of the community.
# Empty data.frame that will contain the results
rar <- data.frame()
# for-loop varying across the different sample sizes (i is the sample size)
for(i in 1:length(sp35)){
tmp <- sample(x = sp35, size = i)
# binding results
rar <- rbind(rar, # rbind () combines objects by rows
data.frame(sample_size = i,
rich = length(unique(tmp))))
}We can now plot the rarefaction curve. The rarefied richness would fall where the minimum number of individuals (considering all plots) is reached.
plot(rar$sample_size, rar$rich,
type = "l", pch = 16,
main = "Rarefaction curve for plot 35",
xlab = "Sample size (nb of individuals)",
ylab = "Species richness")
abline(v = min(rowSums(BCI)), col = "firebrick3") # plot a vertical line corresponding to the minimum number of individuals found in a plot
legend(x = 400, y = 40, legend = "Minimum nb\nof individuals in all plots",
col = "firebrick3", lwd = 1, bty = "n")The code above is quite tedious, and it could get even longer if we
wanted to draw the rarefaction curves for every plot. Fortunately, there
are ready-to-use functions implemented in the vegan
package. The one to draw rarefaction curve is called
rarecurve().
Before using it, we first need to define a rarefied number, which is usually the minimum number of individuals found in a community.
What is the minimum number of individuals found in one plot?## Plot23 Plot18 Plot29 Plot12 Plot17 Plot28 Plot24 Plot11 Plot41 Plot26 Plot43
## 340 347 364 366 381 387 392 401 402 407 407
## Plot21 Plot9 Plot13 Plot44 Plot6 Plot42 Plot48 Plot7 Plot27 Plot22 Plot31
## 408 409 409 409 412 414 415 416 417 418 421
## Plot39 Plot47 Plot49 Plot20 Plot36 Plot46 Plot8 Plot50 Plot19 Plot2 Plot37
## 424 425 427 429 430 430 431 432 433 435 435
## Plot33 Plot16 Plot14 Plot25 Plot45 Plot34 Plot38 Plot1 Plot32 Plot15 Plot3
## 436 437 438 442 444 447 447 448 459 462 463
## Plot30 Plot10 Plot40 Plot5 Plot4 Plot35
## 475 483 489 505 508 601
We can now compare our manual rarefaction curve on plot 35 with the
one provided by vegan::rarecurve().
## [1] 601
## [1] 83
# Our manual rarefaction curve for plot 35
plot(rar$sample_size, rar$rich, type = "l", pch = 16, lwd = 0.5,
xlab = "", ylab = "")
par(new = T) # parameter to specify that the next plot will be plotted over our previous one
rarecurve(BCI[35, , drop = FALSE], # drop = FALSE to keep the matrix format
sample = min_ind, # specify the minimum n of individuals
main = "Plot 35", col = "dodgerblue", cex = 1, lwd = 3)
legend(x = 400, y = 40, title = "Rarefaction", legend = c("manual", "vegan"),
col = c("black", "dodgerblue"), lwd = 2, bty = "n")Both curves coincide, although the one provided by vegan
is smoother. The vertical line correspond to the minimum number of
individuals in one plot. It crosses with the horizontal line at the
corresponding species richness, which is therefore the rarefied
richness.
We could also plot these rarefaction curves for all the BCI plots but it would not be readable, due to the large amount of plots. To get an idea of the variation between rarefaction curves, we can compare two extreme plots.
Which plots have the minimum and maximum number of individuals?
How many individuals do they have?
## [1] 340
## [1] 601
Let’s draw the rarefaction curves for both these plots.
We can observe that the rarefied richness of the plot 35 is around 60 species, which is actually far less than the amount of species plot 23 has for the same number of individuals.
Now that we visualized the rarefied richness with the rarefaction
curves, let’s extract it. The function rarefy() returns the
rarefied richness per community. We can plot the rarefied species
richness against the observed one.
## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6
## 93 84 90 94 101 85
## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6
## 84.33992 76.53165 79.11504 82.46571 86.90901 78.50953
plot(S, Srare, main = "Observed vs Rarefied Richness",
xlab = "Observed No. of Species",
ylab = "Rarefied No. of Species", pch = 16)
abline(0, 1) # plotting a 1:1 lineAll rarefied richness, except the one for the plot having the smallest number of individuals, falls below the 1:1.
While rarefaction is based on subsampling communities, species accumulation curves and species richness estimators are conversely used to extrapolate richness beyond the limited sampling size. This part shows how to estimate the number of unseen species.
SAC principle.
So, how do we extrapolate species richness using R?
Based on the observed species richness and the distribution of species abundances, we can use several indices to extrapolate the species richness. One of them is the Chao’s index, which formula is:
\[ Chao = S_o + \frac{f_1(f_1-1)}{2(f_2+1)} \]
with \(S_o\) the observed species richness in a given plot, \(f_1\) the number of species with only one single individual (singletons) and \(f_2\) the number of species with only 2 individuals (doubletons). This index adds an estimate of unobserved species (calculated from the singleton/doubleton ratio) to the count of observed species, this way providing a more “realistic” richness value.
Let’s first calculate manually this index on the plot 35 and then use
dedicated vegan functions.
# Observed species richness in plot 35
s0 <- length(plot35)
# Number of singletons
plot35[plot35 == 1]## Apeiba.tibourbou Astronium.graveolens
## 1 1
## Beilschmiedia.pendula Brosimum.alicastrum
## 1 1
## Calophyllum.longifolium Cavanillesia.platanifolia
## 1 1
## Ceiba.pentandra Chrysophyllum.argenteum
## 1 1
## Coccoloba.manzanillensis Coussarea.curvigemmia
## 1 1
## Croton.billbergianus Cupania.sylvatica
## 1 1
## Drypetes.standleyi Erythroxylum.macrophyllum
## 1 1
## Eugenia.coloradensis Ficus.yoponensis
## 1 1
## Guapira.standleyana Hirtella.triandra
## 1 1
## Inga.marginata Inga.punctata
## 1 1
## Inga.umbellifera Lacistema.aggregatum
## 1 1
## Lacmellea.panamensis Licania.hypoleuca
## 1 1
## Maquira.costaricana Ocotea.cernua
## 1 1
## Ocotea.oblonga Ormosia.coccinea
## 1 1
## Phoebe.cinnamomifolia Picramnia.latifolia
## 1 1
## Platymiscium.pinnatum Platypodium.elegans
## 1 1
## Protium.costaricense Protium.tenuifolium
## 1 1
## Sapium.glandulosum Solanum.hayesii
## 1 1
## Spondias.radlkoferi Swartzia.simplex.var.ochnacea
## 1 1
## Tabebuia.guayacan Tachigali.versicolor
## 1 1
## Tetragastris.panamensis Trichilia.pallida
## 1 1
## Triplaris.cumingiana Zanthoxylum.panamense
## 1 1
f1 <- length(plot35[plot35 == 1])
# Number of doubletons species
f2 <- length(plot35[plot35 == 2])
# Chao value
chao35 <- s0 + (f1*(f1 - 1))/(2*(f2 + 1))
chao35## [1] 169
estimateR() from vegan allows us to compute
several species richness estimators, including Chao’s index.
sr_estim <- estimateR(BCI) # Computes indices for all plots in our data set
sr_estim[, c(23, 35)] # two plots 23 and 35## Plot23 Plot35
## S.obs 99.000000 83.000000
## S.chao1 140.130435 169.000000
## se.chao1 16.395758 37.212366
## S.ACE 155.213269 149.766025
## se.ACE 7.091696 6.510576
identical(chao35, sr_estim[, 35][["S.chao1"]]) # Is the result for estimatedR() equal to the one we obtained manually? ## [1] TRUE
With these results, we can plot the extrapolated species richness versus the observed one.
# Observed species richness
S <- specnumber(BCI)
plot(S, sr_estim["S.chao1", ], main = "Observed vs Extrapolated Richness",
xlab = "Observed No. of Species",
ylab = "Extrapolated No. of Species", pch = 16, col = "dodgerblue")
abline(0, 1) # 1:1 line
legend(x = 105, y = 130, title = "Index", legend = c("Chao1"),
col = c("dodgerblue"), pch = c(16), bty = "n")specpool() returns several estimators of species
richness based on species accumulation curves at the pool level
(i.e. considering all the sites!). poolaccum() returns
species richness estimators for different sample size (in terms of
number of sampled communities) for the whole pool of sample sites.
## Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All 225 236.3732 6.54361 245.58 5.650522 247.8722 235.6862 3.468888 50
What index to chose based on the sample coverage (see Lecture 3 Slide 40), i.e. how many species does each plot cover??
# Total sp number
pool <- ncol(BCI)
# Using a binary site/species matrix
BCI_bin <- BCI
BCI_bin[BCI_bin > 0] <- 1
sum(BCI_bin[35, ])## [1] 83
## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6 Plot7 Plot8 Plot9 Plot10 Plot11
## 93 84 90 94 101 85 82 88 90 94 87
## Plot12 Plot13 Plot14 Plot15 Plot16 Plot17 Plot18 Plot19 Plot20 Plot21 Plot22
## 84 93 98 93 93 93 89 109 100 99 91
## Plot23 Plot24 Plot25 Plot26 Plot27 Plot28 Plot29 Plot30 Plot31 Plot32 Plot33
## 99 95 105 91 99 85 86 97 77 88 86
## Plot34 Plot35 Plot36 Plot37 Plot38 Plot39 Plot40 Plot41 Plot42 Plot43 Plot44
## 92 83 92 88 82 84 80 102 87 86 81
## Plot45 Plot46 Plot47 Plot48 Plot49 Plot50
## 81 86 102 91 91 93
## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6 Plot7 Plot8
## 41.33333 37.33333 40.00000 41.77778 44.88889 37.77778 36.44444 39.11111
## Plot9 Plot10 Plot11 Plot12 Plot13 Plot14 Plot15 Plot16
## 40.00000 41.77778 38.66667 37.33333 41.33333 43.55556 41.33333 41.33333
## Plot17 Plot18 Plot19 Plot20 Plot21 Plot22 Plot23 Plot24
## 41.33333 39.55556 48.44444 44.44444 44.00000 40.44444 44.00000 42.22222
## Plot25 Plot26 Plot27 Plot28 Plot29 Plot30 Plot31 Plot32
## 46.66667 40.44444 44.00000 37.77778 38.22222 43.11111 34.22222 39.11111
## Plot33 Plot34 Plot35 Plot36 Plot37 Plot38 Plot39 Plot40
## 38.22222 40.88889 36.88889 40.88889 39.11111 36.44444 37.33333 35.55556
## Plot41 Plot42 Plot43 Plot44 Plot45 Plot46 Plot47 Plot48
## 45.33333 38.66667 38.22222 36.00000 36.00000 38.22222 45.33333 40.44444
## Plot49 Plot50
## 40.44444 41.33333
## [1] 40.34667
Jacknife order 3 would be recommended (see lecture).