Before starting
If you would like to see this tutorial in your viewer within RStudio, please execute the following chunk of code:
install.packages("rstudioapi") # installs a required R-package
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/community/index.html",
destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/community/lesson5.html",
destfile = file.path(dir, "lesson5.html"))
htmlFile <- file.path(dir, "lesson5.html")
rstudioapi::viewer(htmlFile)In this practical, we will go through the three main approaches to compute \(\beta\)-diversity.
Species turnover between 3 plots.
We will use the same dataset for this practical than for the practical 2, i.e. the Barro Colorado Island (BCI) forest-plot inventory in Panama. Refer to the practical 2 for more details regarding this dataset.
BCI <- read.table("data/BCI_Data.csv",
header = TRUE, # does the file have column names
sep = ",", # what separates the column
row.names = 1) # does the file have row names
BCI <- t(BCI)
# Presence/absence matrix
BCI_bin <- BCI
BCI_bin[BCI_bin > 0] <- 1Lande’s additive partitionning. \[ \beta = \gamma - \overline{\alpha} \]
Calculate the Lande’s index on BCI data.Solution
library(vegan)
alpha <- specnumber(BCI) # or use the binary site-species matrix
gamma <- ncol(BCI[, colSums(BCI) > 0])
gamma - mean(alpha)## [1] 134.22
Whittaker’s multiplicative partitionning. \[ \beta = \frac{\gamma}{\overline{\alpha}} \]
Calculate the Whittaker’s index.Solution
## [1] 2.478519
The slope of the species-area curve also gives us a measure of \(\beta\) diversity since it tells us how many new species are added when we consider a bigger area.
With BCI data, we now that each plot has a 1 hectare surface. We can therefore reconstruct a species-surface curve using random sample of a varying number of plots. At each random sample, we calculate the average species number. The approach is similar to the species accumulation curve we saw in the practical 4.
To do so, we will use a for-loop with the \(\alpha\) richness of random plot aggregates of 2, 5, 10, 20, 30 and then 40 plots.
Let us start with a random sample of two plots.
## [1] "Plot4" "Plot39"
# Warning: some species occur in both plots so
# we cannot simply take the sum of the alpha values
sum(alpha[sample(names(alpha), 2)]) # simple sum## [1] 185
## [1] 111
# now we repeat this 100 times to have a solid average estimate
mean_S <- replicate(100, {
tmp <- BCI_bin[sample(rownames(BCI_bin), 2), ]
ncol(tmp[, colSums(tmp) > 0])
})
# mean of the 100 replicates
length(mean_S)## [1] 100
## [1] 121.17
We now repeat this step with several aggregates of plot number (and therefore surface):
# and now a for-loop for several sample sizes
sample_size <- c(1:50)
mean_S <- c()
for(i in sample_size){
S <- replicate(100, {
tmp <- BCI_bin[sample(rownames(BCI_bin), i), ,drop=FALSE]
sum(colSums(tmp) > 0)})
mean_S[i] <- mean(S)
}
# data.frame with species richness and area
sp_area <- data.frame(area = sample_size,
S = mean_S)Now the plot.
plot(sp_area$area, sp_area$S, pch = 16, type = "b", cex = 0.5, lwd = 1,
main = "Species-area curve",
xlab = "Area (ha)", ylab = "Species richness")\(\beta\)-diversity estimate corresponds to the exponent \(z\) of the Arrhenius model \(S = cA^z\), where \(S\) is the species number and \(A\) is the area.
To retrieve it, we can calculate a linear model of the log-transformed number of species as a function of the log-transformed area, since we have: \[ log_{10} = log_{10}(c) + zlog_{10}(A) \]
# Log-log scale
sp_area$log_area <- log(sp_area$area, base = 10)
sp_area$log_S <- log(sp_area$S, base = 10)
plot(sp_area$log_area, sp_area$log_S, pch = 16, type = "b", cex = 0.5,
lwd = 1, main = "Species-area curve",
xlab = "Log Area (ha)", ylab = "Log species richness")
# Extract linear coefficient
log_mod <- lm(log_S ~ log_area, data = sp_area)
log_mod##
## Call:
## lm(formula = log_S ~ log_area, data = sp_area)
##
## Coefficients:
## (Intercept) log_area
## 2.054 0.186
## log_area
## 0.186
The slope of 0.186 is the rate at which \(S\) increases against area and therefore our estimate of \(\beta\)-diversity.
Contrarily to the previous sections, we now evaluate \(\beta\)-diversity for each pair of sites.
Each pair of sites can share some species and have unique set of species, like in the following figure:
Example with two plots.
with \(a\) the set of species shared by both plots, \(b\) the species unique to plot 1 and \(c\) the species unique to plot 2.
The idea behind this concept is to find an index that identify the sites that share a lot of species and the ones that are distant, regarding their species composition.
Let us build a simple virtual matrix to visualize \(abc\) with simple plots.
virt <- matrix(c(1, 1, 1, 0, 0,
1, 1, 1, 0, 0,
1, 0, 0, 0, 0,
0, 0, 0, 1, 1,
0, 1, 1, 1, 0),
byrow = TRUE, nrow = 5, ncol = 5,
dimnames = list(c("plot1","plot2","plot3","plot4", "plot5"),
c("sp1", "sp2", "sp3", "sp4", "sp5")))
virt## sp1 sp2 sp3 sp4 sp5
## plot1 1 1 1 0 0
## plot2 1 1 1 0 0
## plot3 1 0 0 0 0
## plot4 0 0 0 1 1
## plot5 0 1 1 1 0
We can see that plots 1 and 2 are identical, plot 3 is nested within 1 and 2, plot 4 shares no species with the first three plots. Plot 5 displays an intermediate situation.
Let’s manually calculate the number of shared and unique species for a given pair of virtual plots.
Calculate the number of unique and shared species in the first two virtual plots 1 and 5.Solution
# Subset of the binary matrix with just plots 1 and 2
virt_15 <- virt[c(1, 5), ]
# Number of species in each plot
rowSums(virt_15)## plot1 plot5
## 3 3
## [1] 2
## plot1 plot5
## 1 1
To compute automatically these numbers, we can use betadiver() function from vegan package. With method = NA, no index is calculated, but instead an object of class betadiver is returned. This is a list of the elements $a, $b and $c.
Each element is a distance matrix of size \(n\times(n-1)/2\) with \(n\) the number of plots.
We can also plot them in a triangular plot showing how each pair of plots is similar, as suggested by (Koleff, Gaston, and Lennon 2003).
## plot1 plot2 plot3 plot4
## plot2 3
## plot3 1 1
## plot4 0 0 0
## plot5 2 2 0 1
## [1] 10
## [1] 10
## plot1 plot2 plot3 plot4
## plot2 0
## plot3 0 0
## plot4 2 2 2
## plot5 1 1 3 2
## plot1 plot2 plot3 plot4
## plot2 0
## plot3 2 2
## plot4 3 3 1
## plot5 1 1 1 1
# Triangular plot
plot(betadiver(virt, method = NA), pch = 16, cex = 2)
legend("topleft", legend = "Virtual matrix", bty = "n")If we draw the same triangular plot on BCI data, what do we get?
## Plot1 Plot2 Plot3
## Plot1 0 64 64
## Plot2 64 0 62
## Plot3 64 62 0
## Plot1 Plot2 Plot3
## Plot1 0 20 26
## Plot2 20 0 28
## Plot3 26 28 0
# Plot
plot(betadiver(BCI, method = NA))
legend("topleft", legend = "Barro Colorado Island", bty = "n")Low heterogeneity is found among BCI communities, all plots are being clumped in one part of the triangular space. We could say that BCI plots are similarly dissimilar.
Many distances and metrics exist, the three following ones being the most important. With \(a\) the number of species shared by two sites, \(b\) the number of species only in the first site and \(c\) the number of species only in the second one, we have:
Therefore, \(\beta\)-diversity being a metric based on dissimilarity, we get:
\[ \beta_{sor} = 1- \frac{2a}{2a + b + c} \]
and similarly:
\[ \beta_{jac} = 1- \frac{a}{a + b + c} \]
\[ \beta_{sim} = \frac{min(b, c)}{a + min(b, c)} \]
Calculate the Sorensen similarity between the first two plots of BCI.Solution
## Plot1 Plot2
## 93 84
## [1] 64
## Plot1 Plot2
## 29 20
## [1] 0.7231638
betadiver() function from vegan allows us to compute these indices.
## Plot1 Plot2
## Plot1 0.0000000 0.7231638
## Plot2 0.7231638 0.0000000
## [1] 0.7231638
## Plot1 Plot2
## Plot1 0.0000000 0.2380952
## Plot2 0.2380952 0.0000000
## [1] 0.2380952
## Plot1 Plot2
## Plot1 0.0000000 0.5663717
## Plot2 0.5663717 0.0000000
## [1] 0.5663717
# Plot the indices against each other
par(mfrow = c(1, 3))
plot(jBCI, sorBCI, xlab = "Jaccard", ylab= "Sorensen",
pch = 16, col = rgb(0, 0, 0, 0.1))
abline(c(0, 1))
plot(jBCI, simBCI, xlab = "Jaccard", ylab= "Simpson",
pch = 16, col = rgb(0, 0, 0, 0.1))
plot(sorBCI, simBCI, xlab = "Sorensen", ylab= "Simpson",
pch = 16, col = rgb(0, 0, 0, 0.1))
Note
The above betadiver() function computes the original indices. These can quantify either similarity or dissimilarity between plots. When classifying plots between each other, dissimilarity is often used. Dissimilarity simply equals 1 - similarity.
vegdist() returns the dissimilarity between plots.
## Plot1 Plot2 Plot3
## Plot1 0.0000000 0.2768362 0.3005464
## Plot2 0.2768362 0.0000000 0.2873563
## Plot3 0.3005464 0.2873563 0.0000000
## Plot1 Plot2 Plot3
## Plot1 1.0000000 0.2768362 0.3005464
## Plot2 0.2768362 1.0000000 0.2873563
## Plot3 0.3005464 0.2873563 1.0000000
# Plot: betadiver versus vegdist
par(mfrow = c(1, 2))
plot(dissimBCI, sorBCI, xlab = "Sorensen dissimilarity (vegdist)",
ylab= "Sorensen similarity (betadiver)",
pch = 16, col = rgb(0, 0, 0, 0.1))
plot(dissimBCI, 1 - sorBCI,
xlab = "Sorensen dissimilarity (vegdist)",
ylab= "Sorensen dissimilarity (1 - betadiver)",
pch = 16, col = rgb(0, 0, 0, 0.1))
abline(c(0, 1))vegdist() can also be applied on abundances data, with the use of Bray-Curtis index:
\[ \beta_{ij}=1-\frac{2C_{ij}}{S_i + S_j} \]
with \(C_{ij}\) the sum of the lesser individuals for only common species in both \(i\) and \(j\), \(S_i\) the number of individuals in plot \(i\) and \(S_j\) the number of individuals in plot \(j\).
# Presence/absence data
dissimBCI <- vegdist(BCI_bin, method = "bray")
# With abundance data
dissimBCI_ab <- vegdist(BCI, method = "bray")
as.matrix(dissimBCI_ab)[1:3, 1:3]## Plot1 Plot2 Plot3
## Plot1 0.0000000 0.2706682 0.3501647
## Plot2 0.2706682 0.0000000 0.2873051
## Plot3 0.3501647 0.2873051 0.0000000
# Plot: betadiver versus vegdist
par(mfrow = c(1, 1))
plot(dissimBCI, dissimBCI_ab, xlab = "Sorensen binary",
ylab= "Sorensen abundance", pch = 16, col = rgb(0, 0, 0, 0.1))
abline(c(0, 1))The overall pattern is similar but including abundances can lead to a better understanding of the changing of species composition across plots.
\(\beta\)-diversity can be divided into a nestedness and a turnover component. The nestedness component quantifies the part of \(\beta\)-diversity that results from the mere difference in species numbers. The turnover component quantifies the part that actually results from differences in species identities (Baselga, 2010). The Simpson index is a common measure of the turnover component because it is independent of differences in the number of species in the plots. Jaccard and Sorensen quantify the total beta diversity. Therefore:
\[ \beta_{nes} = \beta_{sor} - \beta_{sim} \]
or
\[ \beta_{nes} = \beta_{jac} - \beta_{sim} \]
## Plot1 Plot2 Plot3
## Plot1 0.00000000 0.03874092 0.01165756
## Plot2 0.03874092 0.00000000 0.02545156
## Plot3 0.01165756 0.02545156 0.00000000
## [1] 0.3399075
## [1] 0.3090427
## [1] 0.03086482
df_boxplot <- data.frame(Dissimilarity = c(dissimBCI, simBCI, beta_nest),
Metric = rep(c("Beta", "Turnover", "Nestedness"),
each = length(dissimBCI)))
# Plot
par(mfrow = c(1, 2))
plot(simBCI, beta_nest, xlab = "Turnover", ylab = "Nestedness",
pch = 16, col = rgb(0, 0, 0, 0.1))
boxplot(Dissimilarity ~ Metric, df_boxplot, col = "grey")Note
Other package exist to calculate \(\beta\)-diversity like betarpart. It notably allows to compute the three components of \(\beta\)-diversity (with turnover and nestedness) in a faster way.
library(betapart)
# watch out: beta.pair relies on dissimilarity and not similarity
# for example: the formula of beta_sorensen is (b+c)/(2a+b+c)
betaBCI <- beta.pair(BCI_bin, index.family = "sorensen")
str(betaBCI)
# Convert dist object into matrix
sorBCI <- as.matrix(betaBCI$beta.sor)
sorBCI[1:5, 1:5]
sorBCI[upper.tri(sorBCI)] <- NA
diag(sorBCI) <- NA
sorBCI[1:5, 1:5]Bonus
From such a distance matrix, we can target the closest and most distances plots, regarding their species composition.
## [1] 0.4941176
## [1] 723
## [1] 0.7914439
## [1] 954
We can also highlight these extreme pairs in an “abc” triangular plot.
# Function to return dist index, given row i, column j and n number of plots
distdex <- function(i, j, n){
n*(i-1) - i*(i-1)/2 + j-i
}
# Index of most distant pair
dissim_pair <- distdex(18, 44, 50)
paste("a=", betadiver(BCI, method = NA)$a[dissim_pair], "; b=",
betadiver(BCI, method = NA)$b[dissim_pair], "; c=",
betadiver(BCI, method = NA)$c[dissim_pair])## [1] "a= 42 ; b= 39 ; c= 47"
# Index of closest pair
sim_pair <- distdex(27, 32, 50)
paste("a=", betadiver(BCI, method = NA)$a[sim_pair], "; b=",
betadiver(BCI, method = NA)$b[sim_pair], "; c=",
betadiver(BCI, method = NA)$c[sim_pair])## [1] "a= 74 ; b= 14 ; c= 25"
# Color vector
nb_pairs <- length(betadiver(BCI_bin, method = "sor")) # n*(n-1)/2
col_abc <- c(rep("black", (dissim_pair - 1)),
"dodgerblue",
rep("black", (sim_pair - dissim_pair - 1)),
"firebrick3",
rep("black", (nb_pairs - sim_pair)))
# Size vector
size_abc <- c(rep(0.5, (dissim_pair - 1)), 2,
rep(0.5, (sim_pair - dissim_pair - 1)), 2,
rep(0.5, (nb_pairs - sim_pair)))
# Plot
plot(betadiver(BCI, method = NA), pch = 16, cex = size_abc, col = col_abc)
legend(x = 0.6, y = 0.8,
legend = c("Most dissimilar", "Other pairs", "Most similar"),
col = c("dodgerblue", "black", "firebrick3"), pch = 16,
pt.cex = c(2, 0.5, 2), bty = "n")