In this practical, we will go through the main diversity indices used in ecology to compare plots, based on the abundance distribution of the species they host. For example, the two following plots have the same species richness and number of individuals, but the abundances of the three species is not distributed in the same way. How to rank their diversity?
How to compare these two plots?
Before starting
If you would like to see this practical in your viewer within RStudio,
please execute the following chunk of code:
install.packages("rstudioapi") # installs a required R-package
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/community/index.html",
destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/community/lesson4.html",
destfile = file.path(dir, "lesson4.html"))
htmlFile <- file.path(dir, "lesson4.html")
rstudioapi::viewer(htmlFile)For this practical, we will use the same dataset we used for practical 2 and 3, i.e. the Barro Colorado Island (BCI) forest-plot inventory in Panama. Refer to the practical 2 for more details regarding this dataset.
Let us first import the site-species matrix from BCI data.
BCI <- read.table("data/BCI_Data.csv",
header = TRUE, # does the file have column names
sep = ",", # what separates the column
row.names = 1) # does the file have row names
BCI <- t(BCI) # transposing the matrix to have sites in rows
BCI[4:6, 5:7]## Adelia.triloba Aegiphila.panamensis Alchornea.costaricensis
## Plot4 3 0 18
## Plot5 1 1 3
## Plot6 0 0 2
The matrix is filled with abundances. We here create a binary site-species matrix to access in an easier way the species richness of each plot.
## Adelia.triloba Aegiphila.panamensis Alchornea.costaricensis
## Plot4 1 0 1
## Plot5 1 1 1
## Plot6 0 0 1
Now, from this presence/absence matrix, we can count how many different species each plot has. This is a first measure of diversity.
# Histogram
hist(rowSums(BCI_bin), col = "grey",
main = expression(paste(alpha, " richness")), # plot title
xlab = "Number of species") # x-axis titleHowever, as we saw in the lecture, species richness alone has some limitation if we want to rank the diversity of several plots. We should account for the evenness of species in each plot.
Rank-abundance distribution (RAD) is a classic way to represent how the abundances are structured in a community. Note that the rank-abundance curves are also called species-abundance distribution in the literature (SAD, McGill et al. 2007).
The function rank() returns the rank of any feature and
can therefore be used to rank the abundances of species within their
communities.
## [1] 3 1 2
# If we want to assign rank based on ascending order (i.e. the higher values gets rank = 1 etc..) we can just put the minus simbol "-" before our values
rank(-c(10, 2, 5))## [1] 1 3 2
## [1] 1.5 3.0 1.5
## [1] 1 3 2
Let us take two plots (plot 35 and plot 7) with similar richness as examples to illustrate how their abundances can be shaped in a very different way.
# Extract the first community from the site-species table
ex_plot1 <- data.frame(sp = colnames(BCI),
ab = as.numeric(BCI[35, ]))
head(ex_plot1)## sp ab
## 1 Abarema.macradenium 0
## 2 Acacia.melanoceras 0
## 3 Acalypha.diversifolia 0
## 4 Acalypha.macrostachya 0
## 5 Adelia.triloba 6
## 6 Aegiphila.panamensis 0
# Removal of absent species
ex_plot1 <- ex_plot1[which(ex_plot1$ab != 0), ]
dim(ex_plot1) # 83 species## [1] 83 2
# Add rank of species in the first community
# ?rank # help page of a given function
ex_plot1$rank <- rank(-ex_plot1$ab, ties.method = "random")
# Ordering data before plotting: we order the rows of our data set based on the values from our column "rank"
ex_plot1 <- ex_plot1[order(ex_plot1$rank), ]
head( ex_plot1)## sp ab rank
## 91 Gustavia.superba 247 1
## 11 Alseis.blackiana 93 2
## 71 Faramea.occidentalis 37 3
## 208 Trichilia.tuberculata 30 4
## 216 Virola.sebifera 11 5
## 170 Quararibea.asterolepis 11 6
# Plot
plot(ex_plot1$rank, ex_plot1$ab, type = "b",
col = rgb(0.4, 0.4, 0.8, 0.6), # specify a color based on rgb values
pch = 16, # type of symbols we want for our points (this case: filled circles)
lwd = 1, # line width
main = "RAD",
xlab = "Rank", ylab = "Abundances")
# Let's add community 7 on this plot for comparison
ex_plot2 <- data.frame(sp = colnames(BCI),
ab = as.numeric(BCI[7, ]))
ex_plot2 <- ex_plot2[which(ex_plot2$ab != 0), ]
ex_plot2$rank <- rank(-ex_plot2$ab, ties.method = "random")
ex_plot2 <- ex_plot2[order(ex_plot2$rank), ]
head(ex_plot2)## sp ab rank
## 71 Faramea.occidentalis 38 1
## 208 Trichilia.tuberculata 27 2
## 181 Socratea.exorrhiza 23 3
## 170 Quararibea.asterolepis 21 4
## 144 Oenocarpus.mapora 20 5
## 162 Prioria.copaifera 18 6
# instead of creating a new plot we can just add the points over our previous plot
points(ex_plot2$rank, ex_plot2$ab, pch = 16, type = "both",
col = rgb(1, 0.4, 0.2, 0.6))
legend(40, 250, c("35 (83 species)", "7 (82 species)"), title = "Plot",
text.font = 3, lwd = 2, seg.len = 1,
col = c(rgb(0.4, 0.4, 0.8, 0.6), rgb(1, 0.4, 0.2, 0.6)),
lty = 1, bg = "gray90")As for the majority of communities of living organisms in the world, the first BCI plot contains many rare species and just a few common species. The second one also follows this trend but in a less straightforward way as its abundances seem more even. This visual impression can also arise from histograms.
par(mfrow = c(1, 2))
hist(ex_plot1$ab, main = "Plot 35", xlab = "Abundances", col = "grey")
hist(ex_plot2$ab, main = "Plot 7", xlab = "Abundances", col = "grey")Rank-abundance distributions can also be visualized using relative abundances. Relative abundances are local abundances divided by the total number of individuals in a community, such as it sums to 1 in every community.
\[ \sum_{i=1}^{n}a_{i} = 1 \]
with \(n\) the number of species in a community and \(a_i\) the relative abundance of species \(i\).
Create a BCI matrix named “BCI_rel” filled with relative abundances.
## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6
## 448 435 463 508 505 412
## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6
## 93 84 90 94 101 85
## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6
## 1 1 1 1 1 1
# Add relative abundances
# Since we ordered our previous dataframe, it's just better to redo the steps we did before and then order it again
ex_plot1 <- data.frame(sp = colnames(BCI),
ab = as.numeric(BCI[35, ]))
ex_plot1$rank <- rank(-ex_plot1$ab, ties.method = "random")
ex_plot1 <- ex_plot1[which(ex_plot1$ab != 0), ]
ex_plot1$ab_rel <- as.numeric(BCI_rel[35, ][BCI_rel[35, ] > 0])
ex_plot1$rank_rel <- rank(-ex_plot1$ab_rel, ties.method = "random")
ex_plot1 <- ex_plot1[order(ex_plot1$rank_rel), ]
# do the same for the second community (Plot 7)
ex_plot2 <- data.frame(sp = colnames(BCI),
ab = as.numeric(BCI[7, ]))
ex_plot2$rank <- rank(-ex_plot2$ab, ties.method = "random")
ex_plot2 <- ex_plot2[which(ex_plot2$ab != 0), ]
ex_plot2$ab_rel <- as.numeric(BCI_rel[7, ][BCI_rel[7, ] > 0])
ex_plot2$rank_rel <- rank(-ex_plot2$ab_rel, ties.method = "random")
ex_plot2 <- ex_plot2[order(ex_plot2$rank_rel), ]
plot(ex_plot1$rank_rel, ex_plot1$ab_rel, type = "b",
col = rgb(0.4, 0.4, 0.8, 0.6), pch = 16, lwd = 1,
main = "RAD, relative scales",
xlab = "Rank",
ylab = "Relative abundances")
points(ex_plot2$rank_rel, ex_plot2$ab_rel, pch = 16, type = "both",
col = rgb(1, 0.4, 0.2, 0.6))
legend(40, 0.4, c("35 (83 species)", "7 (82 species)"), title = "Plot",
text.font = 3, lwd = 2, seg.len = 1,
col = c(rgb(0.4, 0.4, 0.8, 0.6), rgb(1, 0.4, 0.2, 0.6)),
lty = 1, bg = "gray90")In the next parts, we quantify the evenness of abundances of plots using two family of indices: the non-parametric and parametric indices.
Non-parametric indices do not rely on any statistical distribution. One of the most classical index in ecology is the Shannon index.
Shannon’s diversity H' is based on the proportion of
individuals each species has in a community. For a species \(i\), its proportion of individuals \(p_i\) in a plot having \(N\) individuals writes as follow:
\[ p_i = \frac{n_i}{N} \]
From these proportions, the formula of Shannon is the following:
\[ H' = -\sum_{i=1}^Sp_ilog(p_i) \]
with \(S\) the species richness and \(p_i\) the proportion of individuals for species \(i\).
The idea behind Shannon’s index is that if these proportions are uneven across species, the Shannon’s index decreases.
Let’s take a simple virtual example to see how Shannon behaves. We
here create a virtual site-species matrix with 4 communities and 4
species. In the first community, the four species have 10 individuals.
In the other communities, the first species is more and more abundant,
making the whole abundance distribution more and more uneven.
The function diversity() from vegan package
enables us to calculate the index of Shannon.
virt <- matrix(10, nrow = 4, ncol = 4)
virt <- matrix(10, nrow = 4, ncol = 6,
dimnames = list(c("plot1", "plot2", "plot3", "plot4"),
c("sp1", "sp2", "sp3", "sp4", "sp5", "sp6")))
virt[, 1] <- c(10, 100, 1000, 10000)
virt # our virtual site-species matrix ## sp1 sp2 sp3 sp4 sp5 sp6
## plot1 10 10 10 10 10 10
## plot2 100 10 10 10 10 10
## plot3 1000 10 10 10 10 10
## plot4 10000 10 10 10 10 10
library(vegan)
diversity(virt, index = "shannon") # Calculate shannon index for each community (i.e. plot) in our site x species matrix## plot1 plot2 plot3 plot4
## 1.79175947 1.17299347 0.26808398 0.03935448
Furthermore, the Shannon index increases if the number of species increases while the abundance distribution remains the same. For example, if we add a fifth community where not all species are present:
## sp1 sp2 sp3 sp4 sp5 sp6
## plot1 10 10 10 10 10 10
## plot2 100 10 10 10 10 10
## plot3 1000 10 10 10 10 10
## plot4 10000 10 10 10 10 10
## plot5 0 0 0 10 10 10
## plot1 plot5
## 1.791759 1.098612
Let us now calculate this index on BCI data.
shan <- diversity(BCI, index = "shannon")
# Histogram
hist(shan, col = "grey", main = "Shannon diversity", xlab = "Index value")One community has a very low Shannon index value, and therefore very uneven distribution of abundances.
The Pielou’s index of evenness (or equitability) specifically quantifies the extent to which species abundances are unbalanced in the community. It ranges from 0 to 1 (abundances completely even), and decreases with the heterogeneity of abundances in one community.
Its formula is based on Shannon’s index:
\[ J' = \frac{H'}{H'_{max}} = \frac{H'}{log(S)} \] with \(S\) the species richness.
Using diversity() and specnumber()
(remember that this function retrieve the number of specie per rows
i.e. plots in our case) we can calculate it on BCI data:
pielou <- diversity(BCI, index = "shannon")/log(specnumber(BCI))
# Histogram
hist(pielou, col = "grey", main = "Pielou's evenness", xlab = "Index value")The second most used index to measure evenness in ecology is probably the Simpson’s index. Like for Shannon, it is based on the proportion of individuals each species has in a community. Its formula is however different:
\[ D = 1 -\sum_{i=1}^Sp_i^2 \]
with \(p_i\) the proportion of individuals for species \(i\) and \(S\) the species richness.
We can calculate it on BCI data using the same
diversity() function but with another argument.
simp <- diversity(BCI, index = "simpson")
invsimp <- diversity(BCI, "inv") # Inverse Simpson index (1/D)
# Histogram
hist(simp, col = "grey", main = "Simpson diversity", xlab = "Index value")The distribution of values also highlights a very uneven plot.
These indices are more or less stronger related to species richness. For example, here is the link between Shannon’s index and species richness in BCI:
plot(rowSums(BCI_bin), shan, pch = 16,
main = expression(paste(alpha, " richness versus Shannon diversity")),
xlab = "Species richness",
ylab = "Shannon")We can do the same for all the indices we computed and plot them. To
do so, we can create a common data.frame with all the
calculated indices.
Create a data.frame named all_div with species richness,
Shannon, Simpson and Pielou indices for each plot.
# Create a data.frame with species richness and Shannon
shan <- data.frame(plot = names(shan),
shan = as.numeric(shan))
rich <- data.frame(plot = names(rowSums(BCI_bin)),
rich = as.numeric(rowSums(BCI_bin)))
all_div <- merge(rich, shan, by = "plot")
# Adding Simpson index
simp <- data.frame(plot = names(simp),
simp = as.numeric(simp))
all_div <- merge(all_div, simp, by = "plot")
# Adding Pielou index
pielou <- data.frame(plot = names(pielou),
pielou = as.numeric(pielou))
all_div <- merge(all_div, pielou, by = "plot")
dim(all_div)## [1] 50 5
## plot rich shan simp pielou
## 1 Plot1 93 4.018412 0.9746293 0.8865579
## 2 Plot10 94 3.889803 0.9663808 0.8561634
## 3 Plot11 87 3.859814 0.9658398 0.8642843
## 4 Plot12 84 3.698414 0.9550599 0.8347024
## 5 Plot13 93 3.982373 0.9692075 0.8786069
## 6 Plot14 98 4.017494 0.9718626 0.8762317
From this common table, we can plot a pair plot using
psych package and showing the correlation between each
index.
library(psych)
pairs.panels(all_div[, c("rich", "shan", "simp", "pielou")],
density = FALSE, ellipses = FALSE, hist.col = "grey")Again, one community is really uneven, especially when compared to the other communities. Which community is it?
## [1] 29
## Faramea.occidentalis Trichilia.tuberculata Oenocarpus.mapora
## 54 47 16
## Alseis.blackiana Prioria.copaifera Quararibea.asterolepis
## 15 14 14
As discussed in the lecture, the above diversity indices have some undesirable characteristics. For example, doubling the number of species with equal distribution of abundances does not necessarily lead to a doubling of the index values (non-linear properties). Furthermore, a community with \(S\) equally common species does not necessarily have a diversity of \(S\) and the unit of the index is not species, although both would be desirable. One solution to this problem is to convert the values into effective numbers of species, i.e. the number of equally common species that would be needed to reach a given index value.
The indices discussed can be transformed into effective numbers of species to meet the above characteristics.
In the case of Shannon \(H\), the transformation is achieved by doing \(e^H\).
In the case of Simpson \(D\), \(\frac{1}{1-D}\).
We will look at the properties of the indices using our artificially generated species-by-sites matrix:
## sp1 sp2 sp3 sp4 sp5 sp6
## plot1 10 10 10 10 10 10
## plot2 100 10 10 10 10 10
## plot3 1000 10 10 10 10 10
## plot4 10000 10 10 10 10 10
## plot5 0 0 0 10 10 10
## plot1 plot2 plot3 plot4 plot5
## 1.79175947 1.17299347 0.26808398 0.03935448 1.09861229
## plot1 plot2 plot3 plot4 plot5
## 6.000000 3.231652 1.307457 1.040139 3.000000
## plot1 plot2 plot3 plot4 plot5
## 0.833333333 0.533333333 0.092517007 0.009920547 0.666666667
## plot1 plot2 plot3 plot4 plot5
## 6.000000 2.142857 1.101949 1.010020 3.000000
The transformation leads to a situation in which the diversity in communities with constant abundances corresponds to the number of species and, accordingly, a community with twice the number of species has twice the diversity with the same abundance.
However, the values of the two indices differ for communities with non-constant abundances, as Simpson gives greater weight to abundances than Shannon.
Hill numbers offers a generalisation of this weighting of abundances while retaining the desired characteristics. The calculation is done as
\[ D^q = \biggl(\sum_{i=1}^Sp_i^{q}\biggr)^\frac{1}{(1-q)} \]
where \(p_i\) is the relative abundance of the species \(i\) and \(S\) is the total number of species. The parameter \(q\) determines the sensitivity of the index to the abundances:
\(q = 0\) corresponds to the number
of species,
\(q = 1\) corresponds to \(e^H\) (Shannon) and
\(q = 2\) corresponds to \(\frac{1}{1-D}\) (Simpson).
However, other values for \(q\) can
also be selected. The function hill_taxa() from
hillR package allows us to do so:
## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6 Plot7 Plot8 Plot9 Plot10 Plot11
## 93 84 90 94 101 85 82 88 90 94 87
## Plot12 Plot13 Plot14 Plot15 Plot16 Plot17 Plot18 Plot19 Plot20 Plot21 Plot22
## 84 93 98 93 93 93 89 109 100 99 91
## Plot23 Plot24 Plot25 Plot26 Plot27 Plot28 Plot29 Plot30 Plot31 Plot32 Plot33
## 99 95 105 91 99 85 86 97 77 88 86
## Plot34 Plot35 Plot36 Plot37 Plot38 Plot39 Plot40 Plot41 Plot42 Plot43 Plot44
## 92 83 92 88 82 84 80 102 87 86 81
## Plot45 Plot46 Plot47 Plot48 Plot49 Plot50
## 81 86 102 91 91 93
## [1] TRUE
# With BCI
# Create a dataframe to contain the data
hill_BCI <- data.frame(q = seq(0, 3, by = 0.25),
Diversity35 = NA,
Diversity7 = NA)
for (i in 1:nrow(hill_BCI)){
hill_BCI[i, c("Diversity35", "Diversity7")] <- hill_taxa(BCI[c(35, 7),],
q = hill_BCI$q[i])
}
plot(hill_BCI$q, hill_BCI$Diversity7, type = "l",
xlab = "q", ylab = "Diversity",
col = rgb(1, 0.4, 0.2, 0.6), ylim = c(0, 90))
points(hill_BCI$q, hill_BCI$Diversity35, type = "l",
xlab = "q", ylab = "Diversity", col = rgb(0.4, 0.4, 0.8, 0.6))
legend(1.7, 85, c("Plot 35", "Plot 7"), title = "Plot",
text.font = 3, lwd = 2, seg.len = 1,
col = c(rgb(0.4, 0.4, 0.8, 0.6), rgb(1, 0.4, 0.2, 0.6)),
lty = 1)Plot 7 has a greater diversity than plot 35 at any \(q\), even if they have a comparable species richness.
The Fisher’s \(\alpha\) index is a parameter describing the standard log-series distribution that can be fitted to the observed pattern of relative abundances. It is widely used as a diversity measure and is one of the most recommended when the number of individuals in a plot exceeds 1000.
Note that other distributions can be fitted to an abundance
distribution and that log-series is not the best one measures but we
won’t do it in this practical (for more details, see
?radfit).
In Fisher’s logarithmic series, the expected number of species \(S\) with \(n\) observed individuals is the following:
\[ S_n = \alpha\frac{x^n}{n} \]
with \(S_n\) the number of species with \(n\) individuals, \(x\) a positive constant that you need to solve in an iterative way and \(\alpha\) an empirical constant determining the diversity index. Since \(x\) is empirically bounded between 0.5 and 1, is approximately the number of species with one individual.
fisherfit() function allows us to visualize abundance
distributions in certain plots and the associated Fisher fitted lines.
Let’s plot the species abundance distribution of community 35 and 7, and
the fitted log-series distribution
par(mfrow = c(1, 2))
plot(fisherfit(BCI[35, ]), main = "Plot 35", bar.col = "grey")
plot(fisherfit(BCI[7, ]), main = "Plot 7", bar.col = "grey")In this graphic, the height of the bar is the number of species that have a given abundance on abscissa. As seen before, we can see that the abundance distribution of the plots 35 and 7 differ quite a lot, the red line also shows very distinct profile.
We now first decompose how to extract the \(\alpha\) parameter for one plot. The first step is to solve the value of \(x\). To solve \(x\), an iterative procedure has to be done. Iteration means trying different values of \(x\) and stop when both terms of the following equation are equal.
\[ S/N = [\frac{1-x}{x}]\times[-log(1-x)] \] In the case of BCI plot 35, we get:
## Plot35
## 0.1381032
## Plot35
## 0.01956696
## Plot35
## 0.0008776296
## Plot35
## 0.0003300087
Once \(x\) has been estimated, we can estimate \(\alpha\) by solving the equation: \[ \alpha = \frac{N(1-x)}{x} \] which gives for plot 35:
## Plot35
## 0.1381032
## Plot35
## 26.02139
Fortunately, we do not have to do all these tedious steps as there is
already a ready-to-use function from vegan package. To
extract parameter for every community, we can use the function
fisher.alpha().
## Plot1 Plot2 Plot3 Plot4 Plot5 Plot6 Plot7 Plot8
## 35.67297 30.99091 33.32033 33.92209 37.96423 32.49374 30.58383 33.44981
## Plot9 Plot10 Plot11 Plot12 Plot13 Plot14 Plot15 Plot16
## 35.67237 34.82320 34.20590 34.12041 37.56974 39.21837 35.07926 36.16928
## Plot17 Plot18 Plot19 Plot20 Plot21 Plot22 Plot23 Plot24
## 39.21057 38.71442 46.85170 40.99634 41.58728 35.84848 46.92764 39.87659
## Plot25 Plot26 Plot27 Plot28 Plot29 Plot30 Plot31 Plot32
## 43.53983 36.40222 41.03649 33.65451 35.54363 36.87417 27.62298 32.34465
## Plot33 Plot34 Plot35 Plot36 Plot37 Plot38 Plot39 Plot40
## 32.08719 35.12348 26.11065 35.88776 33.28235 29.46136 31.41425 27.17138
## Plot41 Plot42 Plot43 Plot44 Plot45 Plot46 Plot47 Plot48
## 44.06386 33.59822 33.31374 30.28666 29.02041 32.32598 42.56011 35.99618
## Plot49 Plot50
## 35.41942 36.40427
alpha35; alpha[35] # The value we calculated by hand is of course identical to the value estimated by the function ## Plot35
## 26.02139
## Plot35
## 26.11065
## Plot7
## 30.58383
We use the same pair plot as before.
# Add Fisher's alpha to all_div data.frame
alpha <- data.frame(plot = names(alpha),
alpha = as.numeric(alpha))
all_div <- merge(all_div, alpha, by = "plot")
# pair plot
pairs.panels(all_div[, c("rich", "shan", "simp", "pielou", "alpha")],
density = FALSE, ellipses = FALSE, hist.col = "grey")Jost, L. (2006) The New Synthesis of Diversity Indices and Similarity
Magurran, A.E. (2013) Measuring biological diversity. John Wiley & Sons.