Before starting
If you would like to see this tutorial in your viewer within RStudio, please execute the following chunk of code:
install.packages("rstudioapi") # installs a required R-package
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/community/index.html",
destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/community/lesson6.html",
destfile = file.path(dir, "lesson6.html"))
htmlFile <- file.path(dir, "lesson6.html")
rstudioapi::viewer(htmlFile)In this practical, we will perform an Ordination and a Cluster analysis on the BCI plots to see in a multivariate space which plots are closer from each other regarding their species composition.
How to classify these 6 plots?
We will use two datasets for this practical. The first one is the same than for the practical 2, i.e. the Barro Colorado Island (BCI) forest-plot inventory in Panama. Refer to the practical 2 for more details regarding this dataset. The second one is introduced in the first section.
BCI <- read.table("data/BCI_Data.csv",
header = TRUE, # does the file have column names
sep = ",", # what separates the column
row.names = 1) # does the file have row names
BCI <- t(BCI)
# Presence/absence table
BCI_bin <- BCI
BCI_bin[BCI_bin > 0] <- 1
# Loading vegan package
library(vegan)The goal of NMDS is to collapse information from multiple dimensions (e.g, from multiple communities, sites, etc) into just a few, so that they can be visualized and interpreted.
The goal of NMDS is to represent the position of communities in multidimensional space as accurately as possible using a reduced number of dimensions that can be easily visualized.
NMDS does not use the absolute abundances of species in communities, but rather their rank orders and as a result is a flexible technique that accepts a variety of types of data. The “non-metric” part of the name comes from this point.
To do
vegdist() function.# Abundance-based Sorensen dissimilarity: Bray Curtis index
sorBCI <- vegdist(BCI, method = "bray")
as.matrix(sorBCI)[1:3, 1:3]## Plot1 Plot2 Plot3
## Plot1 0.0000000 0.2706682 0.3501647
## Plot2 0.2706682 0.0000000 0.2873051
## Plot3 0.3501647 0.2873051 0.0000000
# Abundance-based Jaccard dissimilarity
jacBCI <- vegdist(BCI, method = "jaccard")
as.matrix(jacBCI)[1:3, 1:3]## Plot1 Plot2 Plot3
## Plot1 0.0000000 0.4260250 0.5186992
## Plot2 0.4260250 0.0000000 0.4463668
## Plot3 0.5186992 0.4463668 0.0000000
betadiver() function.
simBCI <- betadiver(BCI, method = "sim") # Simpson dissimilarity
as.matrix(simBCI)[1:2, 1:2]## Plot1 Plot2
## Plot1 0.0000000 0.2380952
## Plot2 0.2380952 0.0000000
par(mfrow = c(1, 3))
plot(jacBCI, sorBCI, xlab = "Jaccard", ylab = "Sorensen",
pch = 16, col = rgb(0, 0, 0, 0.1))
plot(jacBCI, simBCI, xlab = "Jaccard", ylab = "Simpson",
pch = 16, col = rgb(0, 0, 0, 0.1))
plot(sorBCI, simBCI, xlab = "Sorensen", ylab = "Simpson",
pch = 16, col = rgb(0, 0, 0, 0.1))To run the NMDS, we will use the function metaMDS() from vegan. This function needs a number of dimensions as argument.
BCI_NMDS <- metaMDS(jacBCI, k = 2)## Run 0 stress 0.1741506
## Run 1 stress 0.1921146
## Run 2 stress 0.1917546
## Run 3 stress 0.1920244
## Run 4 stress 0.1741506
## ... New best solution
## ... Procrustes: rmse 1.745073e-05 max resid 0.0001097437
## ... Similar to previous best
## Run 5 stress 0.1792234
## Run 6 stress 0.1880654
## Run 7 stress 0.1741506
## ... Procrustes: rmse 3.570759e-05 max resid 0.0002199929
## ... Similar to previous best
## Run 8 stress 0.1900289
## Run 9 stress 0.1741506
## ... Procrustes: rmse 1.020411e-05 max resid 4.894262e-05
## ... Similar to previous best
## Run 10 stress 0.1787986
## Run 11 stress 0.1917546
## Run 12 stress 0.1987336
## Run 13 stress 0.1880654
## Run 14 stress 0.2250408
## Run 15 stress 0.192163
## Run 16 stress 0.1791452
## Run 17 stress 0.1741506
## ... New best solution
## ... Procrustes: rmse 6.130856e-06 max resid 2.208525e-05
## ... Similar to previous best
## Run 18 stress 0.1791815
## Run 19 stress 0.1791813
## Run 20 stress 0.1880656
## *** Solution reached
BCI_NMDS##
## Call:
## metaMDS(comm = jacBCI, k = 2)
##
## global Multidimensional Scaling using monoMDS
##
## Data: jacBCI
## Distance: jaccard
##
## Dimensions: 2
## Stress: 0.1741506
## Stress type 1, weak ties
## Two convergent solutions found after 20 tries
## Scaling: centring, PC rotation
## Species: scores missing
# Stress value
BCI_NMDS$stress## [1] 0.1741506
The differences between the 2D configuration and actual distances are output as a stress value, indicating how well the reduction to 2 dimensions worked. Rule of thumb: stress > 0.05 provides excellent representation in the reduced space, > 0.1 very good, > 0.2 good, and stress > 0.3 provides poor representation. According to Clarke (1993), stress values as low as 0.2 should be treated with caution.
If the stress value is too high, more dimensions are needed. How many axes are needed to maximize the quality of the space can be determined with the help of a screeplot.
# Manual for-loop to create the screeplot
scree_plot <- c()
for(stress in 1:12){
tmp_NMDS <- metaMDS(jacBCI, k = stress)
scree_plot <- rbind(scree_plot,
data.frame(k = stress,
stress = tmp_NMDS$stress))
}plot(scree_plot$k, scree_plot$stress, pch = 16, type = "b",
xlab = "Number of Dimensions", ylab = "Stress",
main = "NMDS stress plot")
abline(h = 0.2, lty = 2) # acceptable threshold for NMDSLet us compare the Shepard diagrams for an NMDS with \(k=2\) dimensions and \(k=12\) dimensions.
A Shepard diagram is a plot opposing the observed dissimilarity for each pair of plots to the dissimilarity within the k-dimensional space.
# Let's perform the NMDS with 12 axes
BCI_NMDS12 <- metaMDS(jacBCI, k = 12)## Run 0 stress 0.02751352
## Run 1 stress 0.02829277
## Run 2 stress 0.02816737
## Run 3 stress 0.02868255
## Run 4 stress 0.02845126
## Run 5 stress 0.02867823
## Run 6 stress 0.02842345
## Run 7 stress 0.02782664
## ... Procrustes: rmse 0.02188365 max resid 0.05968413
## Run 8 stress 0.02833658
## Run 9 stress 0.02775726
## ... Procrustes: rmse 0.02431709 max resid 0.05269706
## Run 10 stress 0.02797396
## ... Procrustes: rmse 0.02537938 max resid 0.05104471
## Run 11 stress 0.02800947
## ... Procrustes: rmse 0.03584418 max resid 0.06977831
## Run 12 stress 0.02877972
## Run 13 stress 0.02860401
## Run 14 stress 0.02758956
## ... Procrustes: rmse 0.02479879 max resid 0.06172679
## Run 15 stress 0.02789097
## ... Procrustes: rmse 0.02267196 max resid 0.04073355
## Run 16 stress 0.02753475
## ... Procrustes: rmse 0.02289932 max resid 0.06964807
## Run 17 stress 0.0290389
## Run 18 stress 0.02766346
## ... Procrustes: rmse 0.02566302 max resid 0.06042142
## Run 19 stress 0.02801252
## ... Procrustes: rmse 0.02685743 max resid 0.06494833
## Run 20 stress 0.02877263
## *** No convergence -- monoMDS stopping criteria:
## 20: no. of iterations >= maxit
# Shepard diagrams
par(mfrow = c(1, 2))
stressplot(BCI_NMDS, main = "2 axes")
stressplot(BCI_NMDS12, main = "12 axes")Now we can plot the NMDS. To make it clear that the NMDS is based on a distance matrix, we have applied the metaMDS() function to a distance matrix so far. However, the function can also be applied directly to the species-by-sites matrix and will then calculate the distances internally for a desired distance measure. This has the advantage that the species are available for plotting.
BCI_NMDS <- metaMDS(BCI, k = 2, distance = "jaccard")We can now plot the NMDS, with 2 axes based on Jaccard distance.
par(mfrow = c(1, 1))
plot(BCI_NMDS)#, type = "t") # type = 't' adds text labels.# sites only
plot(BCI_NMDS, display = "sites", type = "t", main = "Goodness of fit")
# size reflecting goodness of fit per site (bigger = worse fit)
points(BCI_NMDS, display = "sites", cex = goodness(BCI_NMDS)*200)
title(main = "Goodness of fit")# Ordination plot function especially for congested plots
ordiplot(BCI_NMDS, type = "n")
# adding text or points to ordination plots
orditorp(BCI_NMDS, display = "species", col = "firebrick3", air = 0.01)
orditorp(BCI_NMDS, display = "sites", cex = 1.25, air = 0.01)Although NMDS is a non-metric ordination procedure, the result (i.e., the coordinates of species and “sites” along the axes) is metric. Environmental variables can therefore be correlated with the axes to investigate whether they influence species assemblages.
We want to look at soil data for the BCI subplots for this purpose. Please download the BCI soil data BCI_soil.txt from Stud-IP and put it in your data directory in your project folder.
# Import data
BCIsoil <- read.table("data/environment/soil/BCI_soil.txt", row.names = 1)
head(BCIsoil)## x y Al B Ca Cu Fe K Mg
## 1 50 50 901.0908 0.79448 1680.021 6.20312 135.2870 141.88128 279.1291
## 2 50 150 954.2488 0.66968 1503.365 6.03148 141.8080 137.23932 280.4524
## 3 50 250 1114.1122 0.59516 1182.311 6.79768 157.0878 98.69056 230.3973
## 4 50 350 1023.5793 0.56780 1558.020 6.63400 153.1746 98.36412 228.9468
## 5 50 450 1001.8848 0.39876 1242.251 6.44428 149.2509 94.07208 202.6820
## 6 150 50 1091.4672 0.73120 1441.977 6.49552 173.8682 131.89280 276.5010
## Mn P Zn N N.min. pH
## 1 266.9997 1.95248 2.96948 18.46500 -3.88544 4.32432
## 2 320.4786 2.24740 2.53208 21.59896 5.64388 4.37548
## 3 445.0708 1.95484 2.24672 20.24516 -4.06408 4.34700
## 4 407.7580 2.63444 2.44284 20.84232 7.89012 4.46112
## 5 250.5403 1.86356 2.13748 16.94500 8.53716 4.40128
## 6 477.3249 1.61612 2.63148 20.29812 4.38948 4.57252
We can now visualize how are distributed pH values across BCI plot:
# Visualization
BCIsoil[1:10, c("x", "y", "pH")]## x y pH
## 1 50 50 4.32432
## 2 50 150 4.37548
## 3 50 250 4.34700
## 4 50 350 4.46112
## 5 50 450 4.40128
## 6 150 50 4.57252
## 7 150 150 4.55972
## 8 150 250 4.41168
## 9 150 350 4.53336
## 10 150 450 4.55500
matrix(BCIsoil$pH, ncol = 5, byrow = TRUE) # coordinates reordered with pH## [,1] [,2] [,3] [,4] [,5]
## [1,] 4.32432 4.37548 4.34700 4.46112 4.40128
## [2,] 4.57252 4.55972 4.41168 4.53336 4.55500
## [3,] 4.71792 4.29640 4.12084 4.12820 4.15760
## [4,] 4.50452 4.46092 4.05204 4.24652 4.33736
## [5,] 4.77492 4.67260 4.29220 4.69712 4.74796
## [6,] 4.96144 4.97480 4.84068 4.75024 4.75976
## [7,] 5.00428 5.15908 5.05812 5.05132 4.85808
## [8,] 4.93364 4.93540 4.88152 4.93488 4.77104
## [9,] 4.89252 4.82900 4.82804 4.94864 5.02364
## [10,] 4.68412 4.52992 4.87296 5.00388 5.02052
filled.contour(x = seq(50, 950, by = 100), y = seq(50, 450, by = 100),
matrix(BCIsoil$pH, ncol = 5, byrow = TRUE),
color.palette = terrain.colors,
main = "pH", xlab ="(m)", ylab ="(m)")We now want to “fit” the soil variables to the NMDS axes.
ef <- envfit(BCI_NMDS,
BCIsoil[, c("Al","B","Ca","Cu","Fe","K","Mg","Mn",
"P","Zn","N","N.min.","pH")],
permu = 999)
ef##
## ***VECTORS
##
## NMDS1 NMDS2 r2 Pr(>r)
## Al 0.92381 -0.38286 0.3717 0.001 ***
## B -0.77370 0.63356 0.5822 0.001 ***
## Ca -0.79091 0.61193 0.3949 0.001 ***
## Cu -0.99128 0.13179 0.3904 0.001 ***
## Fe -0.72673 0.68692 0.1842 0.008 **
## K -0.80044 0.59941 0.4944 0.001 ***
## Mg -0.69865 0.71546 0.2492 0.002 **
## Mn -0.98717 0.15969 0.3460 0.001 ***
## P 0.71437 0.69977 0.2868 0.001 ***
## Zn -0.53224 0.84660 0.4979 0.001 ***
## N -0.56259 0.82673 0.2636 0.003 **
## N.min. -0.42818 0.90369 0.4127 0.001 ***
## pH -0.68068 0.73258 0.7380 0.001 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
Mean concentrations of soil nutrients in 50x50 m quadrats were fitted to the ordinations to test the relationships between species composition and soil nutrients. The orientation of arrows indicates the direction in ordination space in which the soil variables change most rapidly and in which they have maximum correlation with the ordination configuration, whereas the length of the arrows indicates the rate of change.
plot(BCI_NMDS, display = "sites")
plot(ef, p.max = 0.1)In a Hierarchical Cluster Analysis, the plots are either gradually merged into groups (agglomerative; bottom-up) or divided into groups (divisive, top-down). Here, there are different procedures that use different distance and similarity measures to maximize differences between clusters or similarity within clusters. In contrast to non-hierarchical cluster analyses, this results in a nested structure (smaller clusters fall into larger clusters) that can be represented as a dendrogram. The number of clusters does not have to be defined beforehand, but can be chosen based on the dendrogram.
# Agglomerative: Maximizes the distance between clusters
BCIclust <- hclust(jacBCI, method = "complete")
plot(BCIclust, xlab = "", cex = 0.7, ann = FALSE)
abline(h = 0.78, lwd = 2, col = "black", lty = 2)The dendrogram can be cut at a certain height to get an appropriate number of clusters (black dashed line). This can also be optimized to get the ideal number of clusters, but we won’t go into that here. Instead, we choose three clusters.
BCIclust_3 <- cutree(BCIclust, k = 3)
# matrix(names(BCIclust_3), ncol = 5, byrow = TRUE)
# How many plots per cluster?
table(BCIclust_3)## BCIclust_3
## 1 2 3
## 26 19 5
# Visualization
# See https://colorbrewer2.org/ for better color gradients
image(x = seq(50, 950, by = 100), y = seq(50, 450, by = 100),
z = matrix(BCIclust_3, ncol = 5, byrow = TRUE),
col = c("#fc8d59", "#ffffbf", "#91bfdb"),
xlab = "(m)", ylab = "(m)", main = "Clusters on BCI")
# add legend
legend(x = 900, y = 500,
# grconvertX(1, "device"), grconvertY(1, "device"),
legend = c("Cl. 1", "Cl. 2", "Cl. 3"),
fill = c("#fc8d59", "#ffffbf", "#91bfdb"), xpd = NA)We can now mark the plots in the NMDS according to their clusters. Where are the identified clusters on the NMDS visualization?
plot(BCI_NMDS, display = "sites", type = "n")
ordihull(BCI_NMDS, groups = BCIclust_3,
draw = "polygon", col = c("#fc8d59", "#ffffbf", "#91bfdb"),
label = FALSE)
points(BCI_NMDS, display = "sites", pch = 21, col = "black",
bg = c("#fc8d59", "#ffffbf", "#91bfdb")[BCIclust_3])
legend(x = 0.4, y = -0.2, legend = c(paste0("Cl. ", seq(1:3))),
col = c("#fc8d59", "#ffffbf", "#91bfdb"), pch = 16)Tutorial for ordination with vegan
Supplementary: https://www.pnas.org/content/suppl/2007/01/09/0604666104.DC1