Preparation: Please open your RStudio project and
download the new data () for today from Stud-IP
and unzip and copy them into your .Rproj folder ‘/data/’. You can use
getwd()
to locate your current working directory, which
should be your project folder. Please install the following R-packages
using install.packages()
:
dplyr
RColorBrewer
psych
ade4
fundiversity
sf
ggplot2
GGally
FactoMineR
cowplot
bioregion
If you want to visualize this tutorial in the viewer inside RStudio (to save space on your screen) run the following chunk of code:
install.packages("rstudioapi") # install an R-package required for this step
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/mcmmb/index.html", destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/mcmmb/Day8.html", destfile = file.path(dir, "Day8.html"))
htmlFile <- file.path(dir, "Day8.html")
rstudioapi::viewer(htmlFile)
Now you can conveniently copy code from the viewer into your script.
Load R packages & island data set
library("ade4") # Dissimilarity matrix and PCoA
library("dplyr") # data.frame operations
library("FactoMineR") # PCA
library("fundiversity") # Functional diversity
library("ggplot2") # plot
library("GGally") # pair plot
library("psych") # pair plot
library("RColorBrewer") # color gradients
library("sf") # spatial analyses
library("cowplot") # combine several plots
library("bioregion") # conversion between long and wide formats
A database of palm traits has recently been published. These data come from Kissling, W. D., Balslev, H., Baker, W. J., Dransfield, J., Göldel, B., Lim, J. Y., Onstein, R. E., & Svenning, J.-C. (2019). PalmTraits 1.0, a species-level functional trait database of palms worldwide. Scientific Data, 6(1), 1–13.
tra <- read.table("data/palm_traits/PalmTraits_1.0.txt",
stringsAsFactors = FALSE, sep = "\t", header = TRUE)
dim(tra)
## [1] 2557 29
tra[1:2, ]
## SpecName accGenus accSpecies PalmTribe PalmSubfamily
## 1 Acanthophoenix crinita Acanthophoenix crinita Areceae Arecoideae
## 2 Acanthophoenix rousselii Acanthophoenix rousselii Areceae Arecoideae
## Climbing Acaulescent Erect StemSolitary StemArmed LeavesArmed MaxStemHeight_m
## 1 0 0 1 1 1 1 10
## 2 0 0 1 1 1 1 25
## MaxStemDia_cm UnderstoreyCanopy MaxLeafNumber Max_Blade_Length_m
## 1 20 canopy 15 2.3
## 2 30 canopy NA 3.0
## Max_Rachis_Length_m Max_Petiole_length_m AverageFruitLength_cm
## 1 NA NA 0.65
## 2 NA NA 2.00
## MinFruitLength_cm MaxFruitLength_cm AverageFruitWidth_cm MinFruitWidth_cm
## 1 0.6 0.7 0.5 NA
## 2 NA NA 0.8 NA
## MaxFruitWidth_cm FruitSizeCategorical FruitShape FruitColorDescription
## 1 NA small black
## 2 NA small ovoid black
## MainFruitColors Conspicuousness
## 1 black cryptic
## 2 black cryptic
When using a functional trait table, we should look at:
- trait type (continuous, ordinal, categorical, binary)
- trait coverage (percentage of species having a trait value)
- trait distribution (if too asymmetrical, the trait can be transformed)
- trait correlation (are some traits carrying the same information?)
- ecological significance (is the trait varying against environment? biotic variable? physiological process?)
Let’s go through these points.
The traits available relate to growth form and habit, armature, stem size, leaves and fruits. They can be either binary traits (0 or 1, sometimes 2 when one species varies in that trait), categorical or continuous.
str(tra)
## 'data.frame': 2557 obs. of 29 variables:
## $ SpecName : chr "Acanthophoenix crinita" "Acanthophoenix rousselii" "Acanthophoenix rubra" "Acoelorrhaphe wrightii" ...
## $ accGenus : chr "Acanthophoenix" "Acanthophoenix" "Acanthophoenix" "Acoelorrhaphe" ...
## $ accSpecies : chr "crinita" "rousselii" "rubra" "wrightii" ...
## $ PalmTribe : chr "Areceae" "Areceae" "Areceae" "Trachycarpeae" ...
## $ PalmSubfamily : chr "Arecoideae" "Arecoideae" "Arecoideae" "Coryphoideae" ...
## $ Climbing : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Acaulescent : int 0 0 0 0 0 0 1 0 1 0 ...
## $ Erect : int 1 1 1 1 1 1 0 1 0 1 ...
## $ StemSolitary : int 1 1 1 0 1 1 1 1 1 1 ...
## $ StemArmed : int 1 1 1 0 1 1 1 1 1 1 ...
## $ LeavesArmed : int 1 1 1 1 1 1 1 1 1 1 ...
## $ MaxStemHeight_m : num 10 25 15 9.1 12 18 0 NA 0 NA ...
## $ MaxStemDia_cm : num 20 30 18 15 50 35 NA NA NA NA ...
## $ UnderstoreyCanopy : chr "canopy" "canopy" "canopy" "canopy" ...
## $ MaxLeafNumber : int 15 NA 20 25 30 15 NA NA 6 NA ...
## $ Max_Blade_Length_m : num 2.3 3 3.1 1.3 3.5 3 NA NA 0.9 NA ...
## $ Max_Rachis_Length_m : num NA NA 3 0.7 2.5 NA NA NA 0.54 NA ...
## $ Max_Petiole_length_m : num NA NA NA 0.65 NA 0.65 NA NA 0.51 NA ...
## $ AverageFruitLength_cm: num 0.65 2 1 0.7 4.25 2.5 2 NA 2.25 4.6 ...
## $ MinFruitLength_cm : num 0.6 NA NA NA 3.5 NA NA NA 1.5 3.8 ...
## $ MaxFruitLength_cm : num 0.7 NA NA NA 5 NA NA NA 3 5.4 ...
## $ AverageFruitWidth_cm : num 0.5 0.8 0.7 0.7 4.6 1.8 2 NA 2.25 4.6 ...
## $ MinFruitWidth_cm : num NA NA NA 0.5 3.8 NA NA NA 1.5 3.8 ...
## $ MaxFruitWidth_cm : num NA NA NA 0.9 5.4 NA NA NA 3 5.4 ...
## $ FruitSizeCategorical : chr "small" "small" "small" "small" ...
## $ FruitShape : chr "" "ovoid" "ovoid" "ovoid" ...
## $ FruitColorDescription: chr "black" "black" "black" "orange-brown; becomming black" ...
## $ MainFruitColors : chr "black" "black" "black" "brown; black" ...
## $ Conspicuousness : chr "cryptic" "cryptic" "cryptic" "cryptic" ...
Percentage of species having a trait value informed per trait.
nrow(tra) # total number of species
## [1] 2557
# Continuous traits
summary(tra$AverageFruitWidth_cm)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.200 0.750 1.050 1.594 1.800 20.000 563
summary(tra$MaxStemHeight_m)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 2.50 6.00 10.86 15.00 170.00 446
# Binary traits
summary(tra$Climbing)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.2198 0.0000 2.0000
# Categorical traits
summary(tra$FruitSizeCategorical)
## Length Class Mode
## 2557 character character
table(tra$FruitSizeCategorical)
##
## large small
## 505 251 1801
To visualize the distribution of a specific trait value, we can plot
a histogram, a boxplot or a violin plot.
Side note
The R
Graph Gallery provides many nice examples of plot types along with R
code.
plot_grid(
ggplot(tra) +
geom_boxplot(aes(x = 1, y = MaxStemHeight_m)),
ggplot(tra) +
geom_violin(aes(x = 1, y = MaxStemHeight_m)),
ggplot(tra) +
geom_histogram(aes(MaxStemHeight_m)),
nrow = 1)
## Warning: Removed 446 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
## Warning: Removed 446 rows containing non-finite outside the scale range
## (`stat_ydensity()`).
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 446 rows containing non-finite outside the scale range
## (`stat_bin()`).
For categorical and binary traits, we can use the function
table()
to retrieve the proportion of each value.
The next chunk illustrates for example how to plot the joined proportion
of palm species with armed leaves and stems.
table(tra$StemArmed); table(tra$LeavesArmed)
##
## 0 1
## 2297 205
##
## 0 1
## 1422 1015
table(tra$StemArmed, tra$LeavesArmed)
##
## 0 1
## 0 1406 775
## 1 16 189
# Plot
plot(table(tra$StemArmed, tra$LeavesArmed),
main = "Frequency of Armature classes",
xlab = "Stem Armed", ylab = "Leaves Armed")
prpX <- prop.table(table(tra$StemArmed))
prpY <- prop.table(table(tra$StemArmed, tra$LeavesArmed), margin = 1)
text(prpX/2 * c(1,-1) + 0:1,
prpY/2 * c(-1,-1,1,1) + c(1,1,0,0),
paste("n=", table(tra$StemArmed, tra$LeavesArmed), sep=""))
Some traits can carry a very similar information. To see this, you
can look at the correlation between two traits.
One way to look at
this is to plot one trait directly against another. For example, we have
two traits related to stem size: maximum height and diameter of stem,
and we can see how much they are related.
ggplot(tra, aes(MaxStemHeight_m, MaxStemDia_cm)) +
geom_point() +
annotate(x = 100, y = 100, geom = "label",
label = paste0("Pearson correlation coefficient = ",
round(cor(tra$MaxStemHeight_m,
tra$MaxStemDia_cm,
use = "complete.obs"), 2))) +
labs(x = "Maximum Stem Height (m)",
y = "Maximum Stem Diameter (cm)") +
theme_bw()
## Warning: Removed 647 rows containing missing values or values outside the scale range
## (`geom_point()`).
Additional categorical trait can add further insight into such plots. In the example above, the climbing feature is providing more details.
ggplot(tra, aes(MaxStemHeight_m, MaxStemDia_cm)) +
geom_point(alpha = 0.5,
aes(color = as.factor(Climbing))) +
scale_color_viridis_d("Climbing") +
labs(x = "Maximum Stem Height (m)",
y = "Maximum Stem Diameter (cm)") +
theme_bw() +
theme(legend.position = "bottom")
## Warning: Removed 647 rows containing missing values or values outside the scale range
## (`geom_point()`).
When you have more than two traits to look at, the best way to visualize the correlation between multiple traits is to use a correlation pair plot. Let’s make one for all the traits related to leaves and fruits.
# Pair plot
ggpairs(tra[, c("MaxLeafNumber", "Max_Blade_Length_m",
"Max_Rachis_Length_m", "Max_Petiole_length_m")])
# pairs.panels(tra[, c("MaxLeafNumber", "Max_Blade_Length_m",
# "Max_Rachis_Length_m", "Max_Petiole_length_m")],
# density = FALSE, ellipses = FALSE, hist.col = "grey")
ggpairs(tra[, c("AverageFruitLength_cm", "MinFruitLength_cm",
"MaxFruitLength_cm", "AverageFruitWidth_cm",
"MinFruitWidth_cm", "MaxFruitWidth_cm")])
For the leaves traits, the last three traits are highly correlated. We can just keep the blade length and the number of leaves. Regarding the fruit traits, they are all highly correlated, we’ll take only AverageFruitLength_cm.
Conclusion:
- to keep it easy: we just keep four continuous
traits - stem: height and diameter are highly correlated, we just take
the height
- leaf: many leaf traits are correlated, we’ll take only Blade length
and the maximum number of leaves.
- fruit: many fruit traits are correlated, we’ll take only
AverageFruitLength_cm
We here create a vector with our selected traits.
# Vector of traits we want to keep
tra_select <- c("MaxStemHeight_m", "MaxLeafNumber", "Max_Blade_Length_m",
"AverageFruitLength_cm")
And then we make a species x trait table (species in rows and
traits in columns).
# Only species column and traits of interest
sp_tra <- tra[, c("SpecName", tra_select)]
Let’s make a pair plot once again to ensure our selected traits are not highly correlated.
# Distribution of continuous traits
ggpairs(sp_tra[, tra_select])
The correlation coefficients are all below 65%. However, some traits
are highly asymmetrical.
This can heavily affect the construction of functional spaces and the
calculation of functional metrics.
We therefore
log-transform all the traits.
# Log-transforming all the continuous traits
sp_tra$log_height <- log(sp_tra$MaxStemHeight_m + 1) # adding 1 here to avoid -Inf values
sp_tra$log_leaf_nb <- log(sp_tra$MaxLeafNumber)
sp_tra$log_blade <- log(sp_tra$Max_Blade_Length_m)
sp_tra$log_fruit_length <- log(sp_tra$AverageFruitLength_cm)
# Plot
ggplot(sp_tra, aes(AverageFruitLength_cm, log_fruit_length)) +
geom_point() +
labs(x = "Fruit length (cm)", y = "Log-transformed fruit length") +
theme_bw()
## Warning: Removed 505 rows containing missing values or values outside the scale range
## (`geom_point()`).
# Change vector of traits of interest
tra_select <- c("log_height", "log_leaf_nb", "log_blade", "log_fruit_length")
We here merge the trait data with the distribution data from Kreft, H.,
Sommer, J.H. & Barthlott, W. (2006).
The significance of geographic range size for spatial diversity patterns
in Neotropical palms. Ecography, 29, 21-30.
We here import a transformed version of the data to speed up the
calculations. Now each cell is at a 2 degrees resolution, and species
occurrences have been aggregated into these bigger cells. See practical
7 for more details.
species <- readRDS("data/palm_species_per_gridcell_2degrees.rds")
length(unique(species$new_ID)) # number of grid cells
## [1] 479
n_distinct(species$species) # number of species
## [1] 545
We can remove from our trait table all the species that are not
present in our species distribution table.
dim(sp_tra[which(sp_tra$SpecName %in% unique(species$species)), ])
## [1] 0 9
head(species$species)
## [1] "Ammandra_natalia" "Ammandra_natalia" "Ammandra_natalia" "Ammandra_natalia"
## [5] "Ammandra_natalia" "Ammandra_natalia"
head(sp_tra$SpecName)
## [1] "Acanthophoenix crinita" "Acanthophoenix rousselii"
## [3] "Acanthophoenix rubra" "Acoelorrhaphe wrightii"
## [5] "Acrocomia aculeata" "Acrocomia crispa"
# Replace space with underscore in trait table before merging
sp_tra$species <- gsub(" ", "_", sp_tra$SpecName)
table(sp_tra$species %in% species$species)
##
## FALSE TRUE
## 2095 462
sp_tra <- sp_tra[which(sp_tra$species %in% species$species), ]
dim(sp_tra)
## [1] 462 10
sp_tra[1:2, ]
## SpecName MaxStemHeight_m MaxLeafNumber Max_Blade_Length_m
## 5 Acrocomia aculeata 12 30 3.5
## 9 Acrocomia hassleri 0 6 0.9
## AverageFruitLength_cm log_height log_leaf_nb log_blade log_fruit_length
## 5 4.25 2.564949 3.401197 1.2527630 1.4469190
## 9 2.25 0.000000 1.791759 -0.1053605 0.8109302
## species
## 5 Acrocomia_aculeata
## 9 Acrocomia_hassleri
Since our distribution data is linked to a georeferenced grid cell, we can map the distribution of our traits.
We first import the georeferenced grid and the outline of Americas. The CRS is missing, and has to be filled manually. Latitude and longitude centroids have already been calculated.
# Spatial data
grid <- st_read("data/2degrees_grid/data_2degrees_grid_.shp")
americas <- st_read("data/americas/americas.shp")
## Reading layer `data_2degrees_grid_' from data source
## `C:\Users\Pierre\Work\classes\macroecology\class_content\Gitlab\mcmmb\data\2degrees_grid\data_2degrees_grid_.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 479 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -116.0007 ymin: -35 xmax: -35.00066 ymax: 36.5
## CRS: NA
## Reading layer `americas' from data source
## `C:\Users\Pierre\Work\classes\macroecology\class_content\Gitlab\mcmmb\data\americas\americas.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.7158 ymin: -55.91972 xmax: -29.84 ymax: 49.37666
## CRS: NA
# coordinate system
st_crs(grid) <- 4326 # "+proj=longlat +ellps=WGS84 +no_defs"
st_crs(americas) <- 4326 # "+proj=longlat +ellps=WGS84 +no_defs"
We here aim at mapping the distribution of one trait. We do it here
with the maximal stem height.
First, we merge the species x grid cell table with our trait table.
# Merging palm height with spatial data frame
species_height <- left_join(
species[, c("new_ID", "species", "Long", "Lat")],
sp_tra[, c("species", "MaxStemHeight_m")],
by = "species")
head(species_height)
## new_ID species Long Lat MaxStemHeight_m
## 1 665 Ammandra_natalia -78.97865 -1.984349 NA
## 2 701 Ammandra_natalia -77.00066 -2.000000 NA
## 3 737 Ammandra_natalia -75.00066 -2.000000 NA
## 4 664 Ammandra_natalia -78.99632 -4.004123 NA
## 5 700 Ammandra_natalia -77.00066 -4.000000 NA
## 6 736 Ammandra_natalia -75.00066 -4.000000 NA
We can then calculate the average species height per grid cell.
# Calculating the average height per cell with tapply function
head(tapply(species_height$MaxStemHeight_m,
species_height$new_ID,
function(x) mean(x, na.rm = TRUE))) # argument na.rm necessary here
## 1009 1010 1011 1012 1013 1014
## 6.00000 9.00000 11.66667 12.00000 16.00000 17.40000
# With dplyr
grid_mean_height <- species_height %>%
group_by(new_ID) %>%
summarise(mean_height = mean(MaxStemHeight_m, na.rm = TRUE))
dim(grid_mean_height); head(grid_mean_height)
## [1] 479 2
## # A tibble: 6 × 2
## new_ID mean_height
## <chr> <dbl>
## 1 1009 6
## 2 1010 9
## 3 1011 11.7
## 4 1012 12
## 5 1013 16
## 6 1014 17.4
Once this merge has been done, we can map the average height per cell.
# Plot
grid$new_ID <- as.character(grid$new_ID)
grid <- left_join(grid, grid_mean_height, by = "new_ID")
ggplot(grid) +
geom_sf(aes(fill = mean_height), color = NA) +
geom_sf(data = americas, color = "black", fill = NA) +
scale_fill_viridis_c("Average height (m)") +
theme_bw()
We saw that not all species have a trait value. This lack of
information can be spatially distributed in an uneven way. We therefore
need to make an assessment of the spatial trait coverage.
tail(tapply(species_height$MaxStemHeight_m,
species_height$new_ID,
function(x) sum(!is.na(x))/length(x))) # coverage calculation
## 990 991 992 993 994 995
## 0.8958333 0.8809524 0.8571429 0.8604651 0.8484848 0.9333333
grid_coverage_height <- species_height %>%
group_by(new_ID) %>%
summarise(coverage_height = sum(!is.na(MaxStemHeight_m))/
length(MaxStemHeight_m))
dim(grid_coverage_height); tail(grid_coverage_height)
## [1] 479 2
## # A tibble: 6 × 2
## new_ID coverage_height
## <chr> <dbl>
## 1 990 0.896
## 2 991 0.881
## 3 992 0.857
## 4 993 0.860
## 5 994 0.848
## 6 995 0.933
# Plot
grid <- left_join(grid, grid_coverage_height, by = "new_ID")
ggplot(grid) +
geom_sf(aes(fill = coverage_height), color = NA) +
geom_sf(data = americas, color = "black", fill = NA) +
scale_fill_viridis_c("Coverage (%)", option = "E") +
theme_bw()
We can see that the coverage has an influence on the conclusion we
drew before.
When we have several functional traits, we can compute a set of
functional diversity indices that can inform us on the assembly
processes within a given assemblage of species.
We here focus on the following indices: functional richness, functional evenness, dispersion and divergence (see the lecture for definition of these indices).
We here at calculating and then mapping the functional richness per
grid cell using the fundiversity
package.
# Before using the functions from fundiversity, we need to build a
# species-by-sites matrix
library("bioregion")
cell_sp <- net_to_mat(species[, c("new_ID", "species")])
# cell_sp <- as.matrix(table(species$new_ID, species$species))
cell_sp[1:3, 1:3]
## Ammandra_natalia Ammandra_decasperma Ammandra_dasyneura
## 665 1 0 0
## 701 1 0 0
## 737 1 0 0
# And we use the selected traits from our species x trait table
rownames(sp_tra) <- sp_tra$species
sp_tra[1:2, tra_select]
## log_height log_leaf_nb log_blade log_fruit_length
## Acrocomia_aculeata 2.564949 3.401197 1.2527630 1.4469190
## Acrocomia_hassleri 0.000000 1.791759 -0.1053605 0.8109302
# Functional richness
fd_palm <- fd_fric(traits = sp_tra[, tra_select],
sp_com = cell_sp)
## Removed 45 species with missing trait(s)
## Differing number of species between trait dataset and site-species matrix
## Taking subset of species
## Warning in fd_fric(traits = sp_tra[, tra_select], sp_com = cell_sp): Some sites
## had less species than traits so returned FRic is 'NA'
Many cells are not having a Frich value.
dim(fd_palm)
## [1] 479 2
summary(fd_palm$FRic)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00021 1.45713 4.79481 5.47927 8.86673 17.40310 108
We can now add to this table further indices of functional diversity.
# Functional dispersion
fd_palm <- left_join(fd_palm,
fd_fdis(traits = sp_tra[, tra_select],
sp_com = cell_sp),
by = "site")
## Removed 45 species with missing trait(s)
## Differing number of species between trait dataset and site-species matrix
## Taking subset of species
# Functional divergence
fd_palm <- left_join(fd_palm,
fd_fdiv(traits = sp_tra[, tra_select],
sp_com = cell_sp),
by = "site")
## Removed 45 species with missing trait(s)
## Differing number of species between trait dataset and site-species matrix
## Taking subset of species
# Functional evenness
fd_palm <- left_join(fd_palm,
fd_feve(traits = sp_tra[, tra_select],
sp_com = cell_sp),
by = "site")
## Removed 45 species with missing trait(s)
## Differing number of species between trait dataset and site-species matrix
## Taking subset of species
dim(fd_palm); head(fd_palm)
## [1] 479 5
## site FRic FDis FDiv FEve
## 1 665 7.943896 1.313072 0.7576665 0.8657444
## 2 701 10.795302 1.424242 0.7674720 0.8607406
## 3 737 16.218305 1.528642 0.7683450 0.8703439
## 4 664 6.537796 1.284308 0.7331159 0.8687230
## 5 700 11.512031 1.466312 0.7735406 0.8769224
## 6 736 16.681220 1.555033 0.7703221 0.8795797
By doing a pair plot, we can have a look at the correlations between these different indices, and also with species richness.
# Adding species richness to grid
palm_SR <- species %>%
group_by(new_ID) %>%
summarise(SR = n())
fd_palm <- left_join(fd_palm, palm_SR, by = c("site" = "new_ID"))
ggpairs(fd_palm[, c("SR", "FRic", "FDis", "FDiv", "FEve")])
Like for phylogenetic diversity, there is a positive correlation
between functional richness and species richness. To take this bias into
account, we should perform a null model. We won’t do it here, but the
principle is to reshuffle the trait values per cell a certain number of
times and then to compute the Standardized Effect Size (see practical
7).
Like for individual traits, we can also map the FD per cell.
# Merge results with grid
grid <- left_join(grid, fd_palm, by = c("new_ID" = "site"))
plot_grid(
ggplot(grid) +
geom_sf(aes(fill = SR), color = NA) +
geom_sf(data = americas, color = "black", fill = NA) +
scale_fill_viridis_c("Species Richness") +
labs(title = "Species richness") +
theme_bw() +
theme(legend.position = "bottom"),
ggplot(grid) +
geom_sf(aes(fill = FRic), color = NA) +
geom_sf(data = americas, color = "black", fill = NA) +
scale_fill_viridis_c("Functional Richness") +
labs(title = "Functional richness") +
theme_bw() +
theme(legend.position = "bottom"),
ggplot(grid) +
geom_sf(aes(fill = FDis), color = NA) +
geom_sf(data = americas, color = "black", fill = NA) +
scale_fill_viridis_c("Functional dispersion") +
labs(title = "Functional dispersion") +
theme_bw() +
theme(legend.position = "bottom"),
nrow = 1)
Another classical analysis in functional ecology is to build trait spaces that summarize the main axes of variations between species.
The main objective is then to reduce dimensionality, in order to get the main axes of variation between species, through the use of multivariate analyses.
Since we only have continuous traits, we can build this space using a
PCA.
Side-node
If there are some categorical traits, you can first calculate a distance
matrix and then compute a PCoA (see Bonus if interested).
We use the FactoMineR
package to build the PCA.
sp_tra_pca <- sp_tra[, tra_select]
pca_palm <- PCA(sp_tra_pca, scale.unit = FALSE, graph = FALSE)
## Warning in PCA(sp_tra_pca, scale.unit = FALSE, graph = FALSE): Missing values
## are imputed by the mean of the variable: you should use the imputePCA function
## of the missMDA package
dim(sp_tra_pca)
## [1] 462 4
dim(sp_tra_pca[complete.cases(sp_tra_pca), ])
## [1] 417 4
# Removing NAs
sp_tra_pca <- sp_tra_pca[complete.cases(sp_tra_pca), ]
pca_palm <- PCA(sp_tra_pca, scale.unit = FALSE, graph = TRUE)
# Look at percentage of variation explained by the PCA axes
pca_palm$eig
## eigenvalue percentage of variance cumulative percentage of variance
## comp 1 1.4851133 59.581954 59.58195
## comp 2 0.5672445 22.757547 82.33950
## comp 3 0.2548604 10.224863 92.56436
## comp 4 0.1853373 7.435636 100.00000
If needed, the PCA coordinates can be extracted manually in order to
customize the plots with ggplot2
.
# PCA plot
# Individual coordinates with categories (same order as in original data)
PCs <- as.data.frame(pca_palm$ind$coord[, 1:2])
# Variables coordinates
vPCs <- data.frame(pca_palm$var$coord[, 1:2])
# Customary circle
angle <- data.frame(x = sin(seq(-pi, pi, length = 50)),
y = cos(seq(-pi, pi, length = 50)))
# Plot
ggplot(PCs, aes(Dim.1, Dim.2)) +
geom_point(alpha = 0.5) +
geom_path(data = angle, inherit.aes = TRUE, aes(x, y), colour = "grey70") +
geom_text(data = vPCs, aes(Dim.1, Dim.2, label = rownames(vPCs)),
color = "orange2", size = 3, fontface = "bold") +
labs(x = paste0("PC1 (", round(pca_palm$eig[1, 2], 1), " %)"),
y = paste0("PC2 (", round(pca_palm$eig[2, 2], 1), " %)")) +
geom_segment(data = vPCs,
aes(x = 0, y = 0, xend = Dim.1, yend = Dim.2),
arrow = arrow(length = unit(1/2, "picas")),
color = "black", linewidth = 1) +
coord_equal() +
theme_bw()
Pictures of palms: http://idtools.org/id/palms/palmid/gallery.php
First step is to compute a distance matrix.
If you had only numerical traits, you could use the Euclidean distances. If doing so, it would be similar to a PCA.
Now, because we also have categorical traits, we cannot simply use the Euclidean distances and we need a way to combine several types of data.
For this, we’ll use the Gower’s distance implemented in the
dist.ktab
function.
# Making a trait table without NAs
tra_select
sp_tra_complete <- sp_tra[, tra_select]
rownames(sp_tra_complete) <- sp_tra$species
sp_tra_complete <- sp_tra_complete[complete.cases(sp_tra_complete), ]
dim(sp_tra_complete); head(sp_tra_complete)
# Gower distance
?dist.ktab
ktab1 <- ktab.list.df(list(sp_tra_complete))
mat_dissim <- dist.ktab(ktab1, type = "Q", option = "scaledBYrange") # scaled quantitative traits
head(mat_dissim)
# n*(n-1)/2 pairwise combinations
all.equal(length(mat_dissim),
((nrow(sp_tra_complete)*(nrow(sp_tra_complete)-1))/2))
PCoA relies on the distances between samples. Therefore, the first
step of a PCoA is the construction of a (dis)similarity matrix. While
PCA is based on Euclidean distances, PCoA can handle (dis)similarity
matrices calculated from quantitative, semi-quantitative, qualitative,
and mixed variables.
In our case, we would get a similar result with a PCA since we only
have continuous traits and we used Euclidean distances in the
dissimilarity matrix.
As always, the choice of (dis)similarity
measure is critical and must be suitable to the data in question. For
abundance data, Bray-Curtis distance is often recommended. You can use
Jaccard index for presence/absence data. When the distance metric is
Euclidean, PCoA is equivalent to PCA.
# ape::pcoa() # other option
pcoa_palm <- dudi.pco(mat_dissim, scannf = FALSE, nf = 6)
Merging coordinates with trait table and plot
# Coordinates of the individuals
head(pcoa_palm$li)
# Barplot of eigenvectors
barplot((pcoa_palm$eig / sum(pcoa_palm$eig))*100)
# Table with the coordinates on the first 2 axes only
coord_sp_pcoa <- pcoa_palm$li[, 1:2]
coord_sp_pcoa$sp <- rownames(coord_sp_pcoa)
head(coord_sp_pcoa)
# Merge back PCoA coordinates to complete trait table
sp_tra_complete$sp <- rownames(sp_tra_complete)
sp_tra_complete <- left_join(sp_tra_complete, coord_sp_pcoa, by = "sp")
head(sp_tra_complete)
# Plot
plot(sp_tra_complete$A1, sp_tra_complete$A2, pch = 16,
main = "Species coordinates on the PCoA",
xlab = paste0("PCoA1 (",
round((pcoa_palm$eig / sum(pcoa_palm$eig))*100, 2)[1],
"%)"),
ylab = paste0("PCoA2 (",
round((pcoa_palm$eig / sum(pcoa_palm$eig))*100, 2)[2],
"%)"))
# Add the projection of traits on this plot
ordiplot(pcoa_palm,
main = "Species coordinates on the PCoA",
xlab = paste0("PCoA1 (",
round((pcoa_palm$eig / sum(pcoa_palm$eig))*100, 2)[1],
"%)"),
ylab = paste0("PCoA2 (",
round((pcoa_palm$eig / sum(pcoa_palm$eig))*100, 2)[2],
"%)"))
# ?envfit
ef <- envfit(pcoa_palm, sp_tra_complete[, tra_select])
ef
plot(ef)
The world’s tallest monocotyledon:
tra$SpecName[which(tra$Climbing == 0)][which.max(tra$MaxStemHeight_m[which(tra$Climbing == 0)])]
## [1] "Ceroxylon quindiuense"