1. Trait data
2. Trait maps
- 2.1. Trait geographical distribution
- 2.2. Trait coverage
3. Functional diversity
- 3.1. FD per grid cell
- 3.2. Map of FD
4. Trait space
References
Bonus: PCoA
- X.1. Distances
- X.2. Principal Coordinate Analysis (PCoA)

Preparation: Please open your RStudio project and download the new data () for today from Stud-IP and unzip and copy them into your .Rproj folder ‘/data/’. You can use getwd() to locate your current working directory, which should be your project folder. Please install the following R-packages using install.packages():

dplyr
RColorBrewer
psych
ade4
fundiversity
sf
ggplot2
GGally
FactoMineR
cowplot
bioregion

If you want to visualize this tutorial in the viewer inside RStudio (to save space on your screen) run the following chunk of code:

install.packages("rstudioapi") # install an R-package required for this step
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/mcmmb/index.html", destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/mcmmb/Day8.html", destfile = file.path(dir, "Day8.html"))
htmlFile <- file.path(dir, "Day8.html")
rstudioapi::viewer(htmlFile)

Now you can conveniently copy code from the viewer into your script.

Load R packages & island data set

library("ade4") # Dissimilarity matrix and PCoA
library("dplyr") # data.frame operations
library("FactoMineR") # PCA
library("fundiversity") # Functional diversity
library("ggplot2") # plot
library("GGally") # pair plot
library("psych") # pair plot
library("RColorBrewer") # color gradients
library("sf") # spatial analyses
library("cowplot") # combine several plots
library("bioregion") # conversion between long and wide formats

1. Trait data

1.1. Trait data

A database of palm traits has recently been published. These data come from Kissling, W. D., Balslev, H., Baker, W. J., Dransfield, J., Göldel, B., Lim, J. Y., Onstein, R. E., & Svenning, J.-C. (2019). PalmTraits 1.0, a species-level functional trait database of palms worldwide. Scientific Data, 6(1), 1–13.

tra <- read.table("data/palm_traits/PalmTraits_1.0.txt",
                  stringsAsFactors = FALSE, sep = "\t", header = TRUE)
dim(tra)

## [1] 2557   29

tra[1:2, ]

##                   SpecName       accGenus accSpecies PalmTribe PalmSubfamily
## 1   Acanthophoenix crinita Acanthophoenix    crinita   Areceae    Arecoideae
## 2 Acanthophoenix rousselii Acanthophoenix  rousselii   Areceae    Arecoideae
##   Climbing Acaulescent Erect StemSolitary StemArmed LeavesArmed MaxStemHeight_m
## 1        0           0     1            1         1           1              10
## 2        0           0     1            1         1           1              25
##   MaxStemDia_cm UnderstoreyCanopy MaxLeafNumber Max_Blade_Length_m
## 1            20            canopy            15                2.3
## 2            30            canopy            NA                3.0
##   Max_Rachis_Length_m Max_Petiole_length_m AverageFruitLength_cm
## 1                  NA                   NA                  0.65
## 2                  NA                   NA                  2.00
##   MinFruitLength_cm MaxFruitLength_cm AverageFruitWidth_cm MinFruitWidth_cm
## 1               0.6               0.7                  0.5               NA
## 2                NA                NA                  0.8               NA
##   MaxFruitWidth_cm FruitSizeCategorical FruitShape FruitColorDescription
## 1               NA                small                            black
## 2               NA                small      ovoid                 black
##   MainFruitColors Conspicuousness
## 1           black         cryptic
## 2           black         cryptic

When using a functional trait table, we should look at:
- trait type (continuous, ordinal, categorical, binary)
- trait coverage (percentage of species having a trait value)
- trait distribution (if too asymmetrical, the trait can be transformed)
- trait correlation (are some traits carrying the same information?)
- ecological significance (is the trait varying against environment? biotic variable? physiological process?)

Let’s go through these points.

1.1.1. Trait type

The traits available relate to growth form and habit, armature, stem size, leaves and fruits. They can be either binary traits (0 or 1, sometimes 2 when one species varies in that trait), categorical or continuous.

str(tra)

## 'data.frame':    2557 obs. of  29 variables:
##  $ SpecName             : chr  "Acanthophoenix crinita" "Acanthophoenix rousselii" "Acanthophoenix rubra" "Acoelorrhaphe wrightii" ...
##  $ accGenus             : chr  "Acanthophoenix" "Acanthophoenix" "Acanthophoenix" "Acoelorrhaphe" ...
##  $ accSpecies           : chr  "crinita" "rousselii" "rubra" "wrightii" ...
##  $ PalmTribe            : chr  "Areceae" "Areceae" "Areceae" "Trachycarpeae" ...
##  $ PalmSubfamily        : chr  "Arecoideae" "Arecoideae" "Arecoideae" "Coryphoideae" ...
##  $ Climbing             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Acaulescent          : int  0 0 0 0 0 0 1 0 1 0 ...
##  $ Erect                : int  1 1 1 1 1 1 0 1 0 1 ...
##  $ StemSolitary         : int  1 1 1 0 1 1 1 1 1 1 ...
##  $ StemArmed            : int  1 1 1 0 1 1 1 1 1 1 ...
##  $ LeavesArmed          : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ MaxStemHeight_m      : num  10 25 15 9.1 12 18 0 NA 0 NA ...
##  $ MaxStemDia_cm        : num  20 30 18 15 50 35 NA NA NA NA ...
##  $ UnderstoreyCanopy    : chr  "canopy" "canopy" "canopy" "canopy" ...
##  $ MaxLeafNumber        : int  15 NA 20 25 30 15 NA NA 6 NA ...
##  $ Max_Blade_Length_m   : num  2.3 3 3.1 1.3 3.5 3 NA NA 0.9 NA ...
##  $ Max_Rachis_Length_m  : num  NA NA 3 0.7 2.5 NA NA NA 0.54 NA ...
##  $ Max_Petiole_length_m : num  NA NA NA 0.65 NA 0.65 NA NA 0.51 NA ...
##  $ AverageFruitLength_cm: num  0.65 2 1 0.7 4.25 2.5 2 NA 2.25 4.6 ...
##  $ MinFruitLength_cm    : num  0.6 NA NA NA 3.5 NA NA NA 1.5 3.8 ...
##  $ MaxFruitLength_cm    : num  0.7 NA NA NA 5 NA NA NA 3 5.4 ...
##  $ AverageFruitWidth_cm : num  0.5 0.8 0.7 0.7 4.6 1.8 2 NA 2.25 4.6 ...
##  $ MinFruitWidth_cm     : num  NA NA NA 0.5 3.8 NA NA NA 1.5 3.8 ...
##  $ MaxFruitWidth_cm     : num  NA NA NA 0.9 5.4 NA NA NA 3 5.4 ...
##  $ FruitSizeCategorical : chr  "small" "small" "small" "small" ...
##  $ FruitShape           : chr  "" "ovoid" "ovoid" "ovoid" ...
##  $ FruitColorDescription: chr  "black" "black" "black" "orange-brown; becomming black" ...
##  $ MainFruitColors      : chr  "black" "black" "black" "brown; black" ...
##  $ Conspicuousness      : chr  "cryptic" "cryptic" "cryptic" "cryptic" ...

1.1.2. Trait coverage

Percentage of species having a trait value informed per trait.

nrow(tra) # total number of species

## [1] 2557

# Continuous traits
summary(tra$AverageFruitWidth_cm)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.200   0.750   1.050   1.594   1.800  20.000     563

summary(tra$MaxStemHeight_m)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00    2.50    6.00   10.86   15.00  170.00     446

# Binary traits
summary(tra$Climbing)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.2198  0.0000  2.0000

# Categorical traits
summary(tra$FruitSizeCategorical)

##    Length     Class      Mode 
##      2557 character character

table(tra$FruitSizeCategorical)

## 
##       large small 
##   505   251  1801

1.1.3. Trait distribution

To visualize the distribution of a specific trait value, we can plot a histogram, a boxplot or a violin plot.

Side note
The R Graph Gallery provides many nice examples of plot types along with R code.

plot_grid(
  ggplot(tra) +
    geom_boxplot(aes(x = 1, y = MaxStemHeight_m)),
  
  ggplot(tra) +
    geom_violin(aes(x = 1, y = MaxStemHeight_m)),
  
  ggplot(tra) +
    geom_histogram(aes(MaxStemHeight_m)),
  nrow = 1)

## Warning: Removed 446 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

## Warning: Removed 446 rows containing non-finite outside the scale range
## (`stat_ydensity()`).

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 446 rows containing non-finite outside the scale range
## (`stat_bin()`).

For categorical and binary traits, we can use the function table() to retrieve the proportion of each value.
The next chunk illustrates for example how to plot the joined proportion of palm species with armed leaves and stems.

table(tra$StemArmed); table(tra$LeavesArmed)

## 
##    0    1 
## 2297  205

## 
##    0    1 
## 1422 1015

table(tra$StemArmed, tra$LeavesArmed)

##    
##        0    1
##   0 1406  775
##   1   16  189

# Plot
plot(table(tra$StemArmed, tra$LeavesArmed),
     main = "Frequency of Armature classes",
     xlab = "Stem Armed", ylab = "Leaves Armed")

prpX <- prop.table(table(tra$StemArmed))
prpY <- prop.table(table(tra$StemArmed, tra$LeavesArmed), margin = 1)
text(prpX/2 * c(1,-1) + 0:1,
     prpY/2 * c(-1,-1,1,1) + c(1,1,0,0),
     paste("n=", table(tra$StemArmed, tra$LeavesArmed), sep=""))

1.1.4. Trait correlation

Some traits can carry a very similar information. To see this, you can look at the correlation between two traits.
One way to look at this is to plot one trait directly against another. For example, we have two traits related to stem size: maximum height and diameter of stem, and we can see how much they are related.

ggplot(tra, aes(MaxStemHeight_m, MaxStemDia_cm)) +
  geom_point() +
  annotate(x = 100, y = 100, geom = "label",
           label = paste0("Pearson correlation coefficient = ",
                          round(cor(tra$MaxStemHeight_m,
                                    tra$MaxStemDia_cm,
                                    use = "complete.obs"), 2))) +
  labs(x = "Maximum Stem Height (m)",
       y = "Maximum Stem Diameter (cm)") +
  theme_bw()

## Warning: Removed 647 rows containing missing values or values outside the scale range
## (`geom_point()`).

Additional categorical trait can add further insight into such plots. In the example above, the climbing feature is providing more details.

ggplot(tra, aes(MaxStemHeight_m, MaxStemDia_cm)) +
  geom_point(alpha = 0.5,
             aes(color = as.factor(Climbing))) +
  scale_color_viridis_d("Climbing") +
  labs(x = "Maximum Stem Height (m)",
       y = "Maximum Stem Diameter (cm)") +
  theme_bw() +
  theme(legend.position = "bottom")

## Warning: Removed 647 rows containing missing values or values outside the scale range
## (`geom_point()`).

When you have more than two traits to look at, the best way to visualize the correlation between multiple traits is to use a correlation pair plot. Let’s make one for all the traits related to leaves and fruits.

Palm leaf

# Pair plot
ggpairs(tra[, c("MaxLeafNumber", "Max_Blade_Length_m",
                "Max_Rachis_Length_m", "Max_Petiole_length_m")])

# pairs.panels(tra[, c("MaxLeafNumber", "Max_Blade_Length_m",
#                      "Max_Rachis_Length_m", "Max_Petiole_length_m")],
#              density = FALSE, ellipses = FALSE, hist.col = "grey")

ggpairs(tra[, c("AverageFruitLength_cm", "MinFruitLength_cm",
                "MaxFruitLength_cm", "AverageFruitWidth_cm",
                "MinFruitWidth_cm", "MaxFruitWidth_cm")])

For the leaves traits, the last three traits are highly correlated. We can just keep the blade length and the number of leaves. Regarding the fruit traits, they are all highly correlated, we’ll take only AverageFruitLength_cm.

1.2. Trait selection

Conclusion:
- to keep it easy: we just keep four continuous traits - stem: height and diameter are highly correlated, we just take the height
- leaf: many leaf traits are correlated, we’ll take only Blade length and the maximum number of leaves.
- fruit: many fruit traits are correlated, we’ll take only AverageFruitLength_cm

We here create a vector with our selected traits.

# Vector of traits we want to keep
tra_select <- c("MaxStemHeight_m", "MaxLeafNumber", "Max_Blade_Length_m",
                "AverageFruitLength_cm")

And then we make a species x trait table (species in rows and traits in columns).

# Only species column and traits of interest
sp_tra <- tra[, c("SpecName", tra_select)]

Let’s make a pair plot once again to ensure our selected traits are not highly correlated.

# Distribution of continuous traits
ggpairs(sp_tra[, tra_select])

The correlation coefficients are all below 65%. However, some traits are highly asymmetrical.
This can heavily affect the construction of functional spaces and the calculation of functional metrics.
We therefore log-transform all the traits.

# Log-transforming all the continuous traits
sp_tra$log_height <- log(sp_tra$MaxStemHeight_m + 1) # adding 1 here to avoid -Inf values
sp_tra$log_leaf_nb <- log(sp_tra$MaxLeafNumber)
sp_tra$log_blade <- log(sp_tra$Max_Blade_Length_m)
sp_tra$log_fruit_length <- log(sp_tra$AverageFruitLength_cm)

# Plot
ggplot(sp_tra, aes(AverageFruitLength_cm, log_fruit_length)) +
  geom_point() +
  labs(x = "Fruit length (cm)", y = "Log-transformed fruit length") +
  theme_bw()

## Warning: Removed 505 rows containing missing values or values outside the scale range
## (`geom_point()`).

# Change vector of traits of interest
tra_select <- c("log_height", "log_leaf_nb", "log_blade", "log_fruit_length")

1.3. Final trait table

We here merge the trait data with the distribution data from Kreft, H., Sommer, J.H. & Barthlott, W. (2006).
The significance of geographic range size for spatial diversity patterns in Neotropical palms. Ecography, 29, 21-30.

We here import a transformed version of the data to speed up the calculations. Now each cell is at a 2 degrees resolution, and species occurrences have been aggregated into these bigger cells. See practical 7 for more details.

species <- readRDS("data/palm_species_per_gridcell_2degrees.rds")

length(unique(species$new_ID)) # number of grid cells

## [1] 479

n_distinct(species$species) # number of species

## [1] 545

We can remove from our trait table all the species that are not present in our species distribution table.

dim(sp_tra[which(sp_tra$SpecName %in% unique(species$species)), ])

## [1] 0 9

head(species$species)

## [1] "Ammandra_natalia" "Ammandra_natalia" "Ammandra_natalia" "Ammandra_natalia"
## [5] "Ammandra_natalia" "Ammandra_natalia"

head(sp_tra$SpecName)

## [1] "Acanthophoenix crinita"   "Acanthophoenix rousselii"
## [3] "Acanthophoenix rubra"     "Acoelorrhaphe wrightii"  
## [5] "Acrocomia aculeata"       "Acrocomia crispa"

# Replace space with underscore in trait table before merging
sp_tra$species <- gsub(" ", "_", sp_tra$SpecName)

table(sp_tra$species %in% species$species)

## 
## FALSE  TRUE 
##  2095   462

sp_tra <- sp_tra[which(sp_tra$species %in% species$species), ]
dim(sp_tra)

## [1] 462  10

sp_tra[1:2, ]

##             SpecName MaxStemHeight_m MaxLeafNumber Max_Blade_Length_m
## 5 Acrocomia aculeata              12            30                3.5
## 9 Acrocomia hassleri               0             6                0.9
##   AverageFruitLength_cm log_height log_leaf_nb  log_blade log_fruit_length
## 5                  4.25   2.564949    3.401197  1.2527630        1.4469190
## 9                  2.25   0.000000    1.791759 -0.1053605        0.8109302
##              species
## 5 Acrocomia_aculeata
## 9 Acrocomia_hassleri

2. Trait maps

Since our distribution data is linked to a georeferenced grid cell, we can map the distribution of our traits.

We first import the georeferenced grid and the outline of Americas. The CRS is missing, and has to be filled manually. Latitude and longitude centroids have already been calculated.

# Spatial data
grid <- st_read("data/2degrees_grid/data_2degrees_grid_.shp")
americas <- st_read("data/americas/americas.shp")

## Reading layer `data_2degrees_grid_' from data source 
##   `C:\Users\Pierre\Work\classes\macroecology\class_content\Gitlab\mcmmb\data\2degrees_grid\data_2degrees_grid_.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 479 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -116.0007 ymin: -35 xmax: -35.00066 ymax: 36.5
## CRS:           NA

## Reading layer `americas' from data source 
##   `C:\Users\Pierre\Work\classes\macroecology\class_content\Gitlab\mcmmb\data\americas\americas.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -124.7158 ymin: -55.91972 xmax: -29.84 ymax: 49.37666
## CRS:           NA

# coordinate system
st_crs(grid) <- 4326 # "+proj=longlat +ellps=WGS84 +no_defs"
st_crs(americas) <- 4326 # "+proj=longlat +ellps=WGS84 +no_defs"

2.1. Trait geographical distribution

We here aim at mapping the distribution of one trait. We do it here with the maximal stem height.

First, we merge the species x grid cell table with our trait table.

# Merging palm height with spatial data frame
species_height <- left_join(
  species[, c("new_ID", "species", "Long", "Lat")],
  sp_tra[, c("species", "MaxStemHeight_m")],
  by = "species")
head(species_height)

##   new_ID          species      Long       Lat MaxStemHeight_m
## 1    665 Ammandra_natalia -78.97865 -1.984349              NA
## 2    701 Ammandra_natalia -77.00066 -2.000000              NA
## 3    737 Ammandra_natalia -75.00066 -2.000000              NA
## 4    664 Ammandra_natalia -78.99632 -4.004123              NA
## 5    700 Ammandra_natalia -77.00066 -4.000000              NA
## 6    736 Ammandra_natalia -75.00066 -4.000000              NA

We can then calculate the average species height per grid cell.

# Calculating the average height per cell with tapply function
head(tapply(species_height$MaxStemHeight_m,
            species_height$new_ID,
            function(x) mean(x, na.rm = TRUE))) # argument na.rm necessary here

##     1009     1010     1011     1012     1013     1014 
##  6.00000  9.00000 11.66667 12.00000 16.00000 17.40000

# With dplyr
grid_mean_height <- species_height %>%
  group_by(new_ID) %>%
  summarise(mean_height = mean(MaxStemHeight_m, na.rm = TRUE))

dim(grid_mean_height); head(grid_mean_height)

## [1] 479   2

## # A tibble: 6 × 2
##   new_ID mean_height
##   <chr>        <dbl>
## 1 1009           6  
## 2 1010           9  
## 3 1011          11.7
## 4 1012          12  
## 5 1013          16  
## 6 1014          17.4

Once this merge has been done, we can map the average height per cell.

# Plot
grid$new_ID <- as.character(grid$new_ID)
grid <- left_join(grid, grid_mean_height, by = "new_ID")

ggplot(grid) +
  geom_sf(aes(fill = mean_height), color = NA) +
  geom_sf(data = americas, color = "black", fill = NA) +
  scale_fill_viridis_c("Average height (m)") +
  theme_bw()

2.2. Trait coverage

We saw that not all species have a trait value. This lack of information can be spatially distributed in an uneven way. We therefore need to make an assessment of the spatial trait coverage.

tail(tapply(species_height$MaxStemHeight_m,
            species_height$new_ID,
            function(x) sum(!is.na(x))/length(x))) # coverage calculation

##       990       991       992       993       994       995 
## 0.8958333 0.8809524 0.8571429 0.8604651 0.8484848 0.9333333

grid_coverage_height <- species_height %>%
  group_by(new_ID) %>%
  summarise(coverage_height = sum(!is.na(MaxStemHeight_m))/
              length(MaxStemHeight_m))

dim(grid_coverage_height); tail(grid_coverage_height)

## [1] 479   2

## # A tibble: 6 × 2
##   new_ID coverage_height
##   <chr>            <dbl>
## 1 990              0.896
## 2 991              0.881
## 3 992              0.857
## 4 993              0.860
## 5 994              0.848
## 6 995              0.933

# Plot
grid <- left_join(grid, grid_coverage_height, by = "new_ID")

ggplot(grid) +
  geom_sf(aes(fill = coverage_height), color = NA) +
  geom_sf(data = americas, color = "black", fill = NA) +
  scale_fill_viridis_c("Coverage (%)", option = "E") +
  theme_bw()

We can see that the coverage has an influence on the conclusion we drew before.

3. Functional diversity

When we have several functional traits, we can compute a set of functional diversity indices that can inform us on the assembly processes within a given assemblage of species.

We here focus on the following indices: functional richness, functional evenness, dispersion and divergence (see the lecture for definition of these indices).

3.1. FD per grid cell

We here at calculating and then mapping the functional richness per grid cell using the fundiversity package.

# Before using the functions from fundiversity, we need to build a
# species-by-sites matrix
library("bioregion")
cell_sp <- net_to_mat(species[, c("new_ID", "species")])
# cell_sp <- as.matrix(table(species$new_ID, species$species))
cell_sp[1:3, 1:3]

##     Ammandra_natalia Ammandra_decasperma Ammandra_dasyneura
## 665                1                   0                  0
## 701                1                   0                  0
## 737                1                   0                  0

# And we use the selected traits from our species x trait table
rownames(sp_tra) <- sp_tra$species
sp_tra[1:2, tra_select]

##                    log_height log_leaf_nb  log_blade log_fruit_length
## Acrocomia_aculeata   2.564949    3.401197  1.2527630        1.4469190
## Acrocomia_hassleri   0.000000    1.791759 -0.1053605        0.8109302

# Functional richness
fd_palm <- fd_fric(traits = sp_tra[, tra_select],
                   sp_com = cell_sp)

## Removed 45 species with missing trait(s)

## Differing number of species between trait dataset and site-species matrix
## Taking subset of species

## Warning in fd_fric(traits = sp_tra[, tra_select], sp_com = cell_sp): Some sites
## had less species than traits so returned FRic is 'NA'

Many cells are not having a Frich value.

dim(fd_palm)

## [1] 479   2

summary(fd_palm$FRic)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##  0.00021  1.45713  4.79481  5.47927  8.86673 17.40310      108

We can now add to this table further indices of functional diversity.

# Functional dispersion
fd_palm <- left_join(fd_palm,
                     fd_fdis(traits = sp_tra[, tra_select],
                             sp_com = cell_sp),
                     by = "site")

## Removed 45 species with missing trait(s)

## Differing number of species between trait dataset and site-species matrix
## Taking subset of species

# Functional divergence
fd_palm <- left_join(fd_palm,
                     fd_fdiv(traits = sp_tra[, tra_select],
                             sp_com = cell_sp),
                     by = "site")

## Removed 45 species with missing trait(s)
## Differing number of species between trait dataset and site-species matrix
## Taking subset of species

# Functional evenness
fd_palm <- left_join(fd_palm,
                     fd_feve(traits = sp_tra[, tra_select],
                             sp_com = cell_sp),
                     by = "site")

## Removed 45 species with missing trait(s)
## Differing number of species between trait dataset and site-species matrix
## Taking subset of species

dim(fd_palm); head(fd_palm)

## [1] 479   5

##   site      FRic     FDis      FDiv      FEve
## 1  665  7.943896 1.313072 0.7576665 0.8657444
## 2  701 10.795302 1.424242 0.7674720 0.8607406
## 3  737 16.218305 1.528642 0.7683450 0.8703439
## 4  664  6.537796 1.284308 0.7331159 0.8687230
## 5  700 11.512031 1.466312 0.7735406 0.8769224
## 6  736 16.681220 1.555033 0.7703221 0.8795797

By doing a pair plot, we can have a look at the correlations between these different indices, and also with species richness.

# Adding species richness to grid
palm_SR <- species %>%
  group_by(new_ID) %>%
  summarise(SR = n())

fd_palm <- left_join(fd_palm, palm_SR, by = c("site" = "new_ID"))

ggpairs(fd_palm[, c("SR", "FRic", "FDis", "FDiv", "FEve")])

Like for phylogenetic diversity, there is a positive correlation between functional richness and species richness. To take this bias into account, we should perform a null model. We won’t do it here, but the principle is to reshuffle the trait values per cell a certain number of times and then to compute the Standardized Effect Size (see practical 7).

3.2. Map of FD

Like for individual traits, we can also map the FD per cell.

# Merge results with grid
grid <- left_join(grid, fd_palm, by = c("new_ID" = "site"))

plot_grid(
  ggplot(grid) +
    geom_sf(aes(fill = SR), color = NA) +
    geom_sf(data = americas, color = "black", fill = NA) +
    scale_fill_viridis_c("Species Richness") +
    labs(title = "Species richness") +
    theme_bw() +
    theme(legend.position = "bottom"),
  ggplot(grid) +
    geom_sf(aes(fill = FRic), color = NA) +
    geom_sf(data = americas, color = "black", fill = NA) +
    scale_fill_viridis_c("Functional Richness") +
    labs(title = "Functional richness") +
    theme_bw() +
    theme(legend.position = "bottom"),
    ggplot(grid) +
    geom_sf(aes(fill = FDis), color = NA) +
    geom_sf(data = americas, color = "black", fill = NA) +
    scale_fill_viridis_c("Functional dispersion") +
    labs(title = "Functional dispersion") +
    theme_bw() +
    theme(legend.position = "bottom"),
  nrow = 1)

4. Trait space

Another classical analysis in functional ecology is to build trait spaces that summarize the main axes of variations between species.

The main objective is then to reduce dimensionality, in order to get the main axes of variation between species, through the use of multivariate analyses.

Since we only have continuous traits, we can build this space using a PCA.
Side-node
If there are some categorical traits, you can first calculate a distance matrix and then compute a PCoA (see Bonus if interested).

We use the FactoMineR package to build the PCA.

sp_tra_pca <- sp_tra[, tra_select]

pca_palm <- PCA(sp_tra_pca, scale.unit = FALSE, graph = FALSE)

## Warning in PCA(sp_tra_pca, scale.unit = FALSE, graph = FALSE): Missing values
## are imputed by the mean of the variable: you should use the imputePCA function
## of the missMDA package

dim(sp_tra_pca)

## [1] 462   4

dim(sp_tra_pca[complete.cases(sp_tra_pca), ])

## [1] 417   4

# Removing NAs
sp_tra_pca <- sp_tra_pca[complete.cases(sp_tra_pca), ]

pca_palm <- PCA(sp_tra_pca, scale.unit = FALSE, graph = TRUE)

# Look at percentage of variation explained by the PCA axes
pca_palm$eig

##        eigenvalue percentage of variance cumulative percentage of variance
## comp 1  1.4851133              59.581954                          59.58195
## comp 2  0.5672445              22.757547                          82.33950
## comp 3  0.2548604              10.224863                          92.56436
## comp 4  0.1853373               7.435636                         100.00000

If needed, the PCA coordinates can be extracted manually in order to customize the plots with ggplot2.

# PCA plot
# Individual coordinates with categories (same order as in original data)
PCs <- as.data.frame(pca_palm$ind$coord[, 1:2])

# Variables coordinates
vPCs <- data.frame(pca_palm$var$coord[, 1:2])
# Customary circle
angle <- data.frame(x = sin(seq(-pi, pi, length = 50)),
                    y = cos(seq(-pi, pi, length = 50)))

# Plot
ggplot(PCs, aes(Dim.1, Dim.2)) +
  geom_point(alpha = 0.5) +
  geom_path(data = angle, inherit.aes = TRUE, aes(x, y), colour = "grey70") +
  geom_text(data = vPCs, aes(Dim.1, Dim.2, label = rownames(vPCs)),
            color = "orange2", size = 3, fontface = "bold") +
  labs(x = paste0("PC1 (", round(pca_palm$eig[1, 2], 1), " %)"),
       y = paste0("PC2 (", round(pca_palm$eig[2, 2], 1), " %)")) +
  geom_segment(data = vPCs,
               aes(x = 0, y = 0, xend = Dim.1, yend = Dim.2),
               arrow = arrow(length = unit(1/2, "picas")),
               color = "black", linewidth = 1) +
  coord_equal() +
  theme_bw()

References

Kissling, W. D., Balslev, H., Baker, W. J., Dransfield, J., Göldel, B., Lim, J. Y., Onstein, R. E., & Svenning, J.-C. (2019). PalmTraits 1.0, a species-level functional trait database of palms worldwide. Scientific Data, 6(1), 1–13.

Kreft, H., Sommer, J.H. and Barthlott, W. (2006), The significance of geographic range size for spatial diversity patterns in Neotropical palms. Ecography, 29: 21-30.

Pictures of palms: http://idtools.org/id/palms/palmid/gallery.php

Bonus: PCoA

X.1. Distances

First step is to compute a distance matrix.

If you had only numerical traits, you could use the Euclidean distances. If doing so, it would be similar to a PCA.

Now, because we also have categorical traits, we cannot simply use the Euclidean distances and we need a way to combine several types of data.

For this, we’ll use the Gower’s distance implemented in the dist.ktab function.

# Making a trait table without NAs
tra_select
sp_tra_complete <- sp_tra[, tra_select]
rownames(sp_tra_complete) <- sp_tra$species
sp_tra_complete <- sp_tra_complete[complete.cases(sp_tra_complete), ]
dim(sp_tra_complete); head(sp_tra_complete)

# Gower distance
?dist.ktab

ktab1 <- ktab.list.df(list(sp_tra_complete))

mat_dissim <- dist.ktab(ktab1, type = "Q", option = "scaledBYrange") # scaled quantitative traits
head(mat_dissim)

# n*(n-1)/2 pairwise combinations
all.equal(length(mat_dissim),
          ((nrow(sp_tra_complete)*(nrow(sp_tra_complete)-1))/2))

X.2. Principal Coordinate Analysis (PCoA)

PCoA relies on the distances between samples. Therefore, the first step of a PCoA is the construction of a (dis)similarity matrix. While PCA is based on Euclidean distances, PCoA can handle (dis)similarity matrices calculated from quantitative, semi-quantitative, qualitative, and mixed variables.

In our case, we would get a similar result with a PCA since we only have continuous traits and we used Euclidean distances in the dissimilarity matrix.
As always, the choice of (dis)similarity measure is critical and must be suitable to the data in question. For abundance data, Bray-Curtis distance is often recommended. You can use Jaccard index for presence/absence data. When the distance metric is Euclidean, PCoA is equivalent to PCA.

# ape::pcoa() # other option
pcoa_palm <- dudi.pco(mat_dissim, scannf = FALSE, nf = 6)

Merging coordinates with trait table and plot

# Coordinates of the individuals
head(pcoa_palm$li)

# Barplot of eigenvectors
barplot((pcoa_palm$eig / sum(pcoa_palm$eig))*100)

# Table with the coordinates on the first 2 axes only
coord_sp_pcoa <- pcoa_palm$li[, 1:2]
coord_sp_pcoa$sp <- rownames(coord_sp_pcoa)
head(coord_sp_pcoa)

# Merge back PCoA coordinates to complete trait table
sp_tra_complete$sp <- rownames(sp_tra_complete)
sp_tra_complete <- left_join(sp_tra_complete, coord_sp_pcoa, by = "sp")
head(sp_tra_complete)

# Plot
plot(sp_tra_complete$A1, sp_tra_complete$A2, pch = 16,
     main = "Species coordinates on the PCoA",
     xlab = paste0("PCoA1 (",
                   round((pcoa_palm$eig / sum(pcoa_palm$eig))*100, 2)[1],
                   "%)"),
     ylab = paste0("PCoA2 (",
                   round((pcoa_palm$eig / sum(pcoa_palm$eig))*100, 2)[2],
                   "%)"))

# Add the projection of traits on this plot
ordiplot(pcoa_palm,
         main = "Species coordinates on the PCoA",
         xlab = paste0("PCoA1 (",
                       round((pcoa_palm$eig / sum(pcoa_palm$eig))*100, 2)[1],
                       "%)"),
         ylab = paste0("PCoA2 (",
                       round((pcoa_palm$eig / sum(pcoa_palm$eig))*100, 2)[2],
                       "%)"))
# ?envfit
ef <- envfit(pcoa_palm, sp_tra_complete[, tra_select])
ef
plot(ef)

The world’s tallest monocotyledon:

tra$SpecName[which(tra$Climbing == 0)][which.max(tra$MaxStemHeight_m[which(tra$Climbing == 0)])]

## [1] "Ceroxylon quindiuense"

Modern concepts and methods in Macroecology and Biogeography

Day 8: Functional biogeography

Pierre Denelle & Holger Kreft

June 4th, 2025