1. Load data into R
- 1.1 Load a phylogeny & look at structure of ‘multiPhylo’ object
- 1.2. Loading palm distribution data
2. Taxonomic standardization
3. Spatial species composition at coarse resolution
4. Calculating phylogenetic community metrics
- 4.1. Faith’s PD
- 4.2. Null models and other PD indices
References

Preparation: Please open your RStudio project and download the new data (‘Phylogeny_ConservativeCon_Checklist.zip’, ‘palms_specsxsites_phylo.csv’ and ‘palmtree_pruned.nex’) for today from Stud-IP and unzip and copy them into your .Rproj folder ‘/Data/’. You can use getwd() to locate your current working directory, which should be your project folder. Please install the following R-packages using install.packages():

ape
TNRS
remotes
sf
dplyr
psych
picante
pez
ggplot2
cowplot
GGally
RColorBrewer
bioregion

If you want to visualize this tutorial in the viewer inside RStudio (to save space on your screen) run the following chunk of code:

install.packages("rstudioapi") # install an R-package required for this step
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/mcmmb/index.html", destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/mcmmb/Day7.html", destfile = file.path(dir, "Day7.html"))
htmlFile <- file.path(dir, "Day7.html")
rstudioapi::viewer(htmlFile)

Now you can conveniently copy code from the viewer into your script.

Load R packages & island data set

library("dplyr") # data manipulation

library("sf") #  Geospatial library

library("TNRS") # Taxonomic standardisation of plant species names
# library("rWCVP") # Taxonomic standardisation of plant species names
# library("rWCVPdata") # According data for standardization
library("ape") # Analyses of phylogenetics and evolution
library("picante") # Phylogenies and ecology
library("pez") # Phylogenies and ecology

library("psych") # basic stats stuff
library("ggplot2") # for plotting
library("cowplot") # for multi-panel plots
library("GGally") # pairs plot
library("RColorBrewer") # colour palettes

library("bioregion") # bioregionalization package used to switch data formats

1. Load data into R

1.1 Load a phylogeny & look at structure of ‘multiPhylo’ object

load multi-tree object into R
Data from Faurby, S., Eiserhardt, W.L., Baker, W.J. & Svenning, J.-C. (2016).
An all-evidence species-level supertree for the palms (Arecaceae).
Molecular Phylogenetics and Evolution, 100, 57-69.

palmtrees <- read.nexus("data/Phylogeny_ConservativeCon_Checklist/Phylogeny_ConservativeCon_Checklist.nex")

palmtrees

## 1000 phylogenetic trees

palmtrees[[1]]

## 
## Phylogenetic tree with 2539 tips and 2538 internal nodes.
## 
## Tip labels:
##   Nypa_fruticans, Sabal_mauritiiformis, Sabal_pumos, Sabal_domingensis, Sabal_minor, Sabal_palmetto, ...
## 
## Rooted; includes branch length(s).

palmtree <- palmtrees[[1]]
str(palmtree)

## List of 4
##  $ edge       : int [1:5076, 1:2] 2540 2541 2541 2542 2543 2544 2545 2546 2546 2547 ...
##  $ edge.length: num [1:5076] 8.89 95.72 9.44 9.78 12.25 ...
##  $ Nnode      : int 2538
##  $ tip.label  : chr [1:2539] "Nypa_fruticans" "Sabal_mauritiiformis" "Sabal_pumos" "Sabal_domingensis" ...
##  - attr(*, "class")= chr "phylo"
##  - attr(*, "order")= chr "cladewise"

Explanation of structure of a phylogeny in R
edge: a two-column matrix where each row represents a branch (or edge) of the tree. The nodes and the tips are symbolized with integers. The n tips are numbered from 1 to n, and the m (internal) nodes from n+1 to n+m (the root being n + 1). For each row, the first column gives the ancestor.

edge.length (optional): A numeric vector giving the lengths of the branches given by edge.

tip.label: A vector of mode character giving the labels of the tips. The order of these labels corresponds to the integers 1 to n in edge.

Nnode: An integer value giving the number of nodes in the tree (m).

node.label (optional): A vector of mode character giving the labels of the nodes (ordered in the same way as the tip.label).

root.edge (optional): A numeric value giving the length of the branch at the root if it exists.

# Number of nodes (m) (n is the number of tips)
palmtree$Nnode

## [1] 2538

# Tip labels
head(palmtree$tip.label)

## [1] "Nypa_fruticans"       "Sabal_mauritiiformis" "Sabal_pumos"         
## [4] "Sabal_domingensis"    "Sabal_minor"          "Sabal_palmetto"

length(palmtree$tip.label)

## [1] 2539

# Edges
head(palmtree$edge)

##      [,1] [,2]
## [1,] 2540 2541
## [2,] 2541    1
## [3,] 2541 2542
## [4,] 2542 2543
## [5,] 2543 2544
## [6,] 2544 2545

range(palmtree$edge[, 1]) # ranges from n+1 to n+m (length(palmtree$tip.label) + palmtree$Nnode)

## [1] 2540 5077

is.rooted(palmtree) # does the phylogeny have a root? i.e. can the most basal ancestor be identified?

## [1] TRUE

We can also plot the phylogenetic tree.

plot(palmtree, type = "fan", cex = 0.3,
     edge.color = "gray70", tip.color = "#ef8a62",
     main = paste("Phylogenetic tree for", palmtree$Nnode, "palm species"))

1.2. Loading palm distribution data

Data from Kreft, H., Sommer, J.H. & Barthlott, W. (2006).
The significance of geographic range size for spatial diversity
patterns in Neotropical palms. Ecography, 29, 21-30.

species <- read.csv("data/palms_species_per_gridcell.csv",
                    sep = ",", stringsAsFactors = FALSE)
head(species)

##   grid_id spp_Id    GENUS EPITHET
## 1   13735    550 Ammandra natalia
## 2   13736    550 Ammandra natalia
## 3   13737    550 Ammandra natalia
## 4   13738    550 Ammandra natalia
## 5   13910    550 Ammandra natalia
## 6   13911    550 Ammandra natalia

length(unique(species$grid_id)) # number of grid cells

## [1] 6638

Add a column for the full species name with “_” as separator: “species”.

species$species <- paste(species$GENUS, species$EPITHET, sep = "_")
species$species <- paste0(species$GENUS, "_", species$EPITHET)
unique(species$species)[1:5]; head(species)

## [1] "Ammandra_natalia"        "Ammandra_decasperma"    
## [3] "Ammandra_dasyneura"      "Phytelephas_tumacana"   
## [5] "Phytelephas_tenuicaulis"

##   grid_id spp_Id    GENUS EPITHET          species
## 1   13735    550 Ammandra natalia Ammandra_natalia
## 2   13736    550 Ammandra natalia Ammandra_natalia
## 3   13737    550 Ammandra natalia Ammandra_natalia
## 4   13738    550 Ammandra natalia Ammandra_natalia
## 5   13910    550 Ammandra natalia Ammandra_natalia
## 6   13911    550 Ammandra natalia Ammandra_natalia

How many species in the palm distribution dataset are missing from the phylogeny?

length(unique(species$species))

## [1] 550

table(unique(species$species) %in% palmtree$tip.label) # need for unique species

## 
## FALSE  TRUE 
##    80   470

# length(unique(species$species[which(!species$species %in% palmtree$tip.label)]))
# sum(!unique(species$species) %in% palmtree$tip.label)

Which species are missing?

# first five palm species
unique(species$species[which(!species$species %in% palmtree$tip.label)])[1:5]

## [1] "Ammandra_natalia"    "Ammandra_dasyneura"  "Geonoma_weberbaueri"
## [4] "Geonoma_rubescens"   "Geonoma_polyandra"

# Species that are NOT in the phylogeny
unique(species$species)[!(unique(species$species) %in% palmtree$tip.label)]

##  [1] "Ammandra_natalia"             "Ammandra_dasyneura"          
##  [3] "Geonoma_weberbaueri"          "Geonoma_rubescens"           
##  [5] "Geonoma_polyandra"            "Geonoma_poeppiginana"        
##  [7] "Geonoma_paraguensis"          "Geonoma_myriantha"           
##  [9] "Geonoma_longevaginata"        "Geonoma_longepedunculata"    
## [11] "Geonoma_linearis"             "Geonoma_jussieuana"          
## [13] "Geonoma_gastoniana"           "Geonoma_gamiova"             
## [15] "Geonoma_densa"                "Geonoma_brevispatha"         
## [17] "Geonoma_arundinacea"          "Geonoma_appuniana"           
## [19] "Calyptrogyne_kunaria"         "Calyptrogyne_condesata"      
## [21] "Astrocaryum_gynacanthum"      "Desmoncus_phoenicocarpus"    
## [23] "Desmoncus_cirrhiferus"        "Bactris_trailiana"           
## [25] "Bactris_macana"               "Bactris_horrispatha"         
## [27] "Bactris_hondurensis"          "Bactris_grayumi"             
## [29] "Bactris_corrossilla"          "Aiphanes_leidostachys"       
## [31] "Aiphanes_aculeata"            "Gastrococcus_crispa"         
## [33] "Attalea_septuagenta"          "Attalea_luetzelbergii"       
## [35] "Polyandrococcus_caudescens"   "Syagrus_shmithii"            
## [37] "Syagrus_leptospatha"          "Hypospathe_altissima"        
## [39] "Hypospathe_macrorachis"       "Hypospathe_elegans"          
## [41] "Prestoea_roseospadix"         "Euterpe_longebracteata"      
## [43] "Leopoldinia_major"            "Wettinia_angusta"            
## [45] "Wettinia_aquatorialis"        "Socratea_hecantonandra"      
## [47] "Iriartella_stenocarpua"       "Dictyocarpum_ptarianum"      
## [49] "Dictyocarpum_lamarckianum"    "Dictyocarpum_fuscum"         
## [51] "Chamaedorea_vocanensis"       "Chamaedorea_tuerkheimii"     
## [53] "Chamaedorea_sullivaniorum"    "Chamaedorea_selvae"          
## [55] "Chamaedorea_orephila"         "Chamaedorea_keeleriorum"     
## [57] "Chamaedorea_hopperiana"       "Chamaedorea_ernesti-angustii"
## [59] "Ceroxylon_werberbaueri"       "Sabal_miamiensis"            
## [61] "Sabal_guatemalensis"          "Sabal_gretheriae"            
## [63] "Brahea_nitida"                "Acoelorraphe_wrightii"       
## [65] "Coplothrinax_wrightii"        "Coplothrinax_cookii"         
## [67] "Coccothrinax_hiorami"         "Thrinax_rivularis"           
## [69] "Thrinax_morrisii"             "Thrinax_ekmaniana"           
## [71] "Thrinax_compacta"             "Crysophila_williamsii"       
## [73] "Crysophila_warscewiczii"      "Crysophila_stauracantha"     
## [75] "Crysophila_nana"              "Crysophila_macrocarpa"       
## [77] "Crysophila_kalbreyeri"        "Crysophila_guagara"          
## [79] "Crysophila_grayumii"          "Crysophila_cookii"

# dplyr
setdiff(species$species, palmtree$tip.label)

##  [1] "Ammandra_natalia"             "Ammandra_dasyneura"          
##  [3] "Geonoma_weberbaueri"          "Geonoma_rubescens"           
##  [5] "Geonoma_polyandra"            "Geonoma_poeppiginana"        
##  [7] "Geonoma_paraguensis"          "Geonoma_myriantha"           
##  [9] "Geonoma_longevaginata"        "Geonoma_longepedunculata"    
## [11] "Geonoma_linearis"             "Geonoma_jussieuana"          
## [13] "Geonoma_gastoniana"           "Geonoma_gamiova"             
## [15] "Geonoma_densa"                "Geonoma_brevispatha"         
## [17] "Geonoma_arundinacea"          "Geonoma_appuniana"           
## [19] "Calyptrogyne_kunaria"         "Calyptrogyne_condesata"      
## [21] "Astrocaryum_gynacanthum"      "Desmoncus_phoenicocarpus"    
## [23] "Desmoncus_cirrhiferus"        "Bactris_trailiana"           
## [25] "Bactris_macana"               "Bactris_horrispatha"         
## [27] "Bactris_hondurensis"          "Bactris_grayumi"             
## [29] "Bactris_corrossilla"          "Aiphanes_leidostachys"       
## [31] "Aiphanes_aculeata"            "Gastrococcus_crispa"         
## [33] "Attalea_septuagenta"          "Attalea_luetzelbergii"       
## [35] "Polyandrococcus_caudescens"   "Syagrus_shmithii"            
## [37] "Syagrus_leptospatha"          "Hypospathe_altissima"        
## [39] "Hypospathe_macrorachis"       "Hypospathe_elegans"          
## [41] "Prestoea_roseospadix"         "Euterpe_longebracteata"      
## [43] "Leopoldinia_major"            "Wettinia_angusta"            
## [45] "Wettinia_aquatorialis"        "Socratea_hecantonandra"      
## [47] "Iriartella_stenocarpua"       "Dictyocarpum_ptarianum"      
## [49] "Dictyocarpum_lamarckianum"    "Dictyocarpum_fuscum"         
## [51] "Chamaedorea_vocanensis"       "Chamaedorea_tuerkheimii"     
## [53] "Chamaedorea_sullivaniorum"    "Chamaedorea_selvae"          
## [55] "Chamaedorea_orephila"         "Chamaedorea_keeleriorum"     
## [57] "Chamaedorea_hopperiana"       "Chamaedorea_ernesti-angustii"
## [59] "Ceroxylon_werberbaueri"       "Sabal_miamiensis"            
## [61] "Sabal_guatemalensis"          "Sabal_gretheriae"            
## [63] "Brahea_nitida"                "Acoelorraphe_wrightii"       
## [65] "Coplothrinax_wrightii"        "Coplothrinax_cookii"         
## [67] "Coccothrinax_hiorami"         "Thrinax_rivularis"           
## [69] "Thrinax_morrisii"             "Thrinax_ekmaniana"           
## [71] "Thrinax_compacta"             "Crysophila_williamsii"       
## [73] "Crysophila_warscewiczii"      "Crysophila_stauracantha"     
## [75] "Crysophila_nana"              "Crysophila_macrocarpa"       
## [77] "Crysophila_kalbreyeri"        "Crysophila_guagara"          
## [79] "Crysophila_grayumii"          "Crysophila_cookii"

2. Taxonomic standardization

Missing species from the phylogeny may be synonyms of accepted species names, and are therefore undetected in the phylogenetic tree.
Let’s standardize the missing species names.

Write species missing from phylogeny into a vector

specmissing <- unique(species[which(!species$species %in% palmtree$tip.label), c(2:5)])
head(specmissing)

##      spp_Id    GENUS      EPITHET              species
## 1       550 Ammandra      natalia     Ammandra_natalia
## 173     548 Ammandra    dasyneura   Ammandra_dasyneura
## 808     541  Geonoma  weberbaueri  Geonoma_weberbaueri
## 3948    529  Geonoma    rubescens    Geonoma_rubescens
## 4035    528  Geonoma    polyandra    Geonoma_polyandra
## 4249    526  Geonoma poeppiginana Geonoma_poeppiginana

nrow(specmissing)

## [1] 80

Match palm species names to those in World Checklist of Vascular Plants.
It is always important to check whether you have access to an updated taxonomy. We rely on the TNRS R package.

tnrs_names <- TNRS(taxonomic_names = specmissing$species)

# Alternative with rWCVP
# taxstand <- wcvp_match_names(specmissing,
#                              name_col = "species",
#                              id_col = "spp_Id")
# head(taxstand)

Replace old names by new names

tnrs_names$Accepted_species <- gsub(" ", "_", tnrs_names$Accepted_species)

# update the combined species name
for(i in 1:nrow(tnrs_names)){
  species[which(species$species == tnrs_names[i, "Name_submitted"]), "species"] <-
    tnrs_names[i, "Accepted_species"]
}

Check names that are not in the phylogeny

specmissing <- unique(species$species[which(!species$species %in% palmtree$tip.label)])
specmissing

## [1] "Geonoma_longepedunculata" ""                        
## [3] "Leopoldinia_major"        "Acoelorraphe_wrightii"

length(specmissing)

## [1] 4

Instead of 80 missing species, we only have 4 after taxonomic harmonization.

Remaining missing species would have to be looked up and added manually to the phylogeny by an expert.
If missing species cannot be added reliably, we need to remove them from the dataset for analysis.

# Removal of the species not in the phylogeny
length(unique(species$species))

## [1] 527

species <- species[which(!species$species %in% specmissing), ]
length(unique(species$species))

## [1] 523

3. Spatial species composition at coarse resolution

3.1. Loading data

Read in polygon shapefile

# gridcells corresponding to palm distribution data and americas coastline
grid <- st_read("data/30min_grid_select50/30min_grid_select50%.shp")

## Reading layer `30min_grid_select50%' from data source 
##   `C:\Users\Pierre\Work\classes\macroecology\class_content\Gitlab\mcmmb\data\30min_grid_select50\30min_grid_select50%.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 6038 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -116.0007 ymin: -35 xmax: -35.00066 ymax: 36.5
## CRS:           NA

americas <- st_read("data/americas/americas.shp")

## Reading layer `americas' from data source 
##   `C:\Users\Pierre\Work\classes\macroecology\class_content\Gitlab\mcmmb\data\americas\americas.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -124.7158 ymin: -55.91972 xmax: -29.84 ymax: 49.37666
## CRS:           NA

# remember to define the Coordinate Reference System (CRS)
st_crs(grid) <- 4326 # "+proj=longlat +ellps=WGS84 +no_defs"
st_crs(americas) <- 4326 # "+proj=longlat +ellps=WGS84 +no_defs"

# Rename ID column for matching
names(grid)[1] <- "grid_id"

3.2. Coarsening the resolution

We will now convert the grid shapefile to a coarser resolution to have fewer grid cells.

To do so, we first create an empty grid of a 2-degree resolution.

grid_2degrees <- st_make_grid(x = grid, cellsize = 2)%>%
  st_sf() %>% 
  mutate(ID = row_number()) # add a unique ID to each grid cell

As we can see on the following plot, the aggregated grid has been made after the extent of the grid. We therefore mask it to remove empty cells:

grid_union <- st_union(grid)

sf_use_s2(FALSE)

## Spherical geometry (s2) switched off

grid_2degrees <- st_intersection(grid_2degrees, grid_union)

## although coordinates are longitude/latitude, st_intersection assumes that they
## are planar

## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries

ggplot() +
  geom_sf(data = grid_2degrees, fill = "white", color = "black") +
  geom_sf(data = grid, fill = NA, color = "orange")+
  geom_sf(data = grid_2degrees, fill = NA, color = "black", linewidth = 1) +
  scale_x_continuous(limits = c(-80, -60)) +
  scale_y_continuous(limits = c(20, 5))

We now make a spatial join between the original grid and the coarser one. We have to specify largest = TRUE in the function st_join() to only keep the coarser cell with the bigger overlap.

grid <- st_join(grid, grid_2degrees, largest = TRUE)

## although coordinates are longitude/latitude, st_intersection assumes that they
## are planar

## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries

3.3. Add new ID also to species data.frame

species <- inner_join(species, st_drop_geometry(grid), by = "grid_id")
head(species)

##   grid_id spp_Id    GENUS EPITHET          species AREA PORTION  ID
## 1   13735    550 Ammandra natalia Aphandra_natalia 0.25     100 675
## 2   13736    550 Ammandra natalia Aphandra_natalia 0.25     100 676
## 3   13737    550 Ammandra natalia Aphandra_natalia 0.25     100 676
## 4   13738    550 Ammandra natalia Aphandra_natalia 0.25     100 676
## 5   13910    550 Ammandra natalia Aphandra_natalia 0.25     100 675
## 6   13911    550 Ammandra natalia Aphandra_natalia 0.25     100 676

We can now remove species duplicates per new ID.

species_coarse <- species %>% 
  group_by(ID) %>% 
  distinct(species, .keep_all = TRUE) %>% 
  ungroup() %>% 
  dplyr::select(-grid_id, -spp_Id, -GENUS, -EPITHET, -AREA, -PORTION)

Alternative way: manual aggregation with a for-loop and the grid centroids

Alternative

Extract the grid centroids

# convert into points shapefile
grid_centroids <- st_centroid(grid)

ggplot() +
  geom_sf(data = grid, fill = NA) +
  geom_sf(data = grid_centroids, shape = 20) +
  geom_sf(data = americas, fill = NA, color = "black") +
  theme_void() +
  scale_x_continuous(limits = c(-80, -60)) +
  scale_y_continuous(limits = c(20, 5))

Add coordinates to SpatialPolygonsDataframe

# Add latitude and longitude of grid centroids as extra columns
head(grid)
grid <- cbind(grid, st_coordinates(grid_centroids))
names(grid)[4:5] <- c("longitude", "latitude")
head(grid)

Make new IDs for neighboring cells
Two steps: i) we build an empty matrix with the new IDs, ii) with a for-loop, we add the new IDs into the grid object.

range(grid$longitude)
range(grid$latitude)

longitude <- seq(-116, -34, by = 2)
latitude  <- seq(-35, 35, by = 2)

new_ID_matrix <- matrix(c(1:(length(longitude)*length(latitude))),
                        nrow = length(longitude), ncol = length(latitude),
                        dimnames = list(longitude, latitude), byrow = TRUE)
dim(new_ID_matrix)
new_ID_matrix[1:5, 1:5]

# Filling
grid$new_ID <- NA

for(i in 1:nrow(grid)){
  grid$new_ID[i] <- new_ID_matrix[which(longitude < grid$longitude[i] & 
                                          (longitude + 2) > grid$longitude[i]),
                                  which(latitude < grid$latitude[i] &
                                          (latitude + 2) > grid$latitude[i])]
}

head(grid)

Join neighboring cells based on new IDs & aggregate species distribution data using the new grid

grid_2degrees <- group_by(grid, new_ID)
grid_2degrees <- summarise(grid_2degrees, do_union = TRUE, is_coverage = TRUE)

head(grid_2degrees)

ggplot(grid_2degrees) +
  geom_sf(fill = "gray90") +
  geom_sf(data = americas, fill = NA, color = "black") +
  theme_void() + ylim(-40, 40)

# Add new ID also to species data.frame
species <- inner_join(species, st_drop_geometry(grid), by = "grid_id")
head(species)

4. Calculating phylogenetic community metrics

4.1. Faith’s PD

Prune phylogeny to exclude species not in palm distribution data
Make sure that the same number of species in your data set are in your phylogeny

palmtree_pruned <- drop.tip(palmtree,
                            palmtree$tip.label[which(!palmtree$tip.label %in% unique(species$species))])
length(palmtree$tip.label)

## [1] 2539

length(palmtree_pruned$tip.label)

## [1] 518

length(unique(species$species))

## [1] 518

# write.nexus(palmtree_pruned, file = "data/palmtree_pruned.nex")

Convert long table to species by grid-cell table (a community matrix)

library("bioregion")
species_grid <- net_to_mat(species_coarse[, c("ID", "species")])

# Alternative without the package bioregion
# species <- unique(species[, c("new_ID", "species")])
# species_grid <- as.data.frame.matrix(table(species_coarse))
# species_grid <- as.matrix(species_grid)

dim(species_grid)

## [1] 479 518

species_grid[1:5, 1:5]

##     Aphandra_natalia Ammandra_decasperma Phytelephas_tumacana
## 675                1                   0                    0
## 676                1                   0                    0
## 677                1                   0                    0
## 634                1                   0                    0
## 635                1                   0                    0
##     Phytelephas_tenuicaulis Phytelephas_seemannii
## 675                       1                     0
## 676                       1                     0
## 677                       1                     0
## 634                       1                     0
## 635                       1                     0

# write.csv(species_grid, file = "data/palms_specsxsites_phylo.csv")

# if you have a hard time reading in the large palm phylogeny or
# in case you got stuck above
palmtree_pruned <- read.nexus("data/palmtree_pruned.nex")
species_grid <- as.matrix(read.csv("data/palms_specsxsites_phylo.csv", row.names = 1))

Calculate phylogenetic diversity

We here calculate the Faith’s PD (phylogenetic diversity index), which can be written like this:
$PD = \sum_{branches}l(b)$ with $l$ the length of a branch $b$ .

# ?pd
pd_palms <- pd(species_grid, palmtree_pruned)
dim(pd_palms) # one value per grid cell

## [1] 479   2

head(pd_palms)

##           PD SR
## 675 1575.824 69
## 676 1465.944 60
## 677 1524.880 62
## 634 1467.002 61
## 635 1525.790 62
## 636 1642.958 71

# Adding the ID of grid cells
pd_palms <- data.frame(ID = as.integer(row.names(species_grid)),
                       pd_palms)
head(pd_palms)

##      ID       PD SR
## 675 675 1575.824 69
## 676 676 1465.944 60
## 677 677 1524.880 62
## 634 634 1467.002 61
## 635 635 1525.790 62
## 636 636 1642.958 71

pd() calculates Faith’s PD and a corrected version of it being the residuals of a regression with total abundance or species richness per plot. Let’s look into this:

pd_model <- lm(PD ~ SR, data = pd_palms)
pd_palms$residuals <- residuals(pd_model)
head(pd_palms)

##      ID       PD SR  residuals
## 675 675 1575.824 69 -29.973347
## 676 676 1465.944 60  44.204392
## 677 677 1524.880 62  62.238817
## 634 634 1467.002 61  24.812397
## 635 635 1525.790 62  63.148715
## 636 636 1642.958 71  -3.741623

plot_grid(
  nrow = 1, ncol = 2,
  ggplot(pd_palms, aes(SR, PD)) +
    geom_point(color = "black", alpha = 0.3) +
    stat_smooth(method = "lm", formula = y ~ x, color = "red") +
    labs(x = "Species_number", y = "Faith's PD") +
    theme_bw(),
  ggplot(pd_palms, aes(SR, residuals)) +
    geom_point(color = "black", alpha = 0.3) +
    labs(x = "Species_number", y = "Faith's PD residuals") +
    theme_bw()
)

4.2. Null models and other PD indices

Calculate generic phylogenetic metrics (absolute and standardized)

Phylogenetic diversity metrics are standardized by first using a null model to generate expected values.
The principle of null models is illustrated in the following figure:

Then, the observed value is compared to the null, usually using standardized effect sizes (SES):
$SES = (observed - mean.null)/sd.null$ where mean.null is the mean of the null values and sd.null is the standard deviation of the null values.

Null models can also apply to other phylogenetic diversity indices like the Mean Pairwise Distance (MPD) and the Mean Nearest Taxon Distance (MNTD).

MPD consists in calculating all the pairwise distances between species in a grid cell and then takes the mean.
MNTD calculates the mean distance separating each species in the grid cell from its closest relative. These indices are described in Webb el al. (2002).

c.data <- comparative.comm(palmtree_pruned , species_grid)
pd.null <- generic.null(c.data, c(.pd, .mpd, .mntd),
                        null.model = "richness",
                        comp.fun = .ses, permute = 100)
colnames(pd.null) <- c("Faith_PD", "Corrected_FaithPD", "MPD", "MNTD")
rownames(pd.null) <- sites(c.data)
pd.null <- data.frame(pd.null)
pd.null$ID <- as.numeric(rownames(pd.null))

dim(pd.null); head(pd.null)

## [1] 479  21

##      Faith_PD.observed Corrected_FaithPD.observed MPD.observed MNTD.observed
## 1000         1161.9594                  210.59122     126.3598      44.62270
## 1001         1020.6110                  253.30102     123.3111      51.51361
## 1034          475.7734                   97.03081     141.8437      64.31507
## 1035          543.9522                   42.50417     122.3554      44.78403
## 1036          862.3636                   13.25003     109.3301      34.82029
## 1037         1101.8782                  -53.99906     105.3146      28.25024
##      Faith_PD.null.mean Corrected_FaithPD.null.mean MPD.null.mean
## 1000          1234.5055                    84.13755      137.9975
## 1001          1027.9350                    95.14096      136.8423
## 1034           506.7864                    33.31505      136.8407
## 1035           691.9129                    73.39233      135.1412
## 1036          1109.5848                    80.09129      136.1568
## 1037          1411.6192                    19.50242      136.1205
##      MNTD.null.mean Faith_PD.SE Corrected_FaithPD.SE    MPD.SE   MNTD.SE
## 1000       40.15389    8.440552             8.420466 0.5828615 0.4672596
## 1001       44.45246    8.624029             8.571403 0.6379790 0.6467057
## 1034       77.77521    6.052392             6.004969 1.1561112 1.8083958
## 1035       58.36659    6.663936             6.705964 0.8910013 1.0680532
## 1036       42.40147    8.190609             8.110140 0.5391931 0.5823474
## 1037       36.05080   10.138888            10.075718 0.4841846 0.4767682
##      Faith_PD.SES Corrected_FaithPD.SES   MPD.SES   MNTD.SES Faith_PD.rank
## 1000   -8.5949529             15.017420 -19.96651   9.563858          0.20
## 1001   -0.8492592             18.452062 -21.20949  10.918635          0.50
## 1034   -5.1240790             10.610507   4.32742  -7.443134          0.33
## 1035  -22.2031939             -4.606073 -14.34994 -12.717120          0.02
## 1036  -30.1834992             -8.241690 -49.75346 -13.018307          0.01
## 1037  -30.5497980             -7.294912 -63.62434 -16.361326          0.01
##      Corrected_FaithPD.rank MPD.rank MNTD.rank   ID
## 1000                   0.94     0.05      0.81 1000
## 1001                   1.00     0.03      0.87 1001
## 1034                   0.86     0.66      0.21 1034
## 1035                   0.34     0.09      0.12 1035
## 1036                   0.20     0.01      0.12 1036
## 1037                   0.26     0.01      0.06 1037

# select columns of interest for final data set
pd.out <- pd.null[, c("ID", "Faith_PD.observed", "Faith_PD.SES",
                      "MPD.observed", "MPD.SES", "MNTD.observed", "MNTD.SES")] 
head(pd.out)

##        ID Faith_PD.observed Faith_PD.SES MPD.observed   MPD.SES MNTD.observed
## 1000 1000         1161.9594   -8.5949529     126.3598 -19.96651      44.62270
## 1001 1001         1020.6110   -0.8492592     123.3111 -21.20949      51.51361
## 1034 1034          475.7734   -5.1240790     141.8437   4.32742      64.31507
## 1035 1035          543.9522  -22.2031939     122.3554 -14.34994      44.78403
## 1036 1036          862.3636  -30.1834992     109.3301 -49.75346      34.82029
## 1037 1037         1101.8782  -30.5497980     105.3146 -63.62434      28.25024
##        MNTD.SES
## 1000   9.563858
## 1001  10.918635
## 1034  -7.443134
## 1035 -12.717120
## 1036 -13.018307
## 1037 -16.361326

# change NAs to zeroes; zero phylogenetic diversity if grid has only 1 spp!
pd.out[is.na(pd.out)] <- 0

head(pd.out)

##        ID Faith_PD.observed Faith_PD.SES MPD.observed   MPD.SES MNTD.observed
## 1000 1000         1161.9594   -8.5949529     126.3598 -19.96651      44.62270
## 1001 1001         1020.6110   -0.8492592     123.3111 -21.20949      51.51361
## 1034 1034          475.7734   -5.1240790     141.8437   4.32742      64.31507
## 1035 1035          543.9522  -22.2031939     122.3554 -14.34994      44.78403
## 1036 1036          862.3636  -30.1834992     109.3301 -49.75346      34.82029
## 1037 1037         1101.8782  -30.5497980     105.3146 -63.62434      28.25024
##        MNTD.SES
## 1000   9.563858
## 1001  10.918635
## 1034  -7.443134
## 1035 -12.717120
## 1036 -13.018307
## 1037 -16.361326

head(pd_palms)

##      ID       PD SR  residuals
## 675 675 1575.824 69 -29.973347
## 676 676 1465.944 60  44.204392
## 677 677 1524.880 62  62.238817
## 634 634 1467.002 61  24.812397
## 635 635 1525.790 62  63.148715
## 636 636 1642.958 71  -3.741623

palmdiversity <- left_join(pd_palms, pd.out, by = "ID")
head(palmdiversity)

##    ID       PD SR  residuals Faith_PD.observed Faith_PD.SES MPD.observed
## 1 675 1575.824 69 -29.973347          1575.824    -21.87064     125.8718
## 2 676 1465.944 60  44.204392          1465.944    -16.84119     126.7775
## 3 677 1524.880 62  62.238817          1524.880    -14.66500     127.4341
## 4 634 1467.002 61  24.812397          1467.002    -19.34955     130.6003
## 5 635 1525.790 62  63.148715          1525.790    -13.95727     129.7774
## 6 636 1642.958 71  -3.741623          1642.958    -15.60304     128.3756
##     MPD.SES MNTD.observed   MNTD.SES
## 1 -35.51768      26.52567 -11.484193
## 2 -27.25207      27.22376 -13.491031
## 3 -24.40306      26.70619 -15.882595
## 4 -20.47196      28.74301  -9.606953
## 5 -20.87424      28.67301  -7.675902
## 6 -24.35456      25.54811 -15.772418

# Plot
plot_grid(
  nrow = 1, ncol = 2,
  
  ggplot(palmdiversity, aes(SR, Faith_PD.SES)) +
    geom_point(color = "black", alpha = 0.3) +
    stat_smooth(method = "lm", formula = y ~ x, color = "red") +
    labs(x = "Species_number", y = "Faith's PD standardized") +
    theme_bw(),
  
  ggplot(palmdiversity, aes(Faith_PD.observed, Faith_PD.SES)) +
    geom_point(color = "black", alpha = 0.3) +
    stat_smooth(method = "lm", formula = y ~ x, color = "red") +
    labs(x = "Faith's PD observed", y = "Faith's PD standardized") +
    theme_bw()
)

Plot correlations among the diversity metrics in the palm diversity data.frame (use ggpairs()).

ggpairs(palmdiversity[, c(3:10)],
        upper = list(continuous = wrap(ggally_cor, digits = 1))) +
  theme_bw()

# psych::pairs.panels(palmdiversity[, c(3:10)], method = "pearson",
#             density = FALSE, ellipses = FALSE, hist.col = "white")

Join the palmdiversity data.frame to the grid_2degrees shapefile to be able to link the diversity metrics and the poylgons for plotting.

head(grid_2degrees); head(palmdiversity)

## Simple feature collection with 6 features and 1 field
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: -72.00066 ymin: -35 xmax: -58.00066 ymax: -33
## Geodetic CRS:  WGS 84
##    ID                       geometry
## 22 22 MULTILINESTRING ((-72.00066...
## 23 23 POLYGON ((-71.50066 -35, -7...
## 26 26 GEOMETRYCOLLECTION (POLYGON...
## 27 27 POLYGON ((-64.00066 -33, -6...
## 28 28 POLYGON ((-62.00066 -33, -6...
## 29 29 MULTIPOLYGON (((-58.00066 -...

##    ID       PD SR  residuals Faith_PD.observed Faith_PD.SES MPD.observed
## 1 675 1575.824 69 -29.973347          1575.824    -21.87064     125.8718
## 2 676 1465.944 60  44.204392          1465.944    -16.84119     126.7775
## 3 677 1524.880 62  62.238817          1524.880    -14.66500     127.4341
## 4 634 1467.002 61  24.812397          1467.002    -19.34955     130.6003
## 5 635 1525.790 62  63.148715          1525.790    -13.95727     129.7774
## 6 636 1642.958 71  -3.741623          1642.958    -15.60304     128.3756
##     MPD.SES MNTD.observed   MNTD.SES
## 1 -35.51768      26.52567 -11.484193
## 2 -27.25207      27.22376 -13.491031
## 3 -24.40306      26.70619 -15.882595
## 4 -20.47196      28.74301  -9.606953
## 5 -20.87424      28.67301  -7.675902
## 6 -24.35456      25.54811 -15.772418

grid_2degrees <- left_join(grid_2degrees, palmdiversity, by = "ID")
head(grid_2degrees)

## Simple feature collection with 6 features and 10 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: -72.00066 ymin: -35 xmax: -58.00066 ymax: -33
## Geodetic CRS:  WGS 84
##   ID      PD SR residuals Faith_PD.observed Faith_PD.SES MPD.observed MPD.SES
## 1 22      NA NA        NA                NA           NA           NA      NA
## 2 23 104.614  1 -110.5213           104.614     1.461737            0       0
## 3 26 104.614  1 -110.5213           104.614     4.109849            0       0
## 4 27 104.614  1 -110.5213           104.614     5.236266            0       0
## 5 28 104.614  1 -110.5213           104.614     5.873155            0       0
## 6 29 104.614  1 -110.5213           104.614     6.043440            0       0
##   MNTD.observed MNTD.SES                       geometry
## 1            NA       NA MULTILINESTRING ((-72.00066...
## 2             0        0 POLYGON ((-71.50066 -35, -7...
## 3             0        0 GEOMETRYCOLLECTION (POLYGON...
## 4             0        0 POLYGON ((-64.00066 -33, -6...
## 5             0        0 POLYGON ((-62.00066 -33, -6...
## 6             0        0 MULTIPOLYGON (((-58.00066 -...

Plot maps with colours according to the diversity metrics.

Species richness, PD, MPD & MNTD (absolute)

plot_grid(
  nrow = 2, ncol = 2,
  ggplot(grid_2degrees[complete.cases(grid_2degrees$SR), ]) +
    geom_sf(color = NA, aes(fill = SR)) +
    geom_sf(data = americas, fill = NA, color = "black") +
    labs(title = "Palm species richness") +
    scale_fill_viridis_c("Species\nnumber") +
    theme_void() +
    theme(plot.title = element_text(margin = margin(0, 0, 10, 0))) + 
    ylim(-40, 40),
  
  ggplot(grid_2degrees[complete.cases(grid_2degrees$SR), ]) +
    geom_sf(color = NA, aes(fill = PD)) +
    geom_sf(data = americas, fill = NA, color = "black") +
    labs(title = "Faith's Phylogenetic diversity") +
    scale_fill_viridis_c("PD (my)") +
    theme_void() +
    theme(plot.title = element_text(margin = margin(0, 0, 10, 0))) + 
    ylim(-40, 40),
  
  ggplot(grid_2degrees[complete.cases(grid_2degrees$SR), ]) +
    geom_sf(color = NA, aes(fill = MPD.observed)) +
    geom_sf(data = americas, fill = NA, color = "black") +
    labs(title = "Mean pairwise distance") +
    scale_fill_viridis_c("MPD (my)") +
    theme_void() +
    theme(plot.title = element_text(margin = margin(0, 0, 10, 0))) + 
    ylim(-40, 40),
  
  ggplot(grid_2degrees[complete.cases(grid_2degrees$SR), ]) +
    geom_sf(color = NA, aes(fill = MNTD.observed)) +
    geom_sf(data = americas, fill = NA, color = "black") +
    labs(title = "Mean nearest taxon distance") +
    scale_fill_viridis_c("MNTD (my)") +
    theme_void() +
    theme(plot.title = element_text(margin = margin(0, 0, 10, 0))) + 
    ylim(-40, 40)
)

# par(mfrow = c(2, 2), mar = rep(1, 4), oma = rep(0, 4))
# 
# my.class.fr <- classIntervals(grid_2degrees@data$SR, n = 10, style = "equal") # bin data into n quantiles
# my.pal <- matlab.like(10)
# my.col.fr <- findColours(my.class.fr, my.pal) # ramp colors based on classInts
# 
# plot(grid_2degrees, col = my.col.fr, border = NA, axes = FALSE)
# title(main = "Species_richness", line = -3)
# plot(americas, add = TRUE)
# 
# my.class.fr <- classIntervals(grid_2degrees@data$Faith_PD.observed,
#                               n = 10, style = "equal") # bin data into n quantiles
# my.pal <- matlab.like(10)
# my.col.fr <- findColours(my.class.fr, my.pal) # ramp colors based on classInts
# 
# plot(grid_2degrees, col = my.col.fr, border = NA, axes = FALSE)
# title(main = "Faith's PD",line = -3)
# plot(americas, add = TRUE)
# 
# my.class.mpd <- classIntervals(grid_2degrees@data$MPD.observed,
#                                n = 10, style = "equal") # bin data into n quantiles
# my.pal2 <- matlab.like(10)
# my.col.mpd <- findColours(my.class.mpd, my.pal2) # ramp colors based on classInts
# 
# plot(grid_2degrees, col = my.col.mpd, border = NA, axes = FALSE)
# title(main = "MPD", line = -3)
# plot(americas, add = TRUE)
# 
# my.class.mntd <- classIntervals(grid_2degrees@data$MNTD.observed,
#                                 n = 10, style = "equal") # bin data into n quantiles
# my.pal3 <- matlab.like(10)
# my.col.mntd <- findColours(my.class.mntd, my.pal3) # ramp colors based on classInts
# 
# plot(grid_2degrees, col = my.col.mntd, border = NA, axes = FALSE)
# title(main = "MNTD", line = -3)
# plot(americas, add = TRUE)