back to overview    

Preparation: Please open the RStudio project we established last week and download all data for the class from Stud-IP (log in first) and unzip them into your project folder /data/. You can use getwd() to get your current working directory which is your project folder as long as you do not change it. Please install the following R-packages using install.packages() if not done yet:

 

If you want to visualize this tutorial in the viewer inside RStudio (to save space on your screen) run the following chunk of code:

install.packages("rstudioapi") # install an R-package required for this step
dir <- tempfile()
dir.create(dir)
download.file("https://gift.uni-goettingen.de/mcmmb/index.html", destfile = file.path(dir, "index.html"))
download.file("https://gift.uni-goettingen.de/mcmmb/Day2.html", destfile = file.path(dir, "Day2.html"))
htmlFile <- file.path(dir, "Day2.html")
rstudioapi::viewer(htmlFile)

Now you can conveniently copy code from the viewer into your script.

1. Species distribution

We will work with palm species distributed in the Neotropics described inn Kreft, H., Sommer, J.H. & Barthlott, W. (2006) The significance of geographic range size for spatial diversity patterns in Neotropical palms. Ecography, 29, 21-30.

1.1. Loading data

Load data
To import a table into R, we use the function read.table():

species <- read.table("data/palms_species_per_gridcell.csv",
                      header = TRUE, sep = ",")
species <- read.csv("data/palms_species_per_gridcell.csv")

 

Inspection

str(species)
## 'data.frame':    140860 obs. of  4 variables:
##  $ grid_id: int  13735 13736 13737 13738 13910 13911 13912 13913 13914 14085 ...
##  $ spp_Id : int  550 550 550 550 550 550 550 550 550 550 ...
##  $ GENUS  : chr  "Ammandra" "Ammandra" "Ammandra" "Ammandra" ...
##  $ EPITHET: chr  "natalia" "natalia" "natalia" "natalia" ...
dim(species)
## [1] 140860      4
colnames(species) # rownames(species) for rows
## [1] "grid_id" "spp_Id"  "GENUS"   "EPITHET"

1.2. Species and genus number

How many genera do we have?

unique(species$GENUS)
##  [1] "Ammandra"        "Phytelephas"     "Geonoma"         "Asterogyne"     
##  [5] "Calyptrogyne"    "Calyptronoma"    "Welfia"          "Pholidostachys" 
##  [9] "Astrocaryum"     "Desmoncus"       "Bactris"         "Aiphanes"       
## [13] "Gastrococcus"    "Acrocomia"       "Elaeis"          "Barcella"       
## [17] "Attalea"         "Polyandrococcus" "Allagoptera"     "Parajubaea"     
## [21] "Lytocaryum"      "Syagrus"         "Cocos"           "Jubaea"         
## [25] "Butia"           "Roystonea"       "Hypospathe"      "Oenocarpus"     
## [29] "Neonicholsonia"  "Prestoea"        "Euterpe"         "Reinhardtia"    
## [33] "Leopoldinia"     "Manicaria"       "Wettinia"        "Socratea"       
## [37] "Iriartea"        "Iriartella"      "Dictyocarpum"    "Wendlandiella"  
## [41] "Chamaedorea"     "Synechanthus"    "Gaussia"         "Juania"         
## [45] "Ceroxylon"       "Pseudophoenix"   "Lepidocaryum"    "Mauritiella"    
## [49] "Mauritia"        "Raphia"          "Sabal"           "Washingtonia"   
## [53] "Copernicia"      "Brahea"          "Serenoa"         "Acoelorraphe"   
## [57] "Coplothrinax"    "Rhapidophyllum"  "Zombia"          "Coccothrinax"   
## [61] "Thrinax"         "Schippia"        "Itaya"           "Crysophila"     
## [65] "Chelyocarpus"    "Trithrinax"
length(unique(species$GENUS)) # or dplyr::n_distinct(species$GENUS)
## [1] 66

 

Adding a species column to the data.frame
paste() allows us to combine both character strings of genus and epithet into a full species name.

species$species <- paste(species$GENUS, species$EPITHET, sep = " ")

 

How many species do we have?

length(unique(species$species)) # or dplyr::n_distinct(species$species)
## [1] 550

We can make a table with only one entry per species, i.e. we remove duplicates:

species_unique <- species[!duplicated(species$species), ]
# or species_unique <- unique(species[, c("spp_Id", "GENUS", "EPITHET", "species")])

Using table, we can see how many species there are per genus:

spec_genera <- table(species_unique$GENUS) # contingency table of the counts of factor levels.
spec_genera
## 
##    Acoelorraphe       Acrocomia        Aiphanes     Allagoptera        Ammandra 
##               1               2              22               4               3 
##      Asterogyne     Astrocaryum         Attalea         Bactris        Barcella 
##               5              18              29              64               1 
##          Brahea           Butia    Calyptrogyne    Calyptronoma       Ceroxylon 
##               9               8               8               3              11 
##     Chamaedorea    Chelyocarpus    Coccothrinax           Cocos      Copernicia 
##              77               4              14               1              13 
##    Coplothrinax      Crysophila       Desmoncus    Dictyocarpum          Elaeis 
##               2               9               7               3               1 
##         Euterpe    Gastrococcus         Gaussia         Geonoma      Hypospathe 
##               7               1               5              51               3 
##        Iriartea      Iriartella           Itaya          Juania          Jubaea 
##               1               2               1               1               1 
##     Leopoldinia    Lepidocaryum      Lytocaryum       Manicaria        Mauritia 
##               3               1               2               1               2 
##     Mauritiella  Neonicholsonia      Oenocarpus      Parajubaea  Pholidostachys 
##               3               1               9               2               4 
##     Phytelephas Polyandrococcus        Prestoea   Pseudophoenix          Raphia 
##               6               1              11               4               1 
##     Reinhardtia  Rhapidophyllum       Roystonea           Sabal        Schippia 
##               6               1               9              16               1 
##         Serenoa        Socratea         Syagrus    Synechanthus         Thrinax 
##               1               5              30               2               7 
##      Trithrinax    Washingtonia          Welfia   Wendlandiella        Wettinia 
##               3               2               1               1              21 
##          Zombia 
##               1

These frequencies can be plotted using a histogram:

library("ggplot2")

# Convert to a data.frame
spec_genera <- data.frame(genus = names(spec_genera),
                          count = as.numeric(spec_genera))

# Histogram
ggplot(spec_genera, aes(count)) +
  geom_histogram(color= "black", fill = "grey90") +
  labs(title = "Histogram of species per genus",
       x = "Species per genus", y = "Frequency") +
  theme_bw()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 

Below is an alternative using forcats::fct_infreq:

ggplot(species, aes(x = forcats::fct_infreq(GENUS))) +
    geom_histogram(stat = "count", color= "black", fill = "grey90") +
    labs(title = "Histogram of species per genus",
         x = "Species per genus", y = "Frequency") +
    theme_bw()
## Warning in geom_histogram(stat = "count", color = "black", fill = "grey90"):
## Ignoring unknown parameters: `binwidth`, `bins`, and `pad`

1.3. Subsetting the data.frame using logical operators

How many species belong to the genus Ammandra?

which(species$GENUS == "Ammandra") # index of all entries of genus "Ammandra"
##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
##  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
##  [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
## [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
## [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
## [145] 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
## [163] 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
## [181] 181 182
head(species[which(species$GENUS == "Ammandra"), ]) # first six rows
##   grid_id spp_Id    GENUS EPITHET          species
## 1   13735    550 Ammandra natalia Ammandra natalia
## 2   13736    550 Ammandra natalia Ammandra natalia
## 3   13737    550 Ammandra natalia Ammandra natalia
## 4   13738    550 Ammandra natalia Ammandra natalia
## 5   13910    550 Ammandra natalia Ammandra natalia
## 6   13911    550 Ammandra natalia Ammandra natalia
# What species belong to this genus?
unique(species[which(species$GENUS == "Ammandra"), "species"])
## [1] "Ammandra natalia"    "Ammandra decasperma" "Ammandra dasyneura"
length(unique(species[which(species$GENUS == "Ammandra"), "species"]))
## [1] 3

 

OR
How many species belong to the genus Ammandra or Washingtonia?

unique(species$species[which(species$GENUS == "Ammandra" |
                               species$GENUS == "Washingtonia")]) # all species names in the genera "Ammandra" and "Washingtonia"
## [1] "Ammandra natalia"      "Ammandra decasperma"   "Ammandra dasyneura"   
## [4] "Washingtonia robusta"  "Washingtonia filifera"
unique(species$species[which(species$GENUS %in% c("Ammandra", "Washingtonia"))])
## [1] "Ammandra natalia"      "Ammandra decasperma"   "Ammandra dasyneura"   
## [4] "Washingtonia robusta"  "Washingtonia filifera"
length(unique(species$species[which(species$GENUS %in%
                                      c("Ammandra", "Washingtonia"))]))
## [1] 5

 

AND
In how many grid cells can we find the species Ammandra decasperma?

head(species[which(species$GENUS=="Ammandra" &
                     species$EPITHET == "decasperma"), ]) # all entries of Ammandra decasperma
##     grid_id spp_Id    GENUS    EPITHET             species
## 153   10937    549 Ammandra decasperma Ammandra decasperma
## 154   10938    549 Ammandra decasperma Ammandra decasperma
## 155   11112    549 Ammandra decasperma Ammandra decasperma
## 156   11113    549 Ammandra decasperma Ammandra decasperma
## 157   11286    549 Ammandra decasperma Ammandra decasperma
## 158   11287    549 Ammandra decasperma Ammandra decasperma
head(species[which(species$species == "Ammandra decasperma"), ])
##     grid_id spp_Id    GENUS    EPITHET             species
## 153   10937    549 Ammandra decasperma Ammandra decasperma
## 154   10938    549 Ammandra decasperma Ammandra decasperma
## 155   11112    549 Ammandra decasperma Ammandra decasperma
## 156   11113    549 Ammandra decasperma Ammandra decasperma
## 157   11286    549 Ammandra decasperma Ammandra decasperma
## 158   11287    549 Ammandra decasperma Ammandra decasperma
nrow(species[which(species$species == "Ammandra decasperma"), ])
## [1] 20

 

Make a subset of the data.frame that only includes the following 4 species Geonoma pauciflora, Geonoma weberbaueri, Phytelephas tenuicaulis & Mauritia flexuosa.

subset_sp <- c("Geonoma pauciflora", "Geonoma weberbaueri",
               "Phytelephas tenuicaulis", "Mauritia flexuosa")

sub_sp <- species[which(species$species %in% subset_sp), ]

According to the World Checklist of Vascular Plants, Geonoma weberbaueri is a synonym of Geonoma undata. Rename it!

unique(sub_sp$species)
## [1] "Phytelephas tenuicaulis" "Geonoma weberbaueri"    
## [3] "Geonoma pauciflora"      "Mauritia flexuosa"
sub_sp[which(sub_sp$species == "Geonoma weberbaueri"), "species"] <-
  "Geonoma undata"
unique(sub_sp$species)
## [1] "Phytelephas tenuicaulis" "Geonoma undata"         
## [3] "Geonoma pauciflora"      "Mauritia flexuosa"

In how many grid cells does each of these species occur?

table(sub_sp$species)
## 
##      Geonoma pauciflora          Geonoma undata       Mauritia flexuosa 
##                     224                     334                    2858 
## Phytelephas tenuicaulis 
##                     135

Other operators that can be used are: which() and operators: ==, >=, >, <, <=, %in%, is.na(), is.finite(), !, &, |

2. Spatial grid

2.1. Load, structure and CRS

We now load the spatial grid, containing the geometries for all the cell IDs.

Loading Geospatial R packages

install.packages("sf")
library("sf")
library("terra")

 

Load spatial poylgons into R We will use the two shapefiles 30min_grid_select50%.shp and americas.shp

grid <- st_read("data/30min_grid_select50/30min_grid_select50%.shp")
## Reading layer `30min_grid_select50%' from data source 
##   `C:\Users\Pierre\Work\classes\macroecology\class_content\Gitlab\mcmmb\data\30min_grid_select50\30min_grid_select50%.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 6038 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -116.0007 ymin: -35 xmax: -35.00066 ymax: 36.5
## CRS:           NA
americas <- st_read("data/americas/americas.shp")
## Reading layer `americas' from data source 
##   `C:\Users\Pierre\Work\classes\macroecology\class_content\Gitlab\mcmmb\data\americas\americas.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -124.7158 ymin: -55.91972 xmax: -29.84 ymax: 49.37666
## CRS:           NA
str(grid)
## Classes 'sf' and 'data.frame':   6038 obs. of  4 variables:
##  $ ID      : num  538 539 540 712 713 714 715 716 885 886 ...
##  $ AREA    : num  0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 ...
##  $ PORTION : num  100 100 100 100 100 100 100 100 100 100 ...
##  $ geometry:sfc_MULTIPOLYGON of length 6038; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:5, 1:2] -114 -115 -115 -114 -114 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA
##   ..- attr(*, "names")= chr [1:3] "ID" "AREA" "PORTION"
ext(grid) 
## SpatExtent : -116.000656, -35.000656, -35, 36.5 (xmin, xmax, ymin, ymax)
class(americas)
## [1] "sf"         "data.frame"
ext(americas)
## SpatExtent : -124.715843200684, -29.8400001525879, -55.9197235107422, 49.3766555786133 (xmin, xmax, ymin, ymax)

Access the attribute table

head(grid)
## Simple feature collection with 6 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -115.0007 ymin: 35.5 xmax: -113.0007 ymax: 36.5
## CRS:           NA
##    ID AREA PORTION                       geometry
## 1 538 0.25     100 MULTIPOLYGON (((-114.0007 3...
## 2 539 0.25     100 MULTIPOLYGON (((-113.5007 3...
## 3 540 0.25     100 MULTIPOLYGON (((-113.0007 3...
## 4 712 0.25     100 MULTIPOLYGON (((-114.5007 3...
## 5 713 0.25     100 MULTIPOLYGON (((-114.0007 3...
## 6 714 0.25     100 MULTIPOLYGON (((-113.5007 3...

 

CRS

st_crs(grid)
## Coordinate Reference System: NA
st_crs(americas)
## Coordinate Reference System: NA

CRS is not defined yet for any of them!

st_crs(grid) <- "+proj=longlat +ellps=WGS84 +no_defs"
st_crs(americas) <- "+proj=longlat +ellps=WGS84 +no_defs"
st_crs(grid)
## Coordinate Reference System:
##   User input: +proj=longlat +ellps=WGS84 +no_defs 
##   wkt:
## GEOGCRS["unknown",
##     DATUM["Unknown based on WGS 84 ellipsoid",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",7030]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433],
##         ID["EPSG",8901]],
##     CS[ellipsoidal,2],
##         AXIS["longitude",east,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]],
##         AXIS["latitude",north,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433,
##                 ID["EPSG",9122]]]]
st_geometry(grid)
## Geometry set for 6038 features 
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -116.0007 ymin: -35 xmax: -35.00066 ymax: 36.5
## Geodetic CRS:  +proj=longlat +ellps=WGS84 +no_defs
## First 5 geometries:
## MULTIPOLYGON (((-114.0007 36, -114.5007 36, -11...
## MULTIPOLYGON (((-113.5007 36, -114.0007 36, -11...
## MULTIPOLYGON (((-113.0007 36, -113.5007 36, -11...
## MULTIPOLYGON (((-114.5007 35.5, -115.0007 35.5,...
## MULTIPOLYGON (((-114.0007 35.5, -114.5007 35.5,...

Check out http://proj4.org/ or http://epsg.io/ for spatial projections.
 

2.2. Basic plot

We can plot both objects:

ggplot(grid) +
  geom_sf() +
  geom_sf(data = americas, fill = NA) +
  theme_bw()

# Plot with base R
# par(mfrow = c(1, 2))
# plot(st_geometry(grid), border = "darkgrey")
# plot(st_geometry(americas), add=TRUE)

If we zoom in, we will see better the grid cells:

ggplot(grid) +
  geom_sf() +
  geom_sf(data = americas, fill = NA, color = "black") +
  scale_x_continuous(limits = c(-85, -65)) +
  scale_y_continuous(limits = c(5, 25)) +
  theme_bw()

# plot(st_geometry(grid), border = "darkgrey",
#      ylim = c(5, 25), xlim = c(-85, -65))
# plot(st_geometry(americas), add = TRUE)

 

3. Joining distribution and spatial data

3.1. Merge

To perform a join between two tables having a column in common, we rely on the functions available in the dplyr library.

library("dplyr")

To make a join between two tables, they need to have a column in common. The grid ID is the column in common in our case, but it is not named the same. Let’s rename one.

colnames(grid)[1] <- "grid_id"
grid_species <- left_join(grid, species, by = c("grid_id"))

3.2. Plot with facets

subset_sp <- c("Geonoma pauciflora", "Geonoma undata",
               "Phytelephas tenuicaulis", "Mauritia flexuosa")

ggplot(grid_species[which(grid_species$species %in% subset_sp), ]) +
  geom_sf(data = grid, color = "grey90", fill = "grey90") +
  geom_sf(data = americas, fill = NA) +
  geom_sf(aes(color = species, fill = species), show.legend = FALSE) +
  theme_bw() +
  facet_wrap(~species)

3.3. Alternative with wide-format table

Convert long table to species by grid-cell table There are several options to do this, we can use functions from existing R package like bioregion or base R approaches.

library("bioregion")
species <- species[, c("grid_id", "species")]
species_grid <- net_to_mat(net = species)
# Alternative
species_grid <- as.data.frame.matrix(table(species)) # Count species by grid cell combinations
species_grid[1:3, 1:3]

The grid IDs are stored in the rows now, let’s put them as the first column of the wide-format data.frame:

species_grid <- data.frame(grid_id = as.integer(row.names(species_grid)),
                           species_grid)
species_grid[1:3, 1:3]
##       grid_id Ammandra.natalia Ammandra.decasperma
## 13735   13735                1                   0
## 13736   13736                1                   0
## 13737   13737                1                   0

 

Join palm occurrences and spatial shapefile

grid_0 <- left_join(grid_0, species_grid, by = "grid_id") # Perform a "SQL" left join
grid_0[is.na(grid_0)] <- 0
grid_0[which(grid_0$grid_id %in% c(5328:5332)), 1:5]
## Simple feature collection with 5 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -82.00066 ymin: 22.5 xmax: -79.50066 ymax: 23
## Geodetic CRS:  +proj=longlat +ellps=WGS84 +no_defs
##     grid_id    AREA PORTION Ammandra.natalia Ammandra.decasperma
## 558    5328 0.19638  78.552                0                   0
## 559    5329 0.25000 100.000                0                   0
## 560    5330 0.24921  99.684                0                   0
## 561    5331 0.22011  88.044                0                   0
## 562    5332 0.13200  52.800                0                   0
##                           geometry
## 558 MULTIPOLYGON (((-81.50066 2...
## 559 MULTIPOLYGON (((-81.00066 2...
## 560 MULTIPOLYGON (((-80.50066 2...
## 561 MULTIPOLYGON (((-80.3008 23...
## 562 MULTIPOLYGON (((-79.93695 2...

 

Make distribution maps for the four species

ggplot(grid_0) +
  geom_sf(aes(color = ifelse(Geonoma.pauciflora == 1, "forestgreen",
                             "gray90"),
              fill = ifelse(Geonoma.pauciflora == 1, "forestgreen",
                            "gray90")),
          show.legend = FALSE) +
  geom_sf(data = americas, fill = NA) +
  scale_color_identity() +
  scale_fill_identity() +
  labs(title = expression(italic("Geonoma pauciflora"))) +
  theme_bw()

 

Below is the equivalent plotting made from base R code:

# One species
plot(st_geometry(grid_0),
     col = ifelse(grid_0$Geonoma.undata == 1, "forestgreen", "grey80"),
     border=NA, main = expression(italic("Geonoma pauciflora")))
plot(st_geometry(americas), add=TRUE)


Below is the equivalent plotting made from base R code:

# Four species together
plot(grid_0[c("Geonoma.pauciflora", "Geonoma.undata",
              "Phytelephas.tenuicaulis", "Mauritia.flexuosa")],
     border = NA, pal = c("grey80", "forestgreen"))

To go further on the topic of plots with spatial data, you can have a look at these nice tutorials: http://edrub.in/ARE212/section12.html, https://r-spatial.github.io/sf/articles/sf5.html