Delimiting the modelling background for scattered uneven occurrence data

In species distribution modelling and ecological niche modelling (SDM & ENM), the region from where background or pseudoabsence points are picked is key to how well a model turns out. This region should include sufficient localities for the model to assess the species’ (non-)preferences, but it should also be within the species’ reach AND reasonably evenly surveyed for it. Survey bias within the background or the pseudoabsence region can seriously reduce model quality.

Yet, real-life occurrence data (especially at the wide scale of species distributions) often are strongly biased, and it’s rarely straightforward to define an adequate region around the existing points. Very nice criteria exist that work well in theory, but not under most actual scenarios of biodiversity data bias. Until recently, my preferred approach was a buffer around the known occurrences, whose radius was e.g. the mean pairwise distance between them (as a rough measure of their spatial spread, or the spatial reach of the species), or half the width of the area defined by the occurrences (usually better for elongated distributions or surveys):

# download some example occurrence records:

occs <- geodata::sp_occurrence("Rupicapra", "pyrenaica", args = c("year=2024"))


# map occurrence records:

occs <- terra::vect(occs, geom = c("lon", "lat"), crs = "EPSG:4326")
terra::plot(occs, cex = 0.2, ext = terra::ext(occs) + 4)
cntry <- geodata::world(path = tempdir())
terra::plot(cntry, lwd = 0.2, add = TRUE)
# note you should normally clean these records from errors before moving on!


# compute mean distance and width of occurrence records:

occs_mdist <- mean(terra::distance(occs))
occs_width <- terra::width(terra::aggregate(occs))


# compute buffers around occurrences using both metrics:

occs_buff_d <- terra::aggregate(terra::buffer(occs, width = occs_mdist))
occs_buff_w <- terra::aggregate(terra::buffer(occs, width = occs_width / 2))


# plot both buffers and occurrence records:

par(mfrow = c(1, 2))
terra::plot(occs_buff_d, col = "yellow", border = "orange", lwd = 3, ext = terra::ext(occs) + 4, main = "Mean distance")
terra::plot(cntry, lwd = 0.2, add = TRUE)
terra::plot(occs, cex = 0.2, add = TRUE)
terra::plot(occs_buff_w, col = "yellow", border = "orange", lwd = 3, ext = terra::ext(occs) + 4, main = "Half width")
terra::plot(cntry, lwd = 0.2, add = TRUE)
terra::plot(occs, cex = 0.2, add = TRUE)
Image

This may work well for relatively evenly surveyed species or regions, where we can reasonably assume that most of the buffered area has been (roughly evenly) surveyed, and that most places without presence are likely absences. However, many species have isolated records in remote understudied areas, with survey effort hugely imbalanced across the distribution range. So, a buffer with a uniform radius will not work well:

occs <- geodata::sp_occurrence("Lutra", "lutra", args = c("year=2024"))

# everything else same code as above...
Image

You can use smaller multipliers for distance and width, but these uniform-radius buffers will always include large regions with only scarce and isolated occurrence records, where it cannot be assumed that absence of records mostly reflects absence of the species. Normally I would terra::erase() from the buffer the regions that look insufficiently surveyed, but that implies discarding potentially valuable occurrence data.

So, my latest approach consists of finding clusters of occurrence points that are within a given distance of one another; and then defining a varying buffer radius of, e.g., the width or the mean pairwise distance among the points in each cluster, down-weighted by the distance to other points or clusters (as an indicator of isolated observations in under-surveyed areas). This way, the modelling background will avoid areas that were most likely not surveyed, yet without discarding the occurrence records from those areas. I’ve implemented several variations of this and related approaches in the new getRegion() function of R package fuzzySim (version 4.26):

# try regions with two different methods:

reg1 <- fuzzySim::getRegion(occs, type = "inv_dist")
terra::plot(cntry, lwd = 0.2, add = TRUE)
terra::plot(reg1, lwd = 4, border = "orange", add = TRUE)

reg2 <- fuzzySim::getRegion(occs, type = "clust_width", width_mult = 0.5)
terra::plot(cntry, lwd = 0.2, add = TRUE)
terra::plot(reg2, lwd = 4, border = "orange", add = TRUE)
Image

Read the function help file for all the different options available, and try them out to see what feels best for your study system. Feedback and bug reports welcome!

Extract raster values to points with bilinear interpolation

A student recently asked me how exactly the R terra::extract() function worked when using method="bilinear" to get raster values for points. The help file rightly says that ‘With “bilinear” the returned values are interpolated from the values of the four nearest raster cells‘, but this wasn’t immediately clear without a visual example. So, I put together some code to illustrate it:

library(terra)

# create a simple raster map:
r <- rast(nrows = 3, ncols = 3, xmin = 0, xmax = 3, ymin = 0, ymax = 3)
values(r) <- 1:ncell(r)
plot(r)

# get the pixel centroids:
centr <- as.points(r)
plot(centr, add = TRUE)

# show pixel values on the map:
plot(r, main = "Pixel values")
text(centr, values(r))

Image

With terra::extract(), when a point falls exactly at the centre of a pixel, “bilinear” gives the same result as the “simple” method, i.e. the value of the pixel where the point exactly falls:

# get the middle pixel centroid:
pnt <- centr[5, ]
plot(pnt, col = "blue", pch = 4, cex = 2, add = TRUE)

extract(r, pnt, ID = FALSE, method = "simple")
extract(r, pnt, ID = FALSE, method = "bilinear")
# both 5

Now here’s a function to check what values are extracted for points at other parts of this raster map:

checkPoint <- function(dx = 0, dy = 0, col = "black") {
pnt_shift <- shift(pnt, dx = dx, dy = dy)
plot(pnt_shift, col = col, add = TRUE)
val <- extract(r, pnt_shift, ID = FALSE, method = "bilinear")
text(pnt_shift, val, pos = 3, col = col, cex = 0.8, halo = TRUE)
}

Let’s use this function to see what values we get from this raster (with method="bilinear") for different points:

plot(r)
text(centr, values(r))

checkPoint(dx = 0.2, col = "blue")
checkPoint(dx = 0.5, col = "purple")
checkPoint(dx = 0.8, dy = 0.2, col = "steelblue")
checkPoint(dx = 0.5, dy = -0.5, col = "magenta")
checkPoint(dx = -0.3, dy = 0.3, col = "darkgreen")
checkPoint(dx = -1.3, dy = 0, col = "darkturquoise")
checkPoint(dx = -1.3, dy = 0.3, col = "red3")
checkPoint(dx = -1.3, dy = 1.2, col = "cadetblue")
Image

So that’s how this works! Pretty cool, isn’t it? Consider using “bilinear” rather than the default “simple” method when extracting raster values to points with ‘terra‘. And thanks to Robert J. Hijmans for such a great package!

Actual pixel sizes of unprojected raster maps

It is well known, though often dismissed, that the areas of spatial units (cells, pixels) based on unprojected coordinates (longitude-latitude degrees, arc-minutes or arc-seconds) are wildly inconsistent across the globe. Towards the poles, as the longitude meridians approach each other, the actual ground width of the pixels sharply decreases. So, for non-tropical regions, the actual area covered by these pixels can be quite far from the nominal “approximately 1 km at the Equator” (or 10 km, etc.) which is often referred in biogeography and macroecology studies.

The function below combines several tools from the ‘terra’ R package to compute the (mean or centroid) pixel area for a given raster map. By default, it also shows a map illustrating the latitudinal variation in the pixel areas.

pixelArea <- function(rast, # SpatRaster
                      type = "mean", # can also be "centroid"
                      unit = "m",  # can also be "km"
                      mask = TRUE,  # to use only non-NA pixels
                      map = TRUE) {

  # by A. Marcia Barbosa (https://modtools.wordpress.com/)
  # version 1.4 (17 May 2024)

  r <- rast

  stopifnot(inherits(r, "SpatRaster"),
            type %in% c("mean", "centroid"))

  r_size <- terra::cellSize(r, unit = unit, mask = mask)
  if (map) terra::plot(r_size, main = paste0("Pixel area (", unit, "2)"))
  areas <- terra::values(r_size, mat = FALSE, dataframe = FALSE, na.rm = FALSE)  # na.rm must be FALSE for areas[centr_pix] to be correct

  if (type == "mean") {
    out <- mean(areas, na.rm = TRUE)
    cat(paste0("Mean pixel area (", unit, "2):\n", out, "\n\n"))
    return(out)
  }

  if (type == "centroid") {
    r_pol <- terra::as.polygons(r * 0, aggregate = TRUE)
    centr <- terra::centroids(r_pol)
    if (map) {
      if (!mask) terra::plot(r_pol, lwd = 0.2, add = TRUE)
      terra::plot(centr, pch = 4, col = "blue", add = TRUE)
    }

    centr_pix <- terra::cellFromXY(r, terra::crds(centr))
    out <- areas[centr_pix]
    if (!is.finite(out)) message("The centroid of your region may not have a pixel value; consider using mask=FALSE, or type = 'mean'.")
    cat(paste0("Centroid pixel area (", unit, "2):\n", out, "\n\n"))
    return(out)
  }
}

You can source the function directly from GitHub with the first command below, which is followed by a usage example for a central European country:

source("https://raw.githubusercontent.com/AMBarbosa/unpackaged/master/pixelArea.R")
elev <- geodata::elevation_30s("Switzerland", path = tempdir())
terra::plot(elev)

terra::res(elev)
# 0.008333333 degree resolution
# i.e. half arc-minute, or 30 arc-seconds (1/60/2)
# "approx. 1 km at the Equator"

pixelArea(elev, unit = "km")
# Mean pixel area (km2):
# [1] 0.5892033

Image

As you can see, the mean pixel area here is quite different from 1 squared km… and Switzerland isn’t even one of the northernmost countries! So, when using unprojected rasters (e.g. WorldClim, CHELSA climate, Bio-ORACLE, MARSPEC and so many others) in your analysis, consider choosing the spatial resolution that most closely matches the resolution that you want in your study region, rather than the resolution at the Equator when this is actually far away.

You can read more details about geographic areas in the help file of the terra::expanse() function. Feedback welcome!

Getting continent, mainland and island maps in R

Maps of continents, mainlands and islands can be useful, for example, for selecting areas — and then cropping or masking variables — for modelling a species’ distribution. Here’s a way to obtain such maps using the ‘geodata’ and ‘terra’ R packages:

# load required packages:
library(terra)
library(geodata)

# import a world countries map:
countries <- world(resolution = 5, path = "maps")  # you may choose a smaller (more detailed) resolution for the polygon borders, and a different folder path to save the imported map
head(countries)

# import a table with country codes and continents:
cntry_codes <- country_codes()
head(cntry_codes)

# add this table to the countries map attributes:
head(countries)
head(cntry_codes[ , 1:4])
countries <- merge(countries, cntry_codes, by.x = "GID_0", by.y = "ISO3", all.x = TRUE)
head(countries)

# plot the countries map coloured according to "continent":
plot(countries, "continent", lwd = 0.2, main = "Countries by continent")
Image

# dissolve (aggregate) countries into a continents map:
continents <- aggregate(countries, by = "continent")
values(continents)
plot(continents, "continent", lwd = 0.2)
Image
# note that each continent (colour) is a multi-part polygon including mainland and islands - see also:
plot(continents[1, ])

# disaggregate continent polygons, to then separate islands and mainlands:
continents <- disagg(continents)

# get a map of just the continent mainlands (largest polygons):
unique(continents$continent)
largest <- (order(expanse(continents), decreasing = TRUE))[1:length(unique(continents$continent))]
mainlands <- continents[largest, ]
plot(mainlands, "continent", lwd = 0.2, main = "Continent mainlands")
Image

# get a map of just the islands (i.e. polygons except mainlands):
islands <- erase(continents, mainlands)
plot(islands, "continent", lwd = 0.2, main = "World islands")
Image
# you can then crop and mask a raster map to given islands or continents, e.g.:
elevation <- elevation_global(res = 10, path = "maps") # you may choose a smaller (more detailed) resolution or pixel size
afr_mainland <- subset(mainlands, mainlands$continent == "Africa")
elev_afr_mainland <- crop(elevation, afr_mainland, mask = TRUE)
plot(elev_afr_mainland, main = "Elevation in mainland Africa")
Image
# you can also compute a buffer around a given continent, and use it to crop marine layers to get only the near-shore waters of that continent:

afr_buff <- terra::buffer(afr_mainland, width = 200000)  # 200 km
plot(afr_buff, col = "darkblue", background = "lightblue")
plot(afr_mainland, col = "tan", add = TRUE)

# import a marine variable, e.g. bathymetry:
bathy_source <- "https://gebco2023.s3.valeria.science/gebco_2023_land_cog.tif" # from https://gist.github.com/mdsumner/aaa6f1d2c1ed107fbdd7e83f509a7cf3
bathy <- terra::rast(bathy_source, vsi = TRUE)

bathy_afr <- terra::crop(bathy, afr_buff, mask = TRUE)
plot(bathy_afr, col = hcl.colors(100, "blues"))
plot(countries, col = "tan", add = TRUE)
Image

You can use the geodata::gadm() function (instead of world) if you want to download polygons for particular countries, and then apply similar procedures to separate islands from mainlands there.

Range change based on continuous (fuzzy) values

This function, included in the fuzzySim package (version 1.6 just released), uses the fuzzyOverlay function in the previous post and then quantifies overall range change (including gain, loss, maintenance/stability and total change) based on the continuous predictions of two species distribution models (see Gutiérrez-Rodríguez et al., in press).

fuzzyRangeChange <- function(pred1, pred2, number = TRUE, prop = TRUE, na.rm = TRUE, round.digits = 2, measures = c("Gain", "Loss", "Stable_presence", "Stable_absence", "Balance"), plot = TRUE, col = colorRampPalette(c("white", "black"))(length(measures)), ...) {
 
  # version 1.4 (23 Mar 2016)
 
  stopifnot(ncol(pred1) == ncol(pred2),
            all(pred1[is.finite(pred1)] >= 0 && pred1[is.finite(pred1)] <= 1),
            all(pred2[is.finite(pred2)] >= 0 && pred2[is.finite(pred2)] <= 1)
  )
 
  if (!number & !prop) stop ("Nothing to calculate if both 'number' and 'prop' are FALSE.")
 
  values <- vector("numeric", length(measures))
  names(values) <- measures
  if ("Gain" %in% measures)  values["Gain"] <- sum(fuzzyOverlay(data.frame(pred1, pred2), op = "expansion", na.rm = na.rm), na.rm = na.rm)
  if ("Loss" %in% measures)  values["Loss"] <- sum(fuzzyOverlay(data.frame(pred1, pred2), op = "contraction", na.rm = na.rm), na.rm = na.rm)
  if ("Stable_presence" %in% measures)  values["Stable_presence"] <- sum(fuzzyOverlay(data.frame(pred1, pred2), op = "maintenance", na.rm = na.rm), na.rm = na.rm)
  if ("Stable_absence" %in% measures)  values["Stable_absence"] <- sum(fuzzyOverlay(1 - data.frame(pred1, pred2), op = "maintenance", na.rm = na.rm), na.rm = na.rm)
  if ("Balance" %in% measures)  values["Balance"] <- sum(fuzzyOverlay(data.frame(pred1, pred2), op = "change", na.rm = na.rm), na.rm = na.rm)
 
  result <- data.frame(Measure = measures, Number = values)
 
  if (prop) {
    if (na.rm) n <- length(na.omit(pred1))
    else n <- length(pred1)
    range.size <- sum(pred1, na.rm = na.rm)
    stable.abs <- result[result$Measure == "Stable_absence", "Number"]
    result$Proportion <- result[ , "Number"] / range.size
    result[result$Measure == "Stable_absence", "Proportion"] <- stable.abs / (n - range.size)
  }
 
  if (!number) {
    result <- result[ , - (ncol(result) - 1)]
  }
 
  if (plot) {
    barplot(result[ , ncol(result)], legend.text = rownames(result), col = col, ...)
    abline(h = 0)
  }
 
  result[ , "Measure"] <- NULL
  if (is.finite(round.digits))  result <- round(result, round.digits)
 
  result
}

[presented with Pretty R]

The function will produce numeric and (since FuzzySim 1.7.2) also graphical results quantifying the fuzzy overall changes:

FuzzyRangeChange

You can install and load fuzzySim (>= 1.6) and then check help(fuzzyRangeChange) for further  info and reproducible usage examples.

 

REFERENCES

Gutiérrez-Rodríguez J., Barbosa A.M. & Martínez-Solano I. (in press) Present and past climatic effects on the current distribution and genetic diversity of the Iberian spadefoot toad (Pelobates cultripes): an integrative approach. Journal of Biogeography.

Plot column(s) of a polygon vector map

NOTE: I recently had a tutorial on the cartography R package, which makes mapping columns of a data frame much less troublesome. You may want to look at that instead.

If you have a polygon vector map in R and want to quickly map the values in one or more columns of its attribute table, you can use the plotMapColumns function. There’s an option to rasterize the map before plotting, which may be considerably faster and is TRUE by default, but you’ll need to use an appropriate raster.upscale value. This is the number by which the range of coordinates should be divided to get the number of pixels for the maps to be plotted; it’s advised to first check your range(coordinates(map)) and see for yourself which raster.upscale divisor will make reasonably computable raster maps – e.g., for geographical lat-lon, an upscale factor of 1 will usually work (you’ll have at most 360 x 180 pixels; actually you may want to lower raster.upscale to 0.5 or 0.3 if you need more detailed maps); but for a UTM projection (whose coordinates can be much larger values) you may need an upscale.factor of 10000 to get a reasonably computable number of pixels.

plotMapColumns <- function(map, # SpatialPolygons object
                           columns, # index(es) of column(s) of map@data containing the values to plot (there will be one output map per column)
                           rasterize = TRUE, # number by which the difference between maximum and minimum coordinates should be divided to get the number of pixels (if rasterize = TRUE); it's advised to first calculate min and max coordinates and see for yourself which divisor will make reasonably computable raster maps (e.g., for geographical lat-lon an upscale factor of 1 may work, but for a UTM projection you may need an upscale of factor 10,000!)
                           raster.upscale = 1, 
                           ...) # additional arguments for (sp)plot function
  {

  stopifnot(raster.upscale > 0 | is.null(raster.upscale),
            require(raster) | rasterize == FALSE
            )
 
  if (!all(columns %in% 1:length(names(map)))) stop ("index out of bounds; 'columns' must exist in map@data.")
 
  if (rasterize) {
    xmin <- min(coordinates(map)[,1])
    xmax <- max(coordinates(map)[,1])
    ymin <- min(coordinates(map)[,2])
    ymax <- max(coordinates(map)[,2])
    wdth <- round(xmax - xmin)
    hght <- round(ymax - ymin)
 
    #if (raster.upscale == "auto") {
      #max.length <- max(wdth, hght)
      #if (max.length > 500) raster.upscale <-
    #}
 
    #if (!is.null(max.rast.dim)) {
    #  rast.dim <- wdth * hght
    #}
 
    wdth <- wdth / raster.upscale
    hght <- hght / raster.upscale
    message("plotting map(s) with ", wdth, "x", hght, " pixels; consider rising 'raster.upscale' if this is taking too long, or lowering it if the resulting maps are too coarse.")
    require(raster)
    rast <- raster(nrows = hght, ncols = wdth, xmn = xmin, xmx = xmax, ymn = ymin, ymx = ymax)
  }  # end if rasterize I
 
  #if (centroids) {
  #  attr.table <- map@data
  #  map <- SpatialPointsDataFrame(coordinates(map))
  #  map@data <- attr.table
  #  rast <- raster(map)
  #} else {
    require(sp)
  #}
 
  n.cols <- length(columns)
  col.count <- 0
  for (i in columns) {
    col.count <- col.count + 1
    message("Plotting column ", col.count, " of ", n.cols, "...")
    if (rasterize) {
      map.rast <- rasterize(x = map, y = rast, field = names(map)[i], fun = 'last')
      plot(map.rast, main = names(map)[i], ...)
    }  # end if rasterize II
    else {
      print(spplot(map, zcol = names(map)[i], main = names(map)[i], ...))
    }  # end else
  }  # end for i
  message("Finished!")
}

[presented with Pretty R]

Usage example:

# download, unzip and import a map of countries:
download.file("http://biogeo.ucdavis.edu/data/world/countries_shp.zip", destfile = "countries_shp.zip")
unzip("countries_shp.zip")
countries <- rgdal::readOGR(dsn = ".", layer= "countries")
 
# see the data in the attributes table:
head(countries)
names(countries)
 
# use plotMapColumns with and without rasterizing:
plotMapColumns(countries, columns = 17:18, rasterize = TRUE, raster.upscale = 1)
plotMapColumns(countries, columns = 18, rasterize = FALSE)  # slower

You can add arguments for the (sp)plot function, to get e.g. different colour schemes. The plotMapColumns function is not (yet) included in a package.

Calculate zonal statistics from rasters in multiple zip files

This is a wrapper for the zonalFromZip function published in the previous post, for when you have multiple zip files with multiple raster files each (as in the WorldClim paleo-climate database), and you want to extract zonal statistics for them all automatically. To use it, you’ll need to have the zonalFromZip function loaded, as well as the raster R package.

zonalsFromZips <- function(zip.files, zones.map, rast.file.ext = ".tif", aux.file.ext = NULL, verbosity = 1, ...) {
  # v2.0 (2018/07/11)
  results <- vector("list", length(zip.files))
  names(results) <- basename(tools::file_path_sans_ext(zip.files))
  for (f in 1:length(zip.files)) {
    message("\nUNZIPPING FILE ", f, " OF ", length(zip.files), " (", basename(zip.files[f]), ")...")
    results[[f]] <- zonalFromZip(zip.file = zip.files[f], zones.map = zones.map, rast.file.ext = rast.file.ext, aux.file.ext = aux.file.ext, verbosity = verbosity, ...)
  }; rm(f)
  message("\nFINISHED ALL!")
  return(results)
}  # end zonalsFromZips function

The result is a list of dataframes, each containing the zonal stats for one of the .zip files of rasters. Usage example:

LGM.zonals <- zonalsFromZips(zip.files = list.files("/home/joe/LGM", full.names = TRUE), zones.map = provinces)

[presented with Pretty R]

 

Calculate zonal statistics from rasters in a zip file

Imagine you have a zipped folder with a bunch of raster maps containing variables (e.g. the ones you can download from WorldClim or from CliMond), and you need to calculate zonal statistics from each of these rasters. The zonaFromZip function, provided below, automates this process without the need to unzip the folders. It extracts one raster at a time from the .zip, imports it to R, calculates zonal statistics for your zones raster map (the ‘mean’ function is used by default, but you can provide any other argument accepted by the zonal function of the R raster package), and then deletes the unzipped file before unzipping the next one, therefore requiring minimal disk space.

zonalFromZip <- function (zip.file, zones.map, fun, rast.file.ext = ".tif", aux.file.ext = NULL, delete.unzipped = TRUE, verbosity = 2, ...)
 # version 2.1 (updated 11/12/2018)
 # zip.file: path to the zip containing the raster maps to calculate zonal stats from
 # zones.map: map (in your R workspace) containing the spatial units to calculate zonal stats to; must be of class 'raster', 'SpatialPolygons' or 'SpatialPolygonsDataFrame' and have the same CRS (and resolution if raster) of the maps in 'zip.file'
 # fun: function to calculate (e.g. mean)
 # rast.file.ext: file extension of the raster maps in 'zip.file'
 # aux.file.ext: file extension for the auxiliary files (e.g. ".hdr" for .bil raster files, or ".rdc" for Idrisi .rst files)
 # ...: additional arguments to pass to the 'raster::zonal' (if 'zones.map' is a raster) or the 'raster::extract' function (if 'zones.map' is a SpatialPolygons* map), such as na.rm = TRUE
{
 require(raster)
 rast.files <- unzip(zip.file, list = TRUE) $ Name
 var.names <- unique(tools::file_path_sans_ext(rast.files))
 n.var <- length(var.names)
 zonal.stats <- vector("list", length(var.names))
 names(zonal.stats) <- var.names
 
 for (i in 1:n.var) {
 if (verbosity >= 1) message("Getting variable ", i, " of ", n.var)
 if (verbosity >= 2) message(" - unzipping file...")
 unzip(zip.file, files = paste0(var.names[i], rast.file.ext))
 if (!is.null(aux.file.ext)) unzip(zip.file, files = paste0(var.names[i], aux.file.ext))
 var.rast <- raster(paste0(var.names[i], rast.file.ext))
 if (!compareRaster(var.rast, zones.map, stopiffalse = FALSE)) {
 if (verbosity >= 2) message(" - cropping to zones raster...")
 var.rast <- crop(var.rast, zones.map)
 }
 
 if (verbosity >= 2) message(" - calculating zonal stats...")
 if (class(zones.map) %in% c("raster", "RasterLayer"))
 zonal.stats[[i]] <- raster::zonal(var.rast, zones.map, fun = fun, ...)
 else if (class(zones.map) %in% c("SpatialPolygons", "SpatialPolygonsDataFrame"))
 zonal.stats[[i]] <- raster::extract(var.rast, zones.map, fun = fun, df = TRUE, ...)
 else stop("'zones.map' must be of class 'raster', 'RasterLayer', 'SpatialPolygons' or 'SpatialPolygonsDataFrame'")
 
 if (verbosity >= 2) message(" - deleting unzipped file...")
 if (delete.unzipped) {
 unlink(list.files()[grep(pattern = paste0(var.names[i], rast.file.ext), x = list.files())])
 if (!is.null(aux.file.ext)) unlink(list.files()[grep(pattern = paste0(var.names[i], aux.file.ext), x = list.files())])
 } # end if delete
 } # end for i
 
 if (verbosity >= 1) message("Converting results to data frame...")
 zonal.stats <- as.data.frame(zonal.stats)
 zonal.stats <- subset(zonal.stats, select = c(1, seq(2, ncol(zonal.stats), 2)))
 colnames(zonal.stats)[1] <- "zone"
 colnames(zonal.stats)[-1] <- var.names
 if (verbosity >= 1) message("Finished!")
 return(zonal.stats)
}

Mind that you need the raster R package installed for this, and a raster map of the spatial units (zones) to which you want to extract the raster variables. Usage examples:

LGM.CCSM4.utm10 <- zonalFromZip(zip.file = "LGM/CCSM4.zip", zones.map = utm10, rast.file.ext = ".tif", aux.file.ext = NULL)
head(LGM.CCSM4.utm10)
 
WClim.utm10 <- zonalFromZip(zip.file = "bio_2-5m_bil.zip", zones.map = utm10, rast.file.ext = ".bil", aux.file.ext = ".hdr")
head(WClim.utm10)

Example for several .zip files within a folder at once (DEPRECATED – see next post’s zonalsFromZips function instead):

for (f in list.files("LGM")) {  
# "LGM" is the folder containing the zip files to extract zonal stats from
  name <- paste("LGM", tools::file_path_sans_ext(f), "utm10", sep = ".")
  zonstat <- zonalFromZip(zip.file = paste("LGM", f, sep = "/"), zones.map = utm10, raster.ext = ".tif", fun = "mean")
  assign(name, zonstat)
}

Convert geographical coordinates from degree-minute-second to decimal degrees

If you have latitude and/or longitude data in sexagesimal degrees, in the form degreesº minutes’ seconds” hemisfere (e.g. 41° 34′ 10.956″ N  or  8° 37′ 47.1036″ W, with or without spaces in between), the dms2dec function can convert them to decimal degrees, which are usually required for mapping. This function is not included in a package, but it’s in the “Supporting Information” of Zanolla et al. (2018), so please cite this paper if you use the function.

dms2dec <- function(dms, separators = c("º", "°", "\'", "\"")) {
  # version 1.0 (25 Sep 3013)
  # dms: a vector (or column) of latitude or longitude in degrees-minutes-seconds-hemisfere, e.g. 41° 34' 10.956" N (with or without spaces)
  # separators: the characters that are separating degrees, minutes and seconds in dms

  dms <- as.character(dms)
  dms <- gsub(pattern = " ", replacement = "", x = dms)
  for (s in separators) dms <- gsub(pattern = s, replacement = "_splitHere_", x = dms)

  splits <- strsplit(dms, split = "_splitHere_")
  n <- length(dms)
  deg <- min <- sec <- hem <- vector("character", n)

  for (i in 1:n) {
    deg[i] <- splits[[i]][1]
    min[i] <- splits[[i]][2]
    sec[i] <- splits[[i]][3]
    hem[i] <- splits[[i]][4]
  }

  dec <- as.numeric(deg) + (as.numeric(min) / 60) + (as.numeric(sec) / 3600)
  sign <- ifelse (hem %in% c("N", "E"), 1, -1)
  dec <- sign * dec
  return(dec)
}  # end dms2dec function 

[presented with Pretty R]

Usage example, from a table called mydata:

LatLonTable

mydata$latitude.decimal <- dms2dec(mydata$Latitude)

mydata$longitude.decimal <- dms2dec(mydata$Longitude)

REFERENCES:

Zanolla M., Altamirano M., Carmona R., De La Rosa J., Souza-Egipsy V., Sherwood A., Tsiamis K., Barbosa A.M., Muñoz A.R. & Andreakis N. (2018) Assessing global range expansion in a cryptic species complex: insights from the red seaweed genus Asparagopsis (Florideophyceae). Journal of Phycology, 54: 12-24

Upscale UTM 10 x 10 km cells to 20, 50 or 100 km

If you have a map with/or a data table of UTM 10 x 10 km cells with the corresponding UTM codes, and want to upscale them to 20 x 20, 50 x 50 or 100 x 100 km cells, you can use the UTM10upscale function to create cell codes that group the UTM codes accordingly. This function is not included in an R package and is used as a stand-alone.

UTM10upscale <- function(UTMcode, size) {
  # version 1.3 (14 Mai 2013)
  # UTMcode: a vector string of UTM 10x10 km codes (e.g. MC58 or 29SMC58)
  # size: the size (in square km) of the UTM cells for which to create codes; must be 20, 50 or 100
  if (missing(size)) stop ("Argument 'size' is missing, with no default.")
  if (!(size %in% c(20, 50, 100))) stop ("'size' must be a multiple of 10 and a divisor of 100 - i.e. 20, 50 or 100.")
  UTMcode <- as.character(UTMcode)
  nc <- unique(nchar(UTMcode))
  if (length(nc) != 1) stop ("All elements of UTMcode must have the same number of characters.")
  utm10letters <- substr(UTMcode, nc - 4, nc -2)
  if(size == 100)  upscaled.code <- utm10letters
  else {
    utm10x <- substr(UTMcode, nc - 1, nc - 1)  # penultimate digit of UTMcode
    utm10y <- substr(UTMcode, nc, nc)  # last digit of UTMcode
    utm10index <- 0 : 9
    if (size == 20)  upscaled.index <- rep(letters[1 : 5], each = 2)
    else if (size == 50) upscaled.index <- rep(LETTERS[1 : 2], each = 5)
    n <- length(UTMcode)
    upscaled.x <- upscaled.y <- vector("integer", n)
    for (i in 1 : n) {
      x <- which(utm10index == utm10x[i])
      y <- which(utm10index == utm10y[i])
      upscaled.x[i] <- upscaled.index[x]
      upscaled.y[i] <- upscaled.index[y]
    }  # end for i
    upscaled.code <- paste(utm10letters, upscaled.x, upscaled.y, sep = "")
    }  # end if size 100 else
  return(upscaled.code)
}  # end UTM10upscale function

[presented with Pretty R]

Note that the resulting codes for 20×20 and for 50×50 km cells (namely the last two characters) are not “official” UTM codes, just ones that I made up. I used letters instead of numbers, with lower case for 20-km and upper case for 50-km cells, so that these codes cannot be confused with the UTM 10×10 km codes.

Then you just have to add a column with the new codes to your map’s table of attributes, and use e.g. a spatial R package or a GIS to dissolve the UTM 10 x 10 km cells using the values in this column. You can also use R’s aggregate function to summarize your data into the upscaled cells.

mydata$utm20 <- utm10upscale(mydata$utm10, size = 20)
mydata$utm50 <- utm10upscale(mydata$utm10, size = 50)