New features in fuzzySim functions

Posted on May 23, 2018 by A.M. Barbosa

Just two quick notes on recent improvements to functions in the fuzzySim package:

– Thanks to a bug report by José Carlos Guerrero, the multTSA function now also works when there are not multiple but only a single species to analyse;

– Thanks to a feature request and a bit of assistance by Alba Estrada, the getPreds function now also works if the data argument is a RasterStack rather than a data frame.

Check out the new versions of these functions in the latest version of fuzzySim!

New gDefrag R package

Posted on May 17, 2018 by A.M. Barbosa

After a fruitful collaboration with Frederico Mestre (lead developer) and Fernando Ascensão (idea conceiver), a new R package called gDefrag (Graph-Based Landscape De-Fragmentation) was just released on GitHub! Scientific articles presenting and utilizing the package are currently in review.

Find pairs of highly correlated variables

Posted on February 22, 2018 by A.M. Barbosa

Correlations among variables are problematic in multivariate models, and good practice commonly recommends eliminating a priori one from each pair of highly correlated variables. The corPairs function below takes a matrix of correlations among variables, plots a cluster dendrogram based on them and returns the pairs of variables with absolute correlations above a given threshold (by default, 0.8):

corPairs <- function(cor.mat, cor.thresh = 0.8, ...) {
  # A.M. Barbosa, version 1.0 (22 Feb 2018)
  dist_matrix <- as.dist(1 - abs(cor.mat))
  plot(hclust(dist_matrix, ...), ylab = "1 - absolute correlation", xlab = "", sub = "")
  abline(h = 1 - cor.thresh, col = "red")
  cor.mat[upper.tri(cor.mat, diag = TRUE)] <- NA
  high.cor.mat <- numeric(0)
  if (max(abs(cor.mat), na.rm = TRUE) >= cor.thresh) {
    high.cor.rowcol <- as.matrix(which(abs(cor.mat) >= cor.thresh, arr.ind = TRUE))
    high.cor.inds <- sort(unique(as.vector(high.cor.rowcol)))
    high.cor.mat <- data.frame(var1 = rownames(cor.mat)[high.cor.rowcol[ , "row"]], var2 = colnames(cor.mat)[high.cor.rowcol[ , "col"]])
    high.cor.mat$corr <- NULL
    for (r in 1:nrow(high.cor.mat)) 
      high.cor.mat$corr[r] <- cor.mat[high.cor.rowcol[ ,"row"][r], high.cor.rowcol[ ,"col"][r]]
    high.cor.mat <- high.cor.mat[order(abs(high.cor.mat$corr), decreasing = TRUE), ]
    return (high.cor.mat)
  } # end if max
  else return("No correlations exceed the threshold.")
} # end corPairs function

This function is closely related to (and will soon be included in) the corSelect function in the fuzzySim package (Barbosa, 2015), but operates on correlation matrices rather than the raw variables, and adds a cluster dendrogram.

REFERENCES

Barbosa A.M. (2015) fuzzySim: applying fuzzy logic to binary similarity indices in ecology. Methods in Ecology and Evolution, 6: 853–858

Potential biodiversity and relative favourability

Posted on December 18, 2017 by A.M. Barbosa

The two functions below allow calculating potential biodiversity and relative favourability, respectively, according to the procedures described by Real et al. (2017) — which should be cited if you use any of them — and also used by Estrada et al. (2018):

potentialBiodiv <- function(models = NULL, # a list of binomial glm model objects
data = NULL, # alternatively, data frame with observations and predictions
sp.cols = NULL, # index numbers of the columns with presence-absence data
prob.cols = NULL, # index numbers of the columns with probabilities
fav.cols = NULL, # alternatively, index numbers of columns with favourabilities
id.col = NULL, # optionally, index number of a column with row identifiers
zero.action = "increase" # what to do with zero favourability values when calculating the geometric mean; alternatives are "exclude" or "maintain"
) {
if (!(zero.action %in% c("increase", "exclude", "maintain"))) stop ("Invalid 'zero.action'.")
if (!is.null(models)) {
if (!all("glm" %in% sapply(models, class))) stop("All 'models' must be of class 'glm'.")
if (!all("binomial" %in% sapply(models, family))) stop("All 'models' must be of family 'binomial'.")
obs.data <- as.data.frame(lapply(models, `[[`, "y"))
prob.data <- as.data.frame(lapply(models, `[[`, "fitted.values"))
} else {
if (is.null(data) || is.null(sp.cols) || (is.null(prob.cols) && is.null(fav.cols))) stop ("You need to provide either 'models', or 'data' + 'sp.cols' + either 'prob.cols' or 'fav.cols'.")
obs.data <- data[ , sp.cols, drop = FALSE]
if (!is.null(prob.cols)) prob.data <- data[ , prob.cols, drop = FALSE]
} # end if !null models else
Nobs <- nrow(obs.data)
Nsp <- ncol(obs.data)
if (is.null(fav.cols)) {
favs <- matrix(NA, ncol = Nsp, nrow = Nobs)
for (s in 1:Nsp) {
#favs[ , s] <- fuzzySim::Fav(obs = obs.data[ , s], pred = prob.data[ , s])
n1 <- sum(obs.data[ , s] == 1, na.rm = TRUE)
n0 <- sum(obs.data[ , s] == 0, na.rm = TRUE)
prob <- prob.data[ , s]
favs[ , s] <- (prob/(1 - prob))/((n1/n0) + (prob/(1 - prob)))
}; rm(s)
} else { # if !null fav.cols
favs <- data[ , fav.cols, drop = FALSE]
} # end if null fav else
if (zero.action == "increase") favs[favs == 0] <- 2.2e-16
#.Machine$double.xmin # smallest computable positive number
else if (zero.action == "exclude") favs[favs == 0] <- NA
F_lns <- log(favs, base = exp(1))
F_ln_sums <- rowSums(F_lns) # eliminated 'na.rm = na.rm'
result <- data.frame(SpRich = rowSums(obs.data),
Fmin = apply(favs, 1, min),
Fmax = apply(favs, 1, max),
Fsum = rowSums(favs),
Fmean = rowMeans(favs),
Fgmean = exp((1 / Nsp) * F_ln_sums))
if (!is.null(data) && !is.null(id.col))
result <- data.frame(data[ , id.col, drop = FALSE], result)
result
}

relativeFav <- function(target, # matrix or data frame of favourablities in target period
ref) { # idem for reference period, with columns in the same order
stopifnot(ncol(target) == ncol(ref))
N <- ncol(target)
fav_ratios <- target / ref
logs <- log(fav_ratios, base = exp(1))
log_sums <- rowSums(logs)
exp((1 / N) * log_sums)
}

REFERENCES

Estrada A., Barbosa A.M., & Real R. (2018) Changes in potential mammal diversity in National Parks and their implications for conservation. Current Zoology, https://doi.org/10.1093/cz/zoy001

Real R., Barbosa A.M., & Bull J.W. (2017) Species Distributions, Quantum Theory, and the Enhancement of Biodiversity Measures. Systematic Biology, 66: 453-462

Trend surface analysis for multiple species

Posted on June 23, 2017 by A.M. Barbosa

Trend surface analysis is a way to model the spatial structure in species distributions by regressing occurrence data on the spatial coordinates x and y, for a linear trend, and/or on polynomial terms of these coordinates (including the sums of powers and cross-products of the coordinates) for curvilinear trends (Legendre & Legendre, 1998; Borcard et al., 2011). Second- and third-degree polynomials are often used. The multTSA function below, which is included in the fuzzySim package, allows specifying the degree of the spatial polynomial to use. By default, it uses a 3rd-degree polynomial, performs backward stepwise AIC selection of the polynomial terms to include, and returns the presence probability value resulting from the spatial trend, although these options can be changed — for example, if you want the result to be a new spatial variable in the same scale of the input coordinates, you can set P = FALSE. This function can be used for one or multiple species at a time, in a similar way as the multGLM function in the same package.

multTSA <- function(data, sp.cols, coord.cols, id.col = NULL, degree = 3, step = TRUE, criterion = "AIC", type = "P", Favourability = FALSE, suffix = "_TS", save.models = FALSE, ...) {

# version 2.4 (13 Nov 2018)

start.time <- Sys.time()
on.exit(timer(start.time))

stopifnot (
na.omit(as.matrix(data[ , sp.cols])) %in% c(0,1),
length(sp.cols) > 0,
length(sp.cols) <= ncol(data) - length(coord.cols) - length(id.col),
sp.cols %in% 1:ncol(data) | sp.cols %in% colnames(data),
is.null(coord.cols) | length(coord.cols) == 2,
is.null(coord.cols) | coord.cols %in% 1:ncol(data) | coord.cols %in% colnames(data),
is.numeric(as.matrix(data[ , coord.cols])),
is.null(id.col) | id.col %in% 1:ncol(data) | id.col %in% colnames(data),
degree %% 1 == 0,
is.logical(step),
type %in% c("Y", "P", "F"),
criterion %in% c("AIC", "significance"), # new
is.logical(Favourability),
is.logical(save.models)
)

if (Favourability == TRUE) {
warning("Argument 'Favourability' is deprecated; internally converted to type = 'F'.")
type <- "F"
}

coords.poly <- as.data.frame(poly(as.matrix(data[ , coord.cols]),
degree = degree, raw = TRUE, simple = TRUE))
n.poly.terms <- ncol(coords.poly)

coord.names <- ifelse(is.character(coord.cols), coord.cols, colnames(data)[coord.cols])
colnames(coords.poly) <- gsub(pattern = "\\.", replacement = "_", x = colnames(coords.poly))
colnames(coords.poly) <- paste0(coord.names[1], colnames(coords.poly))
colnames(coords.poly) <- gsub(pattern = "_", replacement = paste0("_", coord.names[2]), x = colnames(coords.poly))

sp.data <- as.matrix(data[ , sp.cols])
colnames(sp.data) <- colnames(data[ , sp.cols, drop = FALSE])
n.subjects <- length(sp.cols)
if (save.models) TSA.models <- vector("list", n.subjects)
subj.count <- 0

data.input <- data
data <- cbind(data, coords.poly)

for (s in 1:n.subjects) {
subj.count <- subj.count + 1
subj.name <- colnames(sp.data)[s]
message("Computing TSA ", subj.count, " of ", n.subjects, " (", subj.name, ") ...")
model.formula <- as.formula(paste(subj.name, "~", paste(colnames(coords.poly), collapse = "+")))
model.expr <- expression(with(data, glm(model.formula, family = binomial, data = data)))
if (step) {
if (criterion == "AIC") model <- step(eval(model.expr), trace = 0)
else if (criterion == "significance") model <- modelTrim(eval(model.expr), ...)
}
else model <- eval(model.expr)

if (type == "Y") tp = "link"
else if (type == "P" | type == "F") tp = "response"

pred <- predict(model, coords.poly, type = tp)

if (type == "F") {
#n1 <- sum(sp.data[ , s] == 1)
#n0 <- sum(sp.data[ , s] == 0)
#pred <- (pred / (1 - pred)) / ((n1 / n0) + (pred / (1 - pred)))
pred <- Fav(obs = sp.data[ , s], pred = pred)
}
data[ , ncol(data) + 1] <- pred
colnames(data)[ncol(data)] <- paste0(subj.name, suffix)
if (save.models) {
TSA.models[[subj.count]] <- model
names(TSA.models)[[subj.count]] <- subj.name
}
}

predictions <- data.frame(data[ , id.col], data[ , ((ncol(data.input) + 1 + n.poly.terms) : ncol(data)), drop = FALSE])

if (!is.null(id.col)) {
if (is.character(id.col)) colnames(predictions)[1] <- id.col
else colnames(predictions)[1] <- colnames(data)[id.col]
}

message("Finished!")

if (save.models) return(list(predictions = data.frame(predictions), TSA.models = TSA.models))
else return (predictions)

}

References

Borcard D., Gillet F. & Legendre P. (2011) Numerical Ecology with R. Springer, New York.

Legendre P. & Legendre L. (1998) Numerical Ecology. Elsevier, Amsterdam.

Recent updates to previous functions

Posted on April 27, 2016 by A.M. Barbosa

Just a quick note to let you know of changes introduced to some functions in recent versions of their packages:

multGLM, FDR and corSelect (package fuzzySim) now also include BIC (Bayesian Information Criterion, also known as Schwarz criterion, SBC or SBIC) for variable selection, making it more parsimonious especially for large sample sizes
multGLM (package fuzzySim) now allows choosing which FDR correction to apply (if FDR is set to TRUE). It uses “fdr” (=”BH”) by default, but check p.adjust for other available methods, namely “BY” which allows for non-independent data
varPart (package modEvA) now does not ask for model.type, includes in the output diagram the proportion of unexplained variance, and allows defining a title (main) for the diagram
several modEvA functions now automatically omit NAs, and emit a warning instead of an error if there are pred values outside the [0,1] interval (which some models include in the output)

The help files of these functions have also been updated with further details. It is advisable to unsinstall fuzzySim and modEvA and install their latest versions.

Range change based on continuous (fuzzy) values

Posted on November 6, 2015 by A.M. Barbosa

This function, included in the fuzzySim package (version 1.6 just released), uses the fuzzyOverlay function in the previous post and then quantifies overall range change (including gain, loss, maintenance/stability and total change) based on the continuous predictions of two species distribution models (see Gutiérrez-Rodríguez et al., in press).

fuzzyRangeChange <- function(pred1, pred2, number = TRUE, prop = TRUE, na.rm = TRUE, round.digits = 2, measures = c("Gain", "Loss", "Stable_presence", "Stable_absence", "Balance"), plot = TRUE, col = colorRampPalette(c("white", "black"))(length(measures)), ...) {
 
  # version 1.4 (23 Mar 2016)
 
  stopifnot(ncol(pred1) == ncol(pred2),
            all(pred1[is.finite(pred1)] >= 0 && pred1[is.finite(pred1)] <= 1),
            all(pred2[is.finite(pred2)] >= 0 && pred2[is.finite(pred2)] <= 1)
  )
 
  if (!number & !prop) stop ("Nothing to calculate if both 'number' and 'prop' are FALSE.")
 
  values <- vector("numeric", length(measures))
  names(values) <- measures
  if ("Gain" %in% measures)  values["Gain"] <- sum(fuzzyOverlay(data.frame(pred1, pred2), op = "expansion", na.rm = na.rm), na.rm = na.rm)
  if ("Loss" %in% measures)  values["Loss"] <- sum(fuzzyOverlay(data.frame(pred1, pred2), op = "contraction", na.rm = na.rm), na.rm = na.rm)
  if ("Stable_presence" %in% measures)  values["Stable_presence"] <- sum(fuzzyOverlay(data.frame(pred1, pred2), op = "maintenance", na.rm = na.rm), na.rm = na.rm)
  if ("Stable_absence" %in% measures)  values["Stable_absence"] <- sum(fuzzyOverlay(1 - data.frame(pred1, pred2), op = "maintenance", na.rm = na.rm), na.rm = na.rm)
  if ("Balance" %in% measures)  values["Balance"] <- sum(fuzzyOverlay(data.frame(pred1, pred2), op = "change", na.rm = na.rm), na.rm = na.rm)
 
  result <- data.frame(Measure = measures, Number = values)
 
  if (prop) {
    if (na.rm) n <- length(na.omit(pred1))
    else n <- length(pred1)
    range.size <- sum(pred1, na.rm = na.rm)
    stable.abs <- result[result$Measure == "Stable_absence", "Number"]
    result$Proportion <- result[ , "Number"] / range.size
    result[result$Measure == "Stable_absence", "Proportion"] <- stable.abs / (n - range.size)
  }
 
  if (!number) {
    result <- result[ , - (ncol(result) - 1)]
  }
 
  if (plot) {
    barplot(result[ , ncol(result)], legend.text = rownames(result), col = col, ...)
    abline(h = 0)
  }
 
  result[ , "Measure"] <- NULL
  if (is.finite(round.digits))  result <- round(result, round.digits)
 
  result
}

[presented with Pretty R]

The function will produce numeric and (since FuzzySim 1.7.2) also graphical results quantifying the fuzzy overall changes:

FuzzyRangeChange

You can install and load fuzzySim (>= 1.6) and then check help(fuzzyRangeChange) for further info and reproducible usage examples.

REFERENCES

Gutiérrez-Rodríguez J., Barbosa A.M. & Martínez-Solano I. (in press) Present and past climatic effects on the current distribution and genetic diversity of the Iberian spadefoot toad (Pelobates cultripes): an integrative approach. Journal of Biogeography.

Model overlay operations based on fuzzy logic

Posted on November 6, 2015 by A.M. Barbosa

Logical and set operations are useful for comparative distribution modelling, to assess consensus or mismatch between the predictions of different models, and to quantify differences between models obtained for different time periods. Fuzzy set theory (Zadeh 1965, Barbosa & Real 2012) allows performing such operations without converting the predictions from continuous to binary, with the inherent application of arbitrary thresholds and over-simplification of model predictions. The result is a continuous numerical value quantifying the intersection, union, sum, or other operation among model predictions, whether binary or continuous.

The fuzzyOverlay function, used e.g. by Gutiérrez-Rodríguez et al. (in press) and by Reino et al. (in press) and included in the fuzzySim package (>=1.6), requires a data frame (or a matrix) containing the model prediction columns to compare, an indication of which columns are to be compared (overlay.cols; by default they are all included), and an op indicating the operation to perform between them. Can be ‘consensus‘ for the arithmetic mean of predictions (or the fuzzy equivalent of the proportion of models that agree that the species occurs at each site), ‘fuzzy_and‘ or ‘intersection‘ for fuzzy intersection; ‘fuzzy_or‘ or ‘union‘ for fuzzy union; ‘prob_and‘ or ‘prob_or‘ for probabilistic and/or, respectively (see Details); ‘maintenance‘ for the values where all predictions for the same row (rounded to the number of digits specified in the next argument) are the same. If data has only two columns to compare, you can also calculate ‘xor‘ for exclusive or, ‘AnotB‘ for the the occurrence of the species in column 1 in detriment of that in column 2,’expansion‘ for the prediction increase in rows where column 2 has higher values than column 1, ‘contraction‘ for the prediction decrease in rows where column 2 has lower values than column 1, or ‘change‘ for a mix of the latter two, with positive values where there has been an increase and negative values where there was decrease in favourability from columns 1 to 2.

fuzzyOverlay <- function(data, 
                         overlay.cols = 1:ncol(data),
                         op = "intersection",
                         na.rm = FALSE,
                         round.digits = 2
                         ) {
 
  # version 1.2 (13 Nov 2015)
 
  data <- data[ , overlay.cols]
  stopifnot(all(data[!is.na(data), ] >= 0 && data[!is.na(data), ] <= 1))
 
  if (op == "consensus") rowSums(data, na.rm = na.rm) / ncol(data)
  else if (op %in% c("fuzzy_and", "intersection")) apply(data, MARGIN = 1, FUN = min, na.rm = na.rm)
  else if (op == "prob_and") apply(data, MARGIN = 1, FUN = prod, na.rm = na.rm)
  else if (op %in% c("fuzzy_or", "union")) apply(data, MARGIN = 1, FUN = max, na.rm = na.rm)
  else if (op == "prob_or") apply(data, MARGIN = 1, FUN = sum, na.rm = na.rm) - apply(data, MARGIN = 1, FUN = prod, na.rm = na.rm)
  else if (op == "maintenance") ifelse(round(data[ , 2], digits = round.digits) == round(data[ , 1], digits = round.digits), round(data[ , 1], digits = round.digits), 0)
 
  else if (op %in% c("xor", "AnotB", "expansion", "contraction", "change")) {
    if (ncol(data) != 2) stop ("This 'op' works only for 'data' with 2 columns.")
    if (op == "xor") pmax(pmin(data[,1], 1 - data[,2], na.rm = na.rm), pmin(1 - data[,1], data[,2], na.rm = na.rm), na.rm = na.rm)  # http://www.dmitry-kazakov.de/ada/fuzzy.htm#Fuzzy
    else if (op == "AnotB") pmin(data[,1], 1 - data[,2], na.rm = na.rm)
    else if (op == "expansion") ifelse(data[ , 2] > data[ , 1], 
                                       data[ , 2] - data[ , 1], 
                                       0)
    else if (op == "contraction") ifelse(data[ , 2] < data[ , 1], 
                                         data[ , 2] - data[ , 1], 
                                         0)
    else if (op == "change") data[ , 2] - data[ , 1]
  }
  else stop ("Invalid 'op' name.")
}

[presented with Pretty R]

You can install and load fuzzySim (>= 1.6) and then check help(fuzzyOverlay) for further info and reproducible usage examples.

REFERENCES

Barbosa A.M. & Real R. (2012) Applying fuzzy logic to comparative distribution modelling: a case study with two sympatric amphibians. The Scientific World Journal, 2012, Article ID 428206

Reino L., Ferreira M., Martínez-Solano I., Segurado P., Xu C. & Barbosa A.M. (in press) Favourable areas for co-occurrence of parapatric species: niche conservatism and niche divergence in Iberian tree frogs and midwife toads. Journal of Biogeography

Zadeh, L.A. (1965) Fuzzy sets. Information and Control, 8: 338-353

Assess model overlap with Schoener’s D, Hellinger distance and Warren’s I

Posted on October 30, 2015 by A.M. Barbosa

Metrics for quantifying the similarity among ecological niche models are important for testing patterns of niche evolution. The modOverlap function, used e.g. by Reino et al. (in press) and now included in the fuzzySim package (>=1.5), calculates three such metrics: Shoener’s D statistic for niche overlap, Hellinger distance between probability distributions, and the I similarity statistic of Warren et al. (2008).

These formulas are also implemented e.g. within the niche.overlap function of R package phyloclim, but there they require input data in complex and software-specific formats. The function below calculates these indices from simply two vectors or columns containing the predictions of the two models to compare.

modOverlap <- function (pred1, pred2, na.rm = TRUE) 
{
    p1 <- pred1/sum(pred1, na.rm = na.rm)
    p2 <- pred2/sum(pred2, na.rm = na.rm)
    SchoenerD <- 1 - 0.5 * sum(abs(p1 - p2), na.rm = na.rm)
    HellingerDist <- sqrt(sum((sqrt(p1) - sqrt(p2))^2, na.rm = na.rm))
    WarrenI <- 1 - ((HellingerDist^2)/2)
    list(SchoenerD = SchoenerD, WarrenI = WarrenI, HellingerDist = HellingerDist)
}

[presented with Pretty R]

See also the fuzSim function in the fuzzySim package, which calculates fuzzy versions of classic binary similarity indices such as Jaccard, Baroni-Urbani & Buser, Sorensen and Simpson (Barbosa 2015).

REFERENCES

Barbosa A.M. (2015) fuzzySim: applying fuzzy logic to binary similarity indices in ecology. Methods in Ecology and Evolution, 6: 853-858

Warren D.L., Glor R.E. & Turelli, M. (2008) Environmental niche equivalency versus conservatism: quantitative approaches to niche evolution. Evolution, 62: 2868-83 [plus ERRATUM]

Select among correlated variables based on a given criterion

Posted on October 30, 2015 by A.M. Barbosa

Correlations among variables are problematic in multivariate models, as they inflate the variance of coefficients and thus may bias the interpretation of the effects of those variables on the response. One of the strategies to circumvent this problem is to eliminate a priori one from each pair of correlated variables, but it is not always straightforward to pick the right variable among these. The corSelect function selects among correlated variables based either on their (stepwise) variance inflation factor (VIF); or on their relationship with the response variable, by building a bivariate model of each individual variable against the response and choosing, among each of two correlated variables, the one with the best p-value, AIC or BIC.

This function is now included in fuzzySim and depends on other functions in the package, such as FDR and multicol. It is also included as a new option within function multGLM in this package, so that multiple models can be built based on uncorrelated variables chosen appropriately for each species.

corSelect <- function(data, sp.cols = NULL, var.cols, cor.thresh = 0.8, select = "p.value", ...) {

# version 1.6 (6 Jul 2017)

if (length(sp.cols) > 1) stop ("Sorry, 'corSelect' is currently implemented for only one 'sp.col' at a time.")

univar.criteria <- c("VIF")
 bivar.criteria <- c("p.value", "AIC", "BIC")

if (!(select %in% c(univar.criteria, bivar.criteria))) stop ("Invalid 'select' criterion.")

if (!is.null(sp.cols) & select %in% bivar.criteria) {
 n.in <- nrow(data)
 data <- data[is.finite(data[ , sp.cols]), ]
 n.out <- nrow(data)
 if (n.out < n.in) warning (n.in - n.out, " observations removed due to missing data in 'sp.cols'; ", n.out, " observations actually evaluated.")
 }

cor.mat <- cor(data[ , var.cols], ...)
 cor.mat[upper.tri(cor.mat, diag = TRUE)] <- NA
 high.cor.mat <- bivar.mat <- numeric(0)

if (max(abs(cor.mat), na.rm = TRUE) >= cor.thresh) {
 high.cor.rowcol <- as.matrix(which(abs(cor.mat) >= cor.thresh, arr.ind = TRUE))
 high.cor.inds <- sort(unique(as.vector(high.cor.rowcol)))
 high.cor.mat <- data.frame(var1 = rownames(cor.mat)[high.cor.rowcol[ , "row"]], var2 = colnames(cor.mat)[high.cor.rowcol[ , "col"]])
 high.cor.mat$corr <- NULL
 for (r in 1:nrow(high.cor.mat)) high.cor.mat$corr[r] <- cor.mat[high.cor.rowcol[ ,"row"][r], high.cor.rowcol[ ,"col"][r]]
 high.cor.mat <- high.cor.mat[order(abs(high.cor.mat$corr), decreasing = TRUE), ]

if (is.null(sp.cols) & select %in% bivar.criteria) {
 message("Bivariate relationships not assessable without a response variable ('sp.cols'). Returning high pairwise correlations among 'var.cols'.")
 return (high.cor.mat)
 }

high.cor.vars <- unique(rownames(cor.mat[high.cor.inds, high.cor.inds]))

if (select %in% bivar.criteria) {
 bivar.mat <- FDR(data = data, sp.cols = sp.cols, var.cols = match(high.cor.vars, colnames(data)), simplif = TRUE)[ , c("p.value", "AIC", "BIC")]
 if (all.equal(order(bivar.mat[ , c("p.value")]), order(bivar.mat[ , c("AIC")]), order(bivar.mat[ , c("BIC")]))) message("Results identical using whether p-value, AIC or BIC to select variables.\n")
 else message("Results NOT identical using whether p-value, AIC or BIC to select variables.\n")
 } # end if select in bivar

data.remaining <- data[ , var.cols]
 while (max(abs(cor.mat), na.rm = TRUE) >= cor.thresh) {
 max.cor.ind <- which(abs(cor.mat) == max(abs(cor.mat), na.rm = TRUE), arr.ind = TRUE)
 v1 <- rownames(cor.mat)[max.cor.ind[1]]
 v2 <- colnames(cor.mat)[max.cor.ind[2]]
 if (select %in% univar.criteria) {
 vif <- multicol(data.remaining)
 targets <- vif[c(v1, v2), "VIF", drop = FALSE]
 } # end if univar
 else if (select %in% bivar.criteria) {
 targets <- bivar.mat[c(v1, v2), select, drop = FALSE]
 } # end if bivar

exclude <- which.max(as.matrix(targets))
 excl.name <- rownames(targets)[exclude]
 excl.ind <- match(excl.name, colnames(cor.mat))
 data.remaining <- data.remaining[ , -excl.ind, drop = FALSE]
 cor.mat <- cor.mat[-excl.ind, -excl.ind, drop = FALSE]
 if (all(is.na(cor.mat))) cor.mat <- numeric(0)
 if (length(cor.mat) == 0) break
 } # end while max

} # end if max > thresh

selected.vars <- colnames(cor.mat)
 selected.var.cols <- match(colnames(cor.mat), colnames(data))
 excluded.vars <- colnames(data)[var.cols][!(colnames(data)[var.cols] %in% colnames(cor.mat))]

vif2 <- multicol(data[ , selected.var.cols])

list(high.correlations = high.cor.mat,
 bivariate.significance = bivar.mat,
 excluded.vars = excluded.vars,
 selected.vars = selected.vars,
 selected.var.cols = selected.var.cols,
 strongest.remaining.corr = cor.mat[which.max(abs(cor.mat))],
 remaining.multicollinearity = vif2
 )
} # end corSelect function

You can install and load fuzzySim (>=version 1.5) and then check help(corSelect) for reproducible usage examples.

modTools

simple tools for simpler modelling

New features in fuzzySim functions

New gDefrag R package

Find pairs of highly correlated variables

Potential biodiversity and relative favourability

Trend surface analysis for multiple species

Recent updates to previous functions

Range change based on continuous (fuzzy) values

Model overlay operations based on fuzzy logic

Assess model overlap with Schoener’s D, Hellinger distance and Warren’s I

Select among correlated variables based on a given criterion