Presence probability, typically obtained with presence-(pseudo)absence modelling methods like GLM, GAM, GBM or Random Forest, is conditional not only on the suitability of the environmental conditions, but also on the general prevalence (proportion of presences) of the species in the study area. So, a species with few presences will generally have low presence probabilities, even in suitable conditions, simply because its presence is indeed rare.
As species distribution modellers often want to remove the efect of unbalanced prevalence from model predictions, a common procedure is to only pick (pseudo)absences in the same number as the presences, for a modelled prevalence of 50%, though this may imply significant loss of data. Alternatively, most modelling functions (e.g. glm() of base R, but also functions that implement GAM, GBM, Random Forests, etc. in a variety of R packages) allow attributing different weights to presences and absences, although they typically do not do this by default. However, some modelling packages that use these functions, like ENMTools and biomod2, use their own defaults and silently apply different weights to presences and absences, to balance their contributions and thus produce prevalence-corrected suitability predictions. Beware: as per these packages’ help files (which users should always read!), ENMTools always does this by default (see e.g. here), while biomod2 does it by default when the pseudoabsences or background points are automatically generated by the package, but not when they are provided by the user (see e.g. here).
A less compromising alternative may be the favourability function (Real et al., 2006; Acevedo & Real, 2012), which removes the effect of species prevalence from predictions of actual presence probability, without the need to restrict the number of (pseudo)absences to the same as the number of presences (i.e. without losing data), and without the need to alter the actual contributions of the data by attributing them different weights. Below is a simple and reproducible comparison in R between favourability, raw probability, and probability based on a model with down-weighted absences so that they balance the number of presences. I’ve used GLM, but this applies to other presence-(pseudo)absence models as well.
library(fuzzySim)
data("rotif.env")
names(rotif.env)
spp_cols <- names(rotif.env)[18:47]
var_cols <- names(rotif.env)[5:17]
nrow(rotif.env) # 291 sites
sort(sapply(rotif.env[ , spp_cols], sum)) # from 99 to 172 presences
sort(sapply(rotif.env[ , spp_cols], fuzzySim::prevalence)) # from 34 to 59%
species <- spp_cols[8] # 8 for example; try with others too
species
npres <- sum(rotif.env[ , species])
nabs <- nrow(rotif.env) - npres
npres
nabs
prevalence(rotif.env[ , species])
# set weights as in weights="equal" of ENMTools::enmtools.glm():
weights <- rep(NA, nrow(rotif.env))
weights[rotif.env[ , species] == 1] <- 1
weights[rotif.env[ , species] == 0] <- npres / nabs
weights
sum(weights[rotif.env[ , species] == 1])
sum(weights[rotif.env[ , species] == 0]) # same
formula <- reformulate(termlabels = var_cols, response = species)
formula
mod <- glm(formula, data = rotif.env, family = binomial)
modw <- glm(formula, data = rotif.env, family = binomial, weights = weights)
pred <- predict(mod, rotif.env, type = "response")
predw <- predict(modw, rotif.env, type = "response")
fav <- Fav(mod) # note favourability is only applicable to unweighted predictons
par(mfrow = c(1, 3))
plot(pred, predw, pch = 20, cex = 0.2) # curve
plot(pred, fav, pch = 20, cex = 0.2) # cleaner curve
plot(fav, predw, pch = 20, cex = 0.2) # ~linear but with noise

par(mfrow = c(1, 1))
plot(pred[order(pred)], pch = 20, cex = 0.5, col = "grey30", ylab = "Prediction")
points(predw[order(pred)], pch = 20, cex = 0.5, col = "blue") # higher, as expected after down-weighting the unbalancedly numerous absences
points(fav[order(pred)], pch = 20, cex = 0.5, col = "salmon") # higher like predw, but with less noise (more like the original pred)
legend("topleft", legend = c("probability", "weighted probability", "favourability"), pch = 20, col = c("grey30", "blue", "salmon"), bty = "n")

As you can see, favourability and weighted probability (which serve the same purpose of removing the effect of unbalanced sample prevalence on model predictions) are highly similar. However, favourability does not alter the original data in any way (i.e., it lets the model weigh presences and absences proportionally to the numbers in which they actually occur in the data); and it provides less noisy results that are more aligned with the original (unweighted, non-manipulated) probability.
I’ve checked this for some species already, but further tests and feedback are welcome!


You can change the number of digits in the plotted values with the argument plot.digits. You can also change the character sizes of the plot’s main title, circle names, and values, using arguments cex.main, cex.names and cex.values. Note that these results derive from arithmetic operations between your input values, and they always sum up to 1; if your input is incorrect, the results will be incorrect as well.