Study design:
I have count data of snails per date, counted over many dates at sites, nested in localities. So, in each locality the snail counts come from several different sites, repeatedly sampled on different dates.
Goal:
Test if snail counts differ between localities, and test influence of environmental factors (e.g. water pH)
Things to account for:
A: All in all, I have about 33% of the dates having counts of zero,
which makes me think the data is zero inflated. See histogram:
B: Sites in localities might show variation in intercepts due to higher initial snail abundance
C: Sampling duration differed (5-33 minutes), which will most likely influence counts
D: The number of sites per locality is unbalanced (one locality with only 1 site, overall range per locality: 1-9)
E: The total number of sampling dates per locality is unbalanced (overall range per locality: 19-35)
Steps so far:
A: Use glmmTMB to account for the zero inflation
B: Include site as a random intercept, to account for variation in counts between the sites
C: Include sampling duration as an offset, to account for differences in sampling effort
The current model:
model <- glmmTMB(snail_count ~ (1|site) + locality + pH + locality*pH + offset(log(duration)),
data = df,
ziformula = ~1,
family = poisson)
Questions:
- Have I specified the model correctly?
- How do I account for D: and E: (the unbalanced design)?