HiCPotts

A Bayesian framework for detection of enriched Hi‑C interactions and experimental biases in Hi-C data

1  Why HiCPotts?

Hi‑C interaction counts are

sparse (most bin pairs are zero),
over‑dispersed and often zero‑inflated, and
spatially correlated along the genome.

HiCPotts deals with these challenges by combining

Layer	Details
Spatial layer	Potts model (Wu 1982) with interaction parameter (\gamma) to capture neighbourhood dependence.
Count layer	Mixture of three components (“noise”, “signal”, “false‑positive”) modelled with Poisson
Bias regression	Genomic distance, GC content, TE density and chromatin accessibility are covariates in a log‑linear model for the mean.
Inference	Metropolis‑within‑Gibbs MCMC updates betas, (\gamma), zero‑inflation (\theta) and dispersion size (if NB/ZINB).

2  Installation

## released version (once on Bioconductor)
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("HiCPotts")

Requirements: R ≥ 4.2, C++17 compiler, plus
Rcpp, RcppArmadillo, parallel.

library(HiCPotts)

## 1  Load long‑format interaction table
df <- read.csv("hic_interactions.csv")  # start, end, interactions, GC, ACC, TES

## 2  Convert to N×N matrices
prep   <- process_data(df, N = 40, standardization_y = TRUE)
x_vars <- prep$x_vars
y_list <- prep$y                      # list of 40×40 matrices

## 3  Run the three‑component MCMC
res <- run_chain_betas(
  y_sim_list   = y_list,
  N            = 40,
  iterations   = 5000,
  x_vars       = x_vars,
  thetap       = 0.5,        # initial θ
  size_initial = c(1, 1, 1), # initial NB size for the 3 comps
  dist         = "ZINB",
  mc_cores     = 4
)

graph TD
  A[get_data] --> B[process_data]
  B --> C[run_chain_betas]
  C --> D[Posterior summaries + probability maps]

Key exported functions

Function	What it does
`process_data()`	Converts long‑format counts + covariates to lists of $N \times N$ matrices (counts optionally scaled 1–500).
`run_chain_betas()`	Parallel front‑end to `run_metropolis_MCMC_betas()` for multiple matrices; always uses three components.
`run_metropolis_MCMC_betas()`	Core Metropolis‑within‑Gibbs sampler.
`compute_HMRFHiC_probabilities()`	Converts posterior mean betas into per‑pixel probabilities for components 1, 2, 3.
`prior_combined()`, `likelihood_combined()`, `posterior_combined()`	Compute log‑priors, log‑likelihoods and log‑posteriors.
`pred_combined()`	Calculates log‑linear predictors $\lambda$.
`Neighbours_combined()` (C++)	Counts matching neighbours for the Potts model.
`pz_123()` (C++)	Site‑wise component log‑posterior calculation.

mcmc1 <- res[[1]]

## trace of γ
plot(mcmc1$gamma, type = "l", col = "#1f77b4",
     main = "Potts interaction γ", ylab = "γ")

## posterior means of βs for component 1
colMeans(mcmc1$chains[[1]][-(1:2500), ])

pm <- compute_HMRFHiC_probabilities(
  data         = df,
  chain_betas  = mcmc1$chains,
  iterations   = 5000,
  dist         = "ZINB"
)

head(pm[, c("start", "end", "prob1", "prob2", "prob3")])

Advanced options Distribution choice – dist = "Poisson", "NB" or "ZIP".

Fixed vs. data‑driven priors – set use_data_priors = FALSE in the sampler and supply user_fixed_priors.

Proposal tuning – edit the sd_values vectors inside run_metropolis_MCMC_betas().

set.seed(1)
N <- 10
fake <- data.frame(
  start        = rep(1:N, each = N),
  end          = rep(1:N, N),
  interactions = rpois(N*N, 5),
  GC           = runif(N*N),
  ACC          = runif(N*N),
  TES          = runif(N*N)
)

prep   <- process_data(fake, N)
res    <- run_chain_betas(prep$y, N = N, iterations = 100,
                          x_vars = prep$x_vars,
                          thetap = 0.5, size_initial = c(1,1,1),
                          dist   = "Poisson", mc_cores = 1)

Another example is using the test_data2.csv file inside the folder; inst/extdata

mydata=read.csv("~/inst/extdata/test_data2.csv")

N=20
thetap=0.6

iterations=2000


# Assuming mydata is a data frame
colnames(mydata)[colnames(mydata) %in% c("start.i.", "start.i", "start")] <- "start"
colnames(mydata)[colnames(mydata) %in% c("end.j.", "end.j", "end")] <- "end"
colnames(mydata)[colnames(mydata) %in% c("Acc", "ACC")] <- "ACC"
colnames(mydata)[colnames(mydata) %in% c("GC", "Gc")] <- "GC"
colnames(mydata)[colnames(mydata) %in% c("Tes", "TEs", "TES")] <- "TES"
colnames(mydata)[colnames(mydata) %in% c("interactions", "interaction")] <- "interactions"



scaled_data<-process_data(mydata, N, scale_max = 500, standardization_y = TRUE)


chains=run_chain_betas(
  N,
  gamma_prior=0.3,
  iterations=iterations,
  x_vars=scaled_data[["x_vars"]],
  y=scaled_data[["y"]],
  use_data_priors=TRUE,
  dist="Poisson",
  distance_metric="manhattan",
  mc_cores = 22
)

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
R		R
inst		inst
man		man
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS		NEWS
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HiCPotts

1  Why HiCPotts?

2  Installation

About

Uh oh!

Releases

Packages

Languages

License

igosungithub/HiCPotts

Folders and files

Latest commit

History

Repository files navigation

HiCPotts

1 Why HiCPotts?

2 Installation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1  Why HiCPotts?

2  Installation

Packages