Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 29;11(1):2071.
doi: 10.1038/s41467-020-15848-y.

Haplotype-resolved genomes provide insights into structural variation and gene content in Angus and Brahman cattle

Affiliations

Haplotype-resolved genomes provide insights into structural variation and gene content in Angus and Brahman cattle

Wai Yee Low et al. Nat Commun. .

Abstract

Inbred animals were historically chosen for genome analysis to circumvent assembly issues caused by haplotype variation but this resulted in a composite of the two genomes. Here we report a haplotype-aware scaffolding and polishing pipeline which was used to create haplotype-resolved, chromosome-level genome assemblies of Angus (taurine) and Brahman (indicine) cattle subspecies from contigs generated by the trio binning method. These assemblies reveal structural and copy number variants that differentiate the subspecies and that variant detection is sensitive to the specific reference genome chosen. Six genes with immune related functions have additional copies in the indicine compared with taurine lineage and an indicus-specific extra copy of fatty acid desaturase is under positive selection. The haplotyped genomes also enable transcripts to be phased to detect allele-specific expression. This work exemplifies the value of haplotype-resolved genomes to better explore evolutionary and functional variations.

PubMed Disclaimer

Conflict of interest statement

S.B.K., Z.N.K., and E.T. are employees of Pacific Biosciences. A.R.H., J.L., and A.W.C.P. are employees of BioNano Genomics. J.G. is an employee of Dovetail Genomics. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. An overview of the assembly methods.
Long PacBio reads were binned to the respective haplotypes using parental-specific k-mers and unassigned reads were discarded. TrioCanu was used to assemble sequences from each haplotype into haplotigs. Each set of haplotigs was scaffolded separately with both Hi-C and optical map data (illustrated only for the Angus). Optical map breakpoints were accepted but are imprecise. Therefore, breakpoint positions were improved by observing if there are local drops in short-read coverage and/or where there is a break sequence alignment with the alternative haplotig. Hi-C and optical map-based scaffolds were checked for consistency and combined as a single set of scaffolds. Cattle recombination maps were used to validate the assembly. Each point on the scatter plot is the actual recombination marker coordinate on the latest reference genome and the expected position based on previous reference genome, UMD3.1. Finally, haplotype-specific long reads were used to fill gaps and polish the sequence.
Fig. 2
Fig. 2. Sequence contiguity and resolution of repeats.
a Barplot of the number of gaps by chromosomes between various mammalian assemblies. b Violin plot of repeat families filtered for those >2.5 kb for LINE/L1, LINE/RTE-BovB, and satellite/centromeric repeats.
Fig. 3
Fig. 3. Divergence of the FADS2P1 locus between indicine and taurine cattle.
a Dot plot of Brahman chromosome 15 between positions 3,748,952 to 5,140,465 against the homologous Angus chromosome between positions 78,799,177 to 80,168,904. The Brahman sequence was reverse complemented in the plot. b Maximum likelihood tree with 1000 bootstraps of FADS2P1 homologous protein sequences. The extra Brahman FADS2P1 copy is highlighted with asterisk (*) and its branch colored red. c Microsynteny plot showing a lack of sequence conservation between indicine and taurine breeds around the indicine-specific FADS2P1 gene. All FADS2P1 genes are colored turquoise, other genes purple, and pseudogenes orange. The upper plot compares Brahman to Angus and the lower plot compares Hereford to Angus. The track in black in both panels is the Angus reference. The Brahman FADS2P1 gene Ensembl IDs are ENSBIXG00005007613, ENSBIXG00005021668, and ENSBIXG00005022680, whereas the Angus IDs are ENSBIXG00000018262 and ENSBIXG00000018381. The indicus-specific copy of FADS2P1 is ENSBIXG00005007613. d Mapping of 16 positively selected sites onto the exons of Brahman FADS2P1. The residues with double asterisks (**) indicate they have Probω>1>0.99 (i.e., highly significant positively selected sites).
Fig. 4
Fig. 4. Comparison of structural variants between Brahman and Angus.
a Count in log10 scale of 6 classes of SVs when overlapped with various annotation types. b Population differentiation for copy number variation as estimated by Vst along each chromosome for the taurine and indicine comparison using UOA_Brahman_1 as the reference.
Fig. 5
Fig. 5. Boxplot of normalized copy number of autosomal genes with Vst > 0.3.
Only those CNV genes with average copy number difference of at least 1.5 copies between the taurine and indicine groups are shown. Dot plots of individual values are overlaid on top of boxplots to show minima and maxima as circles. The bounds of box show the 25th and 75th percentile, with the median drawn as a thick line between these two quartiles. The reference genomes were a UOA_Angus_1, b ARS-UCD1.2, and c UOA_Brahman_1. d Liftover of CNV regions from Brahman and Angus to Hereford ARS-UCD1.2 common coordinate for an assessment of intersection between them at base-pair resolution.
Fig. 6
Fig. 6. Phasing of Iso-Seq full-length transcripts in seven tissues reveals transcriptional complexity and allelic imbalance.
a Characterization of transcript annotation of the hybrid animal using SQANTI2 against the Brahman annotation. Full-splice match: perfect match with a reference; incomplete-splice match: missing one or more 5’ exons against a reference; novel in catalog: novel combinations of known junctions; novel not in catalog: at least one novel splice site. b Histogram of transcript length distribution. c The overlap of SNPs between WGS short reads from genomic DNA, Iso-Seq, and RNA-Seq when Brahman was used as the reference genome. d Violin plot of the proportion of Brahman alleles, which was calculated as the normalized count of Brahman alleles divided by the sum of the normalized count of both Brahman and Angus alleles. Transcripts showing allelic imbalance and with higher expression in Brahman have values closer to 1, whereas those with higher expression in Angus have values closer to 0. e Tissue-specific allelic expression at the gene level for ARIH2, which is the most highly expressed Angus gene in the brain. f Tissue-specific allelic expression at the transcript level for ARIH2 in the brain, heart, kidney, liver, lung, muscle and placenta. A denotes Angus and B denotes Brahman.

References

    1. Park SDE, et al. Genome sequencing of the extinct Eurasian wild aurochs, Bos primigenius, illuminates the phylogeography and evolution of cattle. Genome Biol. 2015;16:234. doi: 10.1186/s13059-015-0790-2. - DOI - PMC - PubMed
    1. Verdugo MP, et al. Ancient cattle genomics, origins, and rapid turnover in the Fertile Crescent. Science. 2019;365:173–176. - PubMed
    1. Naik SN. Origin and domestication of Zebu cattle (Bos indicus) J. Hum. Evol. 1978;7:23–30. doi: 10.1016/S0047-2484(78)80032-3. - DOI
    1. Koufariotis L, et al. Sequencing the mosaic genome of Brahman cattle identifies historic and recent introgression including polled. Sci. Rep. 2018;8:17761. doi: 10.1038/s41598-018-35698-5. - DOI - PMC - PubMed
    1. American Brahman Breeders Association. Available at https://brahman.org (2020).

Publication types