Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan 15;31(2):268-70.
doi: 10.1093/bioinformatics/btu630. Epub 2014 Sep 30.

A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature

Affiliations

A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature

Reece K Hart et al. Bioinformatics. .

Abstract

Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics.

Availability and implementation: The package is released under the Apache 2.0 open-source license. Source code, documentation and issue tracking are available at http://bitbucket.org/hgvs/hgvs/. Python packages are available at PyPI (https://pypi.python.org/pypi/hgvs).

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Using the hgvs package to project a variant in MCL1 from one transcript to another via GRCh37 chromosome 1. (a) An object representation of the result of parsing ‘NM_182763.2:c.688+403C>T’. Selected attributes are shown beneath. (b) A diagram of the MCL1 locus with five representations of a single variant. (c) Python code that demonstrates parsing, mapping between sequences, formatting and validating. Gray outline boxes enclose input, and the results appear immediately beneath. Circled numbers indicate a correspondence between the variants in (a) and code in (c). An SNV❶, originally reported in literature as NM_182763.2:c.688+403C>T (rs201430561), is projected onto chromosome 1 as variant ❸, and then projected to an alternative transcript as variant. The inferred protein❹ changes of variants ❶ and ❹ are shown as protein variants❷ and ❺. The results are formatted by ‘stringifying’ them using standard Python printing commands. Validation for a valid variant (281C>T; no error generated) and an error for an invalid variant (281A>T) are shown

References

    1. Cingolani P, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 2012;6:80–92. - PMC - PubMed
    1. Garla V, et al. MU2A—reconciling the genome and transcriptome to determine the effects of base substitutions. Bioinformatics. 2011;27:416–418. - PMC - PubMed
    1. Green RC, et al. ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing. Genet. Med. 2013;15:565–574. - PMC - PubMed
    1. Kapustin Y, et al. Splign: algorithms for computing spliced alignments with identification of paralogs. Biol. Direct. 2008;3:20. - PMC - PubMed
    1. Kent WJ. BLAT—the BLAST-like alignment tool. Genome Res. 2002;12:656–664. - PMC - PubMed