Method HLA-MA for consistency checking in human HTS data analysis. Provided that there is sufficient coverage of the HLA loci, comparing HLA types allows for simple, fast and robust matching of samples from whole genome, exome and RNA-seq data.

Our approach uses information from small but genetically highly variable regions and thus complements approaches that rely on genome or exon-wide variant profiles. The software is implemented In Python 3 and freely available under the MIT license at github and via Bioconda.

a fluffly alpaca

HLA-MA uses a simple, yet effective idea. For a list of HTS samples (each having single or paired-end, DNA-seq or RNA-seq data) and a description of their relation (e.g. being matched as tumor/normal pairs or their family relation in a pedigree), the HLA types are inferred. Then, they are compared up to two-digit (classical serotyping) and four-digit (protein sequence) precision with each other. In matched tumor/normal mode, a full match is expected (i.e. all HLA types should match), whereas in pedigree mode, the inheritance of the HLA genes is checked to follow the expected Mendelian inheritance rules. HLA-MA then reports the number of mismatches between samples for two-digit and four-digit resolution.

HLA-MA is a Python 3 program that uses Snakemake (Köster and Rahmann, 2012) for the orchestration of its workflow. First, Yara (Siragusa, 2015) is used for prefiltering the reads and then OptiType (Szolek et al., 2014) is used for performing the HLA typing, which internally uses RazerS3 (Weese et al., 2012) for read mapping. OptiType was chosen because it performed best on a selection of datasets compared to competing methods in our benchmarks. The resulting HLA types are then compared in Python code. HLA-MA can also be run in parallel on a compute cluster (e.g. using Grid Engine) through the cluster parallelization support in Snakemake.

Licence and Availabilty

Last modified: Jan 21, 2021