Document Type


Publication Date


Digital Object Identifier (DOI)


Background: Whole genome sequencing (WGS) can elucidate Mycobacterium tuberculosis (Mtb) transmission patterns but more data is needed to guide its use in high-burden settings. In a household-based transmissibility study of 4,000 TB patients in Lima, Peru, we identified a large MIRU-VNTR Mtb cluster with a range of resistance phenotypes and studied host and bacterial factors contributing to its spread.

Methods: WGS was performed on 61 of 148 isolates in the cluster. We compared transmission link inference using epidemiological or genomic data with and without the inclusion of controversial variants, and estimated the dates of emergence of the cluster and antimicrobial drug resistance acquisition events by generating a time-calibrated phylogeny. We validated our findings in genomic data from an outbreak of 325 TB cases in London. Using a larger set of 12,032 public Mtb genomes, we determined bacterial factors characterizing this cluster and under positive selection in other Mtb lineages.

Findings: Four isolates were distantly related and the remaining 57 isolates diverged ca. 1968 (95% HPD: 1945-1985). Isoniazid resistance arose once, whereas rifampicin resistance emerged subsequently at least three times. Amplification of other drug resistance occurred as recently as within the last year of sampling. High quality PE/PPE variants and indels added information for transmission inference. We identified five cluster-defining SNPs, including esxV S23L to be potentially contributing to transmissibility.

Interpretation: Clusters defined by MIRU-VNTR typing, could be circulating for decades in a high-burden setting. WGS allows for an improved understanding of transmission, as well as bacterial resistance and fitness factors.

Funding: The study was funded by the National Institutes of Health (Peru Epi study U19-AI076217 and K01-ES026835 to MRF). The funding sources had no role in any aspect of the study, manuscript or decision to submit it for publication.

Evidence before this study: Use of whole genome sequencing (WGS) to study tuberculosis (TB) transmission has proven to have higher resolution that traditional typing methods in low-burden settings. The implications of its use in high-burden settings are not well understood.

Added value of this study: Using WGS, we found that TB clusters defined by traditional typing methods may be circulating for several decades. Genomic regions typically excluded from WGS analysis contain large amount of genetic variation that may affect interpretation of transmission events. We also identified five bacterial mutations that may contribute to transmission fitness.

Implications of all the available evidence: Added value of WGS for understanding TB transmission may be even higher in high-burden vs. low-burden settings. Methods integrating variants found in polymorphic sites and insertions and deletions are likely to have higher resolution. Several host and bacterial factors may be responsible for higher transmissibility that can be targets of intervention to interrupt TB transmission in communities.


Complete list of authors: Leonid Lecca, Sergios-Orestis Kolokotronis, Barun Mathema, Maha R. Farhat

Rights Information

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Was this content written or created while at USF?


Citation / Publisher Attribution

Scientific Reports, v. 9, art. 5602

Included in

Social Work Commons