Searching for shared sequence between Mycobacterium tuberculosis and Homo sapiens

Wed, 21 Jun 2023 00:00:00 +0000

Motivation
Shared k-mer content
Aligning reads
Summary
References

Motivation

We are in the early stages of planning a Mycobacterium tuberculosis (MTB) analysis pipeline for a research project in Papua New Guinea. We’ll be sequencing sputum samples with Oxford Nanopore Technologies (ONT) devices and were thinking of different ways of decontaminating the data - i.e. remove anything non-MTB. Sputum samples traditionally have a lot of host (human) reads and reads from a variety of bacteria. Traditionally the MTB component is quite small¹. One component of this pipeline will be to upload sequencing reads to a remote/cloud server, so any reduction in file size will make uploads faster. As human reads are not used in any analysis steps, and will need to be removed prior to making any data available, we thought we could simplify things by removing human data as the first step. Our idea was to align reads to the human genome and just remove anything that aligns. However, one concern with this approach was whether any MTB reads could be lost in the process. This effectively boils down to the question: Do Mycobacterium tuberculosis and Homo sapiens share genomic sequence? After a literature search, I was unable to find an answer - which seemed quite surprising. My suspicion is that most people just assume they do not. (Or my literature searching skills are poor.) So let’s take a look.

Tuberculosis on Microbes made me do it

Searching for shared sequence between Mycobacterium tuberculosis and Homo sapiens

Table of Contents

Motivation