Posts on Microbes made me do it

Minimap2 lr:hq preset testing

Wed, 22 Apr 2026 14:16:33 +1000

Evaluating minimap2’s `lr:hq` preset for bacterial nanopore variant calling

Introduction

Oxford Nanopore Technologies (ONT) sequencing accuracy has improved dramatically in recent years. With basecalling models like Dorado v5.2.0 super-accuracy (sup), error rates are consistently hovering around the 1% mark. To match this shift in raw read quality, minimap2[1] introduced the lr:hq preset in version 2.27 (March 2024), which is calibrated for long reads with an error rate of <1%.

This introduction was driven by internal benchmarking from ONT developers (see minimap2 issue #1127) who found that -x map-ont -k19 -w19 -U50,500 maximised both speed and downstream accuracy for high-quality reads. As such, the lr:hq preset was added to mirror those options.

Searching for shared sequence between Mycobacterium tuberculosis and Homo sapiens

Wed, 21 Jun 2023 00:00:00 +0000

Motivation
Shared k-mer content
Aligning reads
Summary
References

Motivation

We are in the early stages of planning a Mycobacterium tuberculosis (MTB) analysis pipeline for a research project in Papua New Guinea. We’ll be sequencing sputum samples with Oxford Nanopore Technologies (ONT) devices and were thinking of different ways of decontaminating the data - i.e. remove anything non-MTB. Sputum samples traditionally have a lot of host (human) reads and reads from a variety of bacteria. Traditionally the MTB component is quite small¹. One component of this pipeline will be to upload sequencing reads to a remote/cloud server, so any reduction in file size will make uploads faster. As human reads are not used in any analysis steps, and will need to be removed prior to making any data available, we thought we could simplify things by removing human data as the first step. Our idea was to align reads to the human genome and just remove anything that aligns. However, one concern with this approach was whether any MTB reads could be lost in the process. This effectively boils down to the question: Do Mycobacterium tuberculosis and Homo sapiens share genomic sequence? After a literature search, I was unable to find an answer - which seemed quite surprising. My suspicion is that most people just assume they do not. (Or my literature searching skills are poor.) So let’s take a look.

Cheap Parallelisation

Mon, 22 Jun 2020 00:00:00 +0000

What is fd?
Baby steps: finding our files
Constructing execution commands
Putting it all together: Parallel MSA
Benchmark
- Results
- Conclusion
Final Remarks

Motivation

I was recently creating a snakemake pipeline and needed to write a rule/process that would perform a multiple sequence alignment (MSA) on 2,582 fasta files. Usually, it is easy to parallelise this kind of task using snakemake. To cut a long story short; using snakemake to parallelise across the files was not feasible. I knew there were ways of doing this kind of thing with tools such as parallel, xargs, and find, but I had never really invested the time to get comfortable with them. This post is an attempt to document that process using one of my favourite CLI tools: fd. We’ll see how fd can be used to execute multiple MSAs (with MAFFT) simultaneously, and benchmark how much faster it is than a conventional “synchronous” approach.

Benchmarking Guppy algorithms

Fri, 01 Feb 2019 00:00:00 +0000

Methods
Results
Conclusions
Supplementary code {:toc}

ONT’s basecaller Guppy has recently been released to the masses. And with the announcement of the new “flip-flop” basecalling algorithm there is now the choice of two different algorithms for basecalling.

ONT has obviously been singing flip-flop’s praises, and understandably so, as the initial results look like a decent step up in read accuracy.

For an upcoming project I am going to be doing a lot of basecalling of Mycobacterium tuberculosis and given the project will involve assessing metrics heavily reliant on read accuracy I thought it best to invest some time in deciding which algorithm to go with. Another reason for my indecision came when I read a recent blog from Keith Robison which showed that maybe the new flip-flop algorithm doesn’t work well with organisms that have a higher GC content.

Posts on Microbes made me do it

Minimap2 lr:hq preset testing

Evaluating minimap2’s `lr:hq` preset for bacterial nanopore variant calling

Introduction

Searching for shared sequence between Mycobacterium tuberculosis and Homo sapiens

Table of Contents

Motivation

Cheap Parallelisation

Table of Contents

Motivation

Benchmarking Guppy algorithms

Posts on Microbes made me do it

Minimap2 lr:hq preset testing

Evaluating minimap2’s lr:hq preset for bacterial nanopore variant calling

Introduction

Searching for shared sequence between Mycobacterium tuberculosis and Homo sapiens

Table of Contents

Motivation

Cheap Parallelisation

Table of Contents

Motivation

Benchmarking Guppy algorithms

Evaluating minimap2’s `lr:hq` preset for bacterial nanopore variant calling