Cheap Parallelisation

Table of Contents What is fd? Baby steps: finding our files Constructing execution commands Simple: counting characters in files Intermediate: changing file extensions Advanced: redirecting stdout to a file within the command Putting it all together: Parallel MSA Benchmark Results Conclusion Final Remarks Motivation I was recently creating a snakemake pipeline and needed to write a rule/process that would perform a multiple sequence alignment (MSA) on 2,582 fasta files. Usually, it is easy to parallelise this kind of task using snakemake. To cut a long story short; using snakemake to parallelise across the files was not feasible. I knew there were ways of doing this kind of thing with tools such as parallel, xargs, and find, but I had never really invested the time to get comfortable with them. This post is an attempt to document that process using one of my favourite CLI tools: fd. We’ll see how fd can be used to execute multiple MSAs (with MAFFT) simultaneously, and benchmark how much faster it is than a conventional “synchronous” approach. ...

June 22, 2020 · Michael Hall · ... views

Benchmarking Guppy algorithms

Methods Results Conclusions Supplementary code {:toc} ONT’s basecaller Guppy has recently been released to the masses. And with the announcement of the new “flip-flop” basecalling algorithm there is now the choice of two different algorithms for basecalling. ONT has obviously been singing flip-flop’s praises, and understandably so, as the initial results look like a decent step up in read accuracy. For an upcoming project I am going to be doing a lot of basecalling of Mycobacterium tuberculosis and given the project will involve assessing metrics heavily reliant on read accuracy I thought it best to invest some time in deciding which algorithm to go with. Another reason for my indecision came when I read a recent blog from Keith Robison which showed that maybe the new flip-flop algorithm doesn’t work well with organisms that have a higher GC content. ...

February 1, 2019 · Michael Hall · ... views