<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Benchmark on Microbes made me do it</title><link>https://mbhall88.github.io/tags/benchmark/</link><description>Recent content in Benchmark on Microbes made me do it</description><generator>Hugo</generator><language>en-au</language><lastBuildDate>Mon, 22 Jun 2020 00:00:00 +0000</lastBuildDate><atom:link href="https://mbhall88.github.io/tags/benchmark/index.xml" rel="self" type="application/rss+xml"/><item><title>Cheap Parallelisation</title><link>https://mbhall88.github.io/post/cheap-parallelisation/</link><pubDate>Mon, 22 Jun 2020 00:00:00 +0000</pubDate><guid>https://mbhall88.github.io/post/cheap-parallelisation/</guid><description>&lt;h1 id="table-of-contents"&gt;Table of Contents&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#what-is-fd"&gt;What is &lt;code&gt;fd&lt;/code&gt;?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#baby-steps-finding-our-files"&gt;Baby steps: finding our files&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#constructing-execution-commands"&gt;Constructing execution commands&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#simple-counting-characters-in-files"&gt;Simple: counting characters in files&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#intermediate-changing-file-extensions"&gt;Intermediate: changing file extensions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#advanced-redirecting-stdout-to-a-file-within-the-command"&gt;Advanced: redirecting stdout to a file within the command&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#putting-it-all-together-parallel-msa"&gt;Putting it all together: Parallel MSA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#benchmark"&gt;Benchmark&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#results"&gt;Results&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#conclusion"&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#final-remarks"&gt;Final Remarks&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id="motivation"&gt;Motivation&lt;/h1&gt;
&lt;p&gt;I was recently creating a &lt;a href="https://snakemake.readthedocs.io/en/stable/"&gt;&lt;code&gt;snakemake&lt;/code&gt;&lt;/a&gt; pipeline and needed to write a
rule/process that would perform a multiple sequence alignment (MSA) on 2,582 fasta
files. Usually, it is easy to parallelise this kind of task using &lt;code&gt;snakemake&lt;/code&gt;. To cut a
long story short; using &lt;code&gt;snakemake&lt;/code&gt; to parallelise across the files was not feasible. I
knew there were ways of doing this kind of thing with tools such as
&lt;a href="https://www.gnu.org/software/parallel/"&gt;&lt;code&gt;parallel&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://www.man7.org/linux/man-pages/man1/xargs.1.html"&gt;&lt;code&gt;xargs&lt;/code&gt;&lt;/a&gt;, and &lt;a href="https://www.gnu.org/software/findutils/"&gt;&lt;code&gt;find&lt;/code&gt;&lt;/a&gt;, but I had never really
invested the time to get comfortable with them. This post is an attempt to document that
process using one of my favourite CLI tools: &lt;a href="https://github.com/sharkdp/fd"&gt;&lt;code&gt;fd&lt;/code&gt;&lt;/a&gt;. We&amp;rsquo;ll see how &lt;code&gt;fd&lt;/code&gt; can be used
to execute multiple MSAs (with MAFFT) simultaneously, and benchmark how much faster it is than
a conventional &amp;ldquo;synchronous&amp;rdquo; approach.&lt;/p&gt;</description></item><item><title>Benchmarking Guppy algorithms</title><link>https://mbhall88.github.io/post/benchmark-guppy-algorithms/</link><pubDate>Fri, 01 Feb 2019 00:00:00 +0000</pubDate><guid>https://mbhall88.github.io/post/benchmark-guppy-algorithms/</guid><description>&lt;ul&gt;
&lt;li&gt;Methods&lt;/li&gt;
&lt;li&gt;Results&lt;/li&gt;
&lt;li&gt;Conclusions&lt;/li&gt;
&lt;li&gt;Supplementary code
{:toc}&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;ONT&amp;rsquo;s basecaller Guppy has recently been released to the masses. And with the announcement of the new &amp;ldquo;&lt;a href="https://community.nanoporetech.com/posts/pre-release-of-stand-alone"&gt;flip-flop&lt;/a&gt;&amp;rdquo; basecalling algorithm there is now the choice of two different algorithms for basecalling.&lt;/p&gt;
&lt;p&gt;ONT has obviously been singing flip-flop&amp;rsquo;s praises, and understandably so, as the &lt;a href="https://community.nanoporetech.com/posts/pre-release-of-stand-alone"&gt;initial results&lt;/a&gt; look like a decent step up in read accuracy.&lt;/p&gt;
&lt;p&gt;For an upcoming project I am going to be doing &lt;em&gt;a lot&lt;/em&gt; of basecalling of &lt;em&gt;Mycobacterium tuberculosis&lt;/em&gt; and given the project will involve assessing metrics heavily reliant on read accuracy I thought it best to invest some time in deciding which algorithm to go with. Another reason for my indecision came when I read a &lt;a href="https://omicsomics.blogspot.com/2018/12/flappie-vs-albacore-via-counterr.html"&gt;recent blog from Keith Robison&lt;/a&gt; which showed that maybe the new flip-flop algorithm doesn&amp;rsquo;t work well with organisms that have a higher GC content.&lt;/p&gt;</description></item></channel></rss>