<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Speed on Microbes made me do it</title><link>https://mbhall88.github.io/tags/speed/</link><description>Recent content in Speed on Microbes made me do it</description><generator>Hugo</generator><language>en-au</language><lastBuildDate>Mon, 22 Jun 2020 00:00:00 +0000</lastBuildDate><atom:link href="https://mbhall88.github.io/tags/speed/index.xml" rel="self" type="application/rss+xml"/><item><title>Cheap Parallelisation</title><link>https://mbhall88.github.io/post/cheap-parallelisation/</link><pubDate>Mon, 22 Jun 2020 00:00:00 +0000</pubDate><guid>https://mbhall88.github.io/post/cheap-parallelisation/</guid><description>&lt;h1 id="table-of-contents"&gt;Table of Contents&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#what-is-fd"&gt;What is &lt;code&gt;fd&lt;/code&gt;?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#baby-steps-finding-our-files"&gt;Baby steps: finding our files&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#constructing-execution-commands"&gt;Constructing execution commands&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#simple-counting-characters-in-files"&gt;Simple: counting characters in files&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#intermediate-changing-file-extensions"&gt;Intermediate: changing file extensions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#advanced-redirecting-stdout-to-a-file-within-the-command"&gt;Advanced: redirecting stdout to a file within the command&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#putting-it-all-together-parallel-msa"&gt;Putting it all together: Parallel MSA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#benchmark"&gt;Benchmark&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#results"&gt;Results&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#conclusion"&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#final-remarks"&gt;Final Remarks&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id="motivation"&gt;Motivation&lt;/h1&gt;
&lt;p&gt;I was recently creating a &lt;a href="https://snakemake.readthedocs.io/en/stable/"&gt;&lt;code&gt;snakemake&lt;/code&gt;&lt;/a&gt; pipeline and needed to write a
rule/process that would perform a multiple sequence alignment (MSA) on 2,582 fasta
files. Usually, it is easy to parallelise this kind of task using &lt;code&gt;snakemake&lt;/code&gt;. To cut a
long story short; using &lt;code&gt;snakemake&lt;/code&gt; to parallelise across the files was not feasible. I
knew there were ways of doing this kind of thing with tools such as
&lt;a href="https://www.gnu.org/software/parallel/"&gt;&lt;code&gt;parallel&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://www.man7.org/linux/man-pages/man1/xargs.1.html"&gt;&lt;code&gt;xargs&lt;/code&gt;&lt;/a&gt;, and &lt;a href="https://www.gnu.org/software/findutils/"&gt;&lt;code&gt;find&lt;/code&gt;&lt;/a&gt;, but I had never really
invested the time to get comfortable with them. This post is an attempt to document that
process using one of my favourite CLI tools: &lt;a href="https://github.com/sharkdp/fd"&gt;&lt;code&gt;fd&lt;/code&gt;&lt;/a&gt;. We&amp;rsquo;ll see how &lt;code&gt;fd&lt;/code&gt; can be used
to execute multiple MSAs (with MAFFT) simultaneously, and benchmark how much faster it is than
a conventional &amp;ldquo;synchronous&amp;rdquo; approach.&lt;/p&gt;</description></item></channel></rss>