Example biobox usage
Bioboxes simplify getting and using bioinformatics software. This short guide illustrates this using an example scenario where you would like to assemble some Illumina reads into contigs. This is a common situation for anyone who works in genomics. The purpose of this guide is to illustrate how bioboxes work and this could then be applied for any application for which a biobox exists, not only genome assembly.
This tutorial uses real sequence data so that the example biobox can be run as
as you might do with your own data. The data is available for download
and is a FASTQ file of Illumina reads from a real genome which was sequenced at
the Joint Genome Institute. You can download the data using this link
or on the command line using wget
.
wget \
--output-document reads.fq.gz \
'https://www.dropbox.com/s/uxgn6cqngctqv74/reads.fq.gz?dl=1'
Assuming that you have the biobox CLI installed as described in the
installation instructions. You can use this to run a biobox which
will assemble these reads into longer contigs. Make sure the reads you
downloaded are in a file named reads.fq.gz
in the same directory you are
running the commands.
# Fetch the velvet assembler image using Docker
docker pull bioboxes/velvet
# Use the velvet biobox to assemble these reads
biobox run \
short_read_assembler \
bioboxes/velvet \
--input reads.fq.gz \
--output contigs.fa
You can see this command is specifying the location of the assembly reads using
--input
and the output location for the assembled contigs using --output
.
The advantage provided by bioboxes is that you can try a different biobox
instead of velvet using almost the same command. For example you might try
using megahit.
# Fetch the megahit biobox
docker pull bioboxes/megahit
# Assemble using megahit
biobox run \
short_read_assembler \
bioboxes/megahit \
--input reads.fq.gz \
--output contigs.fa
This examples shows that only the name of the referenced biobox had to be changed to use a different one to assemble the reads. This illustrates the core principles behind bioboxes - that bioinformatics software should be simple to install and equally simple to use.