Using a biobox Docker image
Bioboxes aims to make it much simpler for anyone to use the most recent advances in bioinformatics software. This page will provide a short example of using a biobox genome assembler. The purpose of this guide is to illustrate how bioboxes work and this could then be applied for any application for which a biobox exists, not only genome assembly.
This tutorial will use real sequencing data so that the example biobox can be
run as you might do so with your own data. The data is available for
download and is a FASTQ file of Illumina reads from a real genome which
was sequenced at the JGI. This file can be downloaded using
$ mkdir input_data $ wget \ --output-document input_data/reads.fq.gz \ 'https://www.dropbox.com/s/uxgn6cqngctqv74/reads.fq.gz?dl=1'
Create a biobox.yaml file
The inputs to a biobox are specified using a file named 'bioboxes.yaml' An example file where we specific this data is:
--- version: "0.9.0" arguments: - fastq: - id: "test_reads" type: "paired" value: "/bbx/input/reads.fq.gz"
In this file we specify the current bioboxes version
0.9, along with the
arguments to the biobox. In this case we're giving a single FASTQ file. This
argument has the identifier
test_reads and the type is
paired because this
is the type of sequencing data. The final argument specifies the location of
the files. In the biobox.yaml file this is in the directory named
which is where you will place the reads in the biobox container.
biobox.yaml file can be created as follows:
cat << EOF > input_data/biobox.yaml --- version: "0.9.0" arguments: - fastq: - id: "test_reads" type: "paired" value: "/bbx/input/reads.fq.gz" EOF
Run the biobox
The input data and biobox.yaml file are all that's required to test a biobox. Run the following command to use the velvet biobox to assemble the test reads:
mkdir -p output_data docker run \ --volume="$(pwd)/input_data:/bbx/input:ro" \ --volume="$(pwd)/output_data:/bbx/output:rw" \ --rm \ bioboxes/velvet \ default
$(pwd) syntax. If you are unfamiliar the command
pwd returns the
current working directory. The construct
$(...) replaces itself with the
result of evaluating the contents inside the parenthesis. Therefore
will be replaced with the current working directory you are in. This is
necessary because the
--volume flags require the full directory path.
--volume flag is used to link a directory on your computer to a directory
inside the biobox. In the example you mount the directory
input_data to the
ro is an abbreviation of read-only. This means
only data can be read from the directory. You will generally always want to use
ro for your input data to prevent a biobox accidentally changing it. The
second volume mounts
/bbx/output inside the biobox. This is
the location where the results with be created. The
rw means read-write and
allows the biobox to write to this location.
--rm flag specifies that the biobox container should be removed after it
has finished running. If you don't specify this, then your computer may fill up
with finish bioboxes each time you start one. This will cause you to run out of
disk space if there enough of them.