Create a Task
Tasks allow the same assembler to be run with different combinations of parameters. You can also think of a task as a command bundle that groups different command line flags together into a simpler interface. This guide describes using tasks in your image.
The a biobox interface requires that a container be called with a task
parameter. This follows the name of the biobox when using docker run
.
docker run [OPTIONS] BIOBOX_NAME TASK
You should use tasks to describe the different ways your software can be run. Using genome assembly as an example, there might be a way to run an assembler that create large contigs but at the same time may contain errors. There may also be a separate way to run an assembler more carefully which results in smaller more correct contigs. These different ways of running the same software are the reason tasks are used.
Each biobox should provide a default
task which should be the set of command
line flags that work best in most situations. The task should later be provided
to your run command:
Example
In this velvet example we'll create a script that provides two tasks.
We'll define this in a file called Taskfile
. It is important for the next sections that you place
this file in the same directory you have placed the Dockerfile of the preceding section.
You can see the default
task contains
the commands to run velvet along with environment variables.
default: velveth ${TMP_DIR} 31 -fastq.gz ${READS} && velvetg ${TMP_DIR} -cov_cutoff auto
careful: velveth ${TMP_DIR} 91 -fastq.gz ${READS} && velvetg ${TMP_DIR} -cov_cutoff 10
The second task has a larger kmer size and sets a low assembly coverage cutoff.
The Taskfile is added in your Dockerfile with an ADD
command and now looks
like this:
FROM ubuntu:14.04
MAINTAINER Michael Barton, mail@michaelbarton.me.uk
ENV PACKAGES make gcc wget libc6-dev zlib1g-dev ca-certificates xz-utils
RUN apt-get update -y && apt-get install -y --no-install-recommends ${PACKAGES}
ENV ASSEMBLER_DIR /tmp/assembler
ENV ASSEMBLER_URL https://www.ebi.ac.uk/~zerbino/velvet/velvet_1.2.10.tgz
ENV ASSEMBLER_BLD make 'MAXKMERLENGTH=100' && mv velvet* /usr/local/bin/ && rm -r ${ASSEMBLER_DIR}
RUN mkdir ${ASSEMBLER_DIR}
RUN cd ${ASSEMBLER_DIR} &&\
wget --quiet ${ASSEMBLER_URL} --output-document - |\
tar xzf - --directory . --strip-components=1 && eval ${ASSEMBLER_BLD}
# Locations for biobox file validator
ENV VALIDATOR /bbx/validator/
ENV BASE_URL https://s3-us-west-1.amazonaws.com/bioboxes-tools/validate-biobox-file
ENV VERSION 0.x.y
RUN mkdir -p ${VALIDATOR}
# download the validate-biobox-file binary and extract it to the directory $VALIDATOR
RUN wget \
--quiet \
--output-document -\
${BASE_URL}/${VERSION}/validate-biobox-file.tar.xz \
| tar xJf - \
--directory ${VALIDATOR} \
--strip-components=1
ENV PATH ${PATH}:${VALIDATOR}
# download the assembler schema
RUN wget \
--output-document /schema.yaml \
https://raw.githubusercontent.com/bioboxes/rfc/master/container/short-read-assembler/input_schema.yaml
ENV CONVERT https://github.com/bronze1man/yaml2json/raw/master/builds/linux_386/yaml2json
# download yaml2json and make it executable
RUN cd /usr/local/bin && wget --quiet ${CONVERT} && chmod 700 yaml2json
ENV JQ http://stedolan.github.io/jq/download/linux64/jq
# download jq and make it executable
RUN cd /usr/local/bin && wget --quiet ${JQ} && chmod 700 jq
# Add Taskfile to /
ADD Taskfile /
In the next section you will see how you can access the task with a simple shell command.