Putting everything together
The previous guides described building the individual parts of a biobox. The guide puts all these parts together to create a working biobox that you can give to users. This includes parsing the biobox.yaml file.
Now that we have in the previous sections integrated the file-validator,
specified the task, we will write a script to combine everything.
This script serves as an entrypoint to your container. An entrypoint points to a binary
inside your container that will be executed on a docker run
command.
Command line arguments that will be appended after docker run
will also be available to your
Entrypoint.
That means that the task in the command docker run task
will be available as the first argument
to your Entrypoint. You can configure the entrypoint in your Dockerfile with
ENTRYPOINT ["/path/to/your/script/inside/the/container"]
This script will do the following:
- Check if the provided biobox.yaml is in the correct format using the file-validator.
- Fetch the parameter provided by the input biobox.yaml.
- Run the specified task.
- Generate an output YAML file and return the assembled contigs.
Example
Let's go through the parts of the script. At the end of this section you find
the entire script and an updated Dockerfile.
The first part of the script checks the given /bbx/input/biobox.yaml
file
with the validate-biobox-file
:
#!/bin/bash
# exit script if one command fails
set -o errexit
# exit script if Variable is not set
set -o nounset
INPUT=/bbx/input/biobox.yaml
OUTPUT=/bbx/output
METADATA=/bbx/metadata
# Since this script is the entrypoint to your container
# you can access the task in `docker run task` as the first argument
TASK=$1
# Ensure the biobox.yaml file is valid
validate-biobox-file \
--input ${INPUT} \
--schema /schema.yaml \
mkdir -p ${OUTPUT}
You can savely reuse this part in your biobox implementation since all biobox RFCs have to use the validate-biobox-file binary.
The next part transforms the yaml to json and uses the jq tool to fetch the paths to fastq files. Jq is used to slice, filter, map and even to manipulate the json data by using pipes. In the example we try to access the following yaml:
---
version: 0.9.0
arguments:
- fastq:
- id: "pe"
value: "/test1/reads.fastq.gz"
type: paired
- id: "pe_1"
value: "/test2/reads.fastq.gz"
You see below that we first fetch the array in the arguments
property with .arguments[]
then we select the fastq
property and access the value
entry in
each array item with the .fastq[].value
directive. The last part | -short \(.) | tr '\n' ' '
allows to append each entry -short
and to replace the newline with a whitespace.
-short
must be specified for the velvet command. The result of the jq command is /test1/reads.fastq.gz -short /test2/reads.fastq.gz -short
.
# Parse the read locations from this file
READS=$(yaml2json < ${INPUT} \
| jq --raw-output '.arguments[] | select(has("fastq")) | .fastq[].value | "-short \(.)"' \
| tr '\n' ' ')
#create temporary directory in /tmp
TMP_DIR=$(mktemp -d)
This part access the task provided to the docker container by using egrep
on the Taskfile (see Create a Task )
# Use grep to get $TASK in /Taskfile
CMD=$(egrep ^${TASK}: /Taskfile | cut -f 2 -d ':')
if [[ -z ${CMD} ]]; then
echo "Abort, no task found for '${TASK}'."
exit 1
fi
# if /bbx/metadata is mounted create log.txt
if [ -d "$METADATA" ]; then
CMD="($CMD) >& $METADATA/log.txt"
fi
# Run the given task with eval.
# Eval evaluates a String as if you would use it on a command line.
eval ${CMD}
The last part copies the contigs to the output directory and creates the output.yaml which also is specified in the rfc.
cat << EOF > ${OUTPUT}/biobox.yaml
version: 0.9.0
arguments:
- fasta:
- id: velvet_contigs_1
value: contigs.fa
type: contigs
EOF
The final script that we call assemble
should be placed
in the same directory of your Dockerfile and looks like this:
#!/bin/bash
# exit script if one command fails
set -o errexit
# exit script if Variable is not set
set -o nounset
INPUT=/bbx/input/biobox.yaml
OUTPUT=/bbx/output
METADATA=/bbx/metadata
# Since this script is the entrypoint to your container
# you can access the task in `docker run task` as the first argument
TASK=$1
# Ensure the biobox.yaml file is valid
validate-biobox-file \
--input ${INPUT} \
--schema /schema.yaml \
mkdir -p ${OUTPUT}
# Parse the read locations from this file
READS=$(yaml2json < ${INPUT} \
| jq --raw-output '.arguments[] | select(has("fastq")) | .fastq[].value | "-short \(.)"' \
| tr '\n' ' ')
#create temporary directory in /tmp
TMP_DIR=$(mktemp -d)
# Use grep to get $TASK in /Taskfile
CMD=$(egrep ^${TASK}: /Taskfile | cut -f 2 -d ':')
if [[ -z ${CMD} ]]; then
echo "Abort, no task found for '${TASK}'."
exit 1
fi
# if /bbx/metadata is mounted create log.txt
if [ -d "$METADATA" ]; then
CMD="($CMD) >& $METADATA/log.txt"
fi
# Run the given task with eval.
# Eval evaluates a String as if you would use it on a command line.
eval ${CMD}
cp ${TMP_DIR}/contigs.fa ${OUTPUT}
# This command writes yaml into the biobox.yaml until the EOF symbol is reached
cat << EOF > ${OUTPUT}/biobox.yaml
version: 0.9.0
arguments:
- fasta:
- id: velvet_contigs_1
value: contigs.fa
type: contigs
EOF
The final Dockerfile that has now additional RUN
commands
for downloading yaml2json and jq library now looks like this:
FROM ubuntu:14.04
MAINTAINER Michael Barton, mail@michaelbarton.me.uk
ENV PACKAGES make gcc wget libc6-dev zlib1g-dev ca-certificates xz-utils
RUN apt-get update -y && apt-get install -y --no-install-recommends ${PACKAGES}
ENV ASSEMBLER_DIR /tmp/assembler
ENV ASSEMBLER_URL https://www.ebi.ac.uk/~zerbino/velvet/velvet_1.2.10.tgz
ENV ASSEMBLER_BLD make 'MAXKMERLENGTH=100' && mv velvet* /usr/local/bin/ && rm -r ${ASSEMBLER_DIR}
RUN mkdir ${ASSEMBLER_DIR}
RUN cd ${ASSEMBLER_DIR} &&\
wget --quiet ${ASSEMBLER_URL} --output-document - |\
tar xzf - --directory . --strip-components=1 && eval ${ASSEMBLER_BLD}
# Locations for biobox file validator
ENV VALIDATOR /bbx/validator/
ENV BASE_URL https://s3-us-west-1.amazonaws.com/bioboxes-tools/validate-biobox-file
ENV VERSION 0.x.y
RUN mkdir -p ${VALIDATOR}
# download the validate-biobox-file binary and extract it to the directory $VALIDATOR
RUN wget \
--quiet \
--output-document -\
${BASE_URL}/${VERSION}/validate-biobox-file.tar.xz \
| tar xJf - \
--directory ${VALIDATOR} \
--strip-components=1
ENV PATH ${PATH}:${VALIDATOR}
# download the assembler schema
RUN wget \
--output-document /schema.yaml \
https://raw.githubusercontent.com/bioboxes/rfc/master/container/short-read-assembler/input_schema.yaml
ENV CONVERT https://github.com/bronze1man/yaml2json/raw/master/builds/linux_386/yaml2json
# download yaml2json and make it executable
RUN cd /usr/local/bin && wget --quiet ${CONVERT} && chmod 700 yaml2json
ENV JQ http://stedolan.github.io/jq/download/linux64/jq
# download jq and make it executable
RUN cd /usr/local/bin && wget --quiet ${JQ} && chmod 700 jq
# Add Taskfile to /
ADD Taskfile /
# Add assemble script to the directory /usr/local/bin inside the container.
# /usr/local/bin is appended to the $PATH variable what means that every script
# in that directory will be executed in the shell without providing the path.
ADD assemble /usr/local/bin/
ENTRYPOINT ["assemble"]
Furthermore the Dockerfile sets the Entrypoint to the assemble
script so that it will be executed on docker run
.
If you have followed the examples you should now have the following directory structure:
-
/Dockerfile
-
/assemble
-
/Taskfile
If you run now docker build -t velvet .
in the same directory, you should have a biobox that accepts the
tasks default
and careful
.