Use the rfc

The value of bioboxes is that software of the same biobox type can be used interchangably because they have the same interface. This means that the velvet assember biobox can be replaced with the ray assembler biobox without changing anything. This guides is about the RFC documents which describe how a biobox interface should behave.

The biobox RFC documents describe how data should be given to a container and what format it should be returned in. A standardised interface means each biobox of the same type can be be swapped with each other without the user having to change their pipelines. The interface is represented by a YAML file and each assembler has biobox specific inputs and outputs.

  1. The first step is to check if your tool would work with the provided types.

  2. If you can not implement your biobox then just create a new issue. The bioboxes community will help.

  3. Each YAML that will be provided to your container should be tested with our file-validator binary. The command line interface for this tool is:

    validate-biobox-file --schema=schema.yaml --input=input.yaml
    

    Each RFC links to a schema file that can be used with a validator to ensure the data the user has provided is valid. There is a description and downloads available for the biobox file validator. By using the schema.yaml and the validate-biobox-file binary you can ensure the yaml provided to your assembler always follows the types and structure defined in the rfc.

  4. In the RFC you find the definition where the input biobox.yaml and data should be mounted and where you will find the output biobox.yaml (see section Putting everything together).

Example

Let's go through the points specified in this section:

  1. The types in the assembler biobox specification are the fastq and fragment_size types. If your assembler works with just providing the fastq parameter then you don't have to include the fragment_size parameter. The velvet example works with just the provided FASTQ type.

  2. Since the biobox works with the provided types there is no need to discuss this in an issue.

  3. By specifying the below instructions in the Dockerfile we download the validate-biobox-file binary.


# Locations for biobox file validator
ENV VALIDATOR /bbx/validator/
ENV BASE_URL https://s3-us-west-1.amazonaws.com/bioboxes-tools/validate-biobox-file
ENV VERSION  0.x.y
RUN mkdir -p ${VALIDATOR}

# download the validate-biobox-file binary and extract it to the directory $VALIDATOR
RUN wget \
      --quiet \
      --output-document -\
      ${BASE_URL}/${VERSION}/validate-biobox-file.tar.xz \
    | tar xJf - \
      --directory ${VALIDATOR} \
      --strip-components=1

ENV PATH ${PATH}:${VALIDATOR}

The next part downloads the schema for the assembler biobox specification.


# download the validate-biobox-file binary and extract it to the directory $VALIDATOR
RUN wget \
      --quiet \
      --output-document -\
      ${BASE_URL}/${VERSION}/validate-biobox-file.tar.xz \
    | tar xJf - \
      --directory ${VALIDATOR} \
      --strip-components=1

The above Dockerfile together with the file-validator and the initial Dockerfile in the last section looks like this:

FROM ubuntu:14.04
MAINTAINER Michael Barton, mail@michaelbarton.me.uk

ENV PACKAGES make gcc wget libc6-dev zlib1g-dev ca-certificates xz-utils
RUN apt-get update -y && apt-get install -y --no-install-recommends ${PACKAGES}

ENV ASSEMBLER_DIR /tmp/assembler
ENV ASSEMBLER_URL https://www.ebi.ac.uk/~zerbino/velvet/velvet_1.2.10.tgz
ENV ASSEMBLER_BLD make 'MAXKMERLENGTH=100' && mv velvet* /usr/local/bin/ && rm -r ${ASSEMBLER_DIR}

RUN mkdir ${ASSEMBLER_DIR}
RUN cd ${ASSEMBLER_DIR} &&\
    wget --quiet ${ASSEMBLER_URL} --output-document - |\
    tar xzf - --directory . --strip-components=1 && eval ${ASSEMBLER_BLD}

# Locations for biobox file validator
ENV VALIDATOR /bbx/validator/
ENV BASE_URL https://s3-us-west-1.amazonaws.com/bioboxes-tools/validate-biobox-file
ENV VERSION  0.x.y
RUN mkdir -p ${VALIDATOR}

# download the validate-biobox-file binary and extract it to the directory $VALIDATOR
RUN wget \
      --quiet \
      --output-document -\
      ${BASE_URL}/${VERSION}/validate-biobox-file.tar.xz \
    | tar xJf - \
      --directory ${VALIDATOR} \
      --strip-components=1

ENV PATH ${PATH}:${VALIDATOR}

# download the assembler schema
RUN wget \
    --output-document /schema.yaml \
    https://raw.githubusercontent.com/bioboxes/rfc/master/container/short-read-assembler/input_schema.yaml

This downloads the biobox file validator and adds it to the $PATH. The corresponding biobox assembler schema is also downloaded. If you run the below commands in the same directory you have placed your Dockerfile and type validate-biobox-file, you will see its usage output.

docker build -t velvet .
docker run -it velvet /bin/bash