$ docker pull gitobioinformatics/fastqc
$ docker pull gitobioinformatics/trimmomatic
$ docker pull gitobioinformatics/trinity
$ docker pull gitobioinformatics/bowtie2
Gito is a lightweight and safe Docker image based on Alpine containing bioinformatics tools ready to be used. It's an open
source project that provides an easy and modular method for building, distributing and replicating pipelines, those of which
hava a base image that forms the foundation layer on which each container is built. The base image is only 4.41 MB in size
and contains only the OS libraries and language dependencies required to run the tools, but robust enough to run
bioinformatics pipelines with security.
Gito was conceived with the "start with the minimum and add dependencies as needed" philosophy in mind, meaning we
encourage separate containers for each bioinformatics tool configured with the minimum to run. Gito uses musl libc as the
standard C library for Linux-based systems. Musl libc is free, lightweight, security-oriented and smaller in size compared to
the Glibc library. The source codes for bioinformatics tools, written in C, C++ and Java, have been compiled for native Alpine
executables.
To download and run this Docker image, you first need to set up Docker on your machine.
The easiest way to start with Docker is to install the Docker Desktop (https://www.docker.com/products/docker-desktop) by
simply downloading and clicking the installer which is available for both Mac OSX and Windows. For Linux users, follow the
instructions here, and to use the CLI without sudo, follow the post-installation steps.
The image can be downloaded and executed through the Docker CLI with the following commands:
1. Pull (download) the Docker images:
$ docker pull gitobioinformatics/fastqc
$ docker pull gitobioinformatics/trimmomatic
$ docker pull gitobioinformatics/trinity
$ docker pull gitobioinformatics/bowtie2
2. Change the working directory to a project and run the following commands:
$ docker run -u $(id -u):$(id -g) -v $PWD:/data
-w /data --rm gitobioinformatics/fastqc
$ docker run -u $(id -u):$(id -g) -v $PWD:/data
-w /data --rm gitobioinformatics/trinity
$ docker run -u $(id -u):$(id -g) -v $PWD:/data
-w /data --rm gitobioinformatics/trimmomatic
$ docker run -u $(id -u):$(id -g) -v $PWD:/data
-w /data --rm gitobioinformatics/bowtie2
You can download and deploy this Docker image with your cloud provider such as DigitalOcean, Amazon Web Services,
HP Enterprise, IBM, Microsoft Azure Cloud or others.
Gito was used to reproduce the pipeline present in the work of Hernández-Fernández (2017), which is the first de novo
transcriptome assembly of Eretmochelys Imbricate published. Instructions used and execution results can be accessed here.
To build Gito images from source, you can use the following process:
1. Install latest version of Docker:
For Linux users, you can follow the instructions here to manage Docker as non-root user. Otherwise, execute gitobld
script using sudo.
2. Get the source code:
$ git clone https://github.com/gitobioinformatics/gito.git
$ cd gito
3. Create a private and public key to sign the apk packages:
$ mkdir -p keys
$ openssl genrsa -out keys/packager_key.rsa 2048
$ openssl rsa -in keys/packager_key.rsa -pubout -out keys/packager_key.rsa.pub
4. Run the helper script to build the images:
$ ./gitobld build -rS all
If you are using sudo to execute, use the following command:
$ sudo ./gitobld build -U $(id -un) -rS all
You can build individual images by using the tool name instead of all.
Directory | Description |
---|---|
base | Contains Dockerfile used to build Gito base image. |
build | Utilities to create Docker images using Gito as base. |
examples | Contains examples of bioinformatics pipelines using Gito images. |
images | Contains images used by this README. |
library | Dockerfiles used to build several tools using Alpine packages from ports directory. |
ports | Contains APKBUILD files for bioinformatic tools. |
Due to the extremely small size, Gito has a smaller attack surface compared to the containers that are based on larger
images. To assess safety, we used Quay Security Scanner to assess vulnerabilities in Gito. Quay identifies insecure packages
by matching the metadata against Common Vulnerabilities and Exposures (CVE) database. The results of the analysis were
published in the Quay portal, and the scanning of each tool can be accessed below:
Image | Scanning Result |
---|---|
Bowtie2 | Quay Security ScanPassed |
FastQC | Quay Security ScanPassed |
Jellyfish | Quay Security ScanPassed |
Prokka | Quay Security ScanPassed |
Salmon | Quay Security ScanPassed |
Samtools | Quay Security ScanPassed |
SPAdes | Quay Security ScanPassed |
SRA Tools | Quay Security ScanPassed |
Trimmomatic | Quay Security ScanPassed |
Trinity | Quay Security ScanPassed |
MIT License. See LICENSE for more information.
Gito is open-source (see LICENSE), and we welcome contributions from anyone who is interested in contributing. To
contribute, please make a pull request. The issue tracker for Gito is also available on GitHub.
The word "gito" is used by people from the North of Brazil when they refer to something "very small, smaller than normal".
The logo of the project was inspired by the tucuxi (Sotalia fluviatilis), which is one of the smallest dolphins of the Delphinidae
family and the only one of this family that lives in the rivers. It is a very agile dolphin and although it is an endangered
species, it is still possible to find it in Amazonian rivers jumping in the air.
This work has been supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq (grants
149985/2018-5; 129954/2018-7).