System requirements and setup
Short description
In short, the requirements of the pipeline are: - suitable hardware and operating system, typically a Linux system, but with some adaptation other operating systems may work. - the pipeline itself in the form of a cloned git deposit - Snakemake, preferably installed with the python-based Conda package manger. - Optionally, Singularity containers management system, for enhanced reproducibility and stability - A taxonomy reference database in a suitable format, that has ton be pre-processed once before the first execution of the pipeline
Detailed description
Note
Provided command-line examples are given as examples and are valid for a standard unix bash terminal.
Operating system and system resource
Operating system
RST4ABM was designed on Ubuntu 18.04 (Linux) but should compatible with all system capable of installing the dependencies listed on this page.
RAM memory
Some tools embedded in RST4ABM can be quite demanding on RAM memory. The actual requirement depends on your dataset and is influenced by parameters set in the config file. The bottleneck usually is the taxonomic assignment. Factors which can increase the RAM requirement are:
the number of samples
the bacterial diversity within your samples
the number of cores
Hint
In practice with normal sized datasets (dozens to hundreds of samples of human microbiome datasets), 16 to 32 GB are usually required.
Software dependencies
Git
What for?
Git is required to download (clone) RSP4ABM.
Install
Git is available by default in operating systems. If not, follow the indications on the git installation page..
Test
To test if git is installed:
# To test if git is installed, make it print its version. It will fail if it is not installed
$ git --version
Clone RSP4ABM
Once all dependencies installed and working, RSP4ABM can be cloned with git:
git clone https://github.com/metagenlab/microbiome16S_pipeline.git --recursive
Hint
Please take note of the path of the directory in which you cloned RSP4ABM. You will need it to execute the pipeline.
Conda
What for?
Conda is a convenient python-based package and environment manager. It enables the easy installation of Snakemake. Furthermore, it can be used (as an alternative to Singularity containers) by Snakemake to retrieve all the packages required for the execution of the RSP4ABM.
Install
Test
To test if Conda is installed:
# To test if Conda is installed, make it print its version. It will fail if it is not installed
conda --version
Mamba
What for?
Mamba is an alternative to standard conda managers which It enables the easy installation of Snakemake. Furthermore, it can be used (as an alternative to Singularity containers) by Snakemake to retrieve all the packages required for the execution of the RSP4ABM.
Install
Installed Mamba with Conda:
mamba install xtensor-r -c conda-forge
Test
To test if Mamba is installed:
# To test if Mamba is installed, make it print its version. It will fail if it is not installed
mamba --version
Snakemake
What for?
RSP4ABM is a Snakemake 1 pipeline. Therefore, it must be installed and available for execution of the pipeline.
Install
Follow indications on Snakemake installation page. It is good practice to create a dedicated Conda environment for Snakemake. Even if the the pipeline should work with newer versions, it was fully tested with Snakemake version 5.26.1.
To install Snakemake in a dedicated “Snakemake” environment:
# Install Snakemake version 5.26.1 in a environment named "snakemake5261"
mamba create -c bioconda -n snakemake5261 snakemake=5.26.1
Test
To test if Snakemake is installed:
# To test if Snakmeake is installed, make it print its version. It will fail if it is not installed
snakemake --version
Singularity
What for?
Singularity is a container plateform. It enables to create, retrieve and install containers, which are predefined transposable sets of software. The installation of Singularity is optional for most of the functions in RSP4ABM except for the *in silicod* prediction pipeline for which it is a requirement. Indeed, the user can choose either Conda or Singularity to retrieve all the required tools. Yet, it is recommended running RSP4ABM with Singularity containers since it enables the best level of reproducibility 2.
Install
Follow indications on Singularity installation page
Test
To test if Singularity is installed:
# To test if Singularity is installed, make it print its version. It will fail if it is not installed
singularity --version
Reference database
The very last step of setup and before the first execution of the pipeline, a dedicated workflow must be executed to prepare and format the reference taxonomy database. For this, refer to Taxonomic reference database preprocessing page.