Pipelines

G-nom supports and integrated various analyses, with planned expansion in future versions. Pre-made pipelines allow users to generate data for G-nom with minimal setup overhead.

The Core pipeline

Steps to generate data for the G-nom core analyses are bundled in the core pipeline. It currently includes the following tools:

Tool	Available	Version	Notes
Repeatmasker	✅	4.1.7p1	Databases for Repeatmasker must be downloaded separately.
BUSCO	✅	5.8.3	Gene sets for BUSCO must be downloaded separately.
fCat	❌	-	Pending Bioconda Release
taXaminer	❌	-	Pending Publication

The core pipeline is not executed by G-nom itself but designed to run in a separate HPC environment (it is generally advisable to isolate compute clusters from web-facing applications). The pipeline uses nextflow and is built on top of nf-core. Nextflow supports all major HPC schedulers, including e.g. SLURM and LSF and can be set up in a few minutes. Nf-core provides a common style for the development of nextflow pipelines. The environments required to run the various tools are set up using conda.

Further usage instructions can be found in the repository. Please report issues in the pipeline repository rather then the main G-nom repository.

Automatic import

Future versions of G-nom will include HTTP endpoints to automate the import of pipeline outputs. For progress, check the issue on Github.