Importing data¶

Experimental interface

The visual interface for imports with uploads was added in the recent G-nom rework. It is functional but expected to undergo visual rennovations in the near future.

All imported data references a species NCBI ID, as outlined in the data hierarchy. The first step of every import is selecting the NCBI taxonomy ID in the dialogue seen below.

Import first section — Taxonomy ID dialogue

Following the data hierarchy, your imported data is either a new assembly or references an existing assembly (e.g. mappings or annotations). The next dialogue allows you to decide between importing a new assembly or adding data for an exisiting one.

Import second section — Switching between new / existing genome assembly

The following sections outline the import options available:

Genome assemblies
Genomic annotations
Genomic mappings
BUSCO analyses

Importing Assemblies¶

Info

Future version of G-nom will automatically allow you to automatically import reference assemblies from IDs. You can track progress here.


flowchart TD
    id1(Upload assembly) --> id2(Calculate assembly statistics)
    id1(Upload assembly) --> id3(bgzip file for storage)
    id3(bgzip file for storage) --> id4(Generate FASTA index)
    id4(Generate FASTA index) --> id5(JBrowse import)
    id5(JBrowse import) --> id6(Mark assembly completed)
    id2(Calculate assembly statistics) --> id6(Mark assembly completed)

Specify the NCBI taxonomy ID of the species corresponding to your assembly as outline above
Under Import or select a new assembly, tick the Import a new assembly tickbox
Enter a name for your assembly, this will later be displayed in the assembly page header and on the assembly card
If your assembly is a reference assembly sourced from a database like NCBI or GENCODE, specify the stable ID of the assembly in the reference assembly ID field
At the bottom of the page click the start import button

Importing Annotations¶


flowchart TD
    id1(Upload assembly) --> id2(gff3sort)
    id2(gff3sort) --> id3(bgzip file for storage)
    id3(bgzip file for storage) --> id4(Generate tabix adapters)
    id4(generate tabix adapters) --> id5(Mark annotation complete)

1. Specify the NCBI taxonomy ID of the species corresponding to your annotation 2. Select a pre-existing assembly 3. Under Import Analyses > Annotation, select a GFF file to upload and specify a custom name for the annotation. This name will be used as track label in the genome browser.

Importing Mappings¶

Warning

Mapping to larger genome can lead to large mapping files exceeding 20GB. Depending on your webserver configuration, you may encounter problems uploading. In this case, dispatch the import job manually in the server filesystem.


flowchart TD
    id1(Upload mapping) --> id2(Compress if uncompressed)
    id2(Compress if uncompressed) --> id3(Index)
    id3(bgzip file for storage) --> id5(Mark mapping complete)

Specify the NCBI taxonomy ID of the species corresponding to your mapping
Select a pre-existing assembly
Under Import Analyses > Mapping, select a SAM or BAM file to upload and specify a custom name for the mapping. This name will be used as track label in the genome browser.

Importing BUSCO analyses¶

Info

The BUSCO analysis can be generated automatically using the G-nom core pipeline.

Specify the NCBI taxonomy ID of the species corresponding to your mapping
Select a pre-existing assembly
In your BUSCO output locate the file matching short_summary.specific.<lineage_dataset>.<output_folder>.json
Under Import Analyses > BUSCO, select the aforementioned file and specify a custom name for the analysis.

Importing fCAT analyses¶

Currently under development.

Importing Repeatmasker analyses¶

Currently under development.

Importing taXaminer analyses¶

Currently under development.