# Data Upload

With the SEQ Platform, you can directly upload your files or use the cloud browser to browse and select the files pre-uploaded to your account.

# Direct Data Upload

Click on the "Upload" button at your homepage and select "Germline" option. Here, you can select FASTQ of VCF options.

# FASTQ Upload

Upload page

# Select the files to upload

Click the "Browse" button under the “File” to upload all the files you want to analyze. Make sure that you upload both read files for paired-end reads. See the table below for the supported input file types:

File Types Batch Sample Upload
Illumina .fastq.gz, .fq.gz Supported
Ion Torrent .bam Supported
MGI .fastq.gz, .fq.gz Supported
PacBio .bam Supported
ONT .fastq.gz, .fq.gz Supported

# Filename formatting for batch upload

# Illumina

You can upload multiple samples with two or more files. For batch uploads, we only support filenames following the “Illumina naming convention” e.g.

NA10831_ATCACG_L002_R1_001.fastq.gz
NA10831_ATCACG_L002_R2_001.fastq.gz

The filenames from Illumina platform are handled as below:

<name_field1>_<name_field2>_<lane_#>_<read_#>_<always_001>.fastq.gz

Name field 1, name field 2, lane #, and read # are used to match the corresponding files correctly. You can alter the name fields 1 and 2 without using space and underscore (_) characters. Following this naming convention, you can upload multiple samples (each with multiple fastq.gz files).

Please check the number of samples and matched files on the confirmation screen.

# MGI

You can upload multiple samples with two or more files. For batch uploads, we only support filenames following the “MGI naming convention” e.g.

V12345678_L01_16_1.fastq.gz
V12345678_L01_16_2.fastq.gz

The filenames from MGI platform are handled as below:

<flowcell_id>_<lane_#>_<barcode>_<read_#>.fq.gz

Flowcell ID, lane #, barcode, and read # are used to match the corresponding files correctly. You can alter the flowcell ID field without using space and underscore (_) characters. If you have used more than one barcode for the same sample, you need to rename the file as follows:

Original file names:
sammple1234_L01_16_1.fastq.gz
sammple1234_L01_16_2.fastq.gz
sammple1234_L01_17_1.fastq.gz
sammple1234_L01_17_2.fastq.gz

Altered file names:
sammple1234_L01_16_1.fastq.gz
sammple1234_L01_16_2.fastq.gz
sammple1234_L02_16_1.fastq.gz
sammple1234_L02_16_2.fastq.gz

In this example, barcode numbers of the last two files are changed to 16, and their lane numbers are increased by 1.

Please check the number of samples and matched files on the confirmation screen.

# IonTorrent

IonTorrent files need to be unaligned BAM files with .bam extension.

# Pacific Biosciences

Pacific Biosciences files need to be BAM files with .bam extension. You can upload multiple samples in a single batch. One file for each sample is expected.

# Oxford Nanopore Technologies

Oxford Nanopore Technologies files need to be in fastq.gz or fq.gz format. You can upload multiple samples in a single batch. One file for each sample is expected.

# Select a previous Run or Create a New Run

You can upload your samples to a new run by selecting “Create New Run” under “Run Name” and giving it a name on the “Name For New Run” field. You can also upload your samples to an existing run by selecting the previous run from the dropdown menu.

Run information is very critical for Copy Number Variation analyses. Therefore, please make sure you organize the samples under the runs in the same way as you process your sample materials. Ideally, a run is a set of samples coming from the same wet-lab and run process (flow cell, etc.).

# Choose the technology type

Choose the next-generation sequencing machine associated with the samples. Mixing different technologies in one run is not permitted.

# Choose the kit type

The SEQ platform has hundreds of different kits predefined in the system. A new kit can be defined with a set of target coordinates and a list of targeted genes. The Addition of a new kit typically takes one business day. For kit requests, please contact us through support@genomize.com.

Every kit is associated with a standardized analysis version in SEQ. Probe-based kits, primer-based kits, Illumina & MGI technology, ION torrent technology, germline analysis, or somatic analyses all have a preset analysis version.

# Hotspot VCF file upload

A predefined VCF file can be uploaded to SEQ. The VCF4.2 standard is supported. If the user does not upload a VCF as a hotspot, SEQ automatically subsets Pathogenic or Likely Pathogenic variants from ClinVar as the default hotspot for panels. For Whole Exome Sequencing, the default hotspot assignment is currently not supported.

# Advanced options

# Variant Calling parameters

A set of parameters is used to assess the quality of every variant called in a sample. Two parameters, the primary coverage threshold and the minimum alternative fraction threshold, can cause the classification of the variant as “FAILED”. The “FAILED” variant calls will not be displayed.

The variant calls with an alternative allele count less than the primary coverage threshold will be classified as “FAILED” and not be displayed.

The variant calls with alternative allele frequency less than the allele fraction threshold will be classified as “FAILED” and not be displayed.

# Other parameters

When calculating coverage metrics for the gene coverage and the kit’s on-target coverage percentages, SEQ uses four different thresholds. 1X and 5X are the preset values. The other two values may be customized by the user per upload.

Advanced options

The default values of the advanced options are set under “Site Settings” in the Settings menu.

# Submit your data

As the last step, you can upload your data by clicking the “Continue” button to start the upload process. After clicking "Continue", you will see the "Case Information" screen. Please refer to the "Genomize's AI-Assisted Variant Prioritization" section for more information on entering the case information. You can then click the "Upload" button to see the number of analyses and the list of files matched for each analysis. Please be sure that both of these pieces of information are correct and hit "Approve" to start the upload process or "Cancel" to make changes.

When you start the upload, you will see the progress for each file. Transferred samples will immediately begin processing without waiting for the entire batch to finish uploading.

SEQ Platform's upload process is secure and performs a checksum to ensure the files are transferred correctly. Please do not close the browser tab or shut down your computer. Also, please ensure that your computer will not go into sleep/hibernation mode during the upload. Otherwise, the upload process will be aborted. Our upload process is resistant to intermittent loss of internet connection.

When the upload process is completed, you will be redirected to the corresponding Run's page, and your samples will be queued for analysis. Refresh the corresponding Run page to see the last status of the analysis.

# VCF Upload

# Select a Previous Run or Create a New Run

You can upload your samples to a new run by selecting “Create New Run” under “Run Name” and giving it a name on the “Name For New Run” field. You can also upload your samples to an existing run by selecting the previous run from the dropdown menu.

# Choose the Technology Type

Choose the next-generation sequencing machine associated with the samples. If you do not know which sequencing platform is used, you can select the "Unknown" option. Mixing different technologies in one run is not permitted.

# Select the VCF Files to be Uploaded

You can upload VCF files for SNV, CNV, SV, and STR from DRAGEN or other similar tools for the same sample in any combination you choose. VCF files should have vcf.gzextension. Files are matched using the sample info field, not the file names. Multisample VCFs are not supported.

# Supported Variant Callers

Upload Type Variant Types Supported Callers File Format (Extension) Multisample Support
Small Variant SNV
INDEL
DeepVariant
Dragen
Freebayes
GATK - Haplotype Caller
Ion Torrent Variant Caller
Isaac Variant Caller
Mutect2
Pivat
Sentieon DNAscope, TNScope
VarDict
VCF (.vcf.gz) No
Copy Number Variant CNV Dragen
HifiCNV
VCF (.vcf.gz) No
Structural Variant DEL (Deletion)
DUP (Duplication)
INV (Inversion)
INS (Insertion)
BND (Breakends)*
CPX (Complex)**
Delly
Dragen
Manta
PBSV
Sentieon Long Read SV
Sniffles2
TIDDIT
VCF (.vcf.gz) No
Short Tandem Repeat STR*** ExpansionHunter
DRAGEN
TRGT
VCF (.vcf.gz) No

* BND (Breakends): 5'–3' fusion transcript events are supported, with potential formation of chimeric transcripts.

** CPX (Complex): Combination of structural variant types. Currently only DEL:INS variants are supported.

*** STR (Short Tandem Repeat): Only disease-associated STR variants in the gnomAD STR catalog are supported. For details, see the gnomAD STR catalog (opens new window).

# Unsupported Variant Callers and VCF Version Compatibility

Please note that using unlisted or unsupported variant callers may result in inaccurate VCF metrics. Unpredicted callers are categorized as "other," which may limit the capture of certain metrics. In some cases, custom integration may be required to support these callers fully. Only VCF version 4.1 or newer is supported for Copy Number, Structural and Short Tandem VCF files. If issues persist or critical data appears missing, please contact support for assistance.

# Mitochondrial Variants

The Revised Cambridge Reference Sequence (rCRS, NC_012920.1) is used as the reference for the mitochondrial genome regardless of the genome version used in VCF generation, which is the recommended sequence for clinical use (McCormick et al., 2020 (opens new window)). If the VCF file contains variants called using the older Yoruban (YRI) mitochondrial reference genome, errors may result due to incompatibility with our annotation sources. Unsupported chrM variants should also be removed before upload to prevent genome compatibility issues. For assistance, or if issues arise, please contact support.

# Submit your data

As the last step, you can upload your data by clicking the “Continue” button to start the upload process. After clicking "Continue", you will see the "Case Information" screen. Please refer to the "Genomize's AI-Assisted Variant Prioritization" section for more information on entering the case information. You can then click the "Upload" button to see the number of analyses and the list of files matched for each analysis. Please be sure that both of these pieces of information are correct and hit "Approve" to start the upload process or "Cancel" to make changes.

When you start the upload, you will see the progress for each file. Transferred samples will immediately begin processing without waiting for the entire batch to finish uploading.

SEQ Platform's upload process is secure and performs a checksum to ensure the files are transferred correctly. Please do not close the browser tab or shut down your computer. Also, please ensure that your computer will not go into sleep/hibernation mode during the upload. Otherwise, the upload process will be aborted. Our upload process is resistant to intermittent loss of internet connection.

When the upload process is completed, you will be redirected to the corresponding Run's page, and your samples will be queued for analysis. Refresh the corresponding Run page to see the last status of the analysis.

# Cloud Browser

To use the cloud browser, select or create your run, select the sequencing platform and the kit by following the directions above. After the kit selection, you will see the option to select either your “COMPUTER” or the “CLOUD BROWSER” as the data source.

Data source selection

When you select the "CLOUD BROWSER" option, click the PLUS (➕) button to open the cloud browser interface. Using the cloud browser, you can choose the files with which you want to start the analysis and click “DONE”. The rest of the process is the same as described above. Please note that there is a 3-minute duration between each cloud upload process, and files will be removed from your cloud account upon starting the analysis.