# Data Upload
With the SEQ Platform, you can directly upload your files or use the cloud browser to browse and select the files pre-uploaded to your account.
# Direct Data Upload
Click on the "Upload" button at your homepage and select "Germline" option. Here, you can select FASTQ of VCF options.
# FASTQ Upload
# Select the files to upload
Click the "Browse" button under the “File” to upload all the files you want to analyze. Make sure that you upload both read files for paired-end reads. See the table below for the supported input file types:
File Types | Batch Sample Upload | |
---|---|---|
Illumina | .fastq.gz, .fq.gz | Supported |
Ion Torrent | .bam | Supported |
MGI | .fastq.gz, .fq.gz | Supported |
PacBio | .bam | Supported |
ONT | .fastq.gz, .fq.gz | Supported |
# Filename formatting for batch upload
# Illumina
You can upload multiple samples with two or more files. For batch uploads, we only support filenames following the “Illumina naming convention” e.g.
NA10831_ATCACG_L002_R1_001.fastq.gz
NA10831_ATCACG_L002_R2_001.fastq.gz
The filenames from Illumina platform are handled as below:
<name_field1>_<name_field2>_<lane_#>_<read_#>_<always_001>.fastq.gz
Name field 1, name field 2, lane #, and read # are used to match the corresponding files correctly. You can alter the name fields 1 and 2
without using space and underscore (_
) characters. Following this naming convention, you can upload multiple samples (each with multiple fastq.gz files).
Please check the number of samples and matched files on the confirmation screen.
# MGI
You can upload multiple samples with two or more files. For batch uploads, we only support filenames following the “MGI naming convention” e.g.
V12345678_L01_16_1.fastq.gz
V12345678_L01_16_2.fastq.gz
The filenames from MGI platform are handled as below:
<flowcell_id>_<lane_#>_<barcode>_<read_#>.fq.gz
Flowcell ID, lane #, barcode, and read # are used to match the corresponding files correctly. You can alter the flowcell ID field without using space and underscore (_
) characters.
If you have used more than one barcode for the same sample, you need to rename the file as follows:
Original file names:
sammple1234_L01_16_1.fastq.gz
sammple1234_L01_16_2.fastq.gz
sammple1234_L01_17_1.fastq.gz
sammple1234_L01_17_2.fastq.gz
Altered file names:
sammple1234_L01_16_1.fastq.gz
sammple1234_L01_16_2.fastq.gz
sammple1234_L02_16_1.fastq.gz
sammple1234_L02_16_2.fastq.gz
In this example, barcode numbers of the last two files are changed to 16, and their lane numbers are increased by 1.
Please check the number of samples and matched files on the confirmation screen.
# IonTorrent
IonTorrent files need to be unaligned BAM files with .bam
extension.
# Pacific Biosciences
Pacific Biosciences files need to be BAM files with .bam
extension. You can upload multiple samples in a single batch. One file for each sample is expected.
# Oxford Nanopore Technologies
Oxford Nanopore Technologies files need to be in fastq.gz
or fq.gz
format. You can upload multiple samples in a single batch. One file for each sample is expected.
# Select a previous Run or Create a New Run
You can upload your samples to a new run by selecting “Create New Run” under “Run Name” and giving it a name on the “Name For New Run” field. You can also upload your samples to an existing run by selecting the previous run from the dropdown menu.
Run information is very critical for Copy Number Variation analyses. Therefore, please make sure you organize the samples under the runs in the same way as you process your sample materials. Ideally, a run is a set of samples coming from the same wet-lab and run process (flow cell, etc.).
# Choose the technology type
Choose the next-generation sequencing machine associated with the samples. Mixing different technologies in one run is not permitted.
# Choose the kit type
The SEQ platform has hundreds of different kits predefined in the system. A new kit can be defined with a set of target coordinates and a list of targeted genes. The Addition of a new kit typically takes one business day. For kit requests, please contact us through support@genomize.com.
Every kit is associated with a standardized analysis version in SEQ. Probe-based kits, primer-based kits, Illumina & MGI technology, ION torrent technology, germline analysis, or somatic analyses all have a preset analysis version.
# Hotspot VCF file upload
A predefined VCF file can be uploaded to SEQ. The VCF4.2 standard is supported. If the user does not upload a VCF as a hotspot, SEQ automatically subsets Pathogenic or Likely Pathogenic variants from ClinVar as the default hotspot for panels. For Whole Exome Sequencing, the default hotspot assignment is currently not supported.
# Advanced options
# Variant Calling parameters
A set of parameters is used to assess the quality of every variant called in a sample. Two parameters, the primary coverage threshold and the minimum alternative fraction threshold, can cause the classification of the variant as “FAILED”. The “FAILED” variant calls will not be displayed.
The variant calls with an alternative allele count less than the primary coverage threshold will be classified as “FAILED” and not be displayed.
The variant calls with alternative allele frequency less than the allele fraction threshold will be classified as “FAILED” and not be displayed.
# Other parameters
When calculating coverage metrics for the gene coverage and the kit’s on-target coverage percentages, SEQ uses four different thresholds. 1X and 5X are the preset values. The other two values may be customized by the user per upload.
The default values of the advanced options are set under “Site Settings” in the Settings menu.
# Submit your data
As the last step, you can upload your data by clicking the “Continue” button to start the upload process. After clicking "Continue", you will see the "Case Information" screen. Please refer to the "Genomize's AI-Assisted Variant Prioritization" section for more information on entering the case information. You can then click the "Upload" button to see the number of analyses and the list of files matched for each analysis. Please be sure that both of these pieces of information are correct and hit "Approve" to start the upload process or "Cancel" to make changes.
When you start the upload, you will see the progress for each file. Transferred samples will immediately begin processing without waiting for the entire batch to finish uploading.
SEQ Platform's upload process is secure and performs a checksum to ensure the files are transferred correctly. Please do not close the browser tab or shut down your computer. Also, please ensure that your computer will not go into sleep/hibernation mode during the upload. Otherwise, the upload process will be aborted. Our upload process is resistant to intermittent loss of internet connection.
When the upload process is completed, you will be redirected to the corresponding Run's page, and your samples will be queued for analysis. Refresh the corresponding Run page to see the last status of the analysis.
# Small Variant Detection
SEQ has standard analysis versions pre-setup for every kit defined in the system. Data processing and variant calling are handled differently based on the sample type, sequencing platform, and selected analysis pipeline.
# Analysis versions for long-read WGS
Name | Explanation | Alignment/ variant calling | CMRG Support* | SV Calling | Phasing | Available Genome Versions | Available Platforms |
---|---|---|---|---|---|---|---|
Sentieon minimap2-DNAScope-SV Calling-Phasing-germline | Optimized for ONT long-read WGS samples | Sentieon minimap2 / DNAScope | No | Sention LongreadSV | VariantPhaser | hg38 | Oxford Nanopore |
PacBio pbmm2-DeepVariant-SV calling-Phasing-germline | Optimized for Pacbio long-read WGS samples | Pacbio pbmm2 / Deepvariant | **Paraphase | PBSV, HifiCNV, TRGT | HiPhase | hg38 | PacBio |
* CMRG: Challenging Medically Relevant Genes (Wagner et al., 2022 (opens new window))
** Paraphase: PacBio's recommended tool for detecting segmental duplication regions and medically relevant genes, such as SMN1/SMN2 and HBA1/HBA2 (Chen et al., 2024 (opens new window)). See Targeted Variant Calling for more details.
# Analysis versions for short-read WGS
Name | Explanation | Alignment/ variant calling | BAM processing | SV Calling | Available Genome Versions | Available Platforms |
---|---|---|---|---|---|---|
Sentieon BWA-DNAScope-SV Calling-germline | Optimized for WGS samples prepared with a PCR enrichment step | Sentieon BWA / DNAScope | MarkDuplicate | Delly, Manta, Tiddit, ExpansionHunter | hg19, hg38 | Illumina, MGI |
Sentieon BWA (PCRfree)-DNAScope-SV Calling-germline | Optimized for WGS samples prepared without a PCR enrichment step | Sentieon BWA / DNAScope | MarkDuplicate | Delly, Manta, Tiddit, ExpansionHunter | hg19, hg38 | Illumina, MGI |
BWA-Freebayes-SV Calling-germline | Optimized for WGS samples | BWA / Freebayes | PCR Dedup + Indel Realignment | Delly, Manta, Tiddit, ExpansionHunter | hg19, hg38 | Illumina, MGI |
BWA-GATK-SV Calling-germline | Optimized for WGS samples | BWA / GATK | PCR Dedup + Indel Realignment | Delly, Manta, Tiddit, ExpansionHunter | hg19, hg38 | Illumina, MGI |
# Analysis versions for capture based targeted panels, including WES
Name | Explanation | Alignment/ variant calling** | BAM processing | CNV Calling (Cohort Mode) | Available Genome Versions | Available Platforms |
---|---|---|---|---|---|---|
Sentieon BWA-DNAScope- germline | Optimized for capture-based germline kits. | Sentieon BWA / DNAScope | MarkDuplicate | GATK-CNV + delly* | hg19, hg38 | Illumina, MGI |
BWA-Freebayes-PCR dedup - germline | Optimized for capture-based germline kits. | BWA / Freebayes | PCR Dedup | GATK-CNV + delly* | hg19, hg38 | Illumina, MGI |
BWA-Freebayes-PCR dedup-Indel Realignment - germline | Optimized for capture-based germline kits. Default analysis for most kits. | BWA / Freebayes | PCR Dedup + Indel Realignment | GATK-CNV + delly* | hg19, hg38 | Illumina, MGI |
BWA-GATK-PCR dedup-Indel Realignment - germline | Optimized for capture-based germline kits. Uses a GATK variant caller. | BWA / GATK | PCR Dedup + Indel Realignment | GATK-CNV + delly* | hg19, hg38 | Illumina, MGI |
BWA-Freebayes High Sensitivity-PCR dedup-Indel Realignment - germline | Optimized for capture-based germline kits to call variants with a low fraction (<20%). | BWA / Freebayes | PCR Dedup + Indel Realignment | GATK-CNV + delly* | hg19, hg38 | Illumina, MGI |
* Delly is utilized only for panels with more than 100 genes.
** Mitochondrial analysis is performed following GATK best practices, and gnomAD filters are applied (for details, see Mitochondrial Calling).
# Analysis versions for amplicon based panels
Name | Explanation | Alignment/ variant calling | BAM processing | Primer Trimming | Available Genome Versions | Available Platforms |
---|---|---|---|---|---|---|
BWA-Freebayes-BamKeser-Indel Realignment - germline | Optimized for amplicon-based germline kits. | BWA / Freebayes | Indel Realignment | BamKeser | hg19, hg38 | Illumina, MGI |
BWA-Freebayes-BamKeser - germline | Optimized for amplicon-based germline kits. Does not perform indel realignment step. | BWA / Freebayes | NA | BamKeser | hg19, hg38 | Illumina, MGI |
CVD Specific-BWA-Bowtie2-Freebayes-Bamkeser - germline | Optimized for amplicon-based CVD kits. | BWA + Bowtie2 / Freebayes | NA | BamKeser | hg19 | Illumina, MGI |
Thalassemia Specific-BWA-Freebayes-Bamkeser - germline | Optimized for amplicon-based thalassemia kits. | BWA / Freebayes | NA | BamKeser | hg19 | Illumina, MGI |
CAH Specific v2-BWA-Freebayes-Bamkeser - germline | Optimized for amplicon-based CAH kits. | BWA / Freebayes | NA | BamKeser | hg19 | Illumina, MGI |
**BamKeser is our in-house designed and precisely working primer trimming tool.
# Analyis versions for IonTorrent uBAM samples
Name | Explanation | Alignment/ variant calling | BAM processing | Primer Trimming | Available Genome Versions | Available Platforms |
---|---|---|---|---|---|---|
Torrent Suite 5.8 - No Trimming - Default Parameters v2 - germline | Optimized for IonTorrent samples. Does not perform primer trimming. | Torrent Suite 5.8 | NA | NA | hg19 | IonTorrent |
Torrent Suite 5.8 - No Trimming - BRCA specific - germline | Optimized for IonTorrent BRCA samples. | Torrent Suite 5.8 | NA | NA | hg19 | IonTorrent |
Torrent Suite 5.8 - No Trimming - CFTR specific - germline | Optimized for IonTorrent CFTR samples. | Torrent Suite 5.8 | NA | NA | hg19 | IonTorrent |
**BamKeser is our in-house designed and precisely working primer trimming tool.
After the variant calling, the Genomize-SEQ processes the resulting VCF file to form a Genomize standard VCF file which can be downloaded through the platform. The Genomize standard VCF line will have gstd=1 in the info field. Standardization of the VCF file includes the following important steps
- Minimal variant representation: Some callers produce redundant bases at the left-hand or right-hand side of either alternative or reference allele. This redundancy has to be removed to obtain the correct annotation of variants in the subsequent steps.
# Mitochondrial Variant Detection
The mitochondrial variant calling pipeline follows GATK best practices (opens new window), using the revised Cambridge Reference Sequence (NC_012920.1) as the reference mitochondrial genome and the Mutect2 as variant caller.
Filtering and genotype assignment for mitochondrial variants are performed in line with the recommendations from gnomAD (opens new window) . Briefly, SNVs with Variant Allele Frequency (VAF) below 0.01 were removed. Variants with VAF equal to or higher than 0.95 are classified as homoplasmic. Variants with VAF lower than 0.95 are classified as heteroplasmic. Any variants in previously reported artifact-prone sites (positions 301, 302, 310, 316, 3107, and 16182) are ignored.
In targeted panel analyses (including WES), mitochondrial variant detection is only available for panels with distinct chrM targets in their bed files.
In WGS analyses, mitochondrial variant detection uses the same variant callers and filtering steps as those applied to other chromosomal regions. If your WGS sample preparation is optimized for mitochondrial DNA isolation and enrichment, please contact support for further assistance.
# Structural Variant Detection
SEQ Platform can detect and report various structural variants listed below. For single samples, maximum SV size is limited to 100,000 bps.
For cohort CNV analysis in WGS samples, please refer to Copy Number Variations section
# Supported Structural Variants
SV-Group | Abbreviation | Supporting callers |
---|---|---|
Deletion | DEL | Manta, Tiddit, Delly, PBSV, HifiCNV, Paraphase |
Duplication | DUP | Manta, Tiddit, Delly, PBSV, HifiCNV, Paraphase |
Insertion | INS | Manta, Delly, PBSV |
Inversion | INV | Tiddit, Delly, PBSV |
Breakend (Unresolved)* | BND | Manta, Tiddit, Delly, PBSV |
Short Tandem Repeat | STR | ExpansionHunter, TRGT |
Complex** | CPX | Manta, Tiddit, Delly |
* Structural variants that cannot be classified into any other type are listed as BND
** If more than one type of SV is detected in combination, they are classified as CPX variant. ex: DEL:INS, DUP:INV, etc. Currently only DEL:INS variants are supported.
# Variant callers used for SV detection
Tool | Algorithm | Supported SV-types |
---|---|---|
Manta1 (opens new window) | Manta (opens new window) divides the SV and indel discovery process into two primary steps: 1. Scanning the genome to find SV associated regions. 2. Analysis, scoring and output of SVs found in these regions. | - Deletions - Duplications - Deletion-Insertions - Insertions - Breakends |
Delly2 (opens new window) | DELLY (opens new window), short-range and long-range paired-end libraries are analyzed for discordantly mapped read pairs. Paired-end predicted structural variants are then refined using split-reads and reported at single-nucleotide breakpoint resolution. In addition to general parameters applied to SVs, insert size cutoff for split reads ≥ 15 bps, minimum paired-end MAPQ ≥ 20 filters are used for DELLY. | - Deletions - Duplications - Deletion-Insertions - Insertions - Inversions - Breakends |
Tiddit3 (opens new window) | TIDDIT (opens new window), detects structural variants by examining sequences for discordant pairs, split reads, and supplementary alignments, which must exceed a specified quality threshold. It uses a clustering method similar to DBSCAN, where a cluster forms if sufficient signals are within a designated distance. Clusters lacking enough signals are discarded; otherwise, they are included in the output regardless of other quality filters. | - Deletions - Duplications - Inversions - Breakends |
ExpansionHunter4 (opens new window) | ExpansionHunter (opens new window) is a tool designed for targeted genotyping of short tandem repeats (STRs) and flanking variants. It operates by analyzing BAM files to find reads that either span, flank, or are fully contained within each targeted repeat. This precise approach allows for effective characterization of these genomic elements, tailored specifically to identify and quantify repeat variations. | Short Tandem Repeats |
PBSV5 (opens new window) | PBSV (opens new window) is a suite of tools to call and analyze structural variants in diploid genomes from PacBio single molecule real-time sequencing (SMRT) reads. | - Deletions - Duplications - Insertions - Inversions - Breakends |
HifiCNV6 (opens new window) | HifiCNV (opens new window) is a cutting-edge tool specifically designed for calling copy number variants (CNVs) using high-fidelity (HiFi) sequencing reads. It offers optimized segmentation and calling for germline whole genome sequencing (WGS) using HiFi reads, ensuring accurate results. The tool automatically estimates and corrects GC-bias, which enhances the reliability of the data. | - Deletions - Duplications |
TRGT7 (opens new window) | TRGT (opens new window) is a tool for targeted genotyping of tandem repeats from PacBio HiFi data. In addition to the basic size genotyping, TRGT profiles sequence composition, mosaicism, and CpG methylation of each analyzed repeat and visualization of reads overlapping the repeats. | Short Tandem Repeats |
Paraphase8 (opens new window) | Paraphase (opens new window) is a Python tool that takes HiFi aligned BAMs as input (whole-genome or enrichment), phases haplotypes for genes of the same family, determines copy numbers and makes phased variant calls. Paraphase supports 160 segmental duplication regions (opens new window). | - Deletions - Duplications |
# Filtering Parameters Applied to SVs
Allele Fraction (AF) Filter: SVs with fractions lower than 0.2 are filtered out.
Pass Filter: SVs without the “PASS” flag assigned by their respective callers are filtered out.
Depth of Coverage (DP) Filter: SVs with fewer than 10 supporting reads are filtered out.
No Call Filter: SVs that have a 'no call' status in tandem repeat VCFS, ensuring that only fully determined genotypes are analyzed.
The Same Gene and Same Oriented Breakpoint (BND) Filter: Structural variants that involve the same gene and are oriented in the same direction are filtered out to reduce complexity and focus on more relevant genomic rearrangements.
Chromosome Filter: SVs that are not on chromosomes 1-22, X are filtered out.
Genomic Region Filter: Variants overlapping with predefined blacklisted (Amemiya et al., 2019 (opens new window)) regions are filtered out. The complete list can be accessed here (opens new window).
# Special Note on Repeat Finding
The repeat catalog focuses exclusively on tandem repeat regions known to cause diseases. We employ gnomAD's algorithm for detecting repeat unit motifs and then use ExpansionHunter on these de novo tandem repeat units to identify repeat sequences.
Occasionally, short-read sequencing technology falls short in accurately genotyping tandem repeats. In particular, tools like ExpansionHunter are not designed to genotype multiallelic repeats where different motifs might vary from each other. As a solution, we run ExpansionHunter for different motifs at the same locus to provide information for all repeat units present in the GnomAD. This approach helps capturing the variability and complexity of tandem repeats in genomic studies.
# Targeted Variant Calling
# Targeted Variant callers used for Challenging Medically Relevant Genes
Tool | Supported Region Names (Genes) | Variant Types** |
---|---|---|
Paraphase1 (opens new window)* | SMN1 (SMN1, SMN2) HBA (HBA1, HBA2) PMS2 (PMS2) RCCX (CYP21A2, C4A, C4B, TNXB) STRC (STRC) NCF1 (NCF1) IKBKG (IKBKG) OPN1LW (OPN1LW, OPN1MW, OPN1MW2, OPN1MW3, TEX28) | - Copy Number Variant -Small Variant (Coming Soon) -Structural Variant (Coming Soon) |
* Paraphase: More regions (opens new window) will be added soon.
# gVCF Upload
# Select a Previous Run or Create a New Run
You can upload your samples to a new run by selecting “Create New Run” under “Run Name” and giving it a name on the “Name For New Run” field. You can also upload your samples to an existing run by selecting the previous run from the dropdown menu.
# Choose the Technology Type
Choose the next-generation sequencing machine associated with the samples. If you do not know which sequencing platform is used, you can select the "Unknown" option. Mixing different technologies in one run is not permitted.
# Select the VCF Files to be Uploaded
You can upload one small-variant gVCF file (for SNVs) per sample. Additionally, you may upload VCF files for CNVs, SVs, and STRs generated by DRAGEN or similar tools for the same sample, in any combination. Files are matched based on the sample information field, not the file names. Multisample VCFs are not supported.
# Supported Variant Callers
Upload Type | Variant Types | Supported Callers | File Format (Extension) | Multisample Support |
---|---|---|---|---|
Small Variant | SNV INDEL | Dragen (v4.1, v4.2, v4.3, v4.4) | gVCF (.vcf.gz) (.g.vcf.gz) (.genome.vcf.gz) | No |
Copy Number Variant | CNV | Dragen-CNV HifiCNV | VCF (.vcf.gz) | No |
Structural Variant | DEL (Deletion) DUP (Duplication) INV (Inversion) INS (Insertion) BND (Breakends)* CPX (Complex)** | Delly Dragen-SV Dragen-Targeted Callers (Coming Soon) Manta PBSV Sentieon Long Read SV Sniffles2 TIDDIT | VCF (.vcf.gz) | No |
Short Tandem Repeat | STR*** | ExpansionHunter Dragen-CNV TRGT | VCF (.vcf.gz) | No |
* BND (Breakends): 5'–3' fusion transcript events are supported, with potential formation of chimeric transcripts.
** CPX (Complex): Combination of structural variant types. Currently only DEL:INS variants are supported.
*** STR (Short Tandem Repeat): Only disease-associated STR variants in the gnomAD STR catalog are supported. For details, see the gnomAD STR catalog (opens new window).
# Unsupported Variant Callers and VCF Version Compatibility
Custom integration may be required to fully support unlisted or unsupported variant callers. Please contact support for assistance. Only VCF version 4.1 or newer is supported for Copy Number, Structural, and Short Tandem Repeat (STR) VCF files.
# Targeted Caller Variants
Small variants in gVCF files generated by Dragen targeted callers (opens new window) are supported. If the gVCF file already includes these variants, there is no need for a separate VCF file. However, if the variants are not included, they should be merged into the gVCF file prior to uploading. For assistance, please contact support.
Structural variants generated by Dragen targeted callers can be uploaded as an additional JSON file (opens new window). VCF files (opens new window) generated by Dragen targeted callers are currently not supported.
Upload Type | Variant Types | Supported Callers | File Format (Extension) | Multisample Support |
---|---|---|---|---|
Structural Variant | DEL (Deletion) DUP (Duplication) | Dragen (Coming Soon) | VCF (.targeted.vcf.gz) | No |
Targeted Caller Variant | DEL (Deletion) DUP (Duplication) | Dragen | JSON (.targeted.json) | No |
# Mitochondrial Variants
The Revised Cambridge Reference Sequence (rCRS, NC_012920.1) is used as the reference for the mitochondrial genome regardless of the genome version used in VCF generation, which is the recommended sequence for clinical use (McCormick et al., 2020 (opens new window)). If the VCF file contains variants called using the older Yoruban (YRI) mitochondrial reference genome, errors may result due to incompatibility with our annotation sources. Unsupported chrM variants should also be removed before upload to prevent genome compatibility issues. For assistance, or if issues arise, please contact support.
# Submit your data
As the last step, you can upload your data by clicking the “Continue” button to start the upload process. After clicking "Continue", you will see the "Case Information" screen. Please refer to the "Genomize's AI-Assisted Variant Prioritization" section for more information on entering the case information. You can then click the "Upload" button to see the number of analyses and the list of files matched for each analysis. Please be sure that both of these pieces of information are correct and hit "Approve" to start the upload process or "Cancel" to make changes.
When you start the upload, you will see the progress for each file. Transferred samples will immediately begin processing without waiting for the entire batch to finish uploading.
SEQ Platform's upload process is secure and performs a checksum to ensure the files are transferred correctly. Please do not close the browser tab or shut down your computer. Also, please ensure that your computer will not go into sleep/hibernation mode during the upload. Otherwise, the upload process will be aborted. Our upload process is resistant to intermittent loss of internet connection.
When the upload process is completed, you will be redirected to the corresponding Run's page, and your samples will be queued for analysis. Refresh the corresponding Run page to see the last status of the analysis.
# Filtering Parameters Applied to Small Variants
Pass Filter: Small variants without the “PASS” flag assigned by their respective callers are filtered out.
No Call Filter: Small variants with a 'no call' status are excluded, ensuring that only fully determined genotypes are included in the analysis.
Chromosome Filter: Small variants that are not on chromosomes 1-22, X,M are filtered out.
# VCF Upload
# Select a Previous Run or Create a New Run
You can upload your samples to a new run by selecting “Create New Run” under “Run Name” and giving it a name on the “Name For New Run” field. You can also upload your samples to an existing run by selecting the previous run from the dropdown menu.
# Choose the Technology Type
Choose the next-generation sequencing machine associated with the samples. If you do not know which sequencing platform is used, you can select the "Unknown" option. Mixing different technologies in one run is not permitted.
# Select the VCF Files to be Uploaded
You can upload VCF files for SNV, CNV, SV, and STR from DRAGEN or other similar tools for the same sample in any combination you choose. VCF files should have vcf.gz
extension. Files are matched using the sample info field, not the file names. Multisample VCFs are not supported.
# Supported Variant Callers
Upload Type | Variant Types | Supported Callers | File Format (Extension) | Multisample Support |
---|---|---|---|---|
Small Variant | SNV INDEL | DeepVariant Dragen Freebayes GATK - Haplotype Caller Ion Torrent Variant Caller Isaac Variant Caller Mutect2 Pivat Sentieon DNAscope, TNScope VarDict | VCF (.vcf.gz) | No |
Copy Number Variant | CNV | Dragen HifiCNV | VCF (.vcf.gz) | No |
Structural Variant | DEL (Deletion) DUP (Duplication) INV (Inversion) INS (Insertion) BND (Breakends)* CPX (Complex)** | Delly Dragen Manta PBSV Sentieon Long Read SV Sniffles2 TIDDIT | VCF (.vcf.gz) | No |
Short Tandem Repeat | STR*** | ExpansionHunter DRAGEN TRGT | VCF (.vcf.gz) | No |
* BND (Breakends): 5'–3' fusion transcript events are supported, with potential formation of chimeric transcripts.
** CPX (Complex): Combination of structural variant types. Currently only DEL:INS variants are supported.
*** STR (Short Tandem Repeat): Only disease-associated STR variants in the gnomAD STR catalog are supported. For details, see the gnomAD STR catalog (opens new window).
# Unsupported Variant Callers and VCF Version Compatibility
Please note that using unlisted or unsupported variant callers may result in inaccurate VCF metrics. Unpredicted callers are categorized as "other," which may limit the capture of certain metrics. In some cases, custom integration may be required to support these callers fully. Only VCF version 4.1 or newer is supported for Copy Number, Structural and Short Tandem VCF files. If issues persist or critical data appears missing, please contact support for assistance.
# Mitochondrial Variants
The Revised Cambridge Reference Sequence (rCRS, NC_012920.1) is used as the reference for the mitochondrial genome regardless of the genome version used in VCF generation, which is the recommended sequence for clinical use (McCormick et al., 2020 (opens new window)). If the VCF file contains variants called using the older Yoruban (YRI) mitochondrial reference genome, errors may result due to incompatibility with our annotation sources. Unsupported chrM variants should also be removed before upload to prevent genome compatibility issues. For assistance, or if issues arise, please contact support.
# Submit your data
As the last step, you can upload your data by clicking the “Continue” button to start the upload process. After clicking "Continue", you will see the "Case Information" screen. Please refer to the "Genomize's AI-Assisted Variant Prioritization" section for more information on entering the case information. You can then click the "Upload" button to see the number of analyses and the list of files matched for each analysis. Please be sure that both of these pieces of information are correct and hit "Approve" to start the upload process or "Cancel" to make changes.
When you start the upload, you will see the progress for each file. Transferred samples will immediately begin processing without waiting for the entire batch to finish uploading.
SEQ Platform's upload process is secure and performs a checksum to ensure the files are transferred correctly. Please do not close the browser tab or shut down your computer. Also, please ensure that your computer will not go into sleep/hibernation mode during the upload. Otherwise, the upload process will be aborted. Our upload process is resistant to intermittent loss of internet connection.
When the upload process is completed, you will be redirected to the corresponding Run's page, and your samples will be queued for analysis. Refresh the corresponding Run page to see the last status of the analysis.
# Cloud Browser
To use the cloud browser, select or create your run, select the sequencing platform and the kit by following the directions above. After the kit selection, you will see the option to select either your “COMPUTER” or the “CLOUD BROWSER” as the data source.
When you select the "CLOUD BROWSER" option, click the PLUS (➕) button to open the cloud browser interface. Using the cloud browser, you can choose the files with which you want to start the analysis and click “DONE”. The rest of the process is the same as described above. Please note that there is a 3-minute duration between each cloud upload process, and files will be removed from your cloud account upon starting the analysis.