Data Processing

Main Workflow Controller

File: main.nf The main workflow handles channel processing and parallel execution. It automatically detects input data types (Illumina or Nanopore), performs trimming and host removal, determines the serotype, and then routes data to the specific alignment and consensus generation workflows.

Pre-processing & Host Removal

Files: trimming.nf, host_removal.nf

  1. Trimming:

    • Illumina: Trimmomatic (Quality trimming, adapter removal).

    • Nanopore: chopper (Quality and length filtering).

  2. Host Removal:

    • Tool: hostile.

    • Process: Aligns reads against a host reference (human) to remove non-viral reads using bowtie2 (Illumina) or minimap2 (Nanopore).

Serotyping

File: serotyping.nf Determines the Dengue serotype to select the appropriate reference for alignment.

  1. Tool: minimap2.

  2. Process: Maps a subset of reads against index of Dengue reference sequences (DENV-1, 2, 3, 4, and Sylvatic strains).

Nanopore (Long-Read) Workflow

File: nanopore.nf For Oxford Nanopore Technologies (ONT) sequencing data.

  1. QC: FastQC.

  2. Reference Selection: Selects the specific reference genome based on the determined serotype.

  3. Alignment:

    • Tool: minimap2.

  4. Consensus Generation:

    • Tools: bcftools.

Illumina (Short-Read) Workflow

File: illumina.nf For Illumina paired-end sequencing data.

  1. QC: FastQC.

  2. Reference Selection: Selects the specific reference genome based on the determined serotype.

  3. Alignment:

    • Tool: bwa-mem2.

  4. Consensus Generation:

    • Tools: bcftools.

Genotyping & Variant Analysis

File: genotyping.nf Performs detailed characterization of the consensus sequence.

  1. Tool: Nextclade.

  2. Process:

    • Assigns Genotype and Lineage (Major/Minor) based on the Dengue dataset.

    • Identifies amino acid Mutations.

Reporting

File: report.nf

  1. MultiQC: Aggregates FastQC and trimming logs.

  2. Dengue Report: A text report summarizing Serotype, Genotype, Lineage, Coverage, and Mutations.

FHIR Converter

File: fhir.nf Converts genomic analysis results into HL7 FHIR R4 standard resources.

  1. Input Parsing: Reads consensus sequence stats, serotyping, and Nextclade results.

  2. Resource Creation:

    • Observation: For Dengue Classification (Serotype, Genotype, Lineage).

    • Observation: For Viral Consensus Genome Sequence (Sequence string, length, coverage).

    • Observation: For each detected Genetic Variant.

    • DiagnosticReport: Report for overall Dengue analysis.

Upload to FHIR Server

File: upload_fhir.nf For uploading FHIR Genomics bundle with clinical metadata. Must grant bearer token first using scripts/get_access_token.py and fill the clinical metadata on each metadata csv (patient, organization, and practitioner).

Workflow Parameter

nextflow.config defines all input files, directories, versioning, and specific tool parameters, relative to the base directory ($baseDir).