NEWS & BLOGS

Heading

MAGs Generation Workflow with Illumina short reads for Pbac version 2

Published on: 2025-09-03 | By: EDITOR


1. MAGs Generation Workflow

2. Isolated Bacterial Genome Assembly Workflow

3. Taxonomic Relative Abundance Estimation via Kraken2 and Bracken

4. Enterotyping Analysis Pipeline


(1) Data Preprocessing (Trimmomatic)
Trim low-quality reads using sliding-window quality filtering (Phred≥20),and remove host-contamination using bowtie2.

java -jar ~/trimmomatic-0.39-2/trimmomatic.jar PE -threads 64 ${id}_1.fq ${id}_2.fq ${id}_forward_paired.fq.gz ${id}_forward_unpaired.fq.gz ${id}_reverse_paired.fq.gz ${id}_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:25 MINLEN:100

(2)  Single-sample Assembly (SPAdes) for Illumina short reads
De novo assembly with --meta mode for metagenomic data optimization.

spades.py --meta -1 ${id}_paired_1.fastq -2 ${id}_paired_2.fastq --threads 128 --memory 256 -k 33,55,77 -o ${id}_spades

(3) Quality Assessment (CheckM2)

checkm2 predict --threads 64 -x fasta  --database_path ~/CheckM2_database/uniref100.KO.1.dmnd --remove_intermediates --input ./Genomes/ --output-directory ./Genomes_checkM2

(4) Taxonomic Classification (GTDB-Tk)

gtdbtk classify_wf --genome_dir ./Genomes --out_dir ./Genomes_GTDBtk_results2 --extension fasta --cpus 64 --skip_ani_screen

(5) Genome Dereplication (dRep)

dRep dereplicate --completeness 50 --contamination 10 -sa 0.95 --SkipMash -p 64 ./Pbac_v2_cluster_95 -g ./All_Genomes_MAGs/*fasta --genomeInfo Pbac_v2_Quality.csv

(6) rRNA Prediction (Barrnap)

barrnap -q -k bac ${genome}.fasta

(7) tRNA Annotation (tRNAscan-SE)

tRNAscan-SE -B -o ${genome}_tRNA_result.txt -m ${genome}_tRNA_statistic.txt ${genome}.fasta

(8) Gene prediction (prodigal)

 prodigal -m -p meta -i ${genome}.fasta -a ${genome}.protein.fa -d ${genome}.nucl.fa -f gff -o ${genome}.gff 

(9) Gene annotation (eggNOG-mapper) 

 emapper.py --cpu 64 --itype CDS -m diamond --data_dir ~/eggNOG_database -i ${genome}.nucl.fasta -o ${genome}.nucl.eggnog