NEWS & BLOGS

Heading

Isolated Bacterial Genome Assembly and Quality Assessment Pipeline

Published on: 2025-09-03 | By: EDITOR


1. MAGs Generation Workflow

2. Isolated Bacterial Genome Assembly Workflow

3. Taxonomic Relative Abundance Estimation via Kraken2 and Bracken

4. Enterotyping Analysis Pipeline


  1. Data Pre-processing with Trimmomatic
java -jar ~/trimmomatic-0.39-2/trimmomatic.jar PE -threads 64 ${id}_1.fq ${id}_2.fq ${id}_forward_paired.fq.gz ${id}_forward_unpaired.fq.gz ${id}_reverse_paired.fq.gz ${id}_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:25 MINLEN:100

This step utilizes Trimmomatic to clean and preprocess paired-end sequence data, essential for reducing sequencing errors and improving assembly quality.

  1. De Novo Assembly with SPAdes
spades.py --isolate -1 ${id}_forward_paired.fq -2 ${id}_reverse_paired.fq -t 64 -m 256 --cov-cutoff auto -o ${id}_spades
seqtk seq -L 200 ${id}_spades/scaffolds.fasta > ${id}_spades/scaffolds_clean.fasta
  1. Quality Assessment with CheckM2
checkm2 predict --threads 64 -x fasta  --database_path ~/CheckM2_database/uniref100.KO.1.dmnd --remove_intermediates --input ./Genomes/ --output-directory ./Genomes_checkM2

We utilizes CheckM2 to assess the quality of assembled genomes, providing the completeness and contamination of each genomes.

  1. Taxonomic Classification of Genomes with GTDB-Tk

This command utilizes GTDB-Tk, version 2.4.0, to perform comprehensive taxonomic classification of bacterial genomes, leveraging the GTDB Release 220 database

gtdbtk classify_wf --genome_dir ./Genomes --out_dir ./Genomes_GTDBtk_results2 --extension fasta --cpus 64 --skip_ani_screen
  1. Genome Dereplication with dRep
dRep dereplicate --completeness 50 --contamination 10 -sa 0.95 --SkipMash -p 64 ./Pbac_v2_cluster_95 -g ./All_Genomes_MAGs/*fasta --genomeInfo Pbac_v2_Quality.csv

We utilize dRep software to dereplicate and cluster bacterial genomes based on genomic similarity, employing a clustering algorithm with a 95% average nucleotide identity threshold. This process yields unique and representative genome assemblies at the species level (0.95 ANI) . Genome completeness and contamination were assessed using CheckM2.

  1. Ribosomal RNA Prediction with Barrnap
barrnap -q -k bac ${genome}.fasta
  1. tRNA Annotation with tRNAscan-SE
tRNAscan-SE -B -o ${genome}_tRNA_result.txt -m ${genome}_tRNA_statistic.txt ${genome}.fasta