Info
🌱 來自:bioinfo
trimming_with_trimmomatic
Trimmomatic
Trimmomatic is a flexible read trimming tool for Illumina NGS data.
Generic Trimmomatic command
java -jar trimmomatic-0.39.jar PE inputforward.fq.gz inputreverse.fq.gz outputforwardpaired.fq.gz outputforwardunpaired.fq.gz outputreversepaired.fq.gz outputreverseunpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:keepBothReads LEADING:3 TRAILING:3 MINLEN:36
Created a bash script for running trimmomatic on multiple files at the same time
This scripts looks for files with “_R1.fastq.gz” in their name, and applies trimmomaic on them, and their paired “_R2.fastq.gz” files in the same directory
Transferred BASH script from local computer to cloud instance
scp -C /Users/cm/JPL_Google_Drive/scripts/trim.sh cmicro@149.165.171.66:/home/cmicro/scripts
On the cloud instance
Create a directory for Trimming
mkdir trim
cd trim
Create symbolic links to FASTQ files
(Assuming all fastq files are in the fastqs_backup directory)
ln -s /home/cmicro/fastqs_backup/*.fastq.gz .
Create a Conda environment & install Trimmomatic
conda create -y -n trim trimmomatic
Copy TruSeq adapters to local directory so Trimmomatic finds them easily.
cp /opt/miniconda3/pkgs/trimmomatic-*/share/trimmomatic-*/adapters/TruSeq3-PE.fa .
Activate Trimmomatic Conda environment
conda activate trim
Make BASH script executable & run script (with)
chmod +x trim.sh
Running BASH script for Trimmomatic using 40 threads
/home/cmicro/scripts/trim.sh 40
Move trimmed FASTQ files to specific directory
mkdir trimmed_fastqs
find . -type f -name "*trimmed*" -exec mv '{}' trimmed_fastqs/ \;
Command line output is below:
TrimmomaticPE: Started with arguments: -threads 42 sample1_R1.fastq.gz sample1_R2.fastq.gz sample1_R1_trimmed.fastq.gz sample1_R1_unpaired.fastq.gz sample1_R2_trimmed.fastq.gz sample1_R2_unpaired.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 Using PrefixPair: ‘TACACTCTTTCCCTACACGACGCTCTTCCGATCT’ and ‘GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT’ ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Read Pairs: 5342751 Both Surviving: 4283096 (80.17%) Forward Only Surviving: 1020644 (19.10%) Reverse Only Surviving: 15211 (0.28%) Dropped: 23800 (0.45%) TrimmomaticPE: Completed successfully TrimmomaticPE: Started with arguments: -threads 42 sample2_R1.fastq.gz sample2_R2.fastq.gz sample2_R1_trimmed.fastq.gz sample2_R1_unpaired.fastq.gz sample2_R2_trimmed.fastq.gz sample2_R2_unpaired.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 Using PrefixPair: ‘TACACTCTTTCCCTACACGACGCTCTTCCGATCT’ and ‘GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT’ ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Read Pairs: 5528625 Both Surviving: 4327888 (78.28%) Forward Only Surviving: 1156093 (20.91%) Reverse Only Surviving: 17389 (0.31%) Dropped: 27255 (0.49%) TrimmomaticPE: Completed successfully TrimmomaticPE: Started with arguments: -threads 42 sample3_R1.fastq.gz sample3_R2.fastq.gz sample3_R1_trimmed.fastq.gz sample3_R1_unpaired.fastq.gz sample3_R2_trimmed.fastq.gz sample3_R2_unpaired.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 Using PrefixPair: ‘TACACTCTTTCCCTACACGACGCTCTTCCGATCT’ and ‘GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT’ ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Read Pairs: 5342751 Both Surviving: 4283096 (80.17%) Forward Only Surviving: 1020644 (19.10%) Reverse Only Surviving: 15211 (0.28%) Dropped: 23800 (0.45%) TrimmomaticPE: Completed successfully TrimmomaticPE: Started with arguments: -threads 42 sample4_R1.fastq.gz sample4_R2.fastq.gz sample4_R1_trimmed.fastq.gz sample4_R1_unpaired.fastq.gz sample4_R2_trimmed.fastq.gz sample4_R2_unpaired.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 Using PrefixPair: ‘TACACTCTTTCCCTACACGACGCTCTTCCGATCT’ and ‘GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT’ ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Read Pairs: 6718423 Both Surviving: 5270348 (78.45%) Forward Only Surviving: 1386864 (20.64%) Reverse Only Surviving: 22301 (0.33%) Dropped: 38910 (0.58%) TrimmomaticPE: Completed successfully