NOTE

🌱 來自:researcher

bioinfo

Chapter 1 Sequencing and Raw Sequence Data Quality Control

  • 1.1 NUCLEIC ACIDS 1

  • 1.2 SEQUENCING 3

  • 1.2.1 First-Generation Sequencing 3

  • 1.2.2 Next-Generation Sequencing 4

  • 1.2.2.1 Roche 454 Technology 5

  • 1.2.2.2 Ion Torrent Technology 6

  • 1.2.2.3 AB SOLiD Technology 6

  • 1.2.2.4 Illumina Technology 7

  • 1.2.3 Third-Generation Sequencing 8

  • 1.2.3.1 PacBio Technology 9

  • 1.2.3.2 Oxford Nanopore Technology 10

  • 1.3 SEQUENCING DEPTH AND READ QUALITY 11

  • 1.3.1 Sequencing Depth 11

  • 1.3.2 Base Call Quality 11

  • 1.4 FASTQ FILES 13

  • 1.5 FASTQ READ QUALITY ASSESSMENT 18

  • 1.5.1 Basic Statistics 23

  • 1.5.2 Per Base Sequence Quality 24

  • 1.5.3 Per Tile Sequence Quality 25

  • 1.5.4 Per Sequence Quality Scores 28

  • 1.5.5 Per Base Sequence Content 28

  • 1.5.6 Per Sequence GC Content 28

  • 1.5.7 Per Base N Content 30

  • 1.5.8 Sequence Length Distribution 30

  • 1.5.9 Sequence Duplication Levels 31

  • 1.5.10 Overrepresented Sequences 31

  • 1.5.11 Adapter Content 32

  • 1.5.12 K-mer Content 33

  • 1.6 PREPROCESSING OF THE FASTQ READS 34

  • 1.7 SUMMARY 45

  • REFERENCES 46

  • Chapter 2 ◾ Mapping of Sequence Reads to the Reference Genomes 49

  • 2.1 INTRODUCTION TO SEQUENCE MAPPING 49

  • 2.2 READ MAPPING 55

  • 2.2.1 Trie 56

  • 2.2.2 Suffix Tree 56

  • 2.2.3 Suffix Arrays 57

  • 2.2.4 Burrows-Wheeler Transform 58

  • 2.2.5 FM-Index 62

  • 2.3 READ SEQUENCE ALIGNMENT AND ALIGNERS 63

  • 2.3.1 SAM and BAM File Formats 65

  • 2.3.2 Read Aligners 70

  • 2.3.2.1 Burrows-Wheeler Aligner 71

  • 2.3.2.2 Bowtie2 75

  • 2.3.2.3 STAR 76

  • 2.4 MANIPULATING ALIGNMENTS IN SAM/BAM FILES 79

  • 2.4.1 samtools 79

  • 2.4.1.1 SAM/BAM Format Conversion 79

  • 2.4.1.2 Sorting Alignment 80

  • 2.4.1.3 Indexing BAM File 80

  • 2.4.1.4 Extracting Alignments of a Chromosome 81

  • 2.4.1.5 Filtering and Counting Alignment in SAM/BAM Files 81

  • 2.4.1.6 Removing Duplicate Reads 82

  • 2.4.1.7 Descriptive Statistics 83

  • 2.5 REFERENCE-GUIDED GENOME ASSEMBLY 83

  • Chapter 3 ◾ De Novo Genome Assembly 89

  • 3.1 INTRODUCTION TO DE NOVO GENOME ASSEMBLY 89

  • 3.1.1 Greedy Algorithm 90

  • 3.1.2 Overlap-Consensus Graphs 90

  • 3.1.3 De Bruijn Graphs 91

  • 3.2 EXAMPLES OF DE NOVO ASSEMBLERS 93 3.2.1 ABySS 93 3.2.2 SPAdes 97

  • 3.3 GENOME ASSEMBLY QUALITY ASSESSMENT 99

  • 3.3.1 Statistical Assessment for Genome Assembly 100

  • 3.3.2 Evolutionary Assessment for De Novo Genome Assembly 103

  • 3.4 SUMMARY 106

  • Chapter 4 ◾ variant discovery 109

  • 4.1 INTRODUCTION TO GENETIC VARIATIONS 109

  • 4.1.1 VCF File Format 110

  • 4.1.2. Variant Calling and Analysis 113

  • 4.2 VARIANT CALLING PROGRAMS 114

  • 4.2.1 Consensus-Based Variant Callers 114

  • 4.2.1.1 BCF Tools Variant Calling Pipeline 115

  • 4.2.2 Haplotype-Based Variant Callers 125

  • 4.2.2.1 FreeBayes Variant Calling Pipeline 127

  • 4.2.2.2 GATK Variant Calling Pipeline 129

  • 4.3 VISUALIZING VARIANTS 143

  • 4.4 VARIANT ANNOTATION AND PRIORITIZATION 143

  • 4.4.1 SIFT 145

  • 4.4.2 SnpEff 148

  • 4.3.3 ANNOVAR 151

  • 4.3.3.1

  • 4.3.3.2 ANNOVAR Input Files 156 160 161

  • 4.5 SUMMARY REFERENCES

  • Annotation Databases 153

  • Chapter 5 ◾ RNA-Seq Data Analysis 163

  • 5.1 INTRODUCTION TO RNA-SEQ 163

  • 5.2 RNA-SEQ APPLICATIONS 165

  • 5.3 RNA-SEQ DATA ANALYSIS WORKFLOW 166

  • 5.3.1 Acquiring RNA-Seq Data 166

  • 5.3.2 Read Mapping 167

  • 5.3.3 Alignment Quality Assessment 171

  • 5.3.4 Quantification 172

  • 5.3.5 Normalization 174

  • 5.3.5.1 RPKM and FPKM 174

  • 5.3.5.2 Transcripts per Million 175

  • 5.3.5.3 Counts per Million Mapped Reads 175

  • 5.3.5.4 Trimmed Mean of M-values 175

  • 5.3.5.5 Relative Expression 176

  • 5.3.5.6 Upper Quartile 176

  • 5.3.6 Differential Expression Analysis 176

  • 5.3.7 Using EdgeR for Differential Analysis 180

  • 5.3.7.1 Data Preparation 181

  • 5.3.7.2 Annotation 183

  • 5.3.7.3 Design Matrix 184

  • 5.3.7.4 Filtering Low-Expressed Genes 185

  • 5.3.7.5 Normalization 186

  • 5.3.7.6 Estimating Dispersions 186

  • 5.3.7.7 Exploring the Data 189

  • 5.3.7.8 Model Fitting 194

  • 5.3.7.9 Ontology and Pathways 202

  • 5.3.8 Visualizing RNA-Seq Data 204

  • 5.3.8.1 5.3.8.2 5.3.8.3 5.3.8.4

  • Visualizing Distribution with Boxplots 206 Scatter Plot 207 Mean-Average Plot (MA Plot) 208 Volcano Plots 209

  • 209 211

  • Chapter 6 ◾ Chromatin Immunoprecipitation Sequencing 213

  • 6.1 INTRODUCTION TO CHROMATIN IMMUNOPRECIPITATION 213

  • 6.2 CHIP SEQUENCING 214

  • 6.3 CHIP-SEQ ANALYSIS WORKFLOW 215

  • 6.3.1 Downloading the Raw Data 217

  • 6.3.2 Quality Control 218

  • 6.3.3 ChIP-Seq and Input Read Mapping 219

  • 6.3.4 ChIP-Seq Peak Calling with MACS3 223

  • 6.3.5 Visualizing ChIP-Seq Enrichment in Genome Browser 226

  • 6.3.6 Visualizing Peaks Distribution 229

  • 6.3.6.1 ChIP-Seq Peaks’ Coverage Plot 230

  • 6.3.6.2 Distribution of Peaks in Transcription Start Site (TSS)

  • Regions 233

  • 6.3.6.3 Profile of Peaks along Gene Regions 234

  • 6.3.7 Peak Annotation 235

  • 6.3.7.1 Writing Annotations to Files 237

  • 6.3.8 ChIP-Seq Functional Analysis 239

  • 6.3.9 Motif Discovery 243

  • 6.4 SUMMARY 250 REFERENCES 251

  • Chapter 7 ◾ Targeted Gene Metagenomic Data Analysis 253 7.1. INTRODUCTION TO METAGENOMICS 253 7.2 ANALYSIS WORKFLOW 254

  • 7.2.1 Raw Data Preprocessing 254

  • 7.2.2 Metagenomic Features 255 7.2.2.1 Clustering 255 7.2.2.2 Denoising 256

  • 7.2.3 Taxonomy Assignment 258

  • 7.2.3.1 Basic Local Alignment Search Tool 258

  • 7.2.3.2 VSEARCH 259

  • 7.2.3.3 Ribosomal Database Project 259

  • 7.2.4 Construction of Phylogenetic Trees 260

  • 7.2.5 Microbial Diversity Analysis 261

  • 7.2.5.1 Alpha Diversity Indices 262

  • 7.2.5.2 Beta Diversity 262

  • 7.3 DATA ANALYSIS WITH QIIME2 263

  • 7.3.1 QIIME2 Input Files 265 7.3.1.1 Importing Sequence Data 265 7.3.1.2 Metadata 269

  • 7.3.2 Demultiplexing 269

  • 7.3.3 Downloading and Preparing the Example Data 271

  • 7.3.3.1 Downloading the Raw Data 271

  • 7.3.3.2 Creating the Sample Metadata File 272

  • 7.3.3.3 Importing Microbiome Yoga Data 274

  • 7.3.4 Raw Data Preprocessing 275

  • 7.3.4.1 Quality Assessment and Quality Control 275

  • 7.3.4.2 Clustering and Denoising 278

  • 7.3.5 Taxonomic Assignment with QIIME2 289

  • 7.3.5.1 Using Alignment-Based Classifiers 289

  • 7.3.5.2 Using Machine Learning Classifiers 291

  • 7.3.6 Construction of Phylogenetic Tree 297

  • 7.3.6.1 De Novo Phylogenetic Tree 297

  • 7.3.6.2 Fragment-Insertion Phylogenetic Tree 298

  • 7.3.7 Alpha and Beta Diversity Analysis 298

  • 7.4 SUMMARY 300 REFERENCES 301

  • Chapter 8 Shotgun Metagenomic Data Analysis 303

    初探 PLINK 檔案格式(bed,bim,fam) - 知乎 PLINK: Whole genome data analysis toolset Clonal evolutionary analysis of cancer - Genevia Technologies

  • 8.1 INTRODUCTION 303

  • 8.2 SHOTGUN METAGENOMIC ANALYSIS WORKFLOW 305

  • 8.2.1 Data Acquisition 305

  • 8.2.2 Quality Assessment and Processing 305

  • 8.2.3 Removing Host DNA Reads 306

  • 8.2.3.1 Download Human Reference Genome 306

  • 8.2.3.2 Mapping Reads to the Reference Genome 307

  • 8.2.3.3 Converting SAM to BAM Format 307

  • 8.2.3.4 Separating Metagenomic Reads in BAM Files 307

  • 8.2.3.5 Creating Paired-End FASTQ Files from BAM Files 308

  • 8.2.4 Assembly-Free Taxonomic Profiling 310

  • 8.2.4 Assembly of Metagenomes 315

  • 8.2.5 Assembly Evaluation 317

  • 8.2.6 Mapping Reads to the Assemblies 318

  • 8.2.7 Binning 321

  • 8.2.8 Bin Evaluation 323

  • 8.2.9 Prediction of Protein-Coding Region

ngs_articles