Tóm tắt
Background: Esophageal cancer is the eighth most common cancer in global scale with over 400,000 new cases arising during the year. Generally, the early diagnosis of this cancer remains limited, resulting to approximately 15% five year survival rate. Next generation sequencing technologies have revolutionized cancer genomics by providing a holistic approach for detecting somatic mutations. Hereby, we describe a genomic analysis of 30 esophageal cancer patients using whole exome sequencing. Subjects and methods: 10 sequencing datasets were analyzed through 3 different pipelines. Fastq2vcf modified to use MuTect2 proved to be the most optimal pipeline for esophageal cancer WES data analysis over SeqMule and IMPACT. The selected pipeline was used to analyze the remaining 20 datasets. Results and conclusion: Among 30 patient samples, variants found by Fastq2vcf restricted mostly in chr17 followed by chr9 and were very rare in chr21. Most variants found were SNVs (1,034/1,200 variants) and present in all samples; out of which 841 were non-synonymous. 4 types of damaging mutations causing changes to protein sequences and gene functions were found in exome regions as well as splicing regions. This study provides a comparison of software pipelines to identify potential mutations by analyzing whole exome sequencing data from cancer patients, which can lead to early detection and prevention of cancer. This information may be useful to other research related to cancer diagnosis using molecular biology and bioinformatics.
* Keywords: Esophageal cancer; Whole exome sequencing; Fastq2vcf; MuTect2.
Abstract
Background: Esophageal cancer is the eighth most common cancer in global scale with over 400,000 new cases arising during the year. Generally, the early diagnosis of this cancer remains limited, resulting to approximately 15% five year survival rate. Next generation sequencing technologies have revolutionized cancer genomics by providing a holistic approach for detecting somatic mutations. Hereby, we describe a genomic analysis of 30 esophageal cancer patients using whole exome sequencing. Subjects and methods: 10 sequencing datasets were analyzed through 3 different pipelines. Fastq2vcf modified to use MuTect2 proved to be the most optimal pipeline for esophageal cancer WES data analysis over SeqMule and IMPACT. The selected pipeline was used to analyze the remaining 20 datasets. Results and conclusion: Among 30 patient samples, variants found by Fastq2vcf restricted mostly in chr17 followed by chr9 and were very rare in chr21. Most variants found were SNVs (1,034/1,200 variants) and present in all samples; out of which 841 were non-synonymous. 4 types of damaging mutations causing changes to protein sequences and gene functions were found in exome regions as well as splicing regions. This study provides a comparison of software pipelines to identify potential mutations by analyzing whole exome sequencing data from cancer patients, which can lead to early detection and prevention of cancer. This information may be useful to other research related to cancer diagnosis using molecular biology and bioinformatics.
* Keywords: Esophageal cancer; Whole exome sequencing; Fastq2vcf; MuTect2.