• A Modified Information Criterion in the 1d Fused Lasso for DNA Copy Number Variant Detection using Next Generation Sequencing Data

      Lee, Jaeeun; Department of Biostatistics and Epidemiology (8/3/2017)
      DNA Copy Number Variations (CNVs) are associated with many human diseases. Recently, CNV studies have been carried out using Next Generation Sequencing (NGS) technology that produces millions of short reads. With NGS reads ratio data, we use the 1d fused lasso regression for CNV detection. Given the number of copy number changes, the corresponding genomic locations are estimated by fitting the 1d fused lasso. Estimation of the number of copy number changes depends on a tuning parameter in the 1d fused lasso. In this dissertation, we propose a new modified Bayesian information criterion, called JMIC, to estimate the optimal tuning parameter in the 1d fused lasso. In theoretical studies, we prove that the number of change points estimated by JMIC converges the true number of changes. Also, our simulation studies show that JMIC outperforms the other criteria considered. Finally, we apply our proposed method to the reads ratio data from the breast tumor cell HCC1954 and its matched cell line provided by Chiang et al. (2009).