## Search

Now showing items 1-10 of 12

JavaScript is disabled for your browser. Some features of this site may not work without it.

All of Scholarly CommonsCommunitiesTitleAuthorsIssue DateSubmit DateSubjectsThis CommunityTitleAuthorsIssue DateSubmit DateSubjects

SubjectsBiostatistics (2)DNA Methylation (2)Algebraic statistical model (1)Algebraic statistics (1)Antigen Receptor (1)View MoreAuthorsDepartment of Biostatistics and Epidemiology (7)Campbell, Jeff (1)Chen, Chen Chun (1)Daniel, Jeannie T (1)Department of Biostatistics and Epidemiology (1)View MoreTypes

Dissertation (12)

Now showing items 1-10 of 12

- List view
- Grid view
- Sort Options:
- Relevance
- Title Asc
- Title Desc
- Issue Date Asc
- Issue Date Desc
- Results Per Page:
- 5
- 10
- 20
- 40
- 60
- 80
- 100

Mathematical and Stochastic Modeling of HIV Immunology and Epidemiology

Lee, Tae Jin (8/3/2017)

In HIV virus dynamics, controlling of viral load and maintaining of CD4 value at a higher level are always primary goals for the providers. In recent years, a new molecule was discovered, namely, eCD4-Ig, which mimics CD4 if introduced into the human body and has potential to change existing HIV virus dynamics. Thus, to understand dynamics of viral load, eCD4-Ig, CD4 cells, we have developed mathematical models by incorporating interactions between this new molecule and other known immunological, virological information. We further investigated model based speculations for management, and obtained the level of eCD4-Ig required for elimination of virus. Next, we built epidemiological model for HIV spread and control among discordant couple through dynamics of PrEP (Pre-exposure prophylaxis). For this, an actuarial assumptions based stochastic model is used to obtain the mean remaining time of couple to stay as discordant. We generalized single hook-up/marriage stochastic model to multiple hook-up/marriage model.

A New Method For Analyzing 1:N Matched Case Control Studies With Incomplete Data

Jin, Chan (5/8/2017)

1:n matched case-control studies are commonly used to evaluate the association between the exposure to a risk factor and a disease, where one case is matched to up till n controls. The odds ratio is typically used to quantify such association. Difficulties in estimating the true odds ratio arise, when the exposure status is unknown for at least one individual in a group. In the case where the exposure status is known for all individuals in a group, the true odds ratio is estimated as the ratio of the counts in the discordant cells of the observed two-by-two table. In the case where all data are independent, the odds ratio is estimated using the cross-product ratio from the observed table. Conditional logistic regression estimates are used for incomplete matching data. In this dissertation we suggest a simple method for estimating the odds ratio when the sample consists of a combination of paired and unpaired observations, with 1:n matching. This method uses a weighted average of the odds ratio calculations described above. This dissertation compares the new method to existing methods via simulation.

Correlation Coefficient Inference for Left-Censored Biomarker Data with Known Detection Limits

McCracken, Courtney Elizabeth (2013-05)

Researchers are often interested in the relationship between biological concentrations obtained using two different assays, both of which may be biomarkers. Despite the continuing advances in biotechnology, the value of a particular biomarker may fall below some known limit of detection (LOD). Data values such as these are referred to as non-detects (NDs) and can be treated as left-censored observations. When attempting to measure the association between two concentrations, both of which are subject to NDs, serious complications can arise in the data analysis. Simple substitution, random imputation, and maximum likelihood estimation methods are just a few of the methods that have been proposed for handling NDs when estimating the correlation between two variables, both of which are subject to left-censoring. Unfortunately, many of the popular methods require that the data follow a bivariate normal distribution or that only a small percentage of the data for each variable are below the LOD. These assumptions are often violated with biomarker data. In this paper, we evaluate the performance of several methods, including Spearman’s rho, when the data do not follow a bivariate normal distribution and when there are moderate to large censoring proportions in one or both of the variables. We evaluate the performance of seven methods for estimating the correlation, ρ, between two left-censored variables using bias, median absolute deviation, 95% confidence interval width, and coverage probability under assumptions of various sample sizes, correlations, and censoring proportions. We show that using substitution and imputation methods yields biased estimates of ρ and less than nominal coverage probability under most of the simulation parameters we examined. We recommend the maximum likelihood method for general use even when the data significantly depart from bivariate normality.

Multivariate Poisson Abundance Models for Analyzing Antigen Receptor Data

Greene, Joshua C. (2013-05)

Antigen receptor data is an important source of information for immunologists that is highly statistically challenging to analyze due to the presence of a huge number of T-cell receptors in mammalian immune systems and the severe undersampling bias associated with the commonly used data collection procedures. Many important immunological questions can be stated in terms of richness and diversity of T-cell subsets under various experimental conditions. This dissertation presents a class of parametric models and uses a special case of them to compare the richness and diversity of antigen receptor populations in mammalian T-cells. The parametric models are based on a representation of the observed receptor counts as a multivariate Poisson abundance model (mPAM). A Bayesian model tting procedure is developed which allows tting of the mPAM parameters with the help of the complete likelihood as opposed to its conditional version which was used previously. The new procedure is shown to be often considerably more e cient (as measured by the amount of Fisher information) in the regions of the mPAM parameter space relevant to modeling T-cell data. A richness estimator based on the special case of the mPAM is shown to be superior to several existing richness estimators from the statistical ecology literature under the severe undersampling conditions encountered in antigen receptor data collection. The comparative diversity analyses based on the mPAM special case yield biologically meaningful results when applied to the T-cell receptor repertoires in mice. It is also shown that the amount of time to implement the Bayesian model tting procedure for the mPAM special case scales well as the dimension increases and that the amount of computational resources required to conduct complete statistical analyses for the mPAM special case can be drastically lower for our Bayesian model tting procedure than for code based on the conditional likelihood approach.

A Bayesian Framework To Detect Differentially Methylated Loci in Both Mean And Variability with Next Generation Sequencing

Li, Shuang (2015-07)

DNA methylation at CpG loci is the best known epigenetic process involved in many complex diseases including cancer. In recent years, next-generation sequencing (NGS) has been widely used to generate genome-wide DNA methylation data. Although substantial evidence indicates that di erence in mean methylation proportion between normal and disease is meaningful, it has recently been proposed that it may be important to consider DNA methylation variability underlying common complex disease and cancer. We introduce a robust hierarchical Bayesian framework with a Latent Gaussian model which incorporates both mean and variance to detect di erentially methylated loci for NGS data. To identify methylation loci which are associated with disease, we consider Bayesian statistical hypotheses testing for methylation mean and methylation variance using a twodimensional highest posterior density region. To improve computational e ciency, we use Integrated Nested Laplace Approximation (INLA), which combines Laplace approximations and numerical integration in a very e cient manner for deriving marginal posterior distributions. We performed simulations to compare our proposed method to other alternative methods. The simulation results illustrate that our proposed approach is more powerful in that it detects less false positives and it has true positive rate comparable to the other methods.

A Modified Information Criterion in the 1d Fused Lasso for DNA Copy Number Variant Detection using Next Generation Sequencing Data

Lee, Jaeeun (8/3/2017)

DNA Copy Number Variations (CNVs) are associated with many human diseases. Recently, CNV studies have been carried out using Next Generation Sequencing (NGS) technology that produces millions of short reads. With NGS reads ratio data, we use the 1d fused lasso regression for CNV detection. Given the number of copy number changes, the corresponding genomic locations are estimated by fitting the 1d fused lasso. Estimation of the number of copy number changes depends on a tuning parameter in the 1d fused lasso. In this dissertation, we propose a new modified Bayesian information criterion, called JMIC, to estimate the optimal tuning parameter in the 1d fused lasso. In theoretical studies, we prove that the number of change points estimated by JMIC converges the true number of changes. Also, our simulation studies show that JMIC outperforms the other criteria considered. Finally, we apply our proposed method to the reads ratio data from the breast tumor cell HCC1954 and its matched cell line provided by Chiang et al. (2009).

Penalized Least Squares and the Algebraic Statistical Model for Biochemical Reaction Networks

Linder, Daniel F. II (2013-07)

Systems biology seeks to understand the formation of macro structures such as cellular processes and higher level cellular phenomena by investigating the interactions of systems’ individual components. For cellular biology, this goal is to understand the dynamic behavior of biological materials within the cell, a container consisting of smaller materials such as mRNA, proteins, enzymes and other intermediates necessary for regulating intracellular functions and chemical species levels. Understanding these cellular dynamics is needed to help develop new drug therapies, which can be targeted to specific molecules or specific genes, in order to perturb the system for a desired result. In this work we develop inferential procedures to estimate reaction rate coefficients in cellular systems of ordinary differential equations (ODEs) from noisy data arising from realizations of molecular trajectories. It is assumed that these systems obey the so called chemical mass action law of kinetics, with corresponding deterministic mass action limit as the system size becomes infinite. The estimation and inference is based on the penalized least squares estimates, where the covariance structure of these estimates corresponds to the solution of a system of coupled nonautonomuous ODEs. Another topic discussed here is that of network topology estimation. The algebraic statistical model (ASM) offers a means of performing this topological inference for the special class of conic networks. We prove that the ASM recovers the true network topology as the number of samples grows without bound, a property known in the literature as sparsistency. We propose a method to extend the ASM to a wider class of networks that are decomposable into multiple cones.

Bayesian Functional Clustering and VMR Identification in Methylation Microarray Data

Campbell, Jeff (2015-07)

The study of the relation between DNA and health and disease has had a lot of time, energy, and money invested in it over the years. As more scientific knowledge has accumulated, it has become clear that the relations between DNA and health isn’t just a function of the sequence of nucleotide bases, but also on permanent modifications of DNA that affect DNA transcriptions and thus have a macroscopic effect on an individual. The study of modifications to DNA is known as epigenetics.Epigenetic changes have been shown to play a role in certain diseases, including cancer (Novak 2004). Finding locations of differential methylation in two groups of cells is an ongoing area of research in both science and bioinformatics. The number of developed statistical methods for establishing differential DNA methylation between two groups is limited (Bock 2012). Many developed methods are developed for nextgeneration sequencing data and may not work for microarray data, and vice versa. Bisulfite sequencing, the next-generation sequencing technique for attaining methylation data, often comes with limited sample size and considerations must be made for low and variable coverage, and smoothing the methylation values. The analysis of nextgeneration sequencing data also involves small sample sizes.In addition, these methods can be sensitive to how individual CpG regions are grouped together as a region for analysis. If the DMRs are small relative to the sizes of 5 established regions, then the method may not detect a region as having differential methylation. Robust methods for clustering microarray data have also been an ongoing area of research. It is desirable to have a method that could be applied to microarray data could increase the sample size and mitigate the previous problems if the method used is robust to missing values, outliers, and microarray data noise. Functional clustering has shown to be effective when properly conducted on gene expression data. It can be used when the data have temporal measurements to identify genes that are possibly co-expressed. The clustering of methylation data can also be shown to identify epigenetic subgroups that can potentially be very useful (Wang, 2011). [introduction]

A modified bump hunting approach with correlation-adjusted kernel weight for detecting differentially methylated regions on the 450K array

Daniel, Jeannie T (8/3/2017)

DNA methylation plays an important role in the regulation of gene expression, as hypermethylation is associated with gene silencing. The general purpose of this dissertation is the development of a statistical method, called DMR Detector, for detecting differentially methylated regions (DMRs) on the 450K array. DMR Detector makes three key modifications to an existing method called Bumphunter. The first is what statistic to collect from the initial fitting for further analysis. The second is to perform kernel smoothing under the assumption of correlated errors using a newly proposed correlation-adjusted kernel weight. The third is how to define regions of interest. In simulation, the method was shown to have high power comparable to Bumphunter, with consistently lower family-wise type I error rate, controlled well below the 0.1 FDR. DMR Detector was applied to real data and was able to detect one DMR that was not detected by Bumphunter.

A resampling method of time course gene expression data for gene network inference

Garren, Jeonifer Margaret (2015)

Manipulation of cellular functions may aid in treatment and/or cure of a disease. Thus, identifying the topology of a gene regulatory network (GRN) and the molecular role of each gene is essential. Discovering GRNs from gene expression data is hampered by intrinsic attributes of the data: small sample size n, large number of variables (genes) p, and unknown error structure. Numerous theoretical approaches for GRN inference attempt to overcome these difficulties; however, most solutions utilized in these methods are to provide either point estimators such as coefficient estimators or make numerous assumptions which are often incompatible with the data. Furthermore, the different solutions cause GRN inference methods to provide inconsistent results. This dissertation proposes a resampling method for time-course gene expression data which can provide interval estimators for existing GRN inference methods without any distributional assumptions via bootstrapping and a statistical model that considers the various components of the data structure such as trend of gene expressions, errors of time-course data, and correlation between genes, etc. This method will produce more precise GRNs that are consistent with observed gene expression data. Furthermore, by applying our method to multiple existing GRN inference methods, the resulting networks obtained from different inference methods could be combined using the joint confidence region for their parameters. Thus, this method can be used for the validation of identified networks and GRN inference methods.

The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.

To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.