• False coverage rate - adjusted smoothed bootstrap simultaneous confidence intervals for selected parameters

      Sun, Jing; Department of Biostatistics and Epidemiology (Augusta University, 2020-05)
      Many modern applications refer to a large number of populations with high dimensional parameters. Since there are so many parameters, researchers often draw inferences regarding the most significant parameters, which are called selected parameters. Benjamini and Yekutieli (2005) proposed the false coverage-statement rate (FCR) method for multiplicity correction when constructing confidence intervals for only selected parameters. FCR for the confidence interval method is parallel to the concept of the false discovery rate for multiple hypothesis testing. In practice, we typically construct FCR-adjusted approximate confidence intervals for selected parameters either using the bootstrap method or the normal approximation method. However, these approximated confidence intervals show higher FCR for small and moderate sample sizes. Therefore, we suggest a novel procedure to construct simultaneous confidence intervals for the selected parameters by using a smoothed bootstrap procedure. We consider a smoothed bootstrap procedure using a kernel density estimator. A pertinent problem associated with the smoothed bootstrap approach is how to choose the unknown bandwidth in some optimal sense. We derive an optimal choice for the bandwidth and the resulting smoothed bootstrap confidence intervals asymptotically to give better control of the FCR than its competitors. We further show that the suggested smoothed bootstrap simultaneous confidence intervals are FCR-consistent if the dimension of data grows no faster than N^3/2. Finite sample performances of our method are illustrated based on empirical studies. Through these empirical studies, it is shown that the proposed method can be successfully applied in practice.
    • Multivariate Poisson Abundance Models for Analyzing Antigen Receptor Data

      Greene, Joshua C.; Department of Biostatistics and Epidemiology (2013-05)
      Antigen receptor data is an important source of information for immunologists that is highly statistically challenging to analyze due to the presence of a huge number of T-cell receptors in mammalian immune systems and the severe undersampling bias associated with the commonly used data collection procedures. Many important immunological questions can be stated in terms of richness and diversity of T-cell subsets under various experimental conditions. This dissertation presents a class of parametric models and uses a special case of them to compare the richness and diversity of antigen receptor populations in mammalian T-cells. The parametric models are based on a representation of the observed receptor counts as a multivariate Poisson abundance model (mPAM). A Bayesian model tting procedure is developed which allows tting of the mPAM parameters with the help of the complete likelihood as opposed to its conditional version which was used previously. The new procedure is shown to be often considerably more e cient (as measured by the amount of Fisher information) in the regions of the mPAM parameter space relevant to modeling T-cell data. A richness estimator based on the special case of the mPAM is shown to be superior to several existing richness estimators from the statistical ecology literature under the severe undersampling conditions encountered in antigen receptor data collection. The comparative diversity analyses based on the mPAM special case yield biologically meaningful results when applied to the T-cell receptor repertoires in mice. It is also shown that the amount of time to implement the Bayesian model tting procedure for the mPAM special case scales well as the dimension increases and that the amount of computational resources required to conduct complete statistical analyses for the mPAM special case can be drastically lower for our Bayesian model tting procedure than for code based on the conditional likelihood approach.
    • A New Method For Analyzing 1:N Matched Case Control Studies With Incomplete Data

      Jin, Chan; Department of Biostatisctics and Epidemiology (5/8/2017)
      1:n matched case-control studies are commonly used to evaluate the association between the exposure to a risk factor and a disease, where one case is matched to up till n controls. The odds ratio is typically used to quantify such association. Difficulties in estimating the true odds ratio arise, when the exposure status is unknown for at least one individual in a group. In the case where the exposure status is known for all individuals in a group, the true odds ratio is estimated as the ratio of the counts in the discordant cells of the observed two-by-two table. In the case where all data are independent, the odds ratio is estimated using the cross-product ratio from the observed table. Conditional logistic regression estimates are used for incomplete matching data. In this dissertation we suggest a simple method for estimating the odds ratio when the sample consists of a combination of paired and unpaired observations, with 1:n matching. This method uses a weighted average of the odds ratio calculations described above. This dissertation compares the new method to existing methods via simulation.
    • Statistical Methods for reaction Networks

      Odubote, Oluseyi Samuel; Department of Biostatistics and Epidemiology
      Stochastic reaction networks are important tools for modeling many biological phenomena, and understanding these networks is important in a wide variety of applied research, such as in disease treatment and in drug development. Statistical inference about the structure and parameters of reaction networks, sometimes referred to in this setting as model calibration, is often challenging due to intractable likelihoods. Here we utilize an idea similar to that of generalized estimating equations (GEE), which in this context are the so-called martingale estimating equations, for estimation of reaction rates of the network. The variance component is estimated using the approximate variance under the linear noise approximation, which is based on partial dierential equation, or Fokker-Planck equations, which provides an approximation to the exact chemical master equation. The method is applied to data from the plague outbreak at Eyam, England from 1665-1666 and the COVID-19 pandemic data. We show empirically that the proposed method gives good estimates of the parameters in a large volume setting and works well in small volume settings.