• Correlation Coefficient Inference for Left-Censored Biomarker Data with Known Detection Limits

      McCracken, Courtney Elizabeth; Department of Biostatistics and Epidemiology (2013-05)
      Researchers are often interested in the relationship between biological concentrations obtained using two different assays, both of which may be biomarkers. Despite the continuing advances in biotechnology, the value of a particular biomarker may fall below some known limit of detection (LOD). Data values such as these are referred to as non-detects (NDs) and can be treated as left-censored observations. When attempting to measure the association between two concentrations, both of which are subject to NDs, serious complications can arise in the data analysis. Simple substitution, random imputation, and maximum likelihood estimation methods are just a few of the methods that have been proposed for handling NDs when estimating the correlation between two variables, both of which are subject to left-censoring. Unfortunately, many of the popular methods require that the data follow a bivariate normal distribution or that only a small percentage of the data for each variable are below the LOD. These assumptions are often violated with biomarker data. In this paper, we evaluate the performance of several methods, including Spearman’s rho, when the data do not follow a bivariate normal distribution and when there are moderate to large censoring proportions in one or both of the variables. We evaluate the performance of seven methods for estimating the correlation, ρ, between two left-censored variables using bias, median absolute deviation, 95% confidence interval width, and coverage probability under assumptions of various sample sizes, correlations, and censoring proportions. We show that using substitution and imputation methods yields biased estimates of ρ and less than nominal coverage probability under most of the simulation parameters we examined. We recommend the maximum likelihood method for general use even when the data significantly depart from bivariate normality.