Biostatistics is the field of statistics related to biology, medicine and public health. The intent of Biostatistics is to better understand the factors that affect human health through judicious use of statistical methods. Epidemiology is the study of the distribution and patterns of disease and injury in human populations and the application of this study to improve human health.

News

This collection contains the scholarly works of faculty in the Department of Biostatistics and Epidemiology.

Recent Submissions

  • Maternal Health Literacy Progression Among Rural Perinatal Women

    Mobley, Sandra C.; Thomas, Suzanne Dixson; Sutherland, Donald E.; Hudgins, Jodi; Ange, Brittany L.; Johnson, Maribeth H.; Department of Obstetrics and Gynecology (Springer, 2014-01-28)
    This research examined changes in maternal health literacy progression among 106 low income, high risk, rural perinatal African American and White women who received home visits by Registered Nurse Case Managers through the Enterprise Community Healthy Start Program. Maternal health literacy progression would enable women to better address intermediate factors in their lives that impacted birth outcomes, and ultimately infant mortality (Lu and Halfon in Mater Child Health J 7(1):13-30, 2003; Sharma et al. in J Natl Med Assoc 86(11):857-860, 1994). The Life Skills Progression Instrument (LSP) (Wollesen and Peifer, in Life skills progression. An outcome and intervention planning instrument for use with families at risk. Paul H. Brookes Publishing Co., Baltimore, 2006) measured changes in behaviors that represented intermediate factors in birth outcomes. Maternal Health Care Literacy (LSP/M-HCL) was a woman's use of information, critical thinking and health care services; Maternal Self Care Literacy (LSP/M-SCL) was a woman's management of personal and child health at home (Smith and Moore in Health literacy and depression in the context of home visitation. Mater Child Health J, 2011). Adequacy was set at a score of (≥4). Among 106 women in the study initial scores were inadequate (<4) on LSP/M-HCL (83 %), and on LSP/M-SCL (30 %). Significant positive changes were noted in maternal health literacy progression from the initial prenatal assessment to the first (p < .01) postpartum assessment and to the final (p < .01) postpartum assessment using McNemar's test of gain scores. Numeric comparison of first and last gain scores indicated women's scores progressed (LSP/M-HCL; p < .0001) and (LSP/M-SCL; p < .0001). Elevated depression scores were most frequent among women with <4 LSP/M-HCL and/or <4 LSP/M-SCL. Visit notes indicated lack or loss of relationship with the father of the baby and intimate partner discord contributed to higher depression scores.
  • A gene-based approach for testing association of rare alleles

    Xu, Hongyan; George, Varghese; Department of Biostatistics and Epidemiology (2011-11-29)
    Rare genetic variants have been shown to be important to the susceptibility of common human diseases. Methods for detecting association of rare genetic variants are drawing much attention. In this report, we applied a gene-based approach to the 200 simulated data sets of unrelated individuals. The test can detect the association of some genes with multiple rare variants.
  • Epigenetic Silencing of Nucleolar rRNA Genes in Alzheimer's Disease

    Pietrzak, Maciej; Rempala, Grzegorz A.; Nelson, Peter T.; Zheng, Jing-Juan; Hetman, Michal; Department of Biostatistics and Epidemiology (2011-07-22)
    Background: Ribosomal deficits are documented in mild cognitive impairment (MCI), which often represents an early stage Alzheimer's disease (AD), as well as in advanced AD. The nucleolar rRNA genes (rDNA), transcription of which is critical for ribosomal biogenesis, are regulated by epigenetic silencing including promoter CpG methylation.
  • COGA phenotypes and linkages on chromosome 2.

    Wiener, Howard W; Go, Rodney C P; Tiwari, Hemant; George, Varghese; Page, Grier P; Department of Biostatistics and Epidemiology (2010-01-19)
    An initial linkage analysis of the alcoholism phenotype as defined by DSM-III-R criteria and alcoholism defined by DSM-IV criteria showed many, sometimes striking, inconsistencies. These inconsistencies are greatly reduced by making the definition of alcoholism more specific. We defined new phenotypes combining the alcoholism definitions and the latent variables, defining an individual as affected if that individual is alcoholic under one of the definitions (either DSM-III-R or DSM-IV), and indicated having a symptom defined by one of the latent variables. This was done for each of the two alcoholism definitions and five latent variables, selected from a canonical discriminant analyses indicating they formed significant groupings using the electrophysiological variables. We found that linkage analyses utilizing these latent variables were much more robust and consistent than the linkage results based on DSM-III-R or DSM-IV criteria for definition of alcoholism. We also performed linkage analyses on two first principal components derived phenotypes, one derived from the electrophysiological variables, and the other derived from the latent variables. A region on chromosome 2 at 250 cM was found to be linked to both of these derived phenotypes. Further examination of the SNPs in this region identified several haplotypes strongly associated with these derived phenotypes.
  • Comparisons of mutation rate variation at genome-wide microsatellites: evolutionary insights from two cultivated rice and their wild relatives.

    Gao, Li-Zhi; Xu, Hongyan; Department of Biostatistics and Epidemiology (2008-02-13)
    BACKGROUND: Mutation rate (mu) per generation per locus is an important parameter in the models of population genetics. Studies on mutation rate and its variation are of significance to elucidate the extent and distribution of genetic variation, further infer evolutionary relationships among closely related species, and deeply understand genetic variation of genomes. However, patterns of rate variation of microsatellite loci are still poorly understood in plant species. Furthermore, how their mutation rates vary in di-, tri-, and tetra-nucleotide repeats within the species is largely uninvestigated across related plant genomes. RESULTS: Genome-wide variation of mutation rates was first investigated by means of the composite population parameter theta (theta = 4Nmu, where N is the effective population size and mu is the mutation rate per locus per generation) in four subspecies of Asian cultivated rice O. sativa and its three related species, O. rufipogon, O. glaberrima, and O. officinalis. On the basis of three data sets of microsatellite allele frequencies throughout the genome, population mutation rate (theta) was estimated for each locus. Our results reveal that the variation of population mutation rates at microsatellites within each studied species or subspecies of cultivated rice can be approximated with a gamma distribution. The mean population mutation rates of microsatellites do not significantly differ in motifs of di-, tri-, and tetra-nucleotide repeats for the studied rice species. The shape parameter was also estimated for each subspecies of rice as well as other related rice species. Of them, different subspecies of O. sativa possesses similar shape parameters (alpha) of the gamma distribution, while other species extensively vary in their population mutation rates. CONCLUSION: Through the analysis of genome-wide microsatellite data, the population mutation rate can be approximately fitted with a gamma distribution in most of the studied species. In general, different population histories occurred along different lineages may result in the observed variation of population mutation rates at microsatellites among the studied Oryza species.
  • Ranking analysis of F-statistics for microarray data.

    Tan, Yuan-De; Fornage, Myriam; Xu, Hongyan; Department of Biostatistics and Epidemiology (2008-04-15)
    BACKGROUND: Microarray technology provides an efficient means for globally exploring physiological processes governed by the coordinated expression of multiple genes. However, identification of genes differentially expressed in microarray experiments is challenging because of their potentially high type I error rate. Methods for large-scale statistical analyses have been developed but most of them are applicable to two-sample or two-condition data. RESULTS: We developed a large-scale multiple-group F-test based method, named ranking analysis of F-statistics (RAF), which is an extension of ranking analysis of microarray data (RAM) for two-sample t-test. In this method, we proposed a novel random splitting approach to generate the null distribution instead of using permutation, which may not be appropriate for microarray data. We also implemented a two-simulation strategy to estimate the false discovery rate. Simulation results suggested that it has higher efficiency in finding differentially expressed genes among multiple classes at a lower false discovery rate than some commonly used methods. By applying our method to the experimental data, we found 107 genes having significantly differential expressions among 4 treatments at <0.7% FDR, of which 31 belong to the expressed sequence tags (ESTs), 76 are unique genes who have known functions in the brain or central nervous system and belong to six major functional groups. CONCLUSION: Our method is suitable to identify differentially expressed genes among multiple groups, in particular, when sample size is small.
  • A new measure of population structure using multiple single nucleotide polymorphisms and its relationship with FST.

    Xu, Hongyan; Sarkar, Bayazid; George, Varghese; Department of Biostatistics and Epidemiology (2009-03-16)
    BACKGROUND: Large-scale genome-wide association studies are promising for unraveling the genetic basis of complex diseases. Population structure is a potential problem, the effects of which on genetic association studies are controversial. The first step to systematically quantify the effects of population structure is to choose an appropriate measure of population structure for human data. The commonly used measure is Wright's FST. For a set of subpopulations it is generally assumed to be one value of FST. However, the estimates could be different for distinct loci. Since population structure is a concept at the population level, a measure of population structure that utilized the information across loci would be desirable. FINDINGS: In this study we propose an adjusted C parameter according to the sample size from each sub-population. The new measure C is based on the c parameter proposed for SNP data, which was assumed to be subpopulation-specific and common for all loci. In this study, we performed extensive simulations of samples with varying levels of population structure to investigate the properties and relationships of both measures. It is found that the two measures generally agree well. CONCLUSION: The new measure simultaneously uses the marker information across the genome. It has the advantage of easy interpretation as one measure of population structure and yet can also assess population differentiation.
  • Family-based genome-wide association study for simulated data of Framingham Heart Study.

    Xu, Hongyan; Mathew, George; George, Varghese; Department of Biostatistics and Epidemiology (2009-12-18)
    ABSTRACT : Genome-wide association studies (GWAS) have quickly become the norm in dissecting the genetic basis of complex diseases. Family-based association approaches have the advantages of being robust to possible hidden population structure in samples. Most of these methods were developed with limited markers. Their applicability and performance for GWAS need to be examined. In this report, we evaluated the properties of the family-based association method implemented by ASSOC in the S.A.G.E package using the simulated data sets for the Framingham Heart Study, and found that ASSOC is a highly useful tool for GWAS.
  • A new transmission test for affected sib-pair families.

    Xu, Hongyan; George, Varghese; Department of Biostatistics and Epidemiology (2008-05-09)
    Family-based association approaches such as the transmission-disequilibrium test (TDT) are used extensively in the study of genetic traits because they are generally robust to the presence of population structure. However, these approaches necessarily involve recruitment of families, which is more costly and time-consuming than sampling unrelated individuals in the population-based approaches. Therefore, a family-based approach, which has high power, would be appealing because of the gain in time and cost due to the reduced sample size that is required to attain adequate power. Here we introduce a new family-based transmission test using the joint transmission status from affected sib pairs. We show that by including the transmission status of both siblings, our method gives higher power than the TDT design, while maintaining the correct type I error rate. We use the simulated data from affected sib-pair families with rheumatoid arthritis provided by Genetic Analysis Workshop 15 to illustrate our approach.
  • Simultaneous analysis of all single-nucleotide polymorphisms in genome-wide association study of rheumatoid arthritis.

    Mathew, George; Xu, Hongyan; George, Varghese; Department of Biostatistics and Epidemiology (2009-12-18)
    ABSTRACT : The availability of very large number of markers by modern technology makes genome-wide association studies very popular. The usual approach is to test single-nucleotide polymorphisms (SNPs) one at a time for association with disease status. However, it may not be possible to detect marginally significant effects by single-SNP analysis. Simultaneous analysis of SNPs enables detection of even those SNPs with small effect by evaluating the collective impact of several neighboring SNPs. Also, false-positive signals may be weakened by the presence of other neighboring SNPs included in the analysis. We analyzed the North American Rheumatoid Arthritis Consortium data of Genetic Analysis Workshop 16 using HLasso, a new method for simultaneous analysis of SNPs. The simultaneous analysis approach has excellent control of type I error, and many of the previously reported results of single-SNP analyses were confirmed by this approach.