• Anthropometric Predictors of Type 2 Diabetes Among White and Black Adults

      Hardy, Dale S.; Stallings, Davita T.; Garvin, Jane; Gachupin, Francine C.; Xu, Hongyan; Racette, Susan B. (2015-06)
      Objectives: To determine the best anthropometric measures for discrimination of type 2 diabetes (T2DM) among White and Black males and females: a body shape index (ABSI); body adiposity index (BAI); body mass index (BMI); waist circumference (WC); waist to height ratio (WHtR); waist to hip ratio (WHR); To identify Youden index cut-points for each anthropometric measure.
    • Comparisons of mutation rate variation at genome-wide microsatellites: evolutionary insights from two cultivated rice and their wild relatives.

      Gao, Li-Zhi; Xu, Hongyan; Department of Biostatistics and Epidemiology (2008-02-13)
      BACKGROUND: Mutation rate (mu) per generation per locus is an important parameter in the models of population genetics. Studies on mutation rate and its variation are of significance to elucidate the extent and distribution of genetic variation, further infer evolutionary relationships among closely related species, and deeply understand genetic variation of genomes. However, patterns of rate variation of microsatellite loci are still poorly understood in plant species. Furthermore, how their mutation rates vary in di-, tri-, and tetra-nucleotide repeats within the species is largely uninvestigated across related plant genomes. RESULTS: Genome-wide variation of mutation rates was first investigated by means of the composite population parameter theta (theta = 4Nmu, where N is the effective population size and mu is the mutation rate per locus per generation) in four subspecies of Asian cultivated rice O. sativa and its three related species, O. rufipogon, O. glaberrima, and O. officinalis. On the basis of three data sets of microsatellite allele frequencies throughout the genome, population mutation rate (theta) was estimated for each locus. Our results reveal that the variation of population mutation rates at microsatellites within each studied species or subspecies of cultivated rice can be approximated with a gamma distribution. The mean population mutation rates of microsatellites do not significantly differ in motifs of di-, tri-, and tetra-nucleotide repeats for the studied rice species. The shape parameter was also estimated for each subspecies of rice as well as other related rice species. Of them, different subspecies of O. sativa possesses similar shape parameters (alpha) of the gamma distribution, while other species extensively vary in their population mutation rates. CONCLUSION: Through the analysis of genome-wide microsatellite data, the population mutation rate can be approximately fitted with a gamma distribution in most of the studied species. In general, different population histories occurred along different lineages may result in the observed variation of population mutation rates at microsatellites among the studied Oryza species.
    • Family-based genome-wide association study for simulated data of Framingham Heart Study.

      Xu, Hongyan; Mathew, George; George, Varghese; Department of Biostatistics and Epidemiology (2009-12-18)
      ABSTRACT : Genome-wide association studies (GWAS) have quickly become the norm in dissecting the genetic basis of complex diseases. Family-based association approaches have the advantages of being robust to possible hidden population structure in samples. Most of these methods were developed with limited markers. Their applicability and performance for GWAS need to be examined. In this report, we evaluated the properties of the family-based association method implemented by ASSOC in the S.A.G.E package using the simulated data sets for the Framingham Heart Study, and found that ASSOC is a highly useful tool for GWAS.
    • A gene-based approach for testing association of rare alleles

      Xu, Hongyan; George, Varghese; Department of Biostatistics and Epidemiology (2011-11-29)
      Rare genetic variants have been shown to be important to the susceptibility of common human diseases. Methods for detecting association of rare genetic variants are drawing much attention. In this report, we applied a gene-based approach to the 200 simulated data sets of unrelated individuals. The test can detect the association of some genes with multiple rare variants.
    • A Monte Carlo test of linkage disequilibrium for single nucleotide polymorphisms

      Xu, Hongyan; George, Varghese; Department of Biostatistics and Epidemiology (2011-04-14)
      Background: Genetic association studies, especially genome-wide studies, make use of linkage disequilibrium(LD) information between single nucleotide polymorphisms (SNPs). LD is also used for studying genome structure and has been valuable for evolutionary studies. The strength of LD is commonly measured by r2, a statistic closely related to the Pearson's x2 statistic. However, the computation and testing of linkage disequilibrium using r2 requires known haplotype counts of the SNP pair, which can be a problem for most population-based studies where the haplotype phase is unknown. Most statistical genetic packages use likelihood-based methods to infer haplotypes. However, the variability of haplotype estimation needs to be accounted for in the test for linkage disequilibrium.
    • A new measure of population structure using multiple single nucleotide polymorphisms and its relationship with FST.

      Xu, Hongyan; Sarkar, Bayazid; George, Varghese; Department of Biostatistics and Epidemiology (2009-03-16)
      BACKGROUND: Large-scale genome-wide association studies are promising for unraveling the genetic basis of complex diseases. Population structure is a potential problem, the effects of which on genetic association studies are controversial. The first step to systematically quantify the effects of population structure is to choose an appropriate measure of population structure for human data. The commonly used measure is Wright's FST. For a set of subpopulations it is generally assumed to be one value of FST. However, the estimates could be different for distinct loci. Since population structure is a concept at the population level, a measure of population structure that utilized the information across loci would be desirable. FINDINGS: In this study we propose an adjusted C parameter according to the sample size from each sub-population. The new measure C is based on the c parameter proposed for SNP data, which was assumed to be subpopulation-specific and common for all loci. In this study, we performed extensive simulations of samples with varying levels of population structure to investigate the properties and relationships of both measures. It is found that the two measures generally agree well. CONCLUSION: The new measure simultaneously uses the marker information across the genome. It has the advantage of easy interpretation as one measure of population structure and yet can also assess population differentiation.
    • A new transmission test for affected sib-pair families.

      Xu, Hongyan; George, Varghese; Department of Biostatistics and Epidemiology (2008-05-09)
      Family-based association approaches such as the transmission-disequilibrium test (TDT) are used extensively in the study of genetic traits because they are generally robust to the presence of population structure. However, these approaches necessarily involve recruitment of families, which is more costly and time-consuming than sampling unrelated individuals in the population-based approaches. Therefore, a family-based approach, which has high power, would be appealing because of the gain in time and cost due to the reduced sample size that is required to attain adequate power. Here we introduce a new family-based transmission test using the joint transmission status from affected sib pairs. We show that by including the transmission status of both siblings, our method gives higher power than the TDT design, while maintaining the correct type I error rate. We use the simulated data from affected sib-pair families with rheumatoid arthritis provided by Genetic Analysis Workshop 15 to illustrate our approach.
    • Ranking analysis of F-statistics for microarray data.

      Tan, Yuan-De; Fornage, Myriam; Xu, Hongyan; Department of Biostatistics and Epidemiology (2008-04-15)
      BACKGROUND: Microarray technology provides an efficient means for globally exploring physiological processes governed by the coordinated expression of multiple genes. However, identification of genes differentially expressed in microarray experiments is challenging because of their potentially high type I error rate. Methods for large-scale statistical analyses have been developed but most of them are applicable to two-sample or two-condition data. RESULTS: We developed a large-scale multiple-group F-test based method, named ranking analysis of F-statistics (RAF), which is an extension of ranking analysis of microarray data (RAM) for two-sample t-test. In this method, we proposed a novel random splitting approach to generate the null distribution instead of using permutation, which may not be appropriate for microarray data. We also implemented a two-simulation strategy to estimate the false discovery rate. Simulation results suggested that it has higher efficiency in finding differentially expressed genes among multiple classes at a lower false discovery rate than some commonly used methods. By applying our method to the experimental data, we found 107 genes having significantly differential expressions among 4 treatments at <0.7% FDR, of which 31 belong to the expressed sequence tags (ESTs), 76 are unique genes who have known functions in the brain or central nervous system and belong to six major functional groups. CONCLUSION: Our method is suitable to identify differentially expressed genes among multiple groups, in particular, when sample size is small.
    • Simultaneous analysis of all single-nucleotide polymorphisms in genome-wide association study of rheumatoid arthritis.

      Mathew, George; Xu, Hongyan; George, Varghese; Department of Biostatistics and Epidemiology (2009-12-18)
      ABSTRACT : The availability of very large number of markers by modern technology makes genome-wide association studies very popular. The usual approach is to test single-nucleotide polymorphisms (SNPs) one at a time for association with disease status. However, it may not be possible to detect marginally significant effects by single-SNP analysis. Simultaneous analysis of SNPs enables detection of even those SNPs with small effect by evaluating the collective impact of several neighboring SNPs. Also, false-positive signals may be weakened by the presence of other neighboring SNPs included in the analysis. We analyzed the North American Rheumatoid Arthritis Consortium data of Genetic Analysis Workshop 16 using HLasso, a new method for simultaneous analysis of SNPs. The simultaneous analysis approach has excellent control of type I error, and many of the previously reported results of single-SNP analyses were confirmed by this approach.