DataCite Commons: Nonparametric Bayes methods for high dimensional data and group sequential design for longitudinal trials

High-dimensional unordered categorical data appear in a number of areas ranging from epidemiology, behavioral and social sciences, etc. Such data can be placed into a large contingency table with cell counts defined as the number of subjects with a given combination of variables values. The contingency table is often sparse in practice in the sense that only a few cells have more than a few counts, with most cells being empty. Traditional approaches for contingency table analysis fail to scale up to moderate dimensions, and alternative approaches based on tensor decomposition are promising. This motivates us to develop sparse tensor decompositions for multivariate categorical variables where the number of variables can be potentially larger than the sample size. The methods are shown to have excellent performance in simulations, and results in various data sets are presented. In paper 2, we consider such high-dimensional data in case-control studies, with the main goal being detection of the sparse subset of predictors having a significant association with disease. We propose a new approach based on a nonparametric Bayesian low rank tensor factorization to model the retrospective likelihood. Our model allows a very flexible structure in characterizing the distribution of multivariate variables as unknown and without any linearity assumptions as in logistic regression. Predictors are excluded only if they have no impact on disease risk, either directly or through interactions with other predictors. Hence, we obtain an omnibus approach for screening for important predictors. Computation relies on an efficient Gibbs sampler. The methods are shown to have higher power and lower false discovery rates in simulation studies relative to existing methods, and we consider an application to an epidemiologic study of birth defects. In paper 3, our goal is to design a longitudinal trial using group sequential design. We propose an information-based sample size re-estimation method to update the sample size at each interim analysis, which maintains the target power while controlling the type-I error rate. We illustrate our strategy by data analysis examples and simulations and compare the results with those obtained using fixed design and group-sequential design without sample size re-estimation.

Dissertation published 2014 in Odum Institute Dataverse

TextEnglish

https://doi.org/10.17615/wxwk-3w37

Nonparametric Bayes methods for high dimensional data and group sequential design for longitudinal trials

Cite as

Download Reports

Nonparametric Bayes methods for high dimensional data and group sequential design for longitudinal trials

Cite as

Download Reports

Nonparametric Bayes methods for high dimensional data and group sequential design for longitudinal trials

Cite as

Download Reports

Share

Nonparametric Bayes methods for high dimensional data and group sequential design for longitudinal trials

Cite as

Download Reports

Share