Data Science Seminar
Hosted by Department of Mathematical Sciences

Abstract

Gene-environment (G×E) interactions have important implications to elucidate the etiology of complex diseases beyond the main genetic and environmental effects. Outliers and data contamination in disease phenotypes of G×E studies have been commonly encountered, leading to the development of a broad spectrum of robust regularization methods. Nevertheless, within the Bayesian framework, the issue has not been taken care of in existing studies. In this talk, I will present a robust Bayesian variable selection method for G×E interaction studies. The proposed Bayesian method can effectively accommodate heavy-tailed errors and outliers in the response variable while conducting variable selection by accounting for structural sparsity. In particular, for the robust sparse group selection, the spike-and-slab priors have been imposed on both individual and group levels to identify important main and interaction effects robustly. An efficient Gibbs sampler has been developed to facilitate fast computation. Extensive simulation studies and analysis of both the diabetes data with SNP measurements from the Nurses' Health Study and TCGA melanoma data with gene expression measurements demonstrate the superior performance of the proposed method over multiple competing alternatives.


Biography of the speaker: Dr. Wu is an Associate Professor of Statistics at Kansas State University, and a faculty scientist at the Johnson Cancer Research Center of Kansas State University. His current research has mainly been motivated by data contamination and heavy-tailed distributions that widely exist in disease phenotypes and multi-level omics measurements from cancers and other complex diseases. Tackling these problems in a high dimensional setting demands robust variable selection methods, within both the frequentist and Bayesian frameworks. Dr. Wu’s statistical methodological work include Bayesian sparse learning, high/ultra-high dimensional robust variable selection, and integrative analysis of cancer genomics data from multiple platforms.