Capstone Seminar

Fall 2017

Abstract: Diabetes data resulted from a study conducted at the Stanford Clinical Research Center of the relationship between the three clinical classifications and five measurements for 145 instances. It helps the diagnosis and appropriate treatment to the diabetes patients. Based on all five variables representing the problematic multivariate Gaussian distributions and the inappropriate assumption of equal covariance matrices, misclassification rates in the leave-one-out cross-validation are 11% for LDA and 9.7% for QDA. Thus we will talk about the LDA via Multiple Regression and Logistic Discrimination, which is a semi-parametric model and asymptotically less efficient than is Gaussian LDA. In the view of Logistic Discrimination, we can use data transformation, variable selection techniques, polynomial regression, and other powerful methods to improve the result of LDA and QDA. Then we verify that when the Gaussian distributional assumptions or the common covariance matrix assumption are not satisfied, Logistic discrimination performs much better, and is more robust to non-normality than Gaussian LDA.

Spring 2017