Workshop on Statistics and Data Science
There is an increasing demand for statisticians and data scientists in academia, industry, and government sectors. This workshop is organized to help students see a broad range of problems that statisticians can address at the research level and encourage them to pursue a career in those important areas.
With talks from:
Prof. Xuming He, University of Michigan
Prof. Hongyu Zhao, Yale University
Prof. Lan Wang, University of Minnesota
Prof. Feifang Hu, George Washington University
Prof. Yong Zeng, University of Missouri at Kansas City
Prof. Bo Li, University of Illinois at Urbana-Champaign
Prof. Jingfei Zhang, University of Miami
Speakers and Abstracts
Statistical Methods for Genetic Risk Prediction
08:45-09:30; Prof. Hongyu Zhao, Department of Biostatistics, Yale University
Abstract: Accurate prediction of disease risk based on genetic and other factors is an important goal in human genetics research and precision medicine. Well calibrated prediction models will lead to more effective disease prevention and treatment strategies. Despite the identification of thousands of disease-associated genetic variants through genome-wide association studies (GWAS) in the past decade, accuracy of genetic risk prediction remains moderate for most diseases, which is largely due to the challenges in both identifying all the functionally relevant variants and accurately estimating their effect sizes. In this presentation, we will discuss a number of methods that have been developed in recent years to improve prediction accuracy from jointly estimating effect sizes, incorporating functional annotations, and leveraging genetic correlations among complex diseases. We will demonstrate the utilities of these methods through their applications to a number of complex diseases in large population cohorts, e.g. the UK Biobank data. This is joint work with Yiming Hu, Quongshi Lu, Yixuan Ye, and others.
Sparse Concordance-assisted Learning for Optimal Treatment Decision
09:30-10:15; Prof. Lan Wang, Department of Statistics, University of Minnesota
Abstract: To find optimal decision rule, Fan et al. (2016) proposed an innovative concordance-assisted learning algorithm which is based on maximum rank correlation estimator. It makes better use of the available information through pairwise comparison. However, the objective function is discontinuous and computationally hard to optimize. In this paper, we consider a convex surrogate loss function to solve this problem. In addition, our algorithm ensures sparsity of decision rule and renders easy interpretation. We derive the L2 error bound of the estimated coefficients under ultra-high dimension. Simulation results of various settings and application to a clinical trial for depression treatment both illustrate that the proposed method can still estimate optimal treatment regime successfully when the number of covariates is large. (Joint work with Shuhan Liang, Wenbin Lu and Rui Song)
AI, Big Data, and Data Science
10:40-11:25; Prof. Feifang Hu, Professor of Statistics, George Washington University
Abstract: With modern technology, it becomes easier and easier to collect data (Big Data). Data are not just numbers, but numbers that carry information about a specific setting; need to be interpreted and help us to make decisions (AI). Statisticians (Data Scientists) are experts in: (i) producing useful data; (ii) analyzing data to make meaningful results; and (iii) drawing practical conclusions. In the Big Data and AI era, statisticians (Data Scientists) are face many new challenges. In this presentation, I will talk about: (1) some new challenges of Big Data and AI; (2) the importance of statistics in analyzing Big Data; (3) the role of statisticians in the new Big Data and AI era. Several examples are used to illustrate the success stories of Data Science in the Big Data and AI era.
Real-time Stochastic Volatility Estimation via Filtering Equation for a Partially-observed Heston Model
11:25-12:10 ; Prof. Yong Zeng, Department of Mathematics and Statistics, University of Missouri at Kansas City
Abstract: This talk first briefly reviews a general partially-observed framework of Markov processes with marked point process observations recently proposed for ultra-high frequency data, and the Bayes Estimation via Filtering Equation (BEFE). In recent years, Graphics Processing Units (GPUs) evolved from rendering graphics (linear algebra-like computations) for electronic games and video applications to becoming low-cost and green supercomputing units. With harnessing the newly available GPU high performance computing power in mind and targeting a Heston stochastic volatility model, we develop a new easily-parallelized, uniformly consistent, recursive algorithm via BEFE for propagating and updating the joint posterior distributions. We show that the recursive algorithm is well suited for GPU parallel computing. We present simulation and empirical results obtained from supercomputers to demonstrate that the recursive algorithm works. Real time tracking and feeding stochastic volatility is made possible. This talk consists joint works with B. Bundick and J. Yin.
Statistical Developments and Challenges in Past Climate Reconstruction
14:30-15:15; Prof. Bo Li, Department of Statistics, University of Illinois at Urbana-Champaign
Abstract: Understanding the dynamics of climate change in its full richness requires the knowledge of long temperature time series. Although long- term, widely distributed temperature observations are not available, there are other forms of data, known as climate proxies that can have a statistical relationship with temperatures and have been used to infer temperatures in the past before direct measurements. How to effectively use climate proxies and other available information to recover the past climate is statistically challenging. I will first present some examples of new statistical methods that either improve the climate reconstruction compared to the traditional methods and/or help to quantify the uncertainty of the reconstruction more rigorously. The examples are a blend of frequentist and Bayesian methods. After that I will discuss about the remaining statistical challenges in this research area.
Network Response Regression for Modeling Population of Networks with Covariates
15:15-16:00; Prof. Jingfei Zhang, University of Miami Business School
Abstract: Multiple-network data are fast emerging in recent years, where a separate network over a common set of nodes is measured for each individual subject, along with rich subject covariates information. Existing network analysis methods have primarily focused on modeling a single network, and are not directly applicable to multiple networks, especially multiple networks with subject covariates. In this work, we propose a new network response regression model, where the observed networks are treated as matrix-valued responses, and the individual covariates as predictors. The new model characterizes the population-level connectivity pattern through a low-rank intercept matrix, and the parsimonious effects of subject covariates on the network through a sparse slope tensor. We formulate the parameter estimation as a non-convex optimization problem, and develop an efficient alternating gradient descent algorithm. We establish the non-asymptotic error bound for the actual estimator from our optimization algorithm. Built upon this error bound, we derive the strong consistency for network community recovery, as well as the edge selection consistency. We demonstrate the efficacy of our method through two brain connectivity studies.
Statistics and Data Science Programs at University of Michigan
16:00-16:45; Prof. Xuming He, Department of Statistics, University of Michigan
Abstract: In this session I will introduce graduate programs of Statistics and Data Science at University of Michigan that might be of interest to students at CUHK – Shenzhen. An agreement between CUHK(SZ) and the University of Michigan makes it possible for a student in CUHK(SZ) to spend two years in Michigan after three years in Shenzhen to complete a Bachelor’s degree at CUHK and a Master’s degree from University of Michigan. I will also introduce two other Master’s programs at Michigan, one in Data Science, and the other in Quantitative Finance and Risk Management. I will address any questions that the students may have about graduate education in statistics and data science at Michigan and other American universities.
---
All of you are warmly welcomed!