Linda Zhao

Professor of Statistics at The Wharton School

Schools

  • The Wharton School

Links

Biography

The Wharton School

After getting her Ph.D in Mathematics/Statistics from Cornell University , Linda taught in UCLA, Los Angeles for one year. She joined the Wharton School in 1994. She obtained a BS degree from the Mathematics department of Nankai University, China.

Linda’s research area covers from Beysian analysis, Nonparametric analysis and Numerical computation. She mainly publishes in international leading journals. Current on going projects include forecasting house prices, inference for high dimensional data, data with measurement errors and post model selection inferences. Linda also enjoys teaching very much.

Selected Publications

Zhao, L. H. (2000) Bayesian aspects of some nonparametric problems, The Annals of Statistics, 28, 532552

Mao, V. and Zhao, L. H. (2003) Free knot polynomial splines with confidence intervals, Journal of the Royal Statistical Society, Series B, 65, 901919

Brown, L. D., Wang, Y. and Zhao, L. H. (2003) On the statistical equivalence at suitable frequencies of GARCH and stochastic volatility models with the corresponding diffusion model, Statistica Sinica, 9931013

Brown, L. D., Mandelbaum, A., Sakov, A., Shen, H., Zeltyn, S. and Zhao, L. H. (2005) Statistical analysis of a telephone call center: A queueingscience perspective, Journal of the American Statistical Association, 100, 3650

Cai, T., Low, M. and Zhao, L.H. (2007) Tradeoffs between global and local risks in nonparametric function estimation, Bernoulli, 13, 119

Berk, R., Brown, L.B. and Zhao, L. (2010) Statistical inference after model selection, Journal of Quantitative Criminology, 26, 217236

Raykar, V., Yu, S., Zhao, L., .Valadez, G., Florin, C., Bogoni, L. and Moy, L. (2010) Learning from crowds, Journal of Machine Learning Research, 11, 1297–1322

Brown, L. D., Cai, T., Zhang, R., Zhao, L. H. and Zhou, H. (2010) The rootunroot algorithm for density estimation as implemented via wavelet block thresholding, Probability Theory and Related Field, 146, 401433

Raykar, V. and Zhao, L. (2010) Nonparametric prior for adaptive sparsity, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR: 629636

Nagaraja, C. H., Brown, L.D. and Zhao, L. (2010) An autoregressive approach to house price modeling, to appear The Annals of Applied Statistics

Andreas Buja, Richard A. Berk, Lawrence D. Brown, Edward I. George, Emil Pitkin, Mikhail Traskin, Kai Zhang, Linda Zhao (2015), Models as Approximations: How Random Predictors and Model Violations Invalidate Classical Inference in Regression, Statistical Science, (in press).

Richard A. Berk, Lawrence D. Brown, Andreas Buja, Edward I. George, Emil Pitkin, Kai Zhang, Linda Zhao (2014), Misspecified Mean Function Regression: Making Good Use of Regression Models That Are Wrong, Sociological Methods & Research, 43 (3), pp. 422445.

Abstract: There are over three decades of largely unrebutted criticism of regression analysis as practiced in the social sciences. Yet, regression analysis broadly construed remains for many the method of choice for characterizing conditional relationships. One possible explanation is that the existing alternatives sometimes can be seen by researchers as unsatisfying. In this paper, we provide a di↵erent formulation. We allow the regression model to be incorrect and consider what can be learned nevertheless. To this end, the search for a correct model is abandoned. We o↵er instead a rigorous way to learn from regression approximations. These approximations, not “the truth,” are the estimation targets. We provide estimators that are asymptotically unbiased and standard errors that are asymptotically correct even when there are important specification errors. Both can be obtained easily from popular statistical packages.

Kai Zhang, Lawrence D. Brown, Edward I. George, Linda Zhao (2014), Uniform Correlation Mixture of Bivariate Normal Distributions and Hypercubically Contoured Densities That Are Marginally Normal, American Statistician, 68 (3), pp. 183187.

Richard A. Berk, Emil Pitkin, Lawrence D. Brown, Andreas Buja, Edward I. George, Linda Zhao (2014), Covariance Adjustments for the Analysis of Randomized Field Experiments, Evaluation Review, 37 (34), pp. 170196.

Abstract: It has become common practice to analyze randomized experiments using linear regression with covariates. Improved precision of treatment effect estimates is the usual motivation. In a series of important articles, David Freedman showed that this approach can be badly flawed. Recent work by Winston Lin offers partial remedies, but important problems remain.

Igar Fuki, Lawrence D. Brown, Xu Han, Linda Zhao (2014), Hunting for Significance: Bayesian Classifiers Under a Mixture Loss Function , Journal of Statistical Planning and Inference, 154, pp. 6271.

Abstract: Detecting significance in a highdimensional sparse data structure has received a large amount of attention in modern statistics. In the current paper, we introduce a compound decision rule to simultaneously classify signals from noise. This procedure is a Bayes rule subject to a mixture loss function. The loss function minimizes the number of false discoveries while controlling the false nondiscoveries by incorporating the signal strength information. Based on our criterion, strong signals will be penalized more heavily for nondiscovery than weak signals. In constructing this classification rule, we assume a mixture prior for the parameter which adapts to the unknown sparsity. This Bayes rule can be viewed as thresholding the “local fdr” (Efron, 2007) by adaptive thresholds. Both parametric and nonparametric methods will be discussed. The nonparametric procedure adapts to the unknown data structure well and outperforms the parametric one. Performance of the procedure is illustrated by various simulation studies and a real data application.

Emil Pitkin, Richard A. Berk, Lawrence D. Brown, Andreas Buja, Edward I. George, Kai Zhang, Linda Zhao (Under Review), Improved Precision in Estimating Average Treatment Effects.

Richard A. Berk, Lawrence D. Brown, Andreas Buja, Kai Zhang, Linda Zhao (2013), Valid PostSelection Inference, Annals of Statistics, 41, pp. 802837.

Richard A. Berk, Lawrence D. Brown, Andreas Buja, Edward I. George, Emil Pitkin, Mikhail Traskin, Kai Zhang, Linda Zhao, “What You Can Learn From Wrong Causal Models”. In Handbook of Causal Analysis for Social Research, edited by Stephen Morgan, (2013), pp. 403424

Lawrence D. Brown and Linda Zhao (2012), A Geometrical Explanation of Stein Shrinkage , Statistical Science 27, 24 30.

C. H. Nagaraja, Lawrence D. Brown, Linda Zhao (2011), An Autoregressive Approach to House Price Modeling , The Annals of Applied Statistics, 5, pp. 124149.

Past Courses

STAT101 INTRO BUSINESS STAT

Data summaries and descriptive statistics; introduction to a statistical computer package; Probability: distributions, expectation, variance, covariance, portfolios, central limit theorem; statistical inference of univariate data; Statistical inference for bivariate data: inference for intrinsically linear simple regression models. This course will have a business focus, but is not inappropriate for students in the college.

STAT102 INTRO BUSINESS STAT

Continuation of STAT 101. A thorough treatment of multiple regression, model selection, analysis of variance, linear logistic regression; introduction to time series. Business applications.

STAT111 INTRODUCTORY STATISTICS

Introduction to concepts in probability. Basic statistical inference procedures of estimation, confidence intervals and hypothesis testing directed towards applications in science and medicine. The use of the JMP statistical package.

STAT112 INTRODUCTORY STATISTICS

Further development of the material in STAT 111, in particular the analysis of variance, multiple regression, nonparametric procedures and the analysis of categorical data. Data analysis via statistical packages.

STAT431 STATISTICAL INFERENCE

Graphical displays; one and twosample confidence intervals; one and twosample hypothesis tests; one and twoway ANOVA; simple and multiple linear leastsquares regression; nonlinear regression; variable selection; logistic regression; categorical data analysis; goodnessoffit tests. A methodology course. This course does not have business applications but has significant overlap with STAT 101 and 102.

STAT471 MODERN DATA MINING

Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging reallife data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class.

STAT511 STATISTICAL INFERENCE

Graphical displays; one and twosample confidence intervals; one and twosample hypothesis tests; one and twoway ANOVA; simple and multiple linear leastsquares regression; nonlinear regression; variable selection; logistic regression; categorical data analysis; goodnessoffit tests. A methodology course.

STAT520 APPLIED ECONOMETRICS I

This is a course in econometrics for graduate students. The goal is to prepare students for empirical research by studying econometric methodology and its theoretical foundations. Students taking the course should be familiar with elementary statistical methodology and basic linear algebra, and should have some programming experience. Topics include conditional expectation and linear projection, asymptotic statistical theory, ordinary least squares estimation, the bootstrap and jackknife, instrumental variables and twostage least squares, specification tests, systems of equations, generalized least squares, and introduction to use of linear panel data models.

STAT571 MODERN DATA MINING

Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging reallife data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class.

STAT701 MODERN DATA MINING

Modern Data Mining: Statistics or Data Science has been evolving rapidly to keep up with the modern world. While classical multiple regression and logistic regression technique continue to be the major tools we go beyond to include methods built on top of linear models such as LASSO and Ridge regression. Contemporary methods such as KNN (K nearest neighbor), Random Forest, Support Vector Machines, Principal Component Analyses (PCA), the bootstrap and others are also covered. Text mining especially through PCA is another topic of the course. While learning all the techniques, we keep in mind that our goal is to tackle real problems. Not only do we go through a large collection of interesting, challenging reallife data sets but we also learn how to use the free, powerful software "R" in connection with each of the methods exposed in the class.

STAT927 BAYESIAN STATISTICS

A course in Bayesian statistical theory and methods. Axiomatic developments of utility theory and subjective probability, and elements of Bayesian theory.

STAT932 SURV MODELS & ANALY METH

Parametric models, nonparametric methods for oneand twosample problems, proportional hazards model, inference based on ranks. Problems will be considered from clinical trials, toxicology and tumorigenicity studies, and epidemiological studies.

STAT940 ADVANCED INFERENCE I

The topics covered will change from year to year. Typical topics include sequential analysis, nonparametric function estimation, robustness, bootstrapping and applications decision theory, likelihood methods, and mixture models.

STAT941 ADVANCE INFERENCE II

A continuation of STAT 940.

Read about executive education

Other experts

Anne De Tinguy

Holds degrees from the University of Paris 1 Sorbonne, Institut National des Langues et Civilisations Orientales and Sciences Po and a Ph.D. in political science (1981). Former fellow (1989-1990) of the ’Institut des hautes études de défense nationale (IHEDN). Former research fellow with the CNRS...

Luc Wathieu

Bio and Featured Works Luc Wathieu is Professor of Marketing at Georgetown University McDonough School of Business, where he was Deputy Dean from 2013 to 2017. Prior to joining Georgetown in 2010, he served as Associate Dean of Faculty and the Ferrero Chair in International Marketing at the Euro...

Jack Hu

S. Jack Hu was appointed the Vice President for Research at the University of Michigan, effective January 1, 2016. He is also the J. Reid and Polly Anderson Professor of Manufacturing, Professor of Mechanical Engineering, and Professor of Industrial and Operations Engineering in the College of En...

Looking for an expert?

Contact us and we'll find the best option for you.

Something went wrong. We're trying to fix this error.