Andreas Buja

The Liem Sioe Liong/First Pacific Company Professor at The Wharton School

Schools

The Wharton School

Expertise

Biography

The Wharton School

Education

PhD, Swiss Federal Institute of Technology (ETHZ), 1980

Career and Recent Professional Awards

Fellow, American Statistical Association, 1994

Academic Positions Held

Wharton: 2002present (name Liem Sioe Liong/ First Pacific Company Professor, 2003).
Previous appointment: University of Washington, Seattle. Visiting appointment: Stanford University

Other Positions

Member, Technical Staff, Bellcore/Telcordia, 198794
Member, Technical Staff, AT&T Bell Labs, 199496
Technology Consultant, AT&T Labs, 19962001

Professional Leadership

Editor, Journal of Computational and Graphical Statistics, 19972001
Advisory Editor, Journal of Computational and Graphical Statistics, 2001present

For more information, go to My Personal Page

Andreas Buja and Wolfgang Rolke (Forthcoming), Calibration for Simultaneity: (Re)sampling Methods for Simultaneous Inference with Applications to Function Estimation and Functional Data,.

Abstract: We survey and illustrate a Monte Carlo technique for carrying out simple simultaneous inference with arbitrarily many statistics. Special cases of the technique have appeared in the literature, but there exists widespread unawareness of the simplicity and broad applicability of this solution to simultaneous inference. The technique, here called “calibration for simultaneity" or CfS , consists of 1) limiting the search for coverage regions to a oneparameter family of nested regions, and 2) selecting from the family that region whose estimated coverage probability has the desired value. Natural oneparameter families are almost always available. CfS applies whenever inference is based on a single distribution, for example: 1) fixed distributions such as Gaussians when diagnosing distributional assumptions, 2) conditional null distributions in exact tests with Neyman structure, in particular permutation tests, 3) bootstrap distributions for bootstrap standard error bands, 4) Bayesian posterior distributions for highdimensional posterior probability regions, or 5) predictive distributions for multiple prediction intervals. CfS is particularly useful for estimation of any type of function, such as empirical QQ curves, empirical CDFs, density estimates, smooths, generally any type of _t, and functions estimated from functional data. A special case of CfS is equivalent to pvalue adjustment (Westfall and Young, 1993). Conversely, the notion of a pvalue can be extended to any simultaneous coverage problem that is solved with a oneparameter family of coverage regions.

Andreas Buja, W. Stuetzle, Y, Shen (Under Review), Loss Functions for Binary Class Probability Estimation and Classi cation: Structure and Applications.

Abstract: What are the natural loss functions or fitting criteria for binary class probability estimation? This question has a simple answer: socalled “proper scoring rules”, that is, functions that score probability estimates in view of data in a Fisherconsistent manner. Proper scoring rules comprise most loss functions currently in use: logloss, squared error loss, boosting loss, and as limiting cases costweighted misclassification losses. Proper scoring rules have a rich structure: • Every proper scoring rules is a mixture (limit of sums) of costweighted misclassification losses. The mixture is specified by a weight function (or measure) that describes which misclassification cost weights are most emphasized by the proper scoring rule. • Proper scoring rules permit Fisher scoring and Iteratively Reweighted LS algorithms for model fitting. The weights are derived from a link function and the above weight function. • Proper scoring rules are in a 11 correspondence with information measures for treebased classification. • Proper scoring rules are also in a 11 correspondence with Bregman distances that can be used to derive general approximation bounds for costweighted misclassification errors, as well as generalized biasvariance decompositions. We illustrate the use of proper scoring rules with novel criteria for 1) Hand and Vinciotti’s (2003) localized logistic regression and 2) for interpretable classification trees. We will also discuss connections with exponential loss used in boosting.

Andreas Buja, Abba M. Krieger, Edward I. George, “A Tool for Mining Large Correlation Tables: The Association Navigator”. In Handbook of Big Data, edited by Peter Bühlmann, Petros Drineas, Michael Kane, Mark van der Laan, (2016), pp. 73102

Dan Yang, Zongming Ma, Andreas Buja (2016), Rate optimal denoising of simultaneously sparse and low rank matrices, Journal of Machine Learning Research, 17 (92), pp. 127.

Andreas Buja, Richard A. Berk, Lawrence D. Brown, Edward I. George, Emil Pitkin, Mikhail Traskin, Kai Zhang, Linda Zhao (2015), Models as Approximations: How Random Predictors and Model Violations Invalidate Classical Inference in Regression, Statistical Science, (in press).

Dan Yang, Zongming Ma, Andreas Buja (2014), A sparse singular value decomposition method for highdimensional data, Journal of Computational and Graphical Statistics, 23 (4), pp. 923942.

Richard A. Berk, Lawrence D. Brown, Andreas Buja, Edward I. George, Emil Pitkin, Kai Zhang, Linda Zhao (2014), Misspecified Mean Function Regression: Making Good Use of Regression Models That Are Wrong, Sociological Methods & Research, 43 (3), pp. 422445.

Abstract: There are over three decades of largely unrebutted criticism of regression analysis as practiced in the social sciences. Yet, regression analysis broadly construed remains for many the method of choice for characterizing conditional relationships. One possible explanation is that the existing alternatives sometimes can be seen by researchers as unsatisfying. In this paper, we provide a di↵erent formulation. We allow the regression model to be incorrect and consider what can be learned nevertheless. To this end, the search for a correct model is abandoned. We o↵er instead a rigorous way to learn from regression approximations. These approximations, not “the truth,” are the estimation targets. We provide estimators that are asymptotically unbiased and standard errors that are asymptotically correct even when there are important specification errors. Both can be obtained easily from popular statistical packages.

Kartik Hosanagar, Daniel Fleder, Dokyun Lee, Andreas Buja (2014), Will the Global Village Fracture into Tribes?, Management Science, 60 (4), pp. 805823.

Richard A. Berk, Emil Pitkin, Lawrence D. Brown, Andreas Buja, Edward I. George, Linda Zhao (2014), Covariance Adjustments for the Analysis of Randomized Field Experiments, Evaluation Review, 37 (34), pp. 170196.

Abstract: It has become common practice to analyze randomized experiments using linear regression with covariates. Improved precision of treatment effect estimates is the usual motivation. In a series of important articles, David Freedman showed that this approach can be badly flawed. Recent work by Winston Lin offers partial remedies, but important problems remain.

Sivan AldorNoiman, Lawrence D. Brown, Andreas Buja, Wolfgang Rolke, Robert A. Stine (2013), The Power to See: A New Graphical Test of Normality, The American Statistician , 67 (4), pp. 249260.

Abstract: Many statistical procedures assume that the underlying datagenerating process involves Gaussian errors. Among the popular tests for normality, only the Kolmogorov–Smirnov test has a graphical representation. Alternative tests, such as the Shapiro–Wilk test, offer little insight as to how the observed data deviate from normality. In this article, we discuss a simple new graphical procedure which provides simultaneous confidence bands for a normal quantile–quantile plot. These bands define a test of normality and are narrower in the tails than those related to the Kolmogorov–Smirnov test. Correspondingly, the new procedure has greater power to detect deviations from normality in the tails. Supplementary materials for this article are available online.

Past Courses

STAT101 INTRO BUSINESS STAT

Data summaries and descriptive statistics; introduction to a statistical computer package; Probability: distributions, expectation, variance, covariance, portfolios, central limit theorem; statistical inference of univariate data; Statistical inference for bivariate data: inference for intrinsically linear simple regression models. This course will have a business focus, but is not inappropriate for students in the college.

STAT102 INTRO BUSINESS STAT

Continuation of STAT 101. A thorough treatment of multiple regression, model selection, analysis of variance, linear logistic regression; introduction to time series. Business applications.

STAT470 DATA ANALY & STAT COMP

This course will introduce a highlevel programming language, called R, that is widely used for statistical data analysis. Using R, we will study and practice the following methodologies: data cleaning, feature extraction; web scrubbing, text analysis; data visualization; fitting statistical models; simulation of probability distributions and statistical models; statistical inference methods that use simulations (bootstrap, permutation tests).

STAT503 DATA ANALY & STAT COMP

STAT621 ACC REGRESSION ANALYSIS

STAT 621 is intended for students with recent, practical knowledge of the use of regression analysis in the context of business applications. This course covers the material of STAT 613, but omits the foundations to focus on regression modeling. The course reviews statistical hypothesis testing and confidence intervals for the sake of standardizing terminology and introducing software, and then moves into regression modeling. The pace presumes recent exposure to both the theory and practice of regression and will not be accommodating to students who have not seen or used these methods previously. The interpretation of regression models within the context of applications will be stressed, presuming knowledge of the underlying assumptions and derivations. The scope of regression modeling that is covered includes multiple regression analysis with categorical effects, regression diagnostic procedures, interactions, and time series structure. The presentation of the course relies on computer software that will be introduced in the initial lectures.

STAT770 DATA ANALY & STAT COMP

STAT926 MULTIVARIATE ANALY: METH

This is a course that prepares PhD students in statistics for research in multivariate statistics and data visualization. The emphasis will be on a deep conceptual understanding of multivariate methods to the point where students will propose variations and extensions to existing methods or whole new approaches to problems previously solved by classical methods. Topics include: principal component analysis, canonical correlation analysis, generalized canonical analysis; nonlinear extensions of multivariate methods based on optimal transformations of quantitative variables and optimal scaling of categorical variables; shrinkage and sparsitybased extensions to classical methods; clustering methods of the kmeans and hierarchical varieties; multidimensional scaling, graph drawing, and manifold estimation.

STAT961 STATISTICAL METHODOLOGY

This is a course that prepares 1st year PhD students in statistics for a research career. This is not an applied statistics course. Topics covered include: linear models and their highdimensional geometry, statistical inference illustrated with linear models, diagnostics for linear models, bootstrap and permutation inference, principal component analysis, smoothing and crossvalidation.

Infovis best paper award for the article “Graphical inference for infovis” by Wickham, H., Cook, D., Hofmann, H., and Buja, A. IEEE Transactions on Visualization and Computer Graphics (Proc. InfoVis’10)., 2010
Journal of Marketing, finalist for the Harold H. Maynard Award and featured blog article of the October Issue, 2007
Fellow, Institute of Mathematical Statistics, 2006
IMS Medallion lecture, Joint Statistical Meetings, New York, 2002
Keynote speaker, European Meeting of the Psychometric Society, Leiden, 1995
Fellow, American Statistical Association, 1994
Award Medal for diploma thesis in mathematics, Swiss Federal Institute of Technology, 1975

Knowledge @ Wharton

Different Worlds: Do Recommender Systems Fragment Consumers’ Interests?, Knowledge @ Wharton 08/31/2011

Read about executive education

Other experts

Paul Leonardi

Paul Leonardi (Ph.D., Stanford University) is the Pentair-Nugent Associate Professor at Northwestern University. He teaches courses on the management of innovation and organizational change in the School of Communication, the McCormick School of Engineering, and the Kellogg School of Management. ...

Nicolas Graf

MBA International Hospitality Management - Ecole Hôtelière de Lausanne Nicolas Graf is on the faculty of ESSEC business school in Paris, where he teaches strategy and real estate finance. He received his Ph.D. from Virginia Polytechnic Institute and State University, with a concentration in corpo...

Harald Müllich

Prof. Dr. Harald H. Müllich did Romance studies as well as German, English, Spanish and Italian language and literature studies at Friedrich-Alexander-University Erlangen-Nuremberg, Manchester University and Université Lyon II. He holds a Master''s degree in English and Gallo-Roman philology as w...

Popular Courses

Private Equity: Investing and Creating Value

The Wharton School

Philadelphia, Pennsylvania, United States

Sep 20

See all courses

Andreas Buja

Schools

Expertise

Links

Biography

The Wharton School

Education

Career and Recent Professional Awards

Academic Positions Held

Other Positions

Professional Leadership

Past Courses

STAT101 INTRO BUSINESS STAT

STAT102 INTRO BUSINESS STAT

STAT470 DATA ANALY & STAT COMP

STAT503 DATA ANALY & STAT COMP

STAT621 ACC REGRESSION ANALYSIS

STAT770 DATA ANALY & STAT COMP

STAT926 MULTIVARIATE ANALY: METH

STAT961 STATISTICAL METHODOLOGY

Knowledge @ Wharton

Read about executive education

Other experts

Paul Leonardi

Nicolas Graf

Harald Müllich

Popular Courses

Private Equity: Investing and Creating Value

Looking for an expert?

Thank you!