Academic Positions

• Present 2016

Associate Professor

MIT, Mathematics

• 2016 2015

Assistant Professor

MIT, Mathematics

• 2014 2008

Assistant Professor

Princeton University, ORFE

• 2008 2007

Visiting Assistant Professor

Georgia Tech, Mathematics

Education & Training

• Ph.D. 2006

Ph.D. in Mathematics

University of Paris VI

• B. Sc.2002

B. Sc. in Applied Mathematics

University of Paris VI

• B. Sc..2001

B. Sc. in Statistics

Institute for Statistics (ISUP)

Awards

• 2018
NSF grant IIS-1838071
BIGDATA:F: Statistical and Computational Optimal Transport for Geometric Data Analysis.
• 2018
MIT-Skoltech Seed Fund
Statistical Optimal Transport.
• 2018
Chan-Zuckerberg Initiative: Human Cell Atlas
Reconstruction of cellular trajectories via optimal transport.
• 2017
NSF grant TRIPODS-1740751
TRIPODS: Institute for Foundations of Data Science.
• 2017
NSF grant DMS-1712596
Statistical Estimation with Algebraic Structure.
• 2017
Office of Naval Research
Statistical Learning Theory of Complex Causal Models.
• 2016
DARPA -- Fundamentals of Learning
Foundations of Scalable Statistical Learning.
• 2016
NEC Corporation Fund for Research in Computers and Communications
Awarded by the MIT School of Science to support research on "Average-Case Complexity of Statistical Problems".
• 2015
Invited lecturer. Machine Learning Summer School.
The machine learning summer school (MLSS) series promulgates modern methods of statistical machine learning and inference. Machine learning summer schools present topics which are at the core of modern Machine Learning, from fundamentals to state-of-the-art practice. The speakers are leading experts in their field who talk with enthusiasm about their subjects. More information about MLSS here.
• 2013
Best Paper Award. Conference on Learning Theory (COLT)
Berthet and Rigollet received "best paper award" at the Conference On Learning Theory (COLT) for their paper "Complexity Theoretic Lower Bounds for Sparse Principal Component Detection". In this work, they show that employing computationally efficient statistical methods in the context of sparse principal component analysis leads to an ineluctable deterioration of statistical performance, regardless of the efficient method employed. In particular, their paper draw new connections between statistical learning and computational complexity. An extended version of the paper is available here.
• 2013
Howard B. Wentz Jr. Junior Faculty Award
The Howard B. Wentz, Jr. Junior Faculty Award from the School of Engineering and Applied Science (SEAS) at Princeton University “intended to recognize and assist promising junior faculty members”.
• 2013
NSF grant DMS-1317308
Statistical and Computational Tradeoffs in High Dimensional Learning.
• 2011
NSF CAREER Award DMS-1053987
Large Scale Stochastic Optimization and Statistics.
• 2010
250th Anniversary Fund for Innovation in Undergraduate Education
Competitive award given from the office of the President at Princeton University. The criteria for the award are: innovation and excellence, the positive impact of the course on the undergraduate curriculum, and its effectiveness in demonstrating the faculty's continuing commitment to undergraduate teaching.
• 2009
NSF grant DMS-0906424
Optimal sequential allocation in dynamic environments.
• 2009
Dean's commendation list for outstanding teaching.
Awarded on the basis of outstanding teaching evaluations for the course "Statistical Learning Theory and Nonparametric Estimation".

Join us!

Students

If you are a prospective Ph.D. student interested in working with me, please apply through MIT Department of Mathematics and indicate Statistics as your primary field interest.

Postdocs

If you are a prospective Postdoc interested in working with me, there are several positions you may apply to

Alumni

• Ph.D.2018

Cheng Mao

Current position: Postdoctoral researcher, Yale University

• Ph.D.2015

Lucy Xia (co-advised by J. Fan)

Current position: Stein Fellow, Department of Statistics, Stanford

• Ph.D.2014

Quentin Berthet

Current position: Lecturer, University of Cambridge

• Ph.D. 2012

Xin Tong (co-advised by J. Fan)

Current position: Assistant Professor, Marshall School of Business, Unversity of Southern California

• Post-Doc 2018

Victor-Emmanuel Brunel

Current position: Associate Professor, ENSAE-CREST, Paris, France

• Post-Doc 2016

Afonso Bandeira

Current position: Assistant Professor, Courant Institute and Center for Data Science, NYU

• Post-Doc2016

Irene Waldspurger

Current position: CNRS Researcher, Universite Paris Dauphine

Filter by type:

Sort by year:

Reconstruction of developmental landscapes by optimal-transport analysis of single-cell gene expression sheds light on cellular reprogramming.

Geoffrey Schiebinger, Jian Shu, Marcin Tabaka, Brian Cleary, Vidya Subramanian, Aryeh Solomon, Siyan Liu, Stacie Lin, Peter Berube, Lia Lee, Jenny Chen, Justin Brumbaugh, Philippe Rigollet, Konrad Hochedlinger, Rudolf Jaenisch, Aviv Regev, Eric Lander (2018)
Journal Cell (to appear)

Abstract

Understanding the molecular programs that guide cellular differentiation during development is a major goal of modern biology. Here, we introduce an approach, WADDINGTON-OT, based on the mathematics of optimal transport, for inferring developmental landscapes, probabilistic cellular fates and dynamic trajectories from large-scale single-cell RNA-seq (scRNA-seq) data collected along a time course. We demonstrate the power of WADDINGTON-OT by applying the approach to study 65,781 scRNA-seq profiles collected at 10 time points over 16 days during reprogramming of fibroblasts to iPSCs. We construct a high-resolution map of reprogramming that rediscovers known features; uncovers new alternative cell fates including neural- and placental-like cells; predicts the origin and fate of any cell class; highlights senescent-like cells that may support reprogramming through paracrine signaling; and implicates regulatory models in particular trajectories. Of these findings, we highlight Obox6, which we experimentally show enhances reprogramming efficiency. Our approach provides a general framework for investigating cellular differentiation.

Entropic optimal transport is maximum-likelihood deconvolution

Philippe Rigollet and Jonathan Weed (2018)
Journal Comptes Rendus Mathematique, 356 (11-12), 1228--1235.

Abstract

We give a statistical interpretation of entropic optimal transport by showing that performing maximum-likelihood estimation for Gaussian deconvolution corresponds to calculating a projection with respect to the entropic optimal transport distance. This structural result gives theoretical support for the wide adoption of these tools in the machine learning community.

Uncoupled isotonic regression via minimum Wasserstein deconvolution

Philippe Rigollet and Jonathan Weed (2018)
Preprint arXiv:1806.10648

Abstract

Isotonic regression is a standard problem in shape-constrained estimation where the goal is to estimate an unknown nondecreasing regression function $$f$$ from independent pairs $$(x_i, y_i)$$ where $$\mathbb{E}[y_i]=f(x_i), i=1, \ldots n$$. While this problem is well understood both statistically and computationally, much less is known about its uncoupled counterpart where one is given only the unordered sets $$\{x_1, \ldots, x_n\}$$ and $$\{y_1, \ldots, y_n\}$$. In this work, we leverage tools from optimal transport theory to derive minimax rates under weak moments conditions on $$y_i$$ and to give an efficient algorithm achieving optimal rates. Both upper and lower bounds employ moment-matching arguments that are also pertinent to learning mixtures of distributions and deconvolution.

Statistical Optimal Transport via Factored Couplings

Aden Forrow, Jan-Christian Hütter, Mor Nitzan, Philippe Rigollet, Geoffrey Schiebinger and Jonathan Weed (2018)
Preprint arXiv:1806.07348

Abstract

We propose a new method to estimate Wasserstein distances and optimal transport plans between two probability distributions from samples in high dimension. Unlike plug-in rules that simply replace the true distributions by their empirical counterparts, our method promotes couplings with low transport rank, a new structural assumption that is similar to the nonnegative rank of a matrix. Regularizing based on this assumption leads to drastic improvements on high-dimensional data for various tasks, including domain adaptation in single-cell RNA sequencing data. These findings are supported by a theoretical analysis that indicates that the transport rank is key in overcoming the curse of dimensionality inherent to data-driven optimal transport.

Sparse Gaussian ICA

Nilin Abrahamsen and Philippe Rigollet (2018)
Preprint arXiv:1804.00408

Abstract

Independent component analysis (ICA) is a cornerstone of modern data analysis. Its goal is to recover a latent random vector S with independent components from samples of X=AS where A is an unknown mixing matrix. Critically, all existing methods for ICA rely on and exploit strongly the assumption that S is not Gaussian as otherwise A becomes unidentifiable. In this paper, we show that in fact one can handle the case of Gaussian components by imposing structure on the matrix A. Specifically, we assume that A is sparse and generic in the sense that it is generated from a sparse Bernoulli-Gaussian ensemble. Under this condition, we give an efficient algorithm to recover the columns of A given only the covariance matrix of X as input even when S has several Gaussian components.

The sample complexity of multi-reference alignment

Amelia Perry, Jonathan Weed, Afonso Bandeira, Philippe Rigollet and Amit Singer (2017)
Preprint arXiv:1707.00943

Abstract

The growing role of data-driven approaches to scientific discovery has unveiled a large class of models that involve latent transformations with a rigid algebraic constraint. Among them, multi-reference alignment (MRA) is a simple model that captures fundamental aspects of the statistical and algorithmic challenges arising from this new paradigm. In this model, an unknown signal is subject to two types of corruption: a latent cyclic shift and the more traditional additive white noise. The goal is to recover the signal at a certain precision from independent samples. While at high signal-to-noise ratio (SNR), the number of observations needed to recover a generic signal is proportional to 1/SNR, we show that it rises to 1/SNR^3 in the more realistic low SNR regime. We propose an algorithm that achieves this optimal dependence on the SNR. Furthermore, we extend our results to cover a heterogeneous MRA model where the samples come from a mixture of signals, as is often the case in applications such as Cryo-Electron Microscopy, where molecules may have different conformations. We provide the first known procedure that provably achieves signal recovery in the low SNR regime for heterogeneous MRA.

Optimal rates of estimation for multi-reference alignment

Afonso Bandeira, Philippe Rigollet and Jonathan Weed (2017)
Preprint arXiv:1702.08546

Abstract

This paper describes optimal rates of adaptive estimation of a vector in the multi-reference alignment model, a problem with important applications in fields such as signal processing, image processing, and computer vision, among others. We describe how this model can be viewed as a multivariate Gaussian mixture model under the constraint that the centers belong to the orbit of a group. This enables us to derive matching upper and lower bounds that feature an interesting dependence on the signal-to-noise ratio of the model. Both upper and lower bounds are articulated around a tight local control of Kullback-Leibler divergences that showcases the central role of moment tensors in this problem.

Teacher improves learning by selecting a training subset

Yuzhe Ma, Robert Nowak, Philippe Rigollet , Xuezhou Zhang and Xiaojin Zhu (2018)
Conference Proceedings of AISTATS 2018. PMLR 84:1366-1375

Abstract

We call a learner super-teachable if a teacher can trim down an iid training set while making the learner learn even better. We provide sharp super-teaching guarantees on two learners: the maximum likelihood estimator for the mean of a Gaussian, and the large margin classifier in 1D. For general learners, we provide a mixed-integer nonlinear programming-based algorithm to find a super teaching set. Empirical experiments show that our algorithm is able to find good super-teaching sets for both regression and classification problems.

Minimax rates and efficient algorithms for noisy sorting

Cheng Mao, Jonathan Weed and Philippe Rigollet (2018)
Conference Proceedings of ALT 2018. PMLR 83:821-847

Abstract

There has been a recent surge of interest in studying permutation-based models for ranking from pairwise comparison data. Despite being structurally richer and more robust than parametric ranking models, permutation-based models are less well understood statistically and generally lack efficient learning algorithms. In this work, we study a prototype of permutation-based ranking models, namely, the noisy sorting model. We establish the optimal rates of learning the model under two sampling procedures. Furthermore, we provide a fast algorithm to achieve near-optimal rates if the observations are sampled independently. Along the way, we discover properties of the symmetric group which are of theoretical interest.

High Dimensional Statistics

Philippe Rigollet and Jan-Christian Hütter (2017)
Lecture Notes

Abstract

This course is mainly about learning a regression function from a collection of observations. In this chapter, after defining this task formally, we give an overview of the course and the questions around regression. We adopt the statistical learning point of view where the task of prediction prevails. Last updated in June 2017.

Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration

Jason Altschuler, Jonathan Weed and Philippe Rigollet (2017)
Conference Proceedings of NIPS 2017 (to appear) Spotlight

Abstract

Computing optimal transport distances such as the earth mover's distance is a fundamental problem in machine learning, statistics, and computer vision. Despite the recent introduction of several algorithms with good empirical performance, it is unknown whether general optimal transport distances can be approximated in near-linear time. This paper demonstrates that this ambitious goal is in fact achieved by Cuturi's Sinkhorn Distances, and provides guidance towards parameter tuning for this algorithm. This result relies on a new analysis of Sinkhorn iterations that also directly suggests a new algorithm Greenkhorn with the same theoretical guarantees. Numerical simulations illustrate that Greenkhorn significantly outperforms the classical Sinkhorn algorithm in practice.

Learning determinantal point processes with moments and cycles

John Urschel, Victor-Emmanuel Brunel, Ankur Moitra and Philippe Rigollet (2017)
Conference Proceedings of ICML 2017, PMLR 70:3511-3520.

Abstract

Determinantal Point Processes (DPPs) are a family of probabilistic models that have a repulsive behavior, and lend themselves naturally to many tasks in machine learning where returning a diverse set of objects is important. While there are fast algorithms for sampling, marginalization and conditioning, much less is known about learning the parameters of a DPP. Our contribution is twofold: (i) we establish the optimal sample complexity achievable in this problem and show that it is governed by a natural parameter, which we call the \emph{cycle sparsity}; (ii) we propose a provably fast combinatorial algorithm that implements the method of moments efficiently and achieves optimal sample complexity. Finally, we give experimental results that confirm our theoretical findings.

Maximum likelihood estimation of determinantal point processes

Victor-Emmanuel Brunel, Ankur Moitra, Philippe Rigollet and John Urschel (2017)
Conference Proceedings of COLT 2017, PMLR 65:343-345.

Abstract

eterminantal point processes (DPPs) have wide-ranging applications in machine learning, where they are used to enforce the notion of diversity in subset selection problems. Many estimators have been proposed, but surprisingly the basic properties of the maximum likelihood estimator (MLE) have received little attention. The difficulty is that it is a non-concave maximization problem, and such functions are notoriously difficult to understand in high dimensions, despite their importance in modern machine learning. Here we study both the local and global geometry of the expected log-likelihood function. We prove several rates of convergence for the MLE and give a complete characterization of the case where these are parametric. We also exhibit a potential curse of dimensionality where the asymptotic variance of the MLE scales exponentially with the dimension of the problem. Moreover, we exhibit an exponential number of saddle points, and give evidence that these may be the only critical points.

Marcenko-Pastur Law for Kendall's Tau

Afonso Bandeira, Asad Lohdia and Philippe Rigollet (2017)
Journal Electron. Commun. Probab., 22(32), 1--7.

Abstract

We prove that Kendall's Rank correlation matrix converges to the Marcenko-Pastur law, under the assumption that the observations are i.i.d random vectors X_1, …, X_n with components that are independent and absolutely continuous with respect to the Lebesgue measure. This is the first result on the empirical spectral distribution of a multivariate U-statistic.

Exact recovery in the Ising blockmodel

Quentin Berthet, Philippe Rigollet and Piyush Srivastava (2016)
Journal Ann. Statist. (to appear)

Abstract

We consider the problem associated to recovering the block structure of an Ising model given independent observations on the binary hypercube. This new model, called the Ising blockmodel, is a perturbation of the mean field approximation of the Ising model known as the Curie-Weiss model: the sites are partitioned into two blocks of equal size and the interaction between those of the same block is stronger than across blocks, to account for more order within each block. We study probabilistic, statistical and computational aspects of this model in the high-dimensional case when the number of sites may be much larger than the sample size.

Optimal rates of Statistical Seriation

Nicolas Flammarion, Cheng Mao and Philippe Rigollet (2016)
Journal Bernoulli (to appear)

Abstract

Given a matrix the seriation problem consists in permuting its rows in such way that all its columns have the same shape, for example, they are monotone increasing. We propose a statistical approach to this problem where the matrix of interest is observed with noise and study the corresponding minimax rate of estimation of the matrices. Specifically, when the columns are either unimodal or monotone, we show that the least squares estimator is optimal up to logarithmic factors and adapts to matrices with a certain natural structure. Finally, we propose a computationally efficient estimator in the monotonic case and study its performance both theoretically and experimentally. Our work is at the intersection of shape constrained estimation and recent work that involves permutation learning, such as graph denoising and ranking.

Online learning in repeated auctions

Jonathan Weed, Vianney Perchet and Philippe Rigollet (2016)
Conference Proceedings of COLT 2016, PMLR 49:1562-1583.

Abstract

Motivated by online advertising auctions, we consider repeated Vickrey auctions where goods of unknown value are sold sequentially and bidders only learn (potentially noisy) information about a good's value once it is purchased. We adopt an online learning approach with bandit feedback to model this problem and derive bidding strategies for two models: stochastic and adversarial. In the stochastic model, the observed values of the goods are random variables centered around the true value of the good. In this case, logarithmic regret is achievable when competing against well behaved adversaries. In the adversarial model, the goods need not be identical and we simply compare our performance against that of the best fixed bid in hindsight. We show that sublinear regret is also achievable in this case and prove matching minimax lower bounds. To our knowledge, this is the first complete set of strategies for bidders participating in auctions of this type.

Optimal rates for total variation denoising

Jan-Christian Hütter and Philippe Rigollet (2016)
Conference Proceedings of COLT 2016, PMLR 49:1115-1146.

Abstract

Motivated by its practical success, we show that the two-dimensional total variation denoiser satisfies a sharp oracle inequality that leads to near optimal rates of estimation for a large class of image models such as bi-isotonic, H\"older smooth and cartoons. Our analysis hinges on properties of the unnormalized Laplacian of the two-dimensional grid such as eigenvector delocalization and spectral decay. We also present extensions to more than two dimensions as well as several other graphs.

Batched Bandit Problems

Vianney Perchet, Philippe Rigollet, Sylvain Chassang, and Erik Snowberg (2016)
Journal Ann. Statist., 44(2), 660-681.

Abstract

Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

Review of the book "Introduction to High-Dimensional Statistics" by C. Giraud.

Philippe Rigollet (2015)
Other JASA Book Reviews, 110(512), 1821

Abstract

Invited book review for the Journal of the American Statistical Association.

Comment on "Hypothesis testing by convex optimization"

Philippe Rigollet (2015)
Other Electron. J. Statist., 9(2), 1723-1726

Abstract

Invited comment on the discussion paper "Hypothesis testing by convex optimization" by Alexander Goldenshluger, Anatoli Juditsky and Arkadi Nemirovski.

Estimation of Functionals of Sparse Covariance Matrices

Jianqing Fan, Philippe Rigollet, and Weichen Wang (2015)
Journal Ann. Statist., 43(6), 2706-2737

Abstract

High-dimensional statistical tests often ignore correlations to gain simplicity and stability leading to null distributions that depend on functionals of correlation matrices such as their Frobenius norm and other $$\ell_r$$ norms. Motivated by the computation of critical values of such tests, we investigate the difficulty of estimation the functionals of sparse correlation matrices. Specifically, we show that simple plug-in procedures based on thresholded estimators of correlation matrices are sparsity-adaptive and minimax optimal over a large class of correlation matrices. Akin to previous results on functional estimation, the minimax rates exhibit an elbow phenomenon. Our results are further illustrated in simulated data as well as an empirical study of data arising in financial econometrics.

Batched Bandit Problems

Vianney Perchet, Philippe Rigollet , Sylvain Chassang and Erik Snowberg (2015)
Conference Proceedings of COLT 2015, PMLR 40:1456-1456.

Abstract

Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

Aggregation of Affine Estimators

Dong Dai, Philippe Rigollet, Lucy Xia, and Tong Zhang (2014)
Journal Electon. J. Stat., 8, 302-327.

Abstract

We consider the problem of aggregating a general collection of affine estimators for fixed design regression. Relevant examples include some commonly used statistical estimators such as least squares, ridge and robust least squares estimators. Dalalyan and Salmon (2012) have established that, for this problem, exponentially weighted (EW) model selection aggregation leads to sharp oracle inequalities in expectation, but similar bounds in deviation were not previously known. While results (Dai, Rigollet, Zhang, 2012) indicate that the same aggregation scheme may not satisfy sharp oracle inequalities with high probability, we prove that a weaker notion of oracle inequality for EW that holds with high probability. Moreover, using a generalization of the newly introduced $$Q$$-aggregation scheme we also prove sharp oracle inequalities that hold with high probability. Finally, we apply our results to universal aggregation and show that our proposed estimator leads simultaneously to all the best known bounds for aggregation, including $$\ell_q$$-aggregation, $$q \in (0,1)$$, with high probability.

Optimal learning with Q-aggregation

Guillaume Lecue and Philippe Rigollet (2014)
Journal Ann. Statist., 42(1), 211-224.

Abstract

We consider a general supervised learning problem with strongly convex and Lipschitz loss and study the problem of model selection aggregation. In particular, given a finite dictionary functions (learners) together with the prior, we generalize the results obtained by Dai, Rigollet and Zhang (2012) for Gaussian regression with squared loss and fixed design to this learning setup. Specifically, we prove that the Q-aggregation procedure outputs an estimator that satisfies optimal oracle inequalities both in expectation and with high probability. Our proof techniques somewhat depart from traditional proofs by making most of the standard arguments on the Laplace transform of the empirical process to be controlled.

Complexity Theoretic Lower Bounds for Sparse Principal Component Detection

Quentin Berthet and Philippe Rigollet (2013)
ConferenceProceedings of COLT 2013, PMLR 30:1046-1066. Best Paper Award

Abstract

In the context of sparse principal component detection, we bring evidence towards the existence of a statistical price to pay for computational efficiency. We measure the performance of a test by the smallest signal strength that it can detect and we propose a computationally efficient method based on semidefinite programming. We also prove that the statistical performance of this test cannot be strictly improved by any computationally efficient method. Our results can be viewed as complexity theoretic lower bounds conditionally on the assumptions that some instances of the planted clique problem cannot be solved in randomized polynomial time.

Bounded regret in stochastic multi-armed bandits

Sebastien Bubeck, Vianney Perchet and Philippe Rigollet (2013)
ConferenceProceedings of COLT 2013, PMLR 30:122-134.

Abstract

We study the stochastic multi-armed bandit problem when one knows the value $$\mu^{(\star)}$$ of an optimal arm, as a well as a positive lower bound on the smallest positive gap $$\Delta$$. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows $$\Delta$$, and bounded regret of order $$1/\Delta$$ is not possible if one only knows $$\mu^{(\star)}$$.

Optimal detection of sparse principal components in high dimension

Quentin Berthet and Philippe Rigollet (2013)
JournalAnn. Statist., 41(1), 1780-1815.

Abstract

We perform a finite sample analysis of the detection levels for sparse principal components of a high-dimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NP-complete in general, and we describe a computationally efficient alternative test using convex relaxations. Our relaxation is also proved to detect sparse principal components at near optimal detection levels, and it performs well on simulated datasets. Moreover, using polynomial time reductions from theoretical computer science, we bring significant evidence that our results cannot be improved, thus revealing an inherent trade off between statistical and computational performance.

The multi-armed bandit problem with covariates

Vianney Perchet and Philippe Rigollet (2013)
JournalAnn. Statist., 41(2), 693-721.

Abstract

We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparametric model where the expected rewards are smooth functions of the covariate and where the hardness of the problem is captured by a margin parameter. To maximize the expected cumulative reward, we introduce a policy called Adaptively Binned Successive Elimination (ABSE) that adaptively decomposes the global problem into suitably localized static bandit problems. This policy constructs an adaptive partition using a variant of the Successive Elimination (SE) policy. Our results include sharper regret bounds for the SE policy in a static bandit problem and minimax optimal regret bounds for the ABSE policy in the dynamic problem.

Sparse estimation by exponential weighting

Philippe Rigollet and Alexandre Tsybakov (2012)
JournalStatist. Sci., 27(4), 558-575

Abstract

Consider a regression model with fixed design and Gaussian noise where the regression function can potentially be well approximated by a function that admits a sparse representation in a given dictionary. This paper resorts to exponential weights to exploit this underlying sparsity by implementing the principle of sparsity pattern aggregation. This model selection take on sparse estimation allows us to derive sparsity oracle inequalities in several popular frameworks including ordinary sparsity, fused sparsity and group sparsity. One striking aspect of these theoretical results is that they hold under no condition on the dictionary. Moreover, we describe an efficient implementation of the sparsity pattern aggregation principle that compares favorably to state-of-the-art procedures on some basic numerical examples.

Estimation of Covariance Matrices under Sparsity Constraints

Philippe Rigollet and Alexandre Tsybakov (2012)
JournalStatist. Sinica, 22(4), 1319-1378.

Abstract

Discussion of Minimax Estimation of Large Covariance Matrices under L1-Norm'' by Tony Cai and Harrison Zhou.

Deviation Optimal Learning using Greedy Q-aggregation

Dong Dai, Philippe Rigollet and Tong Zhang (2012)
JournalAnn. Statist., 40(3), 1878-1905.

Abstract

Given a finite family of functions, the goal of model selection is to construct a procedure that mimics the function from this family that is the closest to an unknown regression function. More precisely, we consider a general regression model with fixed design and measure the distance between functions by the mean squared error at the design points. While procedures based on exponential weights are known to solve the problem of model selection in expectation, they are, surprisingly, sub-optimal in deviation. We propose a new formulation called Q-aggregation that addresses this limitation; namely, its solution leads to sharp oracle inequalities that are optimal in a minimax sense. Moreover, based on the new formulation, we design greedy Q-aggregation procedures that produce sparse aggregation models achieving the optimal rate. The convergence and performance of these greedy procedures are illustrated and compared with other standard methods on simulated examples.

Kullback-Leibler aggregation and misspecified generalized linear models

Philippe Rigollet (2012)
JournalAnn. Statist., 40(2), 639-665.

Abstract

In a regression setup with deterministic design, we study the pure aggregation problem and introduce a natural extension from the Gaussian distribution to distributions in the exponential family. While this extension bears strong connections with generalized linear models, it does not require identifiability of the parameter or even that the model on the systematic component is true. It is shown that this problem can be solved by constrained and/or penalized likelihood maximization and we derive sharp oracle inequalities that hold both in expectation and with high probability. Finally all the bounds are proved to be optimal in a minimax sense.

Neyman-Pearson classification, convexity and stochastic constraints

Philippe Rigollet and Xin Tong (2011)
JournalJ. Mach. Learn. Res., 12(Oct):2831-2855.

Abstract

Motivated by problems of anomaly detection, this paper implements the Neyman-Pearson paradigm to deal with asymmetric errors in binary classification with a convex loss $$\varphi$$. Given a finite collection of classifiers, we combine them and obtain a new classifier that satisfies simultaneously the two following properties with high probability: (i) its $$\varphi$$-type I error is below a pre-specified level and (ii), it has $$\varphi$$-type II error close to the minimum possible. The proposed classifier is obtained by minimizing an empirical convex objective with an empirical convex constraint. The novelty of the method is that the classifier output by this computationally feasible program is shown to satisfy the original constraint on type I error. New techniques to handle such problems are developed and they have consequences on chance constrained programming. We also evaluate the price to pay in terms of type II error for being conservative on type I error.

Exponential Screening and optimal rates of sparse estimation

Philippe Rigollet and Alexandre Tsybakov (2011)
JournalAnn. Statist., 39(2), 731-771.

Abstract

In high-dimensional linear regression, the goal pursued here is to estimate an unknown regression function using linear combinations of a suitable set of covariates. One of the key assumptions for the success of any statistical procedure in this setup is to assume that the linear combination is sparse in some sense, for example, that it involves only few covariates. We consider a general, non necessarily linear, regression with Gaussian noise and study a related question that is to find a linear combination of approximating functions, which is at the same time sparse and has small mean squared error (MSE). We introduce a new estimation procedure, called \emph{Exponential Screening} that shows remarkable adaptation properties. It adapts to the linear combination that optimally balances MSE and sparsity, whether the latter is measured in terms of the number of non-zero entries in the combination ($$\ell_0$$ norm) or in terms of the global weight of the combination ($$\ell_1$$ norm). The power of this adaptation result is illustrated by showing that Exponential Screening solves optimally and simultaneously all the problems of aggregation in Gaussian regression that have been discussed in the literature. Moreover, we show that the performance of the Exponential Screening estimator cannot be improved in a minimax sense, even if the optimal sparsity is known in advance. The theoretical and numerical superiority of Exponential Screening compared to state-of-the-art sparse procedures is also discussed.

Neyman-Pearson classification under a strict constraint

Philippe Rigollet and Xin Tong (2011)
ConferenceProceedings of COLT 2011, PMLR 19:595-614.

Abstract

Motivated by problems of anomaly detection, this paper implements the Neyman-Pearson paradigm to deal with asymmetric errors in binary classification with a convex loss. Given a finite collection of classifiers, we combine them and obtain a new classifier that satisfies simultaneously the two following properties with high probability: (i), its probability of type~I error is below a pre-specified level and (ii), it has probability of type ~II error close to the minimum possible. The proposed classifier is obtained by minimizing an empirical objective subject to an empirical constraint. The novelty of the method is that the classifier output by this problem is shown to satisfy the original constraint on type~I error. This strict enforcement of the constraint has interesting consequences on the control of the type~II error and we develop new techniques to handle this situation. Finally, connections with chance constrained optimization are evident and are investigated.

Optimal rates of sparse estimation and universal aggregation

Philippe Rigollet and Alexandre Tsybakov (2010)
OtherOberwolfach reports, 7(1), 924-927 In: Modern Nonparametric Statistics: Going Beyond Asymptotic Minimax, Mar.-Apr. 2010.

Nonparametric Bandits with Covariates

Philippe Rigollet and Assaf Zeevi (2010)
ConferenceProceedings of COLT 2010. Omnipress, 54-66.

Abstract

Me consider a bandit problem which involves sequential sampling from two populations (arms). Each arm produces a noisy reward realization which depends on an observable random covariate. The goal is to maximize cumulative expected reward. We derive general lower bounds on the performance of any admissible policy, and develop an algorithm whose performance achieves the order of said lower bound up to logarithmic terms. This is done by decomposing the global problem into suitably localized'' bandit problems. Proofs blend ideas from nonparametric statistics and traditional methods used in the bandit literature.

Optimal rates for plug-in estimators of density level sets

Philippe Rigollet and Regis Vert (2009)
JournalBernoulli, 15(4), 1154-1178.

Abstract

In the context of density level set estimation, we study the convergence of general plug-in methods under two main assumptions on the density for a given level $$\lambda$$. More precisely, it is assumed that the density (i) is smooth in a neighborhood of $$\lambda$$ and (ii) has $$\gamma$$-exponent at level $$\lambda$$. Condition (i) ensures that the density can be estimated at a standard nonparametric rate and condition (ii) is similar to Tsybakov's margin assumption which is stated for the classification framework. Under these assumptions, we derive optimal rates of convergence for plug-in estimators. Explicit convergence rates are given for plug-in estimators based on kernel density estimators when the underlying measure is the Lebesgue measure. Lower bounds proving optimality of the rates in a minimax sense when the density is Holder smooth are also provided.

Learning by mirror averaging

Anatoli Juditsky, Philippe Rigollet and Alexandre Tsybakov (2009)
JournalAnn. Statist., 36(5), 2183-2206.

Abstract

Given a finite collection of estimators or classifiers, we study the problem of model selection type aggregation, that is, we construct a new estimator or classifier, called aggregate, which is nearly as good as the best among them with respect to a given risk criterion. We define our aggregate by a simple recursive procedure which solves an auxiliary stochastic linear programming problem related to the original nonlinear one and constitutes a special case of the mirror averaging algorithm. We show that the aggregate satisfies sharp oracle inequalities under some general assumptions. The results are applied to several problems including regression, classification and density estimation.

Generalization error bounds in semi-supervised classification under the cluster assumption

Philippe Rigollet (2009)
JournalJ. Mach. Learn. Res., 8(Jul), 1369-1392.

Abstract

We consider semi-supervised classification when part of the available data is unlabeled. These unlabeled data can be useful for the classification problem when we make an assumption relating the behavior of the regression function to that of the marginal distribution. Seeger (2000) proposed the well-known cluster assumption as a reasonable one. We propose a mathematical formulation of this assumption and a method based on density level sets estimation that takes advantage of it to achieve fast rates of convergence both in the number of unlabeled examples and the number of labeled examples.

Linear and convex aggregation of density estimators

Philippe Rigollet and Alexandre Tsybakov (2007)
Journal Math. Methods of Statist., 15(3), 260-280

Abstract

We study the problem of finding the best linear and convex combination of M estimators of a density with respect to the mean squared risk. We suggest aggregation procedures and we prove sharp oracle inequalities for their risks, i.e., oracle inequalities with leading constant 1. We also obtain lower bounds showing that these procedures attain optimal rates of aggregation. As an example, we consider aggregation of multivariate kernel density estimators with different bandwidths. We show that linear and convex aggregates mimic the kernel oracles in asymptotically exact sense. We prove that, for Pinsker's kernel, the proposed aggregates are sharp asymptotically minimax simultaneously over a large scale of Sobolev classes of densities. Finally, we provide simulations demonstrating performance of the convex aggregation procedure.

Adaptive density estimation using the blockwise Stein method

Philippe Rigollet (2006)
Journal Bernoulli, 12(2), 351-370

Abstract

We study the problem of nonparametric estimation of a probability density of unknown smoothness in $$L_2(\mathbb{R})$$. Expressing mean integrated squared error (MISE) in the Fourier domain, we show that it is close to mean squared error in the Gaussian sequence model. Then applying a modified version of Stein's blockwise method, we obtain a linear monotone oracle inequality. Two consequences of this oracle inequality are that the proposed estimator is sharp minimax adaptive over a scale of Sobolev classes of densities, and that its MISE is asymptotically smaller than or equal to that of kernel density estimators with any bandwidth provided that the kernel belongs to a large class of functions including many standard kernels.

Mirror averaging, aggregation and model selection

Anatoli Juditsky, Philippe Rigollet and Alexandre Tsybakov (2005)
Other Oberwolfach reports, 2(4), 2688-2691. In: Meeting on Statistical and Probabilistic Methods of Model Selection, October 2005.

Abstract

Short note on aggregation published in Oberwolfach Reports following the Meeting on Statistical and Probabilistic Methods of Model Selection, October 2005.

Inegalites d'oracle pour l'estimation d'une densite de probabilite

Philippe Rigollet (2005)
Journal C. R. Math. Acad. Sci. Paris, 340(1), 59-62

Abstract

We study the problem of the nonparametric estimation of a probability density in $$L_2(\mathbb{R})$$. Expressing the mean integrated squared error in the Fourier domain, we show that it is close to the mean squared error in the Gaussian sequence model. Then, applying a modified version of Stein's blockwise method, we obtain a linear monotone oracle inequality and a kernel oracle inequality. As a consequence, the proposed estimator is sharp minimax adaptive (i.e. up to a constant) on a scale of Sobolev classes of densities.

Associate Editor

• SIAM Journal on Mathematics of Data Science (2018-present)
• Mathematical Statistics and Learning (2018-present)
• Electronic Journal of Statistics (2016-present)
• Bernoulli (2013-present)
• Statistical Inference for Stochastic Processes (2015-16)
• Journal of Statistical Planning and Inference (2012-15)

Conferences Program Committee

• Conference on Learning Theory, COLT (2012-present)

At My Office

2-279 in the Department of Mathematics

E17-467 in the Institute for Data Systems and Society

photo courtesy of Peter Vanderwarker