Analytical Formula for Large Eigenvalues of Sample Covariance Matrix in Infinite Dimensional Case

S. Dai, H. Matzinger

2022, v.28, Issue 5, 773-816

ABSTRACT

In Modern Data Science the estimated covariance matrix or
Sample Covariance,
plays an important role. Indeed many of the main Machine Learning
algorithms, take it as input. In that context, it is often referred to as Statistical Kernel.
With high dimensional data, there is often a large discrepancy between
the covariance matrix and its estimate. This error often causes machine learning
algorithms to fail, in what is referred to as ``the curse of dimensionality''.
We present a simple analytical formula for the distortion between the
spectrum of the covariance matrix and estimated covariance matrix,
which we prove for eigenvalues with size above a certain order.
The traditional approach is based on Free Probability Theory and often
performs poorly with real-life high-dimensional data. Also, in real-life data
the spectrum of the covariance matrix usually contains eigenvalues with
different orders of magnitudes at the same time: Free Probability
Theory does a priori not apply in that situation. This is the first of a series
of ongoing articles, each for a different cases in which our simple analytic formula
holds.

Keywords: sample covariance matrix, high dimensional case, spectrum reconstruction

COMMENTS

Please log in or register to leave a comment

There are no comments yet

Laboratory of large random systems, Dept. of Mechanics and Mathematics
Moscow State University, Vorobievy Gory, 119952 Moscow Russia
E-mail: editor@math-mprf.org
Please enter our email addresses to the list of allowed addresses of your mail server

The journal was founded in 1995 by prof. Malyshev V. A.
Published by Polymat Publishing Company, Moscow, Russia
ISSN 1024-2953
The journal is published quaterly

Markov Processes And Related Fields

Scientific Journal

Submit article

S. Dai, H. Matzinger