Analytical Formula for Large Eigenvalues of Sample Covariance Matrix in Infinite Dimensional Case
2022, v.28, Issue 5, 773-816
In Modern Data Science the estimated covariance matrix or
plays an important role. Indeed many of the main Machine Learning
algorithms, take it as input. In that context, it is often referred to as Statistical Kernel.
With high dimensional data, there is often a large discrepancy between
the covariance matrix and its estimate. This error often causes machine learning
algorithms to fail, in what is referred to as ``the curse of dimensionality''.
We present a simple analytical formula for the distortion between the
spectrum of the covariance matrix and estimated covariance matrix,
which we prove for eigenvalues with size above a certain order.
The traditional approach is based on Free Probability Theory and often
performs poorly with real-life high-dimensional data. Also, in real-life data
the spectrum of the covariance matrix usually contains eigenvalues with
different orders of magnitudes at the same time: Free Probability
Theory does a priori not apply in that situation. This is the first of a series
of ongoing articles, each for a different cases in which our simple analytic formula
Keywords: sample covariance matrix, high dimensional case, spectrum reconstruction