Analytical Formula for Large Eigenvalues of Sample Covariance Matrix in Infinite Dimensional Case

#### S. Dai, H. Matzinger

2022, v.28, Issue 5, 773-816

ABSTRACT

In Modern Data Science the estimated covariance matrix or

Sample Covariance,

plays an important role. Indeed many of the main Machine Learning

algorithms, take it as input. In that context, it is often referred to as Statistical Kernel.

With high dimensional data, there is often a large discrepancy between

the covariance matrix and its estimate. This error often causes machine learning

algorithms to fail, in what is referred to as ``the curse of dimensionality''.

We present a simple analytical formula for the distortion between the

spectrum of the covariance matrix and estimated covariance matrix,

which we prove for eigenvalues with size above a certain order.

The traditional approach is based on Free Probability Theory and often

performs poorly with real-life high-dimensional data. Also, in real-life data

the spectrum of the covariance matrix usually contains eigenvalues with

different orders of magnitudes at the same time: Free Probability

Theory does a priori not apply in that situation. This is the first of a series

of ongoing articles, each for a different cases in which our simple analytic formula

holds.

Keywords: sample covariance matrix, high dimensional case, spectrum reconstruction

COMMENTS

Please log in or register to leave a comment