Abstract or Additional Information
Gaussian processes are widely used throughout the statistical and machine learning communities for modeling natural processes with broad applications to fields such as nuclear engineering and climate science. However, carrying out maximum likelihood calculations for the resulting statistical models is difficult in the case of very large data sets due to the need to work with the covariance matrices of the observations. In some cases, the covariance matrices (or their inverse) may have some exploitable properties (sparseness, Toeplitz) to reduce computations and/or storage, but in many applications, the covariance matrices are dense and unstructured. Moreover, existing algorithms for maximizing the likelihood heavily rely on the Cholesky factorization, the computation of which is prohibitively costly for many problems of practical interest.
We propose a sample average formulation of the maximum likelihood, which narrows down the several linear algebra challenges to solving a linear system with the covariance matrix for multiple right-hand sides. We further investigate two of the most important ingredients in the conjugate gradient solver: the conditioning of the covariance matrix and the multiplication of a vector to a covariance matrix. We demonstrate the successful scalable resolution of the maximum likelihood problem for data sizes as large as a million points on a grid on a single desktop machine, whereas the Cholesky factorization approach would have needed a moderate-size supercomputer. In addition, we have proved that in some circumstances optimal preconditioning is achievable by means of a filtering approach. Parallel programs are under development to solve the problem for much larger data sets and in higher dimensions.
This is joint work with Michael Stein, Jie Chen and Lei Wang.