Probabilistic Canonical Correlation Analysis in Detail
Probabilistic canonical correlation analysis is a reinterpretation of CCA as a latent variable model, which has benefits such as generative modeling, handling uncertainty, and composability. I define and derive its solution in detail.
Standard CCA
Canonical correlation analysis (CCA) is a multivariate statistical method for finding two linear projections, one for each set of observations in a paired dataset, such that the projected data points are maximally correlated. For a thorough explanation, please see my previous post.
I will present an abbreviated explanation here for completeness and notation. Let
With the constraint that:
Since
Probabilistic interpretation of CCA
A probabilistic interpetation of CCA (PCCA), is one in which our two datasets,
Where
It is worth being explicit about the differences between this probabilistic framing and the standard framing. In CCA, we take our data and perform matrix multiplications to get lower-dimensional representations
The objective is to find projections
But the probabilistic model, Equations (
Where
And this in turn is nice because we can just use the maximum likelihood estimates we computed for factor analysis for PCCA. To see this, let’s use block matrices to represent our data and parameters:
Where
Then our PCCA updates are identical to our updates for factor analysis:
Futhermore, this definition gives us a joint density for
See the Appendix for a derivation.
Related properties
It’s worth thinking about how the properties of CCA are converted to probabilistic assumptions in PCCA. First, in CCA,
Furthermore, in CCA, we proved that the canonical variables are orthogonal. In PCCA, there is no such orthogonality constraint. Instead, we assume the latent variables are independent with an isotropic covariance matrix:
This independence assumption is the probabilistic equivalent of orthogonality. The covariance matrix of the latent variables is diagonal, meaning there is no covariance between the
The final constraint of the CCA objective is that the vectors have unit length. In probabilistic terms, this is analogous to unit variance, which we have since the identity matrix
Code
For an implementation of PCCA, please see my GitHub repository of machine learning algorithms, specifically this file.
Appendix
1. Derivations for and
Let’s solve for
First, note that
If the data are mean-centered, as we assume, then:
To understand the covariance matrix
let’s consider
So for the two views,
Now, let’s consider
Thus, the full covariance matrix for
- Bach, F. R., & Jordan, M. I. (2005). A probabilistic interpretation of canonical correlation analysis.
- Klami, A., Virtanen, S., Leppäaho, E., & Kaski, S. (2015). Group factor analysis. IEEE Transactions on Neural Networks and Learning Systems, 26(9), 2136–2147.