Contrastive learning has recently seen tremendous success in self-supervised learning. So far, however, it is largely unclear why the learned representations generalize so effectively to a large variety of downstream tasks. We here prove that feedforward models trained with objectives belonging to the commonly used InfoNCE family learn to implicitly invert the underlying generative model of the observed data. While the proofs make certain statistical assumptions about the generative model, we observe empirically that our findings hold even if these assumptions are severely violated. Our theory highlights a fundamental connection between contrastive learning, generative modeling, and nonlinear independent component analysis, thereby furthering our understanding of the learned representations as well as providing a theoretical foundation to derive more effective contrastive losses.
We start with the well-known formulation of a contrastive loss (often called InfoNCE),
Our theoretical approach consists of three steps:
We follow this approach both for the contrastive loss (L) defined above, and use our theory as a starting point to design new contrastive losses (e.g., for latents within a hypercube). We validate predictions regarding identifiability of the latent variables (up to a transformation) with extensive experiments.
We introduce 3Dident, a dataset with hallmarks of natural environments (shadows, different lighting conditions, 3D rotations, etc.). We publicly released the full dataset (including both, the train and test set) here. Reference code for evaluation has been made available at our repository.