Understanding On Deep Multi-View Representation Learning

This paper presents a comprehensive analysis of deep multi-view representation learning, comparing various deep neural network-based approaches with linear and kernel canonical correlation analysis (CCA) in the unsupervised multi-view feature learning setting where the second view is not available at test time. The authors propose a new model, deep canonically correlated autoencoders (DCCAE), which combines CCA and autoencoder-based objectives. They also explore a stochastic optimization procedure for deep CCA and discuss the trade-offs between kernel-based and neural network-based implementations. The paper introduces several multi-view learning approaches based on deep feed-forward neural networks, including split autoencoders (SplitAE), deep canonical correlation analysis (DCCA), deep canonically correlated autoencoders (DCCAE), correlated autoencoders (CorrAE), and minimum-distance autoencoders (DistAE). Each approach has its own objective function and optimization procedure. The authors compare these methods on image, speech, and text tasks, finding that CCA-based approaches tend to outperform unconstrained reconstruction-based approaches. DCCAE is the consistent winner across several tasks. The paper also discusses related work on multi-view feature learning using neural networks and the kernel extension of CCA. It explores the use of kernel methods for multi-view feature learning, including kernel CCA (KCCA), and discusses various kernel approximation techniques for scaling up kernel machines. The authors also compare the performance of different methods on a noisy MNIST dataset and on acoustic-articulatory data for speech recognition. The experiments show that DCCAE outperforms other methods on the noisy MNIST dataset, achieving the best clustering accuracy and classification error rates. The results also show that nonlinear CCA-based methods perform better than linear CCA and other methods on the acoustic-articulatory data for speech recognition. The paper concludes that DCCAE is a promising approach for multi-view representation learning, combining the strengths of CCA and autoencoders.This paper presents a comprehensive analysis of deep multi-view representation learning, comparing various deep neural network-based approaches with linear and kernel canonical correlation analysis (CCA) in the unsupervised multi-view feature learning setting where the second view is not available at test time. The authors propose a new model, deep canonically correlated autoencoders (DCCAE), which combines CCA and autoencoder-based objectives. They also explore a stochastic optimization procedure for deep CCA and discuss the trade-offs between kernel-based and neural network-based implementations. The paper introduces several multi-view learning approaches based on deep feed-forward neural networks, including split autoencoders (SplitAE), deep canonical correlation analysis (DCCA), deep canonically correlated autoencoders (DCCAE), correlated autoencoders (CorrAE), and minimum-distance autoencoders (DistAE). Each approach has its own objective function and optimization procedure. The authors compare these methods on image, speech, and text tasks, finding that CCA-based approaches tend to outperform unconstrained reconstruction-based approaches. DCCAE is the consistent winner across several tasks. The paper also discusses related work on multi-view feature learning using neural networks and the kernel extension of CCA. It explores the use of kernel methods for multi-view feature learning, including kernel CCA (KCCA), and discusses various kernel approximation techniques for scaling up kernel machines. The authors also compare the performance of different methods on a noisy MNIST dataset and on acoustic-articulatory data for speech recognition. The experiments show that DCCAE outperforms other methods on the noisy MNIST dataset, achieving the best clustering accuracy and classification error rates. The results also show that nonlinear CCA-based methods perform better than linear CCA and other methods on the acoustic-articulatory data for speech recognition. The paper concludes that DCCAE is a promising approach for multi-view representation learning, combining the strengths of CCA and autoencoders.

On Deep Multi-View Representation Learning: Objectives and Optimization

2 Feb 2016 | Weiran Wang, Raman Arora, Karen Livescu, Jeff Bilmes