Published online: 9 January 2008 | Andreas Argyriou · Theodoros Evgeniou · Massimiliano Pontil
This paper presents a method for learning sparse representations shared across multiple tasks. The method generalizes the well-known single-task 1-norm regularization by introducing a novel non-convex regularizer that controls the number of learned features common across tasks. It is shown that the method is equivalent to solving a convex optimization problem with an iterative algorithm that converges to an optimal solution. The algorithm alternately performs a supervised step (learning task-specific functions) and an unsupervised step (learning common sparse representations). An extension of the algorithm learns sparse nonlinear representations using kernels. Experiments on simulated and real data show that the method improves performance relative to learning each task independently and learns a few common features across related tasks. The algorithm can also be used for variable selection. The paper also proves that the proposed non-convex problem is equivalent to a convex one, and that the alternating algorithm converges to an optimal solution. The method is applied to both linear and nonlinear feature learning, with experiments demonstrating its effectiveness.This paper presents a method for learning sparse representations shared across multiple tasks. The method generalizes the well-known single-task 1-norm regularization by introducing a novel non-convex regularizer that controls the number of learned features common across tasks. It is shown that the method is equivalent to solving a convex optimization problem with an iterative algorithm that converges to an optimal solution. The algorithm alternately performs a supervised step (learning task-specific functions) and an unsupervised step (learning common sparse representations). An extension of the algorithm learns sparse nonlinear representations using kernels. Experiments on simulated and real data show that the method improves performance relative to learning each task independently and learns a few common features across related tasks. The algorithm can also be used for variable selection. The paper also proves that the proposed non-convex problem is equivalent to a convex one, and that the alternating algorithm converges to an optimal solution. The method is applied to both linear and nonlinear feature learning, with experiments demonstrating its effectiveness.