Feature Hashing for Large Scale Multitask Learning

Feature Hashing for Large Scale Multitask Learning

27 Feb 2010 | Kilian Weinberger, Anirban Dasgupta, Josh Attenberg, John Langford, Alex Smola
This paper introduces a novel approach to dimensionality reduction and multitask learning using feature hashing. The authors propose a method that leverages hashing to map high-dimensional input vectors into a lower-dimensional space, enabling efficient and scalable multitask learning. They provide theoretical analysis showing that the interaction between randomly hashed subspaces is negligible with high probability, allowing for effective multitask learning in a compressed space. The method is demonstrated through experimental results on a large-scale email spam filtering task, where hundreds of thousands of users are labeled with spam or non-spam emails, and personalized classifiers are needed for each user. The authors show that their approach outperforms a single global classifier in terms of spam reduction, achieving up to 30% improvement. They also discuss the theoretical bounds on the distortion of hashed inner products and show that the method can be applied to other tasks such as collaborative filtering and massive multi-class estimation. The paper also addresses the issue of hash collisions and provides bounds on the probability of error due to these collisions. The authors conclude that their approach provides a practical and efficient solution for large-scale multitask learning.This paper introduces a novel approach to dimensionality reduction and multitask learning using feature hashing. The authors propose a method that leverages hashing to map high-dimensional input vectors into a lower-dimensional space, enabling efficient and scalable multitask learning. They provide theoretical analysis showing that the interaction between randomly hashed subspaces is negligible with high probability, allowing for effective multitask learning in a compressed space. The method is demonstrated through experimental results on a large-scale email spam filtering task, where hundreds of thousands of users are labeled with spam or non-spam emails, and personalized classifiers are needed for each user. The authors show that their approach outperforms a single global classifier in terms of spam reduction, achieving up to 30% improvement. They also discuss the theoretical bounds on the distortion of hashed inner products and show that the method can be applied to other tasks such as collaborative filtering and massive multi-class estimation. The paper also addresses the issue of hash collisions and provides bounds on the probability of error due to these collisions. The authors conclude that their approach provides a practical and efficient solution for large-scale multitask learning.
Reach us at info@study.space
Understanding Feature hashing for large scale multitask learning