27 Feb 2010 | Kilian Weinberger, Anirban Dasgupta, Josh Attenberg, John Langford, Alex Smola
This paper introduces feature hashing as a dimensionality reduction technique for large-scale multitask learning. The authors provide exponential tail bounds for feature hashing, demonstrating that the interaction between random subspaces is negligible with high probability. They show that this approach is effective in multitask learning scenarios with hundreds of thousands of tasks, where different hash functions can be used for each task to map data into a joint space with minimal interference. The paper also discusses the application of feature hashing in collaborative email spam filtering, where each user expects a personalized classifier reflecting their preferences. Experimental results on real-world spam data sets validate the effectiveness of the proposed method, demonstrating significant improvements in classification accuracy and memory efficiency.This paper introduces feature hashing as a dimensionality reduction technique for large-scale multitask learning. The authors provide exponential tail bounds for feature hashing, demonstrating that the interaction between random subspaces is negligible with high probability. They show that this approach is effective in multitask learning scenarios with hundreds of thousands of tasks, where different hash functions can be used for each task to map data into a joint space with minimal interference. The paper also discusses the application of feature hashing in collaborative email spam filtering, where each user expects a personalized classifier reflecting their preferences. Experimental results on real-world spam data sets validate the effectiveness of the proposed method, demonstrating significant improvements in classification accuracy and memory efficiency.