Support Vector Machines for Spam Categorization

Support Vector Machines for Spam Categorization

SEPTEMBER 1999 | Harris Drucker, Senior Member, IEEE, Donghui Wu, Student Member, IEEE, and Vladimir N. Vapnik
The paper "Support Vector Machines for Spam Categorization" by Harris Drucker, Donghui Wu, and Vladimir N. Vapnik compares the performance of support vector machines (SVMs) with three other classification algorithms—Ripper, Rocchio, and boosting decision trees—on two datasets for classifying emails as spam or non-spam. The study evaluates the algorithms based on accuracy, speed, and training time. SVMs performed best with binary features, showing acceptable test performance in both datasets. Boosting trees and SVMs had similar error rates, but SVMs required significantly less training time. The authors conclude that SVMs, especially with binary features, are superior due to their faster training and better error dispersion. They also recommend using all features rather than a subset, as SVMs do not degrade performance when more features are used. Additionally, they suggest avoiding the use of stop lists and leveraging user-provided sender information to improve accuracy.The paper "Support Vector Machines for Spam Categorization" by Harris Drucker, Donghui Wu, and Vladimir N. Vapnik compares the performance of support vector machines (SVMs) with three other classification algorithms—Ripper, Rocchio, and boosting decision trees—on two datasets for classifying emails as spam or non-spam. The study evaluates the algorithms based on accuracy, speed, and training time. SVMs performed best with binary features, showing acceptable test performance in both datasets. Boosting trees and SVMs had similar error rates, but SVMs required significantly less training time. The authors conclude that SVMs, especially with binary features, are superior due to their faster training and better error dispersion. They also recommend using all features rather than a subset, as SVMs do not degrade performance when more features are used. Additionally, they suggest avoiding the use of stop lists and leveraging user-provided sender information to improve accuracy.
Reach us at info@study.space