June 16-17, 2016 | Saif M. Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, Colin Cherry
The SemEval-2016 Task 6 aimed to detect stance in tweets, where systems had to determine if a tweeter supported, opposed, or was neutral towards a given target. The task involved two parts: Task A, a supervised classification task using labeled data, and Task B, a weakly supervised task with no training data for the target. The dataset included 4870 English tweets annotated for stance towards six U.S. targets. Task A used 70% of the data for training and 30% for testing, while Task B used all instances for a new target without training data. The highest F-scores were 67.82 for Task A and 56.28 for Task B. Systems used features like n-grams, word vectors, and sentiment lexicons, with some using deep learning models. Task B was more challenging due to lack of training data. The best systems used standard text classification features, and results were worse when the target was not the focus of the tweet. The dataset, along with evaluation scripts and visualizations, is available for future research. The task highlighted the complexity of stance detection, especially when the target is not explicitly mentioned. The results showed that stance detection remains a challenging task, with room for improvement as researchers gain more understanding. The shared task provided a platform for evaluating different approaches to stance detection in tweets.The SemEval-2016 Task 6 aimed to detect stance in tweets, where systems had to determine if a tweeter supported, opposed, or was neutral towards a given target. The task involved two parts: Task A, a supervised classification task using labeled data, and Task B, a weakly supervised task with no training data for the target. The dataset included 4870 English tweets annotated for stance towards six U.S. targets. Task A used 70% of the data for training and 30% for testing, while Task B used all instances for a new target without training data. The highest F-scores were 67.82 for Task A and 56.28 for Task B. Systems used features like n-grams, word vectors, and sentiment lexicons, with some using deep learning models. Task B was more challenging due to lack of training data. The best systems used standard text classification features, and results were worse when the target was not the focus of the tweet. The dataset, along with evaluation scripts and visualizations, is available for future research. The task highlighted the complexity of stance detection, especially when the target is not explicitly mentioned. The results showed that stance detection remains a challenging task, with room for improvement as researchers gain more understanding. The shared task provided a platform for evaluating different approaches to stance detection in tweets.